# A Random Walk through a Random Forest

From Wikipedia, the free encyclopedia

A Random Walk Down Wall Street, written by Burton Gordon Malkiel, a Princeton economist, is a book on the subject of stock markets which popularized the random walk hypothesis. Malkiel argues that asset prices typically exhibit signs of random walk and that one cannot consistently outperform market averages. The book is frequently cited by those in favor of the efficient-market hypothesis. As of 2015, there have been eleven editions and over 1.5 million copies sold.[1] A practical popularization is The Random Walk Guide to Investing: Ten Rules for Financial Success.[2]

A random walk is a mathematical object, known as a stochastic or random process, that describes a path that consists of a succession of random steps on some mathematical space such as the integers. An elementary example of a random walk is the random walk on the integer number line,, which starts at 0 and at each step moves +1 or −1 with equal probability. Other examples include the path traced by a molecule as it travels in a liquid or a gas, the search path of a foraging animal, the price of a fluctuating stock and the financial status of a gambler can all be approximated by random walk models, even though they may not be truly random in reality. As illustrated by those examples, random walks have applications to many scientific fields including ecology, psychology, computer science, physics, chemistry, biology as well as economics. Random walks explain the observed behaviors of many processes in these fields, and thus serve as a fundamental model for the recorded stochastic activity. As a more mathematical application, the value of pi can be approximated by the usage of random walk in agent-based modelling environment.[b1][b2] The term random walk was first introduced by Karl Pearson in 1905.[b3]

Various types of random walks are of interest, which can differ in several ways. The term itself most often refers to a special category of Markov chains or Markov processes, but many time-dependent processes are referred to as random walks, with a modifier indicating their specific properties. Random walks (Markov or not) can also take place on a variety of spaces: commonly studied ones include graphs, others on the integers or the real line, in the plane or in higher-dimensional vector spaces, on curved surfaces or higher-dimensional Riemannian manifolds, and also on groups finite, finitely generated or Lie. The time parameter can also be manipulated. In the simplest context the walk is in discrete time, that is a sequence of random variables (Xt) = (X1, X2, ...) indexed by the natural numbers. However, it is also possible to define random walks which take their steps at random times, and in that case the position Xt has to be defined for all times t ∈ [0,+∞). Specific cases or limits of random walks include the Lévy flight and diffusion models such as Brownian motion.

Random walks are a fundamental topic in discussions of Markov processes. Their mathematical study has been extensive. Several properties, including dispersal distributions, first-passage or hitting times, encounter rates, recurrence or transience, have been introduced to quantify their behavior.

Using well known analysis metrics as a feature set for a Random Forest.

MFI is used to measure the "enthusiasm" of the market. In other words, the money flow index shows how much a stock was traded.

A value of 80 or more is generally considered overbought, a value of 20 or less oversold. Divergences between MFI and price action are also considered significant, for instance if price makes a new rally high but the MFI high is less than its previous high then that may indicate a weak advance that is likely to reverse.

The StochRSI is an indicator used in technical analysis that ranges between zero and one and is created by applying the Stochastic Oscillator formula to a set of Relative Strength Index (RSI) values rather than standard price data. Using RSI values within the Stochastic formula gives traders an idea of whether the current RSI value is overbought or oversold - a measure that becomes specifically useful when the RSI value is confined between its signal levels of 20 and 80.

Kaufman's Adaptive Moving Average (KAMA) is a moving average designed to account for market noise or volatility. KAMA will closely follow prices when the price swings are relatively small and the noise is low. KAMA will adjust when the price swings widen and follow prices from a greater distance. This trend-following indicator can be used to identify the overall trend, time turning points and filter price movements.

def get_indicators(stocks, period):

stocks_indicators = {}

for i in stocks:

features = pd.DataFrame(SMA(stocks[i], timeperiod=10))

features.columns = ['sma_10']

features['mfi_10'] = pd.DataFrame(MFI(stocks[i], timeperiod=10))

features['mom_10'] = pd.DataFrame(MOM(stocks[i],10))

features['wma_10'] = pd.DataFrame(WMA(stocks[i],10))

features['ultosc_4'] = pd.DataFrame(ULTOSC(stocks[i],timeperiod1=4, timeperiod2=7, timeperiod3=14))

features = pd.concat([features,STOCHF(stocks[i],

fastk_period=14,

fastd_period=3)],

axis=1)

features['macd'] = pd.DataFrame(MACD(stocks[i], fastperiod=5, slowperiod=14)['macd'])

features['rsi'] = pd.DataFrame(RSI(stocks[i], timeperiod=14))

features['willr'] = pd.DataFrame(WILLR(stocks[i], timeperiod=14))

features['cci'] = pd.DataFrame(CCI(stocks[i], timeperiod=14))

features['adosc'] = pd.DataFrame(ADOSC(stocks[i], fastperiod=3, slowperiod=10))

features['raw_pct_change'] = ROCP(stocks[i], timeperiod=period)

features['raw_pct_change'] = features['raw_pct_change'].shift(-period)

features['pct_change'] = features['raw_pct_change'].apply(lambda x: '1' if x > 0.03 else '0')

features = features.dropna()

stocks_indicators[i] = features

return stocks_indicators

>>> pred = clf.predict(test[features])

>>> list(zip(train[features], clf.feature_importances_))

[('sma_10', '0.08805410862396797'), ('mfi_10', '0.09451084071039871'), ('mom_10', '0.069575191927391'), ('wma_10', '0.13220290244507055'), ('ultosc_4', '0.06845575657410827'), ('fastk', '0.05566000353802879'), ('fastd', '0.09610385618777093'), ('macd', '0.07816429691621057'), ('rsi', '0.06785865605512234'), ('willr', '0.07602636981442927'), ('cci', '0.06132717952251619'), ('adosc', '0.1120608376849855')]

>>> pd.crosstab(test['pct_change'],pred)

col_0 0 1

pct_change

0 142 7

1 30 11

Correlation Matrix

>>> corr_df.head(12)

sma_5 mfi_7 mom_7 wma_7 ultosc_4 fastk fastd macd rsi_7 willr cci adosc raw_pct_change

sma_5 1.000000 -0.159846 -0.021672 0.999855 -0.223942 -0.122112 -0.097125 -0.045893 -0.240654 -0.179791 -0.192162 -0.201078 -0.121511

mfi_7 -0.159846 1.000000 0.756359 -0.160371 0.602693 0.743536 0.723341 0.708150 0.751577 0.670372 0.606966 0.573771 -0.008000

mom_7 -0.021672 0.756359 1.000000 -0.021746 0.724770 0.799565 0.747480 0.944961 0.805183 0.768140 0.659933 0.659134 0.045912

wma_7 0.999855 -0.160371 -0.021746 1.000000 -0.220404 -0.121428 -0.098823 -0.042838 -0.239385 -0.176723 -0.187203 -0.201753 -0.121632

ultosc_4 -0.223942 0.602693 0.724770 -0.220404 1.000000 0.766973 0.672136 0.797311 0.813730 0.828482 0.723102 0.773014 0.043718

fastk -0.122112 0.743536 0.799565 -0.121428 0.766973 1.000000 0.925347 0.824724 0.933722 0.838674 0.748511 0.682123 -0.002816

fastd -0.097125 0.723341 0.747480 -0.098823 0.672136 0.925347 1.000000 0.736916 0.867633 0.658746 0.559054 0.696500 -0.028984

macd -0.045893 0.708150 0.944961 -0.042838 0.797311 0.824724 0.736916 1.000000 0.842905 0.817824 0.752590 0.660587 0.042007

rsi_7 -0.240654 0.751577 0.805183 -0.239385 0.813730 0.933722 0.867633 0.842905 1.000000 0.837893 0.766954 0.726777 0.014199

willr -0.179791 0.670372 0.768140 -0.176723 0.828482 0.838674 0.658746 0.817824 0.837893 1.000000 0.894718 0.577140 0.032947

cci -0.192162 0.606966 0.659933 -0.187203 0.723102 0.748511 0.559054 0.752590 0.766954 0.894718 1.000000 0.463627 0.040750

adosc -0.201078 0.573771 0.659134 -0.201753 0.773014 0.682123 0.696500 0.660587 0.726777 0.577140 0.463627 1.000000 0.006484