AI For Trading: Breakout Strategy Project 2 (32)
Project 2: Breakout Strategy
Instructions
Each problem consists of a function to implement and instructions on how to implement the function. The parts of the function that need to be implemented are marked with a # TODO
comment. After implementing the function, run the cell to test it against the unit tests we've provided. For each problem, we provide one or more unit tests from our project_tests
package. These unit tests won't tell you if your answer is correct, but will warn you of any major errors. Your code will be checked for the correct solution when you submit it to Udacity.
Packages
When you implement the functions, you'll only need to you use the packages you've used in the classroom, like Pandas and Numpy. These packages will be imported for you. We recommend you don't add any import statements, otherwise the grader might not be able to run your code.
The other packages that we're importing are helper
, project_helper
, and project_tests
. These are custom packages built to help you solve the problems. The helper
and project_helper
module contains utility functions and graph functions. The project_tests
contains the unit tests for all the problems.
Install Packages
import sys
!{sys.executable} -m pip install -r requirements.txt
Requirement already satisfied: colour==0.1.5 in /opt/conda/lib/python3.6/site-packages (from -r requirements.txt (line 1))
Requirement already satisfied: cvxpy==1.0.3 in /opt/conda/lib/python3.6/site-packages (from -r requirements.txt (line 2))
Requirement already satisfied: cycler==0.10.0 in /opt/conda/lib/python3.6/site-packages/cycler-0.10.0-py3.6.egg (from -r requirements.txt (line 3))
Requirement already satisfied: numpy==1.13.3 in /opt/conda/lib/python3.6/site-packages (from -r requirements.txt (line 4))
Requirement already satisfied: pytz==2017.3 in /opt/conda/lib/python3.6/site-packages (from -r requirements.txt (line 9))
Requirement already satisfied: python-editor>=0.3 in /opt/conda/lib/python3.6/site-packages (from alembic>=0.7.7->zipline==1.2.0->-r requirements.txt (line 15))
[33mYou are using pip version 9.0.1, however version 19.0.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m
Load Packages
import pandas as pd
import numpy as np
import helper
import project_helper
import project_tests
Market Data
Load Data
While using real data will give you hands on experience, it's doesn't cover all the topics we try to condense in one project. We'll solve this by creating new stocks. We've create a scenario where companies mining Terbium are making huge profits. All the companies in this sector of the market are made up. They represent a sector with large growth that will be used for demonstration latter in this project.
df_original = pd.read_csv('../../data/project_2/eod-quotemedia.csv', parse_dates=['date'], index_col=False)
# Add TB sector to the market
df = df_original
# print
print(df.head(5))
df = pd.concat([df] + project_helper.generate_tb_sector(df[df['ticker'] == 'AAPL']['date']), ignore_index=True)
# print concat
print("PD CONCAT")
print(df.head(5))
close = df.reset_index().pivot(index='date', columns='ticker', values='adj_close')
# print close
print("CLOSE:")
print(close.head(5))
high = df.reset_index().pivot(index='date', columns='ticker', values='adj_high')
low = df.reset_index().pivot(index='date', columns='ticker', values='adj_low')
print('Loaded Data')
date ticker adj_close adj_high adj_low
0 2013-07-01 A 29.99418563 30.11804324 29.52627909
1 2013-07-02 A 29.65013670 30.20061499 29.55380300
2 2013-07-03 A 29.70518453 29.90473291 29.39554049
3 2013-07-05 A 30.43456826 30.47929462 29.86344704
4 2013-07-08 A 30.52402098 30.73733132 30.35887750
PD CONCAT
adj_close adj_high adj_low adj_open date ticker
0 29.99418563 30.11804324 29.52627909 nan 2013-07-01 A
1 29.65013670 30.20061499 29.55380300 nan 2013-07-02 A
2 29.70518453 29.90473291 29.39554049 nan 2013-07-03 A
3 30.43456826 30.47929462 29.86344704 nan 2013-07-05 A
4 30.52402098 30.73733132 30.35887750 nan 2013-07-08 A
CLOSE:
ticker A AAL AAP AAPL ABBV \
date
2013-07-01 29.99418563 16.17609308 81.13821681 53.10917319 34.92447839
2013-07-02 29.65013670 15.81983388 80.72207258 54.31224742 35.42807578
2013-07-03 29.70518453 16.12794994 81.23729877 54.61204262 35.44486235
2013-07-05 30.43456826 16.21460758 81.82188233 54.17338125 35.85613355
2013-07-08 30.52402098 16.31089385 82.95141667 53.86579916 36.66188936
ticker ABC ABT ACN ADBE ADI \
date
2013-07-01 50.86319750 31.42538772 64.69409505 46.23500000 39.91336014
2013-07-02 50.69676639 31.27288084 64.71204071 46.03000000 39.86057632
2013-07-03 50.93716689 30.72565028 65.21451912 46.42000000 40.18607651
2013-07-05 51.37173702 31.32670680 66.07591068 47.00000000 40.65233352
2013-07-08 52.03746147 31.76628544 66.82065546 46.62500000 40.25645492
ticker ... XL XLNX XOM XRAY \
date ...
2013-07-01 ... 27.66879066 35.28892781 76.32080247 40.02387348
2013-07-02 ... 27.54228410 35.05903252 76.60816761 39.96552964
2013-07-03 ... 27.33445191 35.28008569 76.65042719 40.00442554
2013-07-05 ... 27.69589920 35.80177117 77.39419581 40.67537968
2013-07-08 ... 27.98505704 35.20050655 77.96892611 40.64620776
ticker XRX XYL YUM ZBH ZION \
date
2013-07-01 22.10666494 25.75338607 45.48038323 71.89882693 27.85858718
2013-07-02 22.08273998 25.61367511 45.40266113 72.93417195 28.03893238
2013-07-03 22.20236479 25.73475794 46.06329899 72.30145844 28.18131017
2013-07-05 22.58516418 26.06075017 46.41304845 73.16424628 29.39626730
2013-07-08 22.48946433 26.22840332 46.95062632 73.89282298 29.57661249
ticker ZTS
date
2013-07-01 29.44789315
2013-07-02 28.57244125
2013-07-03 28.16838652
2013-07-05 29.02459772
2013-07-08 29.76536472
[5 rows x 519 columns]
Loaded Data
View Data
To see what one of these 2-d matrices looks like, let's take a look at the closing prices matrix.
close.head(5)
ticker | A | AAL | AAP | AAPL | ABBV | ABC | ABT | ACN | ADBE | ADI | ... | XL | XLNX | XOM | XRAY | XRX | XYL | YUM | ZBH | ZION | ZTS |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
date | |||||||||||||||||||||
2013-07-01 | 29.99418563 | 16.17609308 | 81.13821681 | 53.10917319 | 34.92447839 | 50.86319750 | 31.42538772 | 64.69409505 | 46.23500000 | 39.91336014 | ... | 27.66879066 | 35.28892781 | 76.32080247 | 40.02387348 | 22.10666494 | 25.75338607 | 45.48038323 | 71.89882693 | 27.85858718 | 29.44789315 |
2013-07-02 | 29.65013670 | 15.81983388 | 80.72207258 | 54.31224742 | 35.42807578 | 50.69676639 | 31.27288084 | 64.71204071 | 46.03000000 | 39.86057632 | ... | 27.54228410 | 35.05903252 | 76.60816761 | 39.96552964 | 22.08273998 | 25.61367511 | 45.40266113 | 72.93417195 | 28.03893238 | 28.57244125 |
2013-07-03 | 29.70518453 | 16.12794994 | 81.23729877 | 54.61204262 | 35.44486235 | 50.93716689 | 30.72565028 | 65.21451912 | 46.42000000 | 40.18607651 | ... | 27.33445191 | 35.28008569 | 76.65042719 | 40.00442554 | 22.20236479 | 25.73475794 | 46.06329899 | 72.30145844 | 28.18131017 | 28.16838652 |
2013-07-05 | 30.43456826 | 16.21460758 | 81.82188233 | 54.17338125 | 35.85613355 | 51.37173702 | 31.32670680 | 66.07591068 | 47.00000000 | 40.65233352 | ... | 27.69589920 | 35.80177117 | 77.39419581 | 40.67537968 | 22.58516418 | 26.06075017 | 46.41304845 | 73.16424628 | 29.39626730 | 29.02459772 |
2013-07-08 | 30.52402098 | 16.31089385 | 82.95141667 | 53.86579916 | 36.66188936 | 52.03746147 | 31.76628544 | 66.82065546 | 46.62500000 | 40.25645492 | ... | 27.98505704 | 35.20050655 | 77.96892611 | 40.64620776 | 22.48946433 | 26.22840332 | 46.95062632 | 73.89282298 | 29.57661249 | 29.76536472 |
5 rows × 519 columns
Stock Example
Let's see what a single stock looks like from the closing prices. For this example and future display examples in this project, we'll use Apple's stock (AAPL). If we tried to graph all the stocks, it would be too much information.
apple_ticker = 'AAPL'
project_helper.plot_stock(close[apple_ticker], '{} Stock'.format(apple_ticker))
The Alpha Research Process
In this project you will code and evaluate a "breakout" signal. It is important to understand where these steps fit in the alpha research workflow. The signal-to-noise ratio in trading signals is very low and, as such, it is very easy to fall into the trap of overfitting to noise. It is therefore inadvisable to jump right into signal coding. To help mitigate overfitting, it is best to start with a general observation and hypothesis; i.e., you should be able to answer the following question before you touch any data:
What feature of markets or investor behaviour would lead to a persistent anomaly that my signal will try to use?
Ideally the assumptions behind the hypothesis will be testable before you actually code and evaluate the signal itself. The workflow therefore is as follows:
In this project, we assume that the first three steps area done ("observe & research", "form hypothesis", "validate hypothesis"). The hypothesis you'll be using for this project is the following:
- In the absence of news or significant investor trading interest, stocks oscillate in a range.
- Traders seek to capitalize on this range-bound behaviour periodically by selling/shorting at the top of the range and buying/covering at the bottom of the range. This behaviour reinforces the existence of the range.
- When stocks break out of the range, due to, e.g., a significant news release or from market pressure from a large investor:
- the liquidity traders who have been providing liquidity at the bounds of the range seek to cover their positions to mitigate losses, thus magnifying the move out of the range, and
- the move out of the range attracts other investor interest; these investors, due to the behavioural bias of herding (e.g., Herd Behavior) build positions which favor continuation of the trend.
Using this hypothesis, let start coding..
Compute the Highs and Lows in a Window
You'll use the price highs and lows as an indicator for the breakout strategy. In this section, implement get_high_lows_lookback
to get the maximum high price and minimum low price over a window of days. The variable lookback_days
contains the number of days to look in the past. Make sure this doesn't include the current day.
def get_high_lows_lookback(high, low, lookback_days):
"""
Get the highs and lows in a lookback window.
Parameters
----------
high : DataFrame
High price for each ticker and date
low : DataFrame
Low price for each ticker and date
lookback_days : int
The number of days to look back
Returns
-------
lookback_high : DataFrame
Lookback high price for each ticker and date
lookback_low : DataFrame
Lookback low price for each ticker and date
"""
#TODO: Implement function
lookback_high = high.rolling(window=lookback_days).max().shift()
lookback_low = low.rolling(window=lookback_days).min().shift()
# print(lookback_high)
return lookback_high,lookback_low
project_tests.test_get_high_lows_lookback(get_high_lows_lookback)
Tests Passed
View Data
Let's use your implementation of get_high_lows_lookback
to get the highs and lows for the past 50 days and compare it to it their respective stock. Just like last time, we'll use Apple's stock as the example to look at.
lookback_days = 50
lookback_high, lookback_low = get_high_lows_lookback(high, low, lookback_days)
project_helper.plot_high_low(
close[apple_ticker],
lookback_high[apple_ticker],
lookback_low[apple_ticker],
'High and Low of {} Stock'.format(apple_ticker))
Compute Long and Short Signals
Using the generated indicator of highs and lows, create long and short signals using a breakout strategy. Implement get_long_short
to generate the following signals:
Signal | Condition |
---|---|
-1 | Low > Close Price |
1 | High < Close Price |
0 | Otherwise |
In this chart, Close Price is the close
parameter. Low and High are the values generated from get_high_lows_lookback
, the lookback_high
and lookback_low
parameters.
def get_long_short(close, lookback_high, lookback_low):
"""
Generate the signals long, short, and do nothing.
Parameters
----------
close : DataFrame
Close price for each ticker and date
lookback_high : DataFrame
Lookback high price for each ticker and date
lookback_low : DataFrame
Lookback low price for each ticker and date
Returns
-------
long_short : DataFrame
The long, short, and do nothing signals for each ticker and date
"""
#TODO: Implement function
long_short = close.copy(deep=True)
long_short[:] = 0
long_short = long_short.astype(np.int64)
for i, row in close.iterrows():
for j, stock in close.iteritems():
close_price = close.loc[i].loc[j]
lookback_high_price = lookback_high.loc[i].loc[j]
lookback_low_price = lookback_low.loc[i].loc[j]
if close_price > lookback_high_price:
long_short.loc[i].loc[j] = 1
elif close_price < lookback_low_price:
long_short.loc[i].loc[j] = -1
else:
long_short.loc[i].loc[j] = 0
return long_short
project_tests.test_get_long_short(get_long_short)
Tests Passed
View Data
Let's compare the signals you generated against the close prices. This chart will show a lot of signals. Too many in fact. We'll talk about filtering the redundant signals in the next problem.
signal = get_long_short(close, lookback_high, lookback_low)
project_helper.plot_signal(
close[apple_ticker],
signal[apple_ticker],
'Long and Short of {} Stock'.format(apple_ticker))
Filter Signals
That was a lot of repeated signals! If we're already shorting a stock, having an additional signal to short a stock isn't helpful for this strategy. This also applies to additional long signals when the last signal was long.
Implement filter_signals
to filter out repeated long or short signals within the lookahead_days
. If the previous signal was the same, change the signal to 0
(do nothing signal). For example, say you have a single stock time series that is
[1, 0, 1, 0, 1, 0, -1, -1]
Running filter_signals
with a lookahead of 3 days should turn those signals into
[1, 0, 0, 0, 1, 0, -1, 0]
To help you implement the function, we have provided you with the clear_signals
function. This will remove all signals within a window after the last signal. For example, say you're using a windows size of 3 with clear_signals
. It would turn the Series of long signals
[0, 1, 0, 0, 1, 1, 0, 1, 0]
into
[0, 1, 0, 0, 0, 1, 0, 0, 0]
clear_signals
only takes a Series of the same type of signals, where 1
is the signal and 0
is no signal. It can't take a mix of long and short signals. Using this function, implement filter_signals
.
For implementing filter_signals
, we don't reccommend you try to find a vectorized solution. Instead, you should use the iterrows
over each column.
def clear_signals(signals, window_size):
"""
Clear out signals in a Series of just long or short signals.
Remove the number of signals down to 1 within the window size time period.
Parameters
----------
signals : Pandas Series
The long, short, or do nothing signals
window_size : int
The number of days to have a single signal
Returns
-------
signals : Pandas Series
Signals with the signals removed from the window size
"""
# Start with buffer of window size
# This handles the edge case of calculating past_signal in the beginning
clean_signals = [0]*window_size
for signal_i, current_signal in enumerate(signals):
# Check if there was a signal in the past window_size of days
has_past_signal = bool(sum(clean_signals[signal_i:signal_i+window_size]))
# Use the current signal if there's no past signal, else 0/False
clean_signals.append(not has_past_signal and current_signal)
# Remove buffer
clean_signals = clean_signals[window_size:]
# Return the signals as a Series of Ints
return pd.Series(np.array(clean_signals).astype(np.int), signals.index)
def filter_signals(signal, lookahead_days):
"""
Filter out signals in a DataFrame.
Parameters
----------
signal : DataFrame
The long, short, and do nothing signals for each ticker and date
lookahead_days : int
The number of days to look ahead
Returns
-------
filtered_signal : DataFrame
The filtered long, short, and do nothing signals for each ticker and date
"""
#TODO: Implement function
signal_high = signal.replace({-1:0})
signal_low = signal.replace({1:0})
# filtered trading signals
for i in signal.columns:
signal_high[i] = clear_signals(signal_high[i], lookahead_days)
signal_low[i] = clear_signals(signal_low[i], lookahead_days)
filtered_signal = signal_high + signal_low
# print(filtered_signal)
filtered_signal = filtered_signal.astype(np.int64())
return filtered_signal
project_tests.test_filter_signals(filter_signals)
Tests Passed
View Data
Let's view the same chart as before, but with the redundant signals removed.
signal_5 = filter_signals(signal, 5)
signal_10 = filter_signals(signal, 10)
signal_20 = filter_signals(signal, 20)
for signal_data, signal_days in [(signal_5, 5), (signal_10, 10), (signal_20, 20)]:
project_helper.plot_signal(
close[apple_ticker],
signal_data[apple_ticker],
'Long and Short of {} Stock with {} day signal window'.format(apple_ticker, signal_days))
Lookahead Close Prices
With the trading signal done, we can start working on evaluating how many days to short or long the stocks. In this problem, implement get_lookahead_prices
to get the close price days ahead in time. You can get the number of days from the variable lookahead_days
. We'll use the lookahead prices to calculate future returns in another problem.
def get_lookahead_prices(close, lookahead_days):
"""
Get the lookahead prices for `lookahead_days` number of days.
Parameters
----------
close : DataFrame
Close price for each ticker and date
lookahead_days : int
The number of days to look ahead
Returns
-------
lookahead_prices : DataFrame
The lookahead prices for each ticker and date
"""
#TODO: Implement function
return close.shift(-lookahead_days)
project_tests.test_get_lookahead_prices(get_lookahead_prices)
Tests Passed
View Data
Using the get_lookahead_prices
function, let's generate lookahead closing prices for 5, 10, and 20 days.
Let's also chart a subsection of a few months of the Apple stock instead of years. This will allow you to view the differences between the 5, 10, and 20 day lookaheads. Otherwise, they will mesh together when looking at a chart that is zoomed out.
lookahead_5 = get_lookahead_prices(close, 5)
lookahead_10 = get_lookahead_prices(close, 10)
lookahead_20 = get_lookahead_prices(close, 20)
project_helper.plot_lookahead_prices(
close[apple_ticker].iloc[150:250],
[
(lookahead_5[apple_ticker].iloc[150:250], 5),
(lookahead_10[apple_ticker].iloc[150:250], 10),
(lookahead_20[apple_ticker].iloc[150:250], 20)],
'5, 10, and 20 day Lookahead Prices for Slice of {} Stock'.format(apple_ticker))
Lookahead Price Returns
Implement get_return_lookahead
to generate the log price return between the closing price and the lookahead price.
def get_return_lookahead(close, lookahead_prices):
"""
Calculate the log returns from the lookahead days to the signal day.
Parameters
----------
close : DataFrame
Close price for each ticker and date
lookahead_prices : DataFrame
The lookahead prices for each ticker and date
Returns
-------
lookahead_returns : DataFrame
The lookahead log returns for each ticker and date
"""
#TODO: Implement function
return np.log(lookahead_prices) - np.log(close)
project_tests.test_get_return_lookahead(get_return_lookahead)
Tests Passed
View Data
Using the same lookahead prices and same subsection of the Apple stock from the previous problem, we'll view the lookahead returns.
In order to view price returns on the same chart as the stock, a second y-axis will be added. When viewing this chart, the axis for the price of the stock will be on the left side, like previous charts. The axis for price returns will be located on the right side.
price_return_5 = get_return_lookahead(close, lookahead_5)
price_return_10 = get_return_lookahead(close, lookahead_10)
price_return_20 = get_return_lookahead(close, lookahead_20)
project_helper.plot_price_returns(
close[apple_ticker].iloc[150:250],
[
(price_return_5[apple_ticker].iloc[150:250], 5),
(price_return_10[apple_ticker].iloc[150:250], 10),
(price_return_20[apple_ticker].iloc[150:250], 20)],
'5, 10, and 20 day Lookahead Returns for Slice {} Stock'.format(apple_ticker))
Compute the Signal Return
Using the price returns generate the signal returns.
def get_signal_return(signal, lookahead_returns):
"""
Compute the signal returns.
Parameters
----------
signal : DataFrame
The long, short, and do nothing signals for each ticker and date
lookahead_returns : DataFrame
The lookahead log returns for each ticker and date
Returns
-------
signal_return : DataFrame
Signal returns for each ticker and date
"""
#TODO: Implement function
return signal * lookahead_returns
project_tests.test_get_signal_return(get_signal_return)
Tests Passed
View Data
Let's continue using the previous lookahead prices to view the signal returns. Just like before, the axis for the signal returns is on the right side of the chart.
title_string = '{} day LookaheadSignal Returns for {} Stock'
signal_return_5 = get_signal_return(signal_5, price_return_5)
# print(signal_return_5)
signal_return_10 = get_signal_return(signal_10, price_return_10)
signal_return_20 = get_signal_return(signal_20, price_return_20)
project_helper.plot_signal_returns(
close[apple_ticker],
[
(signal_return_5[apple_ticker], signal_5[apple_ticker], 5),
(signal_return_10[apple_ticker], signal_10[apple_ticker], 10),
(signal_return_20[apple_ticker], signal_20[apple_ticker], 20)],
[title_string.format(5, apple_ticker), title_string.format(10, apple_ticker), title_string.format(20, apple_ticker)])
Test for Significance
Histogram
Let's plot a histogram of the signal return values.
project_helper.plot_signal_histograms(
[signal_return_5, signal_return_10, signal_return_20],
'Signal Return',
('5 Days', '10 Days', '20 Days'))
Question: What do the histograms tell you about the signal returns?
The first image is closest to normal distributions, and 10 days signal return and 20 days signal return is not looking close to normal distributions.
Outliers
You might have noticed the outliers in the 10 and 20 day histograms. To better visualize the outliers, let's compare the 5, 10, and 20 day signals returns to normal distributions with the same mean and deviation for each signal return distributions.
project_helper.plot_signal_to_normal_histograms(
[signal_return_5, signal_return_10, signal_return_20],
'Signal Return',
('5 Days', '10 Days', '20 Days'))
Kolmogorov-Smirnov Test
While you can see the outliers in the histogram, we need to find the stocks that are causing these outlying returns. We'll use the Kolmogorov-Smirnov Test or KS-Test. This test will be applied to teach ticker's signal returns where a long or short signal exits.
# Filter out returns that don't have a long or short signal.
long_short_signal_returns_5 = signal_return_5[signal_5 != 0].stack()
long_short_signal_returns_10 = signal_return_10[signal_10 != 0].stack()
long_short_signal_returns_20 = signal_return_20[signal_20 != 0].stack()
# Get just ticker and signal return
long_short_signal_returns_5 = long_short_signal_returns_5.reset_index().iloc[:, [1,2]]
long_short_signal_returns_5.columns = ['ticker', 'signal_return']
long_short_signal_returns_10 = long_short_signal_returns_10.reset_index().iloc[:, [1,2]]
long_short_signal_returns_10.columns = ['ticker', 'signal_return']
long_short_signal_returns_20 = long_short_signal_returns_20.reset_index().iloc[:, [1,2]]
long_short_signal_returns_20.columns = ['ticker', 'signal_return']
# View some of the data
long_short_signal_returns_5.head(10)
ticker | signal_return | |
---|---|---|
0 | A | 0.00732604 |
1 | ABC | 0.01639650 |
2 | ADP | 0.00981520 |
3 | AKAM | 0.04400495 |
4 | ALGN | 0.01545561 |
5 | ALTAIC | 0.01956370 |
6 | APC | 0.00305859 |
7 | BA | 0.08061297 |
8 | BCR | 0.00933418 |
9 | BIFLOR | 0.03372771 |
This gives you the data to use in the KS-Test.
Now it's time to implement the function calculate_kstest
to use Kolmogorov-Smirnov test (KS test) between a normal distribution and each stock's signal returns. Run KS test on a normal distribution against each stock's signal returns. Use scipy.stats.kstest
perform the KS test. When calculating the standard deviation of the signal returns, make sure to set the delta degrees of freedom to 0.
For this function, we don't reccommend you try to find a vectorized solution. Instead, you should iterate over the groupby
function.
from scipy.stats import kstest
def calculate_kstest(long_short_signal_returns):
"""
Calculate the KS-Test against the signal returns with a long or short signal.
Parameters
----------
long_short_signal_returns : DataFrame
The signal returns which have a signal.
This DataFrame contains two columns, "ticker" and "signal_return"
Returns
-------
ks_values : Pandas Series
KS static for all the tickers
p_values : Pandas Series
P value for all the tickers
"""
#TODO: Implement function
ks_values = []
p_values = []
tickers = []
# print paramter
# print(long_short_signal_returns.head(5))
mean = long_short_signal_returns['signal_return'].mean()
std = long_short_signal_returns['signal_return'].std(ddof=0)
returns_grouped = long_short_signal_returns.groupby('ticker')
# print(returns_grouped)
# view:scipy-stats-kstest
# @url:https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.kstest.html#scipy-stats-kstest
for name, group in returns_grouped:
p_i, p_value = kstest(group['signal_return'].values,cdf='norm',args=(mean,std))
# print(p_i,p_value)
ks_values.append(p_i)
p_values.append(p_value)
tickers.append(name)
ks_values = pd.Series(ks_values, index=tickers)
p_values = pd.Series(p_values, index=tickers)
return ks_values, p_values
project_tests.test_calculate_kstest(calculate_kstest)
Tests Passed
View Data
Using the signal returns we created above, let's calculate the ks and p values.
ks_values_5, p_values_5 = calculate_kstest(long_short_signal_returns_5)
ks_values_10, p_values_10 = calculate_kstest(long_short_signal_returns_10)
ks_values_20, p_values_20 = calculate_kstest(long_short_signal_returns_20)
print('ks_values_5')
print(ks_values_5.head(5))
print('p_values_5')
print(p_values_5.head(5))
ks_values_5
A 0.17230853
AAL 0.10732936
AAP 0.19714000
AAPL 0.15575372
ABBV 0.16833363
dtype: float64
p_values_5
A 0.18625858
AAL 0.72589551
AAP 0.04471855
AAPL 0.24650325
ABBV 0.24582663
dtype: float64
Find Outliers
With the ks and p values calculate, let's find which symbols are the outliers. Implement the find_outliers
function to find the following outliers:
- Symbols that pass the null hypothesis with a p-value less than
pvalue_threshold
. - Symbols that with a KS value above
ks_threshold
.
def find_outliers(ks_values, p_values, ks_threshold, pvalue_threshold=0.05):
"""
Find outlying symbols using KS values and P-values
Parameters
----------
ks_values : Pandas Series
KS static for all the tickers
p_values : Pandas Series
P value for all the tickers
ks_threshold : float
The threshold for the KS statistic
pvalue_threshold : float
The threshold for the p-value
Returns
-------
outliers : set of str
Symbols that are outliers
"""
#init outliers
outliers = set()
for ticker in p_values.index:
if p_values[ticker] < pvalue_threshold and ks_values[ticker] > ks_threshold:
outliers.add(ticker)
return outliers
project_tests.test_find_outliers(find_outliers)
Tests Passed
View Data
Using the find_outliers
function you implemented, let's see what we found.
ks_threshold = 0.8
outliers_5 = find_outliers(ks_values_5, p_values_5, ks_threshold)
outliers_10 = find_outliers(ks_values_10, p_values_10, ks_threshold)
outliers_20 = find_outliers(ks_values_20, p_values_20, ks_threshold)
outlier_tickers = outliers_5.union(outliers_10).union(outliers_20)
print('{} Outliers Found:\n{}'.format(len(outlier_tickers), ', '.join(list(outlier_tickers))))
24 Outliers Found:
GREIGI, VVEDEN, KAUFMA, URUMIE, BIFLOR, LINIFO, ORPHAN, SCHREN, PULCHE, TURKES, SPRENG, TARDA, ALTAIC, BAKERI, AGENEN, CLUSIA, DASYST, SAXATI, KOLPAK, PRAEST, GESNER, HUMILI, ARMENA, SYLVES
Show Significance without Outliers
Let's compare the 5, 10, and 20 day signals returns without outliers to normal distributions. Also, let's see how the P-Value has changed with the outliers removed.
good_tickers = list(set(close.columns) - outlier_tickers)
project_helper.plot_signal_to_normal_histograms(
[signal_return_5[good_tickers], signal_return_10[good_tickers], signal_return_20[good_tickers]],
'Signal Return Without Outliers',
('5 Days', '10 Days', '20 Days'))
That's more like it! The returns are closer to a normal distribution. You have finished the research phase of a Breakout Strategy. You can now submit your project.
为者常成,行者常至
自由转载-非商用-非衍生-保持署名(创意共享3.0许可证)