Browse Source

docs(ai-branch): add link to files for project 4

pull/68/head
eslopfer 2 years ago
parent
commit
5355b60e9e
  1. 52
      projects/project4/README.md

52
projects/project4/README.md

@ -2,12 +2,10 @@
In this project we will apply machine to finance. You are a Quant/Data Scientist and your goal is to create a financial strategy based on a signal outputted by a machine learning model that overperforms the [SP500](https://en.wikipedia.org/wiki/S%26P_500).
The Standard & Poors 500 Index is a collection of stocks intended to reflect the overall return characteristics of the stock market as a whole. The stocks that make up the S&P 500 are selected by market capitalization, liquidity, and industry. Companies to be included in the S&P are selected by the S&P 500 Index Committee, which consists of a group of analysts employed by Standard & Poor's.
The S&P 500 Index originally began in 1926 as the "composite index" comprised of only 90 stocks. According to historical records, the average annual return since its inception in 1926 through 2018 is approximately 10%–11%.The average annual return since adopting 500 stocks into the index in 1957 through 2018 is roughly 8%.
As a Quant Researcher, you may beat the SP500 one year or few years. The real challenge though is to beat the SP500 consistently over decades. That's what most hedge funds in the world are trying to do.
The project is divided in parts:
- **Data processing and feature engineering**: Build a dataset: insightful features and the target
@ -28,20 +26,16 @@ Note: Financial data can be complex and tricky to analyse for a lot of reasons.
**"No leakage" small guide:**
We assume it is day D and we want to take a position on the next h days on the next day. The position starts on day D+1 (included). To decide wether we take a short or long position the return between day D+1 and D+2 is computed and used as a target. Finally, as features on day contain information until day D 11:59pm, target need to be shifted. As a result, the final dataframe schema is:
| Index | Features |Target |
|----------|:-------------: |------:|
| Index | Features | Target |
| ------- | :------------------------: | ---------------: |
| Day D-1 | Features until D-1 23:59pm | return(D, D+1) |
| Day D | Features until D 23:59pm | return(D+1, D+2) |
| Day D+1 | Features until D+1 23:59pm | return(D+2, D+3) |
**Note: This table is simplified, the index of your DataFrame is a multi-index with date and ticker.**
- Features:
- Bollinger
- RSI
- MACD
**Note: you can use any library to compute these features, you don't need to implement all financial features from scratch.**
- Features: - Bollinger - RSI - MACD
**Note: you can use any library to compute these features, you don't need to implement all financial features from scratch.**
- Target:
- On day D, the target is: **sign(return(D+1, D+2))**
@ -61,11 +55,11 @@ We assume it is day D and we want to take a position on the next h days on the n
![alt text][blocking]
[blocking]: blocking_time_series_split.png 'Blocking Time Series split'
[blocking]: blocking_time_series_split.png "Blocking Time Series split"
![alt text][timeseries]
[timeseries]: Time_series_split.png 'Time Series split'
[timeseries]: Time_series_split.png "Time Series split"
Once you'll have run the gridsearch on the cross validation (choose either Blocking or Time Series split), you'll select the best pipeline on the train set and save it as `selected_model.pkl` and `selected_model.txt` (pipeline hyper-parameters).
@ -78,7 +72,7 @@ Once you'll have run the gridsearch on the cross validation (choose either Block
![alt text][barplot]
[barplot]: metric_plot.png 'Metric plot'
[barplot]: metric_plot.png "Metric plot"
- The signal has to be generated with the chosen cross validation: train the model on the train set of the first fold, then predict on its validation set; train the model on the train set of the second fold, then predict on its validation set, etc ... Then, concatenate the predictions on the validation sets to build the machine learning signal. **The pipeline shouldn't be trained once and predict on all data points !**
@ -86,7 +80,6 @@ Once you'll have run the gridsearch on the cross validation (choose either Block
- (optional): [Train a RNN/LSTM](https://towardsdatascience.com/predicting-stock-price-with-lstm-13af86a74944). This a nice way to discover and learn about recurrent neural networks. But keep in mind that there are some new neural network architectures that seem to outperform recurrent neural networks: https://towardsdatascience.com/the-fall-of-rnn-lstm-2d1594c74ce0.
## Strategy backtesting
- Backtesting module deliverables. The module takes as input a machine learning signal, convert it into a financial strategy. A financial strategy DataFrame gives the amount invested at time t on asset i. The module returns the following metrics on the train set and the test set.
@ -115,6 +108,7 @@ Once you'll have run the gridsearch on the cross validation (choose either Block
- strategy metrics on the train set and test set
### Example of strategies:
- Long only:
- Binary signal:
0: do nothing for one day on asset i
@ -122,6 +116,7 @@ Once you'll have run the gridsearch on the cross validation (choose either Block
- Weights proportional to the machine learning signals
- invest x on asset i for on day
- Long and short: For those who search long short strategy on Google, don't get wrong, this has nothing to do with pair trading.
- Binary signal:
- -1: take a short position on asset i for 1 day
- 1: take a long position on asset i for 1 day
@ -142,27 +137,26 @@ Here's an example on how to convert a machine learning signal into a financial s
- Input:
| Date | Ticker|Machine Learning signal |
|--------|:----: |-----------:|
| Day D-1| AAPL | 0.55 |
| Day D-1| C | 0.36 |
| Date | Ticker | Machine Learning signal |
| ------- | :----: | ----------------------: |
| Day D-1 | AAPL | 0.55 |
| Day D-1 | C | 0.36 |
| Day D | AAPL | 0.59 |
| Day D | C | 0.33 |
| Day D+1| AAPL | 0.61 |
| Day D+1| C | 0.33 |
| Day D+1 | AAPL | 0.61 |
| Day D+1 | C | 0.33 |
- Convert it into a binary long only strategy:
- Machine learning signal > 0.5
| Date | Ticker|Binary signal |
|--------|:----: |-----------:|
| Day D-1| AAPL | 1 |
| Day D-1| C | 0 |
| Date | Ticker | Binary signal |
| ------- | :----: | ------------: |
| Day D-1 | AAPL | 1 |
| Day D-1 | C | 0 |
| Day D | AAPL | 1 |
| Day D | C | 0 |
| Day D+1| AAPL | 1 |
| Day D+1| C | 0 |
| Day D+1 | AAPL | 1 |
| Day D+1 | C | 0 |
!!! BE CAREFUL !!!THIS IS EXTREMELY IMPORTANT.
@ -172,7 +166,6 @@ Here's an example on how to convert a machine learning signal into a financial s
**Assumption**: you have 1$ per day to invest in your strategy.
## Project repository structure:
```
@ -209,6 +202,9 @@ project
│ │ strategy.py
```
Note: `features_engineering.py` can be used in `gridsearch.py`
### Files for this project
You can find the data required for this project in this [link]:(https://assets.01-edu.org/ai-branch/project4/project04-20221031T173034Z-001.zip)

Loading…
Cancel
Save