Financial strategies on the SP500

This documents is the correction of the project 4. Some steps are detailed in W1D5E4.

Data processing and feature engineering

Split train/test: this step is validated if you
No leakage: unfortunately there's no autamated way to check if the dataset is leaked. This step is valiated if the features of date d are built as follow:

Features: some of the features as the MACD are computed on a rolling fashion. This step is validated if you grouped the features by ticker before to compute the features.
Target: This step is validated if you grouped the prices by ticker before to compute the returns.

This step is validated if:
- the test set is not used
- the selected model is saved in the pkl file and described in a txt file

This step is validated if:
- the ml metrics computed on the train set are agregated: sum or median.
- the ml metrics are saved in a csv file.
- metric_train.png shows a plot similar to the one below
- the top 10 important feature per fold are in top_10_feature_importance.csv.

Note that, this can be done also on the test set IF this hasn't helped to select the pipeline.

This step is validated if:
- The pipeline shouldn't be trained once and predict on all data points !
- as explained: The signal has to be generated with the chosen cross validation: train the model on the train set of the first fold, then predict on its validation set; train the model on the train set of the second fold, then predict on its validation set, etc ... Then, concatenate the predictions on the validation sets to build the machine learning signal.

This step is validated if:
- the transformed machine learning signal (long only, long short, binary, ternary, stock picking, proportional to probability or custom ) is multiplied by the return between d+1 and d+2. As a reminder, the signal at date d predicts wether the return between d+1 and d+2 is increasing or deacreasing. Then, the PnL of date d could be associated with date d, d+1 or d+2. This is arbitrary and should impact the value of the PnL.
- invest the same amount of money every day. One exception: if you invest 1$ per day per stock the amount invested every day may change depending on the strategy chosen. If you take into account the different values of capital invested every day in the calculation of the PnL, the step is still validated.