Browse Source

chore(backtesting-sp500): fix samll grammar mistakes

pull/2316/head
nprimo 6 months ago committed by Niccolò Primo
parent
commit
7dc8770894
  1. 25
      subjects/ai/backtesting-sp500/README.md

25
subjects/ai/backtesting-sp500/README.md

@ -6,22 +6,27 @@ The goal of this project is to perform a Backtest on the SP500 constituents. The
## Data
The input file are:
The input files are:
- [`sp500.csv`](./data/sp500.csv) contains the SP500 data. The SP500 is a stock market index that measures the stock performance of 500 large companies listed on stock exchanges in the United States.
- [`stock_prices.csv`](./data/stock_prices.csv): contains the close prices for all the companies that had been in the SP500. It contains a lot of missing data. The adjusted close price may be unavailable for three main reasons:
- [`stock_prices.csv`](./data/stock_prices.csv): contains the close prices for
all the companies that had been in the SP500. It contains a lot of missing
data.
- The company doesn't exist at date d
- The company is not public, pas coté
The adjusted close price may be unavailable for three main reasons:
- The company doesn't exist at date `d`
- The company is not public
- Its close price hasn't been reported
- Note: The quality of this data set is not good: some prices are wrong, there are some prices spikes, there are some prices adjustments (share split, dividend distribution) - the prices adjustment are corrected in the adjusted close. But I'm not providing this data for this project to let you understand what is bad quality data and how important it is to detect outliers and missing values. The idea is not to correct the full data set manually, but to correct the main problems.
_Note: The quality of this data set is not good: some prices are wrong, there are some prices spikes, there are some prices adjustments (share split, dividend distribution) - the price adjustment is corrected in the adjusted close. This data is not provided for this project to let you understand what is bad quality data and how important it is to detect outliers and missing values. The idea is not to correct the full data set manually, but to correct the main problems._
_Note: The corrections will not fix the data, as a result the results may be abnormal compared to results from cleaned financial data. That's not a problem for this small project !_
## Problem
Once preprocessed this data, it will be used to generate a signal that is, for each asset at each date a metric that indicates if the asset price will increase the next month. At each date (once a month) we will take the 20 highest metrics and invest 1$ per company. This strategy is called **stock picking**. It consists in picking stock in an index and try to overperform the index. Finally we will compare the performance of our strategy compared to the benchmark: the SP500
Once preprocessed this data, it will be used to generate a signal that is, for each asset at each date a metric that indicates if the asset price will increase the next month. At each date (once a month) we will take the 20 highest metrics and invest $1 per company. This strategy is called **stock picking**. It consists in picking stock in an index and try to over perform the index. Finally, we will compare the performance of our strategy compared to the benchmark: the SP500
It is important to understand that the SP500 components change over time. The reason is simple: Facebook entered the SP500 in 2013 thus meaning that another company had to be removed from the 500 companies.
@ -75,20 +80,20 @@ There are four parts:
- One of average price for companies for all variables (save the plot with the images).
- Describe at least 5 outliers ('ticker', 'date', 'price'). Put them in `outliers.txt` file with the 3 fields on the folder `results`.
_Note: create functions that generate the plots and save them in the images folder. Add a parameter `plot` with a default value `False` which doesn't return the plot. This will be useful for the correction to let people run your code without overriding your plots._
_Note: create functions that generate the plots and save them in the `images` directory. Add a parameter `plot` with a default value `False` which doesn't return the plot. This will be useful for the correction to let people run your code without overriding your plots._
- Here is how the `prices` data should be preprocessed:
- Resample data on month and keep the last value
- Filter prices outliers: Remove prices outside of the range 0.1$, 10k$
- Filter prices outliers: Remove prices outside the range 0.1$, 10k$
- Compute monthly returns:
- Historical returns. **returns(current month) = price(current month) - price(previous month) / price(previous month)**
- Future returns. **returns(current month) = price(next month) - price(current month) / price(current month)**
- Replace returns outliers by the last value available regarding the company. This corrects prices spikes that corresponds to a monthly return greater than 1 and smaller than -0.5. This correction should not consider the 2008 and 2009 period as the financial crisis impacted the market brutally. **Don't forget that a value is considered as an outlier comparing to the other returns/prices of the same company**
- Replace returns outliers by the last value available regarding the company. This corrects prices spikes that correspond to a monthly return greater than 1 and smaller than -0.5. This correction should not consider the 2008 and 2009 period as the financial crisis impacted the market brutally. **Don't forget that a value is considered as an outlier comparing to the other returns/prices of the same company**
At this stage the DataFrame should looks like this:
At this stage the DataFrame should look like this:
| | Price | monthly_past_return | monthly_future_return |
| :--------------------------------------------------- | ------: | ------------------: | --------------------: |

Loading…
Cancel
Save