mirror of https://github.com/01-edu/Branch-AI.git
Badr Ghazlane
3 years ago
17 changed files with 32354 additions and 0 deletions
File diff suppressed because it is too large
diff.load
@ -0,0 +1,72 @@
|
||||
Citation Request: |
||||
This dataset is public available for research. The details are described in [Cortez et al., 2009]. |
||||
Please include this citation if you plan to use this database: |
||||
|
||||
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. |
||||
Modeling wine preferences by data mining from physicochemical properties. |
||||
In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236. |
||||
|
||||
Available at: [@Elsevier] http://dx.doi.org/10.1016/j.dss.2009.05.016 |
||||
[Pre-press (pdf)] http://www3.dsi.uminho.pt/pcortez/winequality09.pdf |
||||
[bib] http://www3.dsi.uminho.pt/pcortez/dss09.bib |
||||
|
||||
1. Title: Wine Quality |
||||
|
||||
2. Sources |
||||
Created by: Paulo Cortez (Univ. Minho), Antonio Cerdeira, Fernando Almeida, Telmo Matos and Jose Reis (CVRVV) @ 2009 |
||||
|
||||
3. Past Usage: |
||||
|
||||
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. |
||||
Modeling wine preferences by data mining from physicochemical properties. |
||||
In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236. |
||||
|
||||
In the above reference, two datasets were created, using red and white wine samples. |
||||
The inputs include objective tests (e.g. PH values) and the output is based on sensory data |
||||
(median of at least 3 evaluations made by wine experts). Each expert graded the wine quality |
||||
between 0 (very bad) and 10 (very excellent). Several data mining methods were applied to model |
||||
these datasets under a regression approach. The support vector machine model achieved the |
||||
best results. Several metrics were computed: MAD, confusion matrix for a fixed error tolerance (T), |
||||
etc. Also, we plot the relative importances of the input variables (as measured by a sensitivity |
||||
analysis procedure). |
||||
|
||||
4. Relevant Information: |
||||
|
||||
The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. |
||||
For more details, consult: http://www.vinhoverde.pt/en/ or the reference [Cortez et al., 2009]. |
||||
Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables |
||||
are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.). |
||||
|
||||
These datasets can be viewed as classification or regression tasks. |
||||
The classes are ordered and not balanced (e.g. there are munch more normal wines than |
||||
excellent or poor ones). Outlier detection algorithms could be used to detect the few excellent |
||||
or poor wines. Also, we are not sure if all input variables are relevant. So |
||||
it could be interesting to test feature selection methods. |
||||
|
||||
5. Number of Instances: red wine - 1599; white wine - 4898. |
||||
|
||||
6. Number of Attributes: 11 + output attribute |
||||
|
||||
Note: several of the attributes may be correlated, thus it makes sense to apply some sort of |
||||
feature selection. |
||||
|
||||
7. Attribute information: |
||||
|
||||
For more information, read [Cortez et al., 2009]. |
||||
|
||||
Input variables (based on physicochemical tests): |
||||
1 - fixed acidity |
||||
2 - volatile acidity |
||||
3 - citric acid |
||||
4 - residual sugar |
||||
5 - chlorides |
||||
6 - free sulfur dioxide |
||||
7 - total sulfur dioxide |
||||
8 - density |
||||
9 - pH |
||||
10 - sulphates |
||||
11 - alcohol |
||||
Output variable (based on sensory data): |
||||
12 - quality (score between 0 and 10) |
||||
|
||||
8. Missing Attribute Values: None |
@ -0,0 +1,10 @@
|
||||
nan -9.480000000000000426e+00 1.415000000000000036e+01 1.126999999999999957e+01 -5.650000000000000355e+00 3.330000000000000071e+00 1.094999999999999929e+01 -2.149999999999999911e+00 5.339999999999999858e+00 -2.830000000000000071e+00 |
||||
9.480000000000000426e+00 nan 4.860000000000000320e+00 -8.609999999999999432e+00 7.820000000000000284e+00 -1.128999999999999915e+01 1.324000000000000021e+01 4.919999999999999929e+00 2.859999999999999876e+00 9.039999999999999147e+00 |
||||
-1.415000000000000036e+01 -1.126999999999999957e+01 nan 1.227999999999999936e+01 -2.410000000000000142e+00 6.040000000000000036e+00 -5.160000000000000142e+00 -3.870000000000000107e+00 -1.281000000000000050e+01 1.790000000000000036e+00 |
||||
5.650000000000000355e+00 -3.330000000000000071e+00 -1.094999999999999929e+01 nan -1.364000000000000057e+01 0.000000000000000000e+00 2.240000000000000213e+00 -3.609999999999999876e+00 -7.730000000000000426e+00 8.000000000000000167e-02 |
||||
2.149999999999999911e+00 -5.339999999999999858e+00 2.830000000000000071e+00 -4.860000000000000320e+00 nan -8.800000000000000044e-01 -8.570000000000000284e+00 2.560000000000000053e+00 -7.030000000000000249e+00 -6.330000000000000071e+00 |
||||
8.609999999999999432e+00 -7.820000000000000284e+00 1.128999999999999915e+01 -1.324000000000000021e+01 -4.919999999999999929e+00 nan -1.296000000000000085e+01 -1.282000000000000028e+01 -1.403999999999999915e+01 1.456000000000000050e+01 |
||||
-2.859999999999999876e+00 -9.039999999999999147e+00 -1.227999999999999936e+01 2.410000000000000142e+00 -6.040000000000000036e+00 5.160000000000000142e+00 nan -1.091000000000000014e+01 -1.443999999999999950e+01 -1.372000000000000064e+01 |
||||
3.870000000000000107e+00 1.281000000000000050e+01 -1.790000000000000036e+00 1.364000000000000057e+01 -0.000000000000000000e+00 -2.240000000000000213e+00 3.609999999999999876e+00 nan 1.053999999999999915e+01 -1.417999999999999972e+01 |
||||
7.730000000000000426e+00 -8.000000000000000167e-02 8.800000000000000044e-01 8.570000000000000284e+00 -2.560000000000000053e+00 7.030000000000000249e+00 6.330000000000000071e+00 1.296000000000000085e+01 nan -1.169999999999999929e+01 |
||||
1.282000000000000028e+01 1.403999999999999915e+01 -1.456000000000000050e+01 1.091000000000000014e+01 1.443999999999999950e+01 1.372000000000000064e+01 -1.053999999999999915e+01 1.417999999999999972e+01 1.169999999999999929e+01 nan |
@ -0,0 +1 @@
|
||||
Empty file. The original is too big to be pushed on Github. |
File diff suppressed because it is too large
diff.load
|
@ -0,0 +1,152 @@
|
||||
sepal_length,sepal_width,petal_length,petal_width, flower |
||||
5.1,3.5,1.4,0.2,Iris-setosa |
||||
4.9,3.0,1.4,0.2,Iris-setosa |
||||
4.7,3.2,1.3,0.2,Iris-setosa |
||||
4.6,3.1,1.5,0.2,Iris-setosa |
||||
5.0,-3.6,-1.4,0.2,Iris-setosa |
||||
5.4,3.9,1.7,0.4,Iris-setosa |
||||
4.6,3.4,1.4,0.3,Iris-setosa |
||||
5.0,3.4,1.5,0.2,Iris-setosa |
||||
-4.4,2.9,1400,0.2,Iris-setosa |
||||
4.9,3.1,1.5,0.1,Iris-setosa |
||||
5.4,3.7,1.5,0.2,Iris-setosa |
||||
4.8,3.4,1.6,0.2,Iris-setosa |
||||
4.8,3.0,1.4,0.1,Iris-setosa |
||||
4.3,3.0,1.1,0.1,Iris-setosa |
||||
5.8,4.0,1.2,0.2,Iris-setosa |
||||
5.7,4.4,1500,0.4,Iris-setosa |
||||
5.4,3.9,1.3,0.4,Iris-setosa |
||||
5.1,3.5,1.4,0.3,Iris-setosa |
||||
5.7,3.8,1.7,0.3,Iris-setosa |
||||
5.1,3.8,1.5,0.3,Iris-setosa |
||||
5.4,3.4,-1.7,0.2,Iris-setosa |
||||
5.1,3.7,1.5,0.4,Iris-setosa |
||||
4.6,3.6,1.0,0.2,Iris-setosa |
||||
5.1,3.3,1.7,0.5,Iris-setosa |
||||
4.8,3.4,1.9,0.2,Iris-setosa |
||||
5.0,-3.0,1.6,0.2,Iris-setosa |
||||
5.0,3.4,1.6,0.4,Iris-setosa |
||||
5.2,3.5,1.5,0.2,Iris-setosa |
||||
5.2,3.4,1.4,0.2,Iris-setosa |
||||
4.7,3.2,1.6,0.2,Iris-setosa |
||||
4.8,3.1,1.6,0.2,Iris-setosa |
||||
5.4,3.4,1.5,0.4,Iris-setosa |
||||
5.2,4.1,1.5,0.1,Iris-setosa |
||||
5.5,4.2,1.4,0.2,Iris-setosa |
||||
4.9,3.1,1.5,0.1,Iris-setosa |
||||
5.0,3.2,1.2,0.2,Iris-setosa |
||||
5.5,3.5,1.3,0.2,Iris-setosa |
||||
4.9,,1.5,0.1,Iris-setosa |
||||
4.4,3.0,1.3,0.2,Iris-setosa |
||||
5.1,3.4,1.5,0.2,Iris-setosa |
||||
5.0,"3.5",1.3,0.3,Iris-setosa |
||||
4.5,2.3,1.3,0.3,Iris-setosa |
||||
4.4,3.2,1.3,0.2,Iris-setosa |
||||
5.0,3.5,1.6,0.6,Iris-setosa |
||||
5.1,3.8,1.9,0.4,Iris-setosa |
||||
4.8,3.0,1.4,0.3,Iris-setosa |
||||
5.1,3809,1.6,0.2,Iris-setosa |
||||
4.6,3.2,1.4,0.2,Iris-setosa |
||||
5.3,3.7,1.5,0.2,Iris-setosa |
||||
5.0,3.3,1.4,0.2,Iris-setosa |
||||
7.0,3.2,4.7,1.4,Iris-versicolor |
||||
6.4,3200,4.5,1.5,Iris-versicolor |
||||
6.9,3.1,4.9,1.5,Iris-versicolor |
||||
5.5,2.3,4.0,1.3,Iris-versicolor |
||||
6.5,2.8,4.6,1.5,Iris-versicolor |
||||
5.7,2.8,4.5,1.3,Iris-versicolor |
||||
6.3,3.3,4.7,1600,Iris-versicolor |
||||
4.9,2.4,3.3,1.0,Iris-versicolor |
||||
6.6,2.9,4.6,1.3,Iris-versicolor |
||||
5.2,2.7,3.9,,Iris-versicolor |
||||
5.0,2.0,3.5,1.0,Iris-versicolor |
||||
5.9,3.0,4.2,1.5,Iris-versicolor |
||||
6.0,2.2,4.0,1.0,Iris-versicolor |
||||
6.1,2.9,4.7,1.4,Iris-versicolor |
||||
5.6,2.9,3.6,1.3,Iris-versicolor |
||||
6.7,3.1,4.4,1.4,Iris-versicolor |
||||
5.6,3.0,4.5,1.5,Iris-versicolor |
||||
5.8,2.7,4.1,1.0,Iris-versicolor |
||||
6.2,2.2,4.5,1.5,Iris-versicolor |
||||
5.6,2.5,3.9,1.1,Iris-versicolor |
||||
5.9,3.2,4.8,1.8,Iris-versicolor |
||||
6.1,2.8,4.0,1.3,Iris-versicolor |
||||
6.3,2.5,4.9,1.5,Iris-versicolor |
||||
6.1,2.8,4.7,1.2,Iris-versicolor |
||||
6.4,2.9,4.3,1.3,Iris-versicolor |
||||
6.6,3.0,4.4,1.4,Iris-versicolor |
||||
6.8,2.8,4.8,1.4,Iris-versicolor |
||||
6.7,3.0,5.0,1.7,Iris-versicolor |
||||
6.0,2.9,4.5,1.5,Iris-versicolor |
||||
5.7,2.6,3.5,1.0,Iris-versicolor |
||||
5.5,2.4,3.8,1.1,Iris-versicolor |
||||
5.5,2.4,3.7,1.0,Iris-versicolor |
||||
5.8,2.7,3.9,1.2,Iris-versicolor |
||||
6.0,2.7,5.1,1.6,Iris-versicolor |
||||
5.4,3.0,4.5,1.5,Iris-versicolor |
||||
6.0,3.4,4.5,1.6,Iris-versicolor |
||||
6.7,3.1,4.7,1.5,Iris-versicolor |
||||
6.3,2.3,4.4,1.3,Iris-versicolor |
||||
5.6,3.0,4.1,1.3,Iris-versicolor |
||||
5.5,2.5,4.0,1.3,Iris-versicolor |
||||
5.5,2.6,4.4,1.2,Iris-versicolor |
||||
6.1,3.0,4.6,1.4,Iris-versicolor |
||||
5.8,2.6,4.0,1.2,Iris-versicolor |
||||
5.0,2.3,3.3,1.0,Iris-versicolor |
||||
5.6,2.7,4.2,1.3,Iris-versicolor |
||||
5.7,3.0,4.2,1.2,Iris-versicolor |
||||
5.7,2.9,4.2,1.3,Iris-versicolor |
||||
6.2,2.9,4.3,1.3,Iris-versicolor |
||||
5.1,2.5,3.0,1.1,Iris-versicolor |
||||
5.7,2.8,4.1,1.3,Iris-versicolor |
||||
6.3,3.3,6.0,2.5,Iris-virginica |
||||
5.8,2.7,5.1,1.9,Iris-virginica |
||||
7.1,3.0,5.9,2.1,Iris-virginica |
||||
6.3,2.9,5.6,1.8,Iris-virginica |
||||
6.5,3.0,5.8,2.2,Iris-virginica |
||||
7.6,3.0,6.6,2.1,Iris-virginica |
||||
4.9,2.5,4.5,1.7,Iris-virginica |
||||
7.3,2.9,6.3,1.8,Iris-virginica |
||||
6.7,2.5,5.8,1.8,Iris-virginica |
||||
7.2,3.6,6.1,2.5,Iris-virginica |
||||
6.5,3.2,5.1,2.0,Iris-virginica |
||||
6.4,2.7,5.3,1.9,Iris-virginica |
||||
6.8,3.0,5.5,2.1,Iris-virginica |
||||
5.7,2.5,5.0,2.0,Iris-virginica |
||||
5.8,2.8,5.1,2.4,Iris-virginica |
||||
6.4,3.2,5.3,2.3,Iris-virginica |
||||
6.5,3.0,5.5,1.8,Iris-virginica |
||||
7.7,3.8,6.7,2.2,Iris-virginica |
||||
7.7,2.6,6.9,2.3,Iris-virginica |
||||
6.0,2.2,5.0,1.5,Iris-virginica |
||||
6.9,3.2,5.7,2.3,Iris-virginica |
||||
5.6,2.8,4.9,2.0,Iris-virginica |
||||
7.7,2.8,6.7,2.0,Iris-virginica |
||||
6.3,2.7,4.9,1.8,Iris-virginica |
||||
6.7,3.3,5.7,2.1,Iris-virginica |
||||
7.2,3.2,6.0,1.8,Iris-virginica |
||||
6.2,2.8,-4.8,1.8,Iris-virginica |
||||
6.1,3.0,4.9,1.8,Iris-virginica |
||||
6.4,2.8,5.6,2.1,Iris-virginica |
||||
7.2,3.0,5.8,1.6,Iris-virginica |
||||
7.4,2.8,6.1,1.9,Iris-virginica |
||||
7.9,3.8,6.4,2.0,Iris-virginica |
||||
6.-4,2.8,5.6,2.2,Iris-virginica |
||||
6.3,2.8,"5.1",1.5,Iris-virginica |
||||
6.1,2.6,5.6,1.4,Iris-virginica |
||||
7.7,3.0,6.1,2.3,Iris-virginica |
||||
6.3,3.4,5.6,2.4,Iris-virginica |
||||
6.4,3.1,5.5,1.8,Iris-virginica |
||||
6.0,3.0,4.8,1.8,Iris-virginica |
||||
6900,3.1,5.4,2.1,Iris-virginica |
||||
6.7,3.1,5.6,2.4,Iris-virginica |
||||
6.9,3.1,5.1,2.3,Iris-virginica |
||||
580,2.7,5.1,1.9,Iris-virginica |
||||
6.8,3.2,5.9,2.3,Iris-virginica |
||||
6.7,3.3,5.7,-2.5,Iris-virginica |
||||
6.7,3.0,5.2,2.3,Iris-virginica |
||||
6.3,2.5,5.0,1.9,Iris-virginica |
||||
6.5,3.0,5.2,2.0,Iris-virginica |
||||
6.2,3.4,5.4,2.3,Iris-virginica |
||||
5.9,3.0,5.1,1.8,Iris-virginica |
||||
|
@ -0,0 +1,37 @@
|
||||
1. This question is validated if the output of is |
||||
|
||||
```console |
||||
2010-01-01 0 |
||||
2010-01-02 1 |
||||
2010-01-03 2 |
||||
2010-01-04 3 |
||||
2010-01-05 4 |
||||
... |
||||
2020-12-27 4013 |
||||
2020-12-28 4014 |
||||
2020-12-29 4015 |
||||
2020-12-30 4016 |
||||
2020-12-31 4017 |
||||
Freq: D, Name: integer_series, Length: 4018, dtype: int64 |
||||
``` |
||||
|
||||
The best solution uses `pd.date_range` to generate the index and `range` to generate the integer series. |
||||
|
||||
2. This question is validated if the output is: |
||||
|
||||
```console |
||||
2010-01-01 NaN |
||||
2010-01-02 NaN |
||||
2010-01-03 NaN |
||||
2010-01-04 NaN |
||||
2010-01-05 NaN |
||||
... |
||||
2020-12-27 4010.0 |
||||
2020-12-28 4011.0 |
||||
2020-12-29 4012.0 |
||||
2020-12-30 4013.0 |
||||
2020-12-31 4014.0 |
||||
Freq: D, Name: integer_series, Length: 4018, dtype: float64 |
||||
``` |
||||
|
||||
If the `NaN` values have been dropped the solution is also accepted. The solution uses `rolling().mean()`. |
@ -0,0 +1,7 @@
|
||||
# Exercise 1 |
||||
|
||||
The goal of this exercise is to learn to manipulate time series in Pandas. |
||||
|
||||
1. Create a `Series` named `integer_series` from 1st January 2010 to 31 December 2020. At each date is associated the number of days since 1st January 2010. It starts with 0. |
||||
|
||||
2. Using Pandas, compute a 7 days moving average. This transformation smooths the time series by removing small fluctuations. **without for loop** |
@ -0,0 +1,50 @@
|
||||
Preliminary: |
||||
|
||||
- As usual the first steps are: |
||||
|
||||
- Check missing values and data types |
||||
- Convert string dates to datetime |
||||
- Set dates as index |
||||
- Use `info` or `describe` to have a first look at the data |
||||
|
||||
The exercise is not validated if these steps have not been done. |
||||
|
||||
1. The Candlestick is based on Open, High, Low and Close columns. The index is Date (datetime). As long as you inserted the right columns in `Candlestick` `Plotly` object you validate the question. |
||||
|
||||
2. This question is validated if the output of `print(transformed_df.head().to_markdown())` is |
||||
|
||||
| Date | Open | Close | Volume | High | Low | |
||||
|:--------------------|---------:|---------:|------------:|---------:|---------:| |
||||
| 1980-12-31 00:00:00 | 0.136075 | 0.135903 | 1.34485e+09 | 0.161272 | 0.112723 | |
||||
| 1981-01-30 00:00:00 | 0.141768 | 0.141316 | 6.08989e+08 | 0.155134 | 0.126116 | |
||||
| 1981-02-27 00:00:00 | 0.118215 | 0.117892 | 3.21619e+08 | 0.128906 | 0.106027 | |
||||
| 1981-03-31 00:00:00 | 0.111328 | 0.110871 | 7.00717e+08 | 0.120536 | 0.09654 | |
||||
| 1981-04-30 00:00:00 | 0.121811 | 0.121545 | 5.36928e+08 | 0.131138 | 0.108259 | |
||||
|
||||
To get this result there are two ways: `resample` and `groupby`. There are two key steps: |
||||
|
||||
- Find how to affect the aggregation on the last **business** day of each month. This is already implemented in Pandas and the keyword that should be used either in `resample` parameter or in `Grouper` is `BM`. |
||||
- Choose the right aggregation function for each variable. The prices (Open, Close and Adjusted Close) should be aggregated by taking the `mean`. Low should be aggregated by taking the `minimum` because it represents the lower price of the day, so the lowest price on the month is the lowest price of the lowest prices on the day. The same logic applied to High, leads to use the `maximum` to aggregate the High. Volume should be aggregated using the `sum` because the monthly volume is equal to the sum of daily volume over the month. |
||||
|
||||
There are **482 months**. |
||||
|
||||
3. The solution is accepted if it doesn't involve a for loop and the output is: |
||||
|
||||
```console |
||||
Date |
||||
1980-12-12 NaN |
||||
1980-12-15 -0.047823 |
||||
1980-12-16 -0.073063 |
||||
1980-12-17 0.019703 |
||||
1980-12-18 0.028992 |
||||
... |
||||
2021-01-25 0.049824 |
||||
2021-01-26 0.003704 |
||||
2021-01-27 -0.001184 |
||||
2021-01-28 -0.027261 |
||||
2021-01-29 -0.026448 |
||||
Name: Open, Length: 10118, dtype: float64 |
||||
``` |
||||
|
||||
- The first way is to compute the return without for loop is to use `pct_change` |
||||
- The second way to compute the return without for loop is to implement the formula given in the exercise in a vectorized way. To get the value at `t-1` you can use `shift` |
File diff suppressed because it is too large
diff.load
@ -0,0 +1,7 @@
|
||||
1. This question is validated if, without having used a for loop, the outputted DataFrame shape's `(261, 5)` and your output is the same as the one return with this line of code: |
||||
|
||||
```python |
||||
market_data.loc[market_data.index.get_level_values('Ticker')=='AAPL'].sort_index().pct_change() |
||||
``` |
||||
|
||||
The DataFrame contains random data. Make sure your output and the one returned by this code is based on the same DataFrame. |
@ -0,0 +1,69 @@
|
||||
Preliminary: |
||||
|
||||
- As usual the first steps are: |
||||
|
||||
- Check missing values and data types |
||||
- Convert string dates to datetime |
||||
- Set dates as index |
||||
- Use `info` or `describe` to have a first look at the data |
||||
|
||||
The exercise is not validated if these steps haven't been done. |
||||
|
||||
My results can be reproduced using: `np.random.seed = 2712`. Given the versions of NumPy used I do not guaranty the reproducibility of the results - that is why I also explain the steps to get to the solution. |
||||
|
||||
1. This question is validated if the return is computed as: Return(t) = (Price(t+1) - Price(t))/Price(t) and returns this output. |
||||
|
||||
```console |
||||
Date |
||||
1980-12-12 -0.052170 |
||||
1980-12-15 -0.073403 |
||||
1980-12-16 0.024750 |
||||
1980-12-17 0.029000 |
||||
1980-12-18 0.061024 |
||||
... |
||||
2021-01-25 0.001679 |
||||
2021-01-26 -0.007684 |
||||
2021-01-27 -0.034985 |
||||
2021-01-28 -0.037421 |
||||
2021-01-29 NaN |
||||
Name: Daily_futur_returns, Length: 10118, dtype: float64 |
||||
``` |
||||
|
||||
The answer is also accepted if the returns is computed as in the exercise 2 and then shifted in the futur using `shift`, but I do not recommend this implementation as it adds missing values ! |
||||
|
||||
An example of solution is: |
||||
|
||||
```python |
||||
def compute_futur_return(price): |
||||
return (price.shift(-1) - price)/price |
||||
|
||||
compute_futur_return(df['Adj Close']) |
||||
``` |
||||
|
||||
Note that if the index is not ordered in ascending order the futur return computed is wrong. |
||||
|
||||
2. This question is validated if the index of the Series is the same as the index of the DataFrame. The data of the series can be generated using `np.random.randint(0,2,len(df.index)`. |
||||
|
||||
3. This question is validated if the Pnl is computed as: signal * futur_return. Both series should have the same index. |
||||
|
||||
```console |
||||
Date |
||||
1980-12-12 -0.052170 |
||||
1980-12-15 -0.073403 |
||||
1980-12-16 0.024750 |
||||
1980-12-17 0.029000 |
||||
1980-12-18 0.061024 |
||||
... |
||||
2021-01-25 0.001679 |
||||
2021-01-26 -0.007684 |
||||
2021-01-27 -0.034985 |
||||
2021-01-28 -0.037421 |
||||
2021-01-29 NaN |
||||
Name: PnL, Length: 10119, dtype: float64 |
||||
``` |
||||
|
||||
4. The question is validated if you computed the return of the strategy as: `(Total earned - Total invested) / Total` invested. The result should be close to 0. The formula given could be simplified as `(PnLs.sum())/signal.sum()`. |
||||
|
||||
My return is: 0.00043546984088551553 because I invested 5147$ and I earned 5149$. |
||||
|
||||
5. The question is validated if you replaced the previous signal Series with 1s. Similarly as the previous question, we earned 10128$ and we invested 10118$ which leads to a return of 0.00112670194140969 (0.1%). |
@ -0,0 +1,44 @@
|
||||
# Exercise 4 Backtest |
||||
|
||||
The goal of this exercise is to learn to perform a backtest in Pandas. A backtest is a tool that allows you to know how a strategy would have performed retrospectively using historical data. In this exercise we will focus on the backtesting tool and not on how to build the best strategy. |
||||
|
||||
We will backtest a **long only** strategy on Apple Inc. Long only means that we only consider buying the stock. The input signal at date d says if the close price will increase at d+1. We assume that the input signal is available before the market closes. |
||||
|
||||
1. Drop the rows with missing values and compute the daily futur return on the Apple stock on the adjusted close price. The daily futur return means: **Return(t) = (Price(t+1) - Price(t))/Price(t)**. |
||||
There are some events as splits or dividents that artificially change the price of the stock. That is why the close price is adjusted to avoid to have outliers in the price data. |
||||
|
||||
2. Create a Series that contains a random boolean array with **p=0.5** |
||||
|
||||
```console |
||||
Here an example of the expected time series |
||||
2010-01-01 1 |
||||
2010-01-02 0 |
||||
2010-01-03 0 |
||||
2010-01-04 1 |
||||
2010-01-05 0 |
||||
Freq: D, Name: long_only_signal, dtype: int64 |
||||
``` |
||||
|
||||
- The information is this series should be interpreted this way: |
||||
- On the 2010-01-01 I receive `1` before the market closes meaning that, if I trust the signal, the close price of day d+1 will increase. I should buy the stock before the market closes. |
||||
- On the 2010-01-02 I receive `0` before the market closes meaning that,, if I trust the signal, the close price of day d+1 will not increase. I should not buy the stock. |
||||
|
||||
3. Backtest the signal created in Question 2. Here are some assumptions made to backtest this signal: |
||||
- When, at date d, the signal equals 1 we buy 1$ of stock just before the market closes and we sell the stock just before the market closes the next day. |
||||
- When, at date d, the signal equals 0, we do not buy anything. |
||||
- The profit is not reinvested, when invested, the amount is always 1$. |
||||
- Fees are not considered |
||||
|
||||
**The expected output** is a **Series that gives for each day the return of the strategy. The return of the strategy is the PnL (Profit and Losses) divided by the invested amount**. The PnL for day d is: |
||||
`(money earned this day - money invested this day)` |
||||
|
||||
Let's take the example of a 20% return for an invested amount of 1$. The PnL is `(1,2 - 1) = 0.2`. We notice that the PnL when the signal is 1 equals the daily return. The Pnl when the signal is 0 is 0. |
||||
By convention, we consider that the PnL of d is affected to day d and not d+1, even if the underlying return contains the information of d+1. |
||||
|
||||
**The usage of for loop is not allowed**. |
||||
|
||||
4. Compute the return of the strategy. The return of the strategy is defined as: `(Total earned - Total invested) / Total invested` |
||||
|
||||
5. Now the input signal is: **always buy**. Compute the daily PnL and the total PnL. Plot the daily PnL of Q5 and of Q3 on the same plot |
||||
|
||||
- https://www.investopedia.com/terms/b/backtesting.asp |
@ -0,0 +1,33 @@
|
||||
# D05 Piscine AI - Data Science |
||||
|
||||
The goal of this day is to understand practical usage of Pandas. |
||||
Today we will discover some important functionalities of Pandas. they will allow you to manipulate the data (DataFrame and Series) in order to clean, delete, add, merge and leverage more information. |
||||
|
||||
In Data Science this is crucial, because without cleaned data there's no algorithms learning. |
||||
|
||||
Author: |
||||
|
||||
# Table of Contents: |
||||
|
||||
Historical part: |
||||
|
||||
# Introduction |
||||
|
||||
Not only is the pandas library a central component of the data science toolkit but it is used in conjunction with other libraries in that collection. |
||||
|
||||
Pandas is built on top of the NumPy package, meaning a lot of the structure of NumPy is used or replicated in Pandas. Data in pandas is often used to feed statistical analysis in SciPy, plotting functfunctionsions from Matplotlib, and machine learning algorithms in Scikit-learn. |
||||
|
||||
## Historical |
||||
|
||||
## Rules |
||||
|
||||
... |
||||
|
||||
## Ressources |
||||
|
||||
Pandas website |
||||
|
||||
- https://jakevdp.github.io/PythonDataScienceHandbook/ |
||||
|
||||
- https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf |
||||
- https://www.learndatasci.com/tutorials/python-pandas-tutorial-complete-introduction-for-beginners/ |
Loading…
Reference in new issue