Browse Source

feat: structure exercises of day05 and add data

pull/42/head
Badr Ghazlane 3 years ago
parent
commit
8b016c2ec4
  1. 1600
      one_exercise_per_file/day01/ex08/data/winequality-red.csv
  2. 72
      one_exercise_per_file/day01/ex08/data/winequality.names
  3. 10
      one_exercise_per_file/day01/ex09/data/model_forecasts.txt
  4. 1
      one_exercise_per_file/day02/ex02/data/household_power_consumption.txt
  5. 20001
      one_exercise_per_file/day02/ex03/data/Ecommerce_purchases.txt
  6. 151
      one_exercise_per_file/day02/ex04/data/iris.csv
  7. 152
      one_exercise_per_file/day02/ex04/data/iris.data
  8. 37
      one_exercise_per_file/day05/ex01/audit/readme.md
  9. 7
      one_exercise_per_file/day05/ex01/readme.md
  10. 50
      one_exercise_per_file/day05/ex02/audit/readme.md
  11. 10120
      one_exercise_per_file/day05/ex02/data/AAPL.csv
  12. 0
      one_exercise_per_file/day05/ex02/readme.md
  13. 7
      one_exercise_per_file/day05/ex03/audit/readme.md
  14. 0
      one_exercise_per_file/day05/ex03/readme.md
  15. 69
      one_exercise_per_file/day05/ex04/audit/readme.md
  16. 44
      one_exercise_per_file/day05/ex04/readme.md
  17. 33
      one_exercise_per_file/day05/readme.md

1600
one_exercise_per_file/day01/ex08/data/winequality-red.csv

File diff suppressed because it is too large diff.load

72
one_exercise_per_file/day01/ex08/data/winequality.names

@ -0,0 +1,72 @@
Citation Request:
This dataset is public available for research. The details are described in [Cortez et al., 2009].
Please include this citation if you plan to use this database:
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis.
Modeling wine preferences by data mining from physicochemical properties.
In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.
Available at: [@Elsevier] http://dx.doi.org/10.1016/j.dss.2009.05.016
[Pre-press (pdf)] http://www3.dsi.uminho.pt/pcortez/winequality09.pdf
[bib] http://www3.dsi.uminho.pt/pcortez/dss09.bib
1. Title: Wine Quality
2. Sources
Created by: Paulo Cortez (Univ. Minho), Antonio Cerdeira, Fernando Almeida, Telmo Matos and Jose Reis (CVRVV) @ 2009
3. Past Usage:
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis.
Modeling wine preferences by data mining from physicochemical properties.
In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.
In the above reference, two datasets were created, using red and white wine samples.
The inputs include objective tests (e.g. PH values) and the output is based on sensory data
(median of at least 3 evaluations made by wine experts). Each expert graded the wine quality
between 0 (very bad) and 10 (very excellent). Several data mining methods were applied to model
these datasets under a regression approach. The support vector machine model achieved the
best results. Several metrics were computed: MAD, confusion matrix for a fixed error tolerance (T),
etc. Also, we plot the relative importances of the input variables (as measured by a sensitivity
analysis procedure).
4. Relevant Information:
The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine.
For more details, consult: http://www.vinhoverde.pt/en/ or the reference [Cortez et al., 2009].
Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables
are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).
These datasets can be viewed as classification or regression tasks.
The classes are ordered and not balanced (e.g. there are munch more normal wines than
excellent or poor ones). Outlier detection algorithms could be used to detect the few excellent
or poor wines. Also, we are not sure if all input variables are relevant. So
it could be interesting to test feature selection methods.
5. Number of Instances: red wine - 1599; white wine - 4898.
6. Number of Attributes: 11 + output attribute
Note: several of the attributes may be correlated, thus it makes sense to apply some sort of
feature selection.
7. Attribute information:
For more information, read [Cortez et al., 2009].
Input variables (based on physicochemical tests):
1 - fixed acidity
2 - volatile acidity
3 - citric acid
4 - residual sugar
5 - chlorides
6 - free sulfur dioxide
7 - total sulfur dioxide
8 - density
9 - pH
10 - sulphates
11 - alcohol
Output variable (based on sensory data):
12 - quality (score between 0 and 10)
8. Missing Attribute Values: None

10
one_exercise_per_file/day01/ex09/data/model_forecasts.txt

@ -0,0 +1,10 @@
nan -9.480000000000000426e+00 1.415000000000000036e+01 1.126999999999999957e+01 -5.650000000000000355e+00 3.330000000000000071e+00 1.094999999999999929e+01 -2.149999999999999911e+00 5.339999999999999858e+00 -2.830000000000000071e+00
9.480000000000000426e+00 nan 4.860000000000000320e+00 -8.609999999999999432e+00 7.820000000000000284e+00 -1.128999999999999915e+01 1.324000000000000021e+01 4.919999999999999929e+00 2.859999999999999876e+00 9.039999999999999147e+00
-1.415000000000000036e+01 -1.126999999999999957e+01 nan 1.227999999999999936e+01 -2.410000000000000142e+00 6.040000000000000036e+00 -5.160000000000000142e+00 -3.870000000000000107e+00 -1.281000000000000050e+01 1.790000000000000036e+00
5.650000000000000355e+00 -3.330000000000000071e+00 -1.094999999999999929e+01 nan -1.364000000000000057e+01 0.000000000000000000e+00 2.240000000000000213e+00 -3.609999999999999876e+00 -7.730000000000000426e+00 8.000000000000000167e-02
2.149999999999999911e+00 -5.339999999999999858e+00 2.830000000000000071e+00 -4.860000000000000320e+00 nan -8.800000000000000044e-01 -8.570000000000000284e+00 2.560000000000000053e+00 -7.030000000000000249e+00 -6.330000000000000071e+00
8.609999999999999432e+00 -7.820000000000000284e+00 1.128999999999999915e+01 -1.324000000000000021e+01 -4.919999999999999929e+00 nan -1.296000000000000085e+01 -1.282000000000000028e+01 -1.403999999999999915e+01 1.456000000000000050e+01
-2.859999999999999876e+00 -9.039999999999999147e+00 -1.227999999999999936e+01 2.410000000000000142e+00 -6.040000000000000036e+00 5.160000000000000142e+00 nan -1.091000000000000014e+01 -1.443999999999999950e+01 -1.372000000000000064e+01
3.870000000000000107e+00 1.281000000000000050e+01 -1.790000000000000036e+00 1.364000000000000057e+01 -0.000000000000000000e+00 -2.240000000000000213e+00 3.609999999999999876e+00 nan 1.053999999999999915e+01 -1.417999999999999972e+01
7.730000000000000426e+00 -8.000000000000000167e-02 8.800000000000000044e-01 8.570000000000000284e+00 -2.560000000000000053e+00 7.030000000000000249e+00 6.330000000000000071e+00 1.296000000000000085e+01 nan -1.169999999999999929e+01
1.282000000000000028e+01 1.403999999999999915e+01 -1.456000000000000050e+01 1.091000000000000014e+01 1.443999999999999950e+01 1.372000000000000064e+01 -1.053999999999999915e+01 1.417999999999999972e+01 1.169999999999999929e+01 nan

1
one_exercise_per_file/day02/ex02/data/household_power_consumption.txt

@ -0,0 +1 @@
Empty file. The original is too big to be pushed on Github.

20001
one_exercise_per_file/day02/ex03/data/Ecommerce_purchases.txt

File diff suppressed because it is too large diff.load

151
one_exercise_per_file/day02/ex04/data/iris.csv

@ -0,0 +1,151 @@
,sepal_length,sepal_width,petal_length,petal_width, flower
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,-3.6,-1.4,0.2,Iris-setosa
5,5.4,3.9,1.7,0.4,Iris-setosa
6,4.6,3.4,1.4,0.3,Iris-setosa
7,5.0,3.4,1.5,0.2,Iris-setosa
8,-4.4,2.9,1400.0,0.2,Iris-setosa
9,4.9,3.1,1.5,0.1,Iris-setosa
10,5.4,3.7,,0.2,Iris-setosa
11,4.8,3.4,,0.2,Iris-setosa
12,4.8,3.0,,0.1,Iris-setosa
13,4.3,3.0,,0.1,Iris-setosa
14,5.8,4.0,,0.2,Iris-setosa
15,5.7,4.4,,0.4,Iris-setosa
16,5.4,3.9,,0.4,Iris-setosa
17,5.1,3.5,,0.3,Iris-setosa
18,5.7,3.8,,0.3,Iris-setosa
19,5.1,3.8,,0.3,Iris-setosa
20,5.4,3.4,,0.2,Iris-setosa
21,5.1,3.7,,0.4,Iris-setosa
22,4.6,3.6,,0.2,Iris-setosa
23,5.1,3.3,,0.5,Iris-setosa
24,4.8,3.4,,0.2,Iris-setosa
25,5.0,-3.0,,0.2,Iris-setosa
26,5.0,3.4,,0.4,Iris-setosa
27,5.2,3.5,,0.2,Iris-setosa
28,5.2,3.4,,0.2,Iris-setosa
29,4.7,3.2,,0.2,Iris-setosa
30,4.8,3.1,1.6,0.2,Iris-setosa
31,5.4,3.4,1.5,0.4,Iris-setosa
32,5.2,4.1,1.5,0.1,Iris-setosa
33,5.5,4.2,1.4,0.2,Iris-setosa
34,4.9,3.1,1.5,0.1,Iris-setosa
35,5.0,3.2,1.2,0.2,Iris-setosa
36,5.5,3.5,1.3,0.2,Iris-setosa
37,4.9,,1.5,0.1,Iris-setosa
38,4.4,3.0,1.3,0.2,Iris-setosa
39,5.1,3.4,1.5,0.2,Iris-setosa
40,5.0,3.5,1.3,0.3,Iris-setosa
41,4.5,2.3,1.3,0.3,Iris-setosa
42,4.4,3.2,1.3,0.2,Iris-setosa
43,5.0,3.5,1.6,0.6,Iris-setosa
44,5.1,3.8,1.9,0.4,Iris-setosa
45,4.8,3.0,1.4,0.3,Iris-setosa
46,5.1,3809.0,1.6,0.2,Iris-setosa
47,4.6,3.2,1.4,0.2,Iris-setosa
48,5.3,3.7,1.5,0.2,Iris-setosa
49,5.0,3.3,1.4,0.2,Iris-setosa
50,7.0,3.2,4.7,1.4,Iris-versicolor
51,6.4,3200.0,4.5,1.5,Iris-versicolor
52,6.9,3.1,4.9,1.5,Iris-versicolor
53,5.5,2.3,4.0,1.3,Iris-versicolor
54,6.5,2.8,4.6,1.5,Iris-versicolor
55,5.7,2.8,4.5,1.3,Iris-versicolor
56,6.3,3.3,4.7,1600.0,Iris-versicolor
57,4.9,2.4,3.3,1.0,Iris-versicolor
58,6.6,2.9,4.6,1.3,Iris-versicolor
59,5.2,2.7,3.9,,Iris-versicolor
60,5.0,2.0,3.5,1.0,Iris-versicolor
61,5.9,3.0,4.2,1.5,Iris-versicolor
62,6.0,2.2,4.0,1.0,Iris-versicolor
63,6.1,2.9,4.7,1.4,Iris-versicolor
64,5.6,2.9,3.6,1.3,Iris-versicolor
65,6.7,3.1,4.4,1.4,Iris-versicolor
66,5.6,3.0,4.5,1.5,Iris-versicolor
67,5.8,2.7,4.1,1.0,Iris-versicolor
68,6.2,2.2,4.5,1.5,Iris-versicolor
69,5.6,2.5,3.9,1.1,Iris-versicolor
70,5.9,3.2,4.8,1.8,Iris-versicolor
71,6.1,2.8,4.0,1.3,Iris-versicolor
72,6.3,2.5,4.9,1.5,Iris-versicolor
73,6.1,2.8,4.7,1.2,Iris-versicolor
74,6.4,2.9,4.3,1.3,Iris-versicolor
75,6.6,3.0,4.4,1.4,Iris-versicolor
76,6.8,2.8,4.8,1.4,Iris-versicolor
77,6.7,3.0,5.0,1.7,Iris-versicolor
78,6.0,2.9,4.5,1.5,Iris-versicolor
79,5.7,2.6,3.5,1.0,Iris-versicolor
80,5.5,2.4,3.8,1.1,Iris-versicolor
81,5.5,2.4,3.7,1.0,Iris-versicolor
82,5.8,2.7,3.9,1.2,Iris-versicolor
83,6.0,2.7,5.1,1.6,Iris-versicolor
84,5.4,3.0,4.5,1.5,Iris-versicolor
85,6.0,3.4,4.5,1.6,Iris-versicolor
86,6.7,3.1,4.7,1.5,Iris-versicolor
87,6.3,2.3,4.4,1.3,Iris-versicolor
88,5.6,3.0,4.1,1.3,Iris-versicolor
89,5.5,2.5,4.0,1.3,Iris-versicolor
90,5.5,2.6,4.4,1.2,Iris-versicolor
91,6.1,3.0,4.6,1.4,Iris-versicolor
92,5.8,2.6,4.0,1.2,Iris-versicolor
93,5.0,2.3,3.3,1.0,Iris-versicolor
94,5.6,2.7,4.2,1.3,Iris-versicolor
95,5.7,3.0,4.2,1.2,Iris-versicolor
96,5.7,2.9,4.2,1.3,Iris-versicolor
97,6.2,2.9,4.3,1.3,Iris-versicolor
98,5.1,2.5,3.0,1.1,Iris-versicolor
99,5.7,2.8,,1.3,Iris-versicolor
100,,3.3,,2.5,Iris-virginica
101,5.8,2.7,,1.9,Iris-virginica
102,7.1,3.0,,2.1,Iris-virginica
103,6.3,2.9,,1.8,Iris-virginica
104,6.5,3.0,,2.2,Iris-virginica
105,7.6,3.0,6.6,2.1,Iris-virginica
106,4.9,2.5,4.5,1.7,Iris-virginica
107,7.3,2.9,6.3,1.8,Iris-virginica
108,6.7,2.5,5.8,1.8,Iris-virginica
109,7.2,3.6,6.1,2.5,Iris-virginica
110,6.5,3.2,5.1,2.0,Iris-virginica
111,6.4,2.7,5.3,1.9,Iris-virginica
112,6.8,3.0,5.5,2.1,Iris-virginica
113,5.7,2.5,5.0,2.0,Iris-virginica
114,5.8,,5.1,2.4,Iris-virginica
115,6.4,,5.3,2.3,Iris-virginica
116,6.5,,5.5,1.8,Iris-virginica
117,7.7,,6.7,2.2,Iris-virginica
118,7.7,,,2.3,Iris-virginica
119,6.0,,5.0,1.5,Iris-virginica
120,6.9,,5.7,2.3,Iris-virginica
121,5.6,2.8,4.9,2.0,Iris-virginica
122,always,check,the,data,!!!!!!!!
123,6.3,2.7,4.9,1.8,Iris-virginica
124,6.7,3.3,5.7,2.1,Iris-virginica
125,7.2,3.2,6.0,1.8,Iris-virginica
126,6.2,2.8,-4.8,1.8,Iris-virginica
127,,3.0,4.9,1.8,Iris-virginica
128,6.4,2.8,5.6,2.1,Iris-virginica
129,7.2,3.0,5.8,1.6,Iris-virginica
130,7.4,2.8,6.1,1.9,Iris-virginica
131,7.9,3.8,6.4,2.0,Iris-virginica
132,6.-4,2.8,5.6,2.2,Iris-virginica
133,6.3,2.8,,1.5,Iris-virginica
134,6.1,2.6,5.6,1.4,Iris-virginica
135,7.7,3.0,6.1,2.3,Iris-virginica
136,6.3,3.4,5.6,2.4,Iris-virginica
137,6.4,3.1,5.5,1.8,Iris-virginica
138,6.0,3.0,4.8,1.8,Iris-virginica
139,6900,3.1,5.4,2.1,Iris-virginica
140,6.7,3.1,,2.4,Iris-virginica
141,6.9,3.1,5.1,2.3,Iris-virginica
142,580,2.7,5.1,,Iris-virginica
143,6.8,3.2,5.9,2.3,Iris-virginica
144,6.7,3.3,5.7,-2.5,Iris-virginica
145,6.7,3.0,5.2,2.3,Iris-virginica
146,6.3,2.5,5.0,1.9,Iris-virginica
147,6.5,3.0,5.2,2.0,Iris-virginica
148,6.2,3.4,5.4,2.3,Iris-virginica
149,5.9,3.0,5.1,1.8,Iris-virginica
1 sepal_length sepal_width petal_length petal_width flower
2 0 5.1 3.5 1.4 0.2 Iris-setosa
3 1 4.9 3.0 1.4 0.2 Iris-setosa
4 2 4.7 3.2 1.3 0.2 Iris-setosa
5 3 4.6 3.1 1.5 0.2 Iris-setosa
6 4 5.0 -3.6 -1.4 0.2 Iris-setosa
7 5 5.4 3.9 1.7 0.4 Iris-setosa
8 6 4.6 3.4 1.4 0.3 Iris-setosa
9 7 5.0 3.4 1.5 0.2 Iris-setosa
10 8 -4.4 2.9 1400.0 0.2 Iris-setosa
11 9 4.9 3.1 1.5 0.1 Iris-setosa
12 10 5.4 3.7 0.2 Iris-setosa
13 11 4.8 3.4 0.2 Iris-setosa
14 12 4.8 3.0 0.1 Iris-setosa
15 13 4.3 3.0 0.1 Iris-setosa
16 14 5.8 4.0 0.2 Iris-setosa
17 15 5.7 4.4 0.4 Iris-setosa
18 16 5.4 3.9 0.4 Iris-setosa
19 17 5.1 3.5 0.3 Iris-setosa
20 18 5.7 3.8 0.3 Iris-setosa
21 19 5.1 3.8 0.3 Iris-setosa
22 20 5.4 3.4 0.2 Iris-setosa
23 21 5.1 3.7 0.4 Iris-setosa
24 22 4.6 3.6 0.2 Iris-setosa
25 23 5.1 3.3 0.5 Iris-setosa
26 24 4.8 3.4 0.2 Iris-setosa
27 25 5.0 -3.0 0.2 Iris-setosa
28 26 5.0 3.4 0.4 Iris-setosa
29 27 5.2 3.5 0.2 Iris-setosa
30 28 5.2 3.4 0.2 Iris-setosa
31 29 4.7 3.2 0.2 Iris-setosa
32 30 4.8 3.1 1.6 0.2 Iris-setosa
33 31 5.4 3.4 1.5 0.4 Iris-setosa
34 32 5.2 4.1 1.5 0.1 Iris-setosa
35 33 5.5 4.2 1.4 0.2 Iris-setosa
36 34 4.9 3.1 1.5 0.1 Iris-setosa
37 35 5.0 3.2 1.2 0.2 Iris-setosa
38 36 5.5 3.5 1.3 0.2 Iris-setosa
39 37 4.9 1.5 0.1 Iris-setosa
40 38 4.4 3.0 1.3 0.2 Iris-setosa
41 39 5.1 3.4 1.5 0.2 Iris-setosa
42 40 5.0 3.5 1.3 0.3 Iris-setosa
43 41 4.5 2.3 1.3 0.3 Iris-setosa
44 42 4.4 3.2 1.3 0.2 Iris-setosa
45 43 5.0 3.5 1.6 0.6 Iris-setosa
46 44 5.1 3.8 1.9 0.4 Iris-setosa
47 45 4.8 3.0 1.4 0.3 Iris-setosa
48 46 5.1 3809.0 1.6 0.2 Iris-setosa
49 47 4.6 3.2 1.4 0.2 Iris-setosa
50 48 5.3 3.7 1.5 0.2 Iris-setosa
51 49 5.0 3.3 1.4 0.2 Iris-setosa
52 50 7.0 3.2 4.7 1.4 Iris-versicolor
53 51 6.4 3200.0 4.5 1.5 Iris-versicolor
54 52 6.9 3.1 4.9 1.5 Iris-versicolor
55 53 5.5 2.3 4.0 1.3 Iris-versicolor
56 54 6.5 2.8 4.6 1.5 Iris-versicolor
57 55 5.7 2.8 4.5 1.3 Iris-versicolor
58 56 6.3 3.3 4.7 1600.0 Iris-versicolor
59 57 4.9 2.4 3.3 1.0 Iris-versicolor
60 58 6.6 2.9 4.6 1.3 Iris-versicolor
61 59 5.2 2.7 3.9 Iris-versicolor
62 60 5.0 2.0 3.5 1.0 Iris-versicolor
63 61 5.9 3.0 4.2 1.5 Iris-versicolor
64 62 6.0 2.2 4.0 1.0 Iris-versicolor
65 63 6.1 2.9 4.7 1.4 Iris-versicolor
66 64 5.6 2.9 3.6 1.3 Iris-versicolor
67 65 6.7 3.1 4.4 1.4 Iris-versicolor
68 66 5.6 3.0 4.5 1.5 Iris-versicolor
69 67 5.8 2.7 4.1 1.0 Iris-versicolor
70 68 6.2 2.2 4.5 1.5 Iris-versicolor
71 69 5.6 2.5 3.9 1.1 Iris-versicolor
72 70 5.9 3.2 4.8 1.8 Iris-versicolor
73 71 6.1 2.8 4.0 1.3 Iris-versicolor
74 72 6.3 2.5 4.9 1.5 Iris-versicolor
75 73 6.1 2.8 4.7 1.2 Iris-versicolor
76 74 6.4 2.9 4.3 1.3 Iris-versicolor
77 75 6.6 3.0 4.4 1.4 Iris-versicolor
78 76 6.8 2.8 4.8 1.4 Iris-versicolor
79 77 6.7 3.0 5.0 1.7 Iris-versicolor
80 78 6.0 2.9 4.5 1.5 Iris-versicolor
81 79 5.7 2.6 3.5 1.0 Iris-versicolor
82 80 5.5 2.4 3.8 1.1 Iris-versicolor
83 81 5.5 2.4 3.7 1.0 Iris-versicolor
84 82 5.8 2.7 3.9 1.2 Iris-versicolor
85 83 6.0 2.7 5.1 1.6 Iris-versicolor
86 84 5.4 3.0 4.5 1.5 Iris-versicolor
87 85 6.0 3.4 4.5 1.6 Iris-versicolor
88 86 6.7 3.1 4.7 1.5 Iris-versicolor
89 87 6.3 2.3 4.4 1.3 Iris-versicolor
90 88 5.6 3.0 4.1 1.3 Iris-versicolor
91 89 5.5 2.5 4.0 1.3 Iris-versicolor
92 90 5.5 2.6 4.4 1.2 Iris-versicolor
93 91 6.1 3.0 4.6 1.4 Iris-versicolor
94 92 5.8 2.6 4.0 1.2 Iris-versicolor
95 93 5.0 2.3 3.3 1.0 Iris-versicolor
96 94 5.6 2.7 4.2 1.3 Iris-versicolor
97 95 5.7 3.0 4.2 1.2 Iris-versicolor
98 96 5.7 2.9 4.2 1.3 Iris-versicolor
99 97 6.2 2.9 4.3 1.3 Iris-versicolor
100 98 5.1 2.5 3.0 1.1 Iris-versicolor
101 99 5.7 2.8 1.3 Iris-versicolor
102 100 3.3 2.5 Iris-virginica
103 101 5.8 2.7 1.9 Iris-virginica
104 102 7.1 3.0 2.1 Iris-virginica
105 103 6.3 2.9 1.8 Iris-virginica
106 104 6.5 3.0 2.2 Iris-virginica
107 105 7.6 3.0 6.6 2.1 Iris-virginica
108 106 4.9 2.5 4.5 1.7 Iris-virginica
109 107 7.3 2.9 6.3 1.8 Iris-virginica
110 108 6.7 2.5 5.8 1.8 Iris-virginica
111 109 7.2 3.6 6.1 2.5 Iris-virginica
112 110 6.5 3.2 5.1 2.0 Iris-virginica
113 111 6.4 2.7 5.3 1.9 Iris-virginica
114 112 6.8 3.0 5.5 2.1 Iris-virginica
115 113 5.7 2.5 5.0 2.0 Iris-virginica
116 114 5.8 5.1 2.4 Iris-virginica
117 115 6.4 5.3 2.3 Iris-virginica
118 116 6.5 5.5 1.8 Iris-virginica
119 117 7.7 6.7 2.2 Iris-virginica
120 118 7.7 2.3 Iris-virginica
121 119 6.0 5.0 1.5 Iris-virginica
122 120 6.9 5.7 2.3 Iris-virginica
123 121 5.6 2.8 4.9 2.0 Iris-virginica
124 122 always check the data !!!!!!!!
125 123 6.3 2.7 4.9 1.8 Iris-virginica
126 124 6.7 3.3 5.7 2.1 Iris-virginica
127 125 7.2 3.2 6.0 1.8 Iris-virginica
128 126 6.2 2.8 -4.8 1.8 Iris-virginica
129 127 3.0 4.9 1.8 Iris-virginica
130 128 6.4 2.8 5.6 2.1 Iris-virginica
131 129 7.2 3.0 5.8 1.6 Iris-virginica
132 130 7.4 2.8 6.1 1.9 Iris-virginica
133 131 7.9 3.8 6.4 2.0 Iris-virginica
134 132 6.-4 2.8 5.6 2.2 Iris-virginica
135 133 6.3 2.8 1.5 Iris-virginica
136 134 6.1 2.6 5.6 1.4 Iris-virginica
137 135 7.7 3.0 6.1 2.3 Iris-virginica
138 136 6.3 3.4 5.6 2.4 Iris-virginica
139 137 6.4 3.1 5.5 1.8 Iris-virginica
140 138 6.0 3.0 4.8 1.8 Iris-virginica
141 139 6900 3.1 5.4 2.1 Iris-virginica
142 140 6.7 3.1 2.4 Iris-virginica
143 141 6.9 3.1 5.1 2.3 Iris-virginica
144 142 580 2.7 5.1 Iris-virginica
145 143 6.8 3.2 5.9 2.3 Iris-virginica
146 144 6.7 3.3 5.7 -2.5 Iris-virginica
147 145 6.7 3.0 5.2 2.3 Iris-virginica
148 146 6.3 2.5 5.0 1.9 Iris-virginica
149 147 6.5 3.0 5.2 2.0 Iris-virginica
150 148 6.2 3.4 5.4 2.3 Iris-virginica
151 149 5.9 3.0 5.1 1.8 Iris-virginica

152
one_exercise_per_file/day02/ex04/data/iris.data

@ -0,0 +1,152 @@
sepal_length,sepal_width,petal_length,petal_width, flower
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,-3.6,-1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
4.6,3.4,1.4,0.3,Iris-setosa
5.0,3.4,1.5,0.2,Iris-setosa
-4.4,2.9,1400,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
5.4,3.7,1.5,0.2,Iris-setosa
4.8,3.4,1.6,0.2,Iris-setosa
4.8,3.0,1.4,0.1,Iris-setosa
4.3,3.0,1.1,0.1,Iris-setosa
5.8,4.0,1.2,0.2,Iris-setosa
5.7,4.4,1500,0.4,Iris-setosa
5.4,3.9,1.3,0.4,Iris-setosa
5.1,3.5,1.4,0.3,Iris-setosa
5.7,3.8,1.7,0.3,Iris-setosa
5.1,3.8,1.5,0.3,Iris-setosa
5.4,3.4,-1.7,0.2,Iris-setosa
5.1,3.7,1.5,0.4,Iris-setosa
4.6,3.6,1.0,0.2,Iris-setosa
5.1,3.3,1.7,0.5,Iris-setosa
4.8,3.4,1.9,0.2,Iris-setosa
5.0,-3.0,1.6,0.2,Iris-setosa
5.0,3.4,1.6,0.4,Iris-setosa
5.2,3.5,1.5,0.2,Iris-setosa
5.2,3.4,1.4,0.2,Iris-setosa
4.7,3.2,1.6,0.2,Iris-setosa
4.8,3.1,1.6,0.2,Iris-setosa
5.4,3.4,1.5,0.4,Iris-setosa
5.2,4.1,1.5,0.1,Iris-setosa
5.5,4.2,1.4,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
5.0,3.2,1.2,0.2,Iris-setosa
5.5,3.5,1.3,0.2,Iris-setosa
4.9,,1.5,0.1,Iris-setosa
4.4,3.0,1.3,0.2,Iris-setosa
5.1,3.4,1.5,0.2,Iris-setosa
5.0,"3.5",1.3,0.3,Iris-setosa
4.5,2.3,1.3,0.3,Iris-setosa
4.4,3.2,1.3,0.2,Iris-setosa
5.0,3.5,1.6,0.6,Iris-setosa
5.1,3.8,1.9,0.4,Iris-setosa
4.8,3.0,1.4,0.3,Iris-setosa
5.1,3809,1.6,0.2,Iris-setosa
4.6,3.2,1.4,0.2,Iris-setosa
5.3,3.7,1.5,0.2,Iris-setosa
5.0,3.3,1.4,0.2,Iris-setosa
7.0,3.2,4.7,1.4,Iris-versicolor
6.4,3200,4.5,1.5,Iris-versicolor
6.9,3.1,4.9,1.5,Iris-versicolor
5.5,2.3,4.0,1.3,Iris-versicolor
6.5,2.8,4.6,1.5,Iris-versicolor
5.7,2.8,4.5,1.3,Iris-versicolor
6.3,3.3,4.7,1600,Iris-versicolor
4.9,2.4,3.3,1.0,Iris-versicolor
6.6,2.9,4.6,1.3,Iris-versicolor
5.2,2.7,3.9,,Iris-versicolor
5.0,2.0,3.5,1.0,Iris-versicolor
5.9,3.0,4.2,1.5,Iris-versicolor
6.0,2.2,4.0,1.0,Iris-versicolor
6.1,2.9,4.7,1.4,Iris-versicolor
5.6,2.9,3.6,1.3,Iris-versicolor
6.7,3.1,4.4,1.4,Iris-versicolor
5.6,3.0,4.5,1.5,Iris-versicolor
5.8,2.7,4.1,1.0,Iris-versicolor
6.2,2.2,4.5,1.5,Iris-versicolor
5.6,2.5,3.9,1.1,Iris-versicolor
5.9,3.2,4.8,1.8,Iris-versicolor
6.1,2.8,4.0,1.3,Iris-versicolor
6.3,2.5,4.9,1.5,Iris-versicolor
6.1,2.8,4.7,1.2,Iris-versicolor
6.4,2.9,4.3,1.3,Iris-versicolor
6.6,3.0,4.4,1.4,Iris-versicolor
6.8,2.8,4.8,1.4,Iris-versicolor
6.7,3.0,5.0,1.7,Iris-versicolor
6.0,2.9,4.5,1.5,Iris-versicolor
5.7,2.6,3.5,1.0,Iris-versicolor
5.5,2.4,3.8,1.1,Iris-versicolor
5.5,2.4,3.7,1.0,Iris-versicolor
5.8,2.7,3.9,1.2,Iris-versicolor
6.0,2.7,5.1,1.6,Iris-versicolor
5.4,3.0,4.5,1.5,Iris-versicolor
6.0,3.4,4.5,1.6,Iris-versicolor
6.7,3.1,4.7,1.5,Iris-versicolor
6.3,2.3,4.4,1.3,Iris-versicolor
5.6,3.0,4.1,1.3,Iris-versicolor
5.5,2.5,4.0,1.3,Iris-versicolor
5.5,2.6,4.4,1.2,Iris-versicolor
6.1,3.0,4.6,1.4,Iris-versicolor
5.8,2.6,4.0,1.2,Iris-versicolor
5.0,2.3,3.3,1.0,Iris-versicolor
5.6,2.7,4.2,1.3,Iris-versicolor
5.7,3.0,4.2,1.2,Iris-versicolor
5.7,2.9,4.2,1.3,Iris-versicolor
6.2,2.9,4.3,1.3,Iris-versicolor
5.1,2.5,3.0,1.1,Iris-versicolor
5.7,2.8,4.1,1.3,Iris-versicolor
6.3,3.3,6.0,2.5,Iris-virginica
5.8,2.7,5.1,1.9,Iris-virginica
7.1,3.0,5.9,2.1,Iris-virginica
6.3,2.9,5.6,1.8,Iris-virginica
6.5,3.0,5.8,2.2,Iris-virginica
7.6,3.0,6.6,2.1,Iris-virginica
4.9,2.5,4.5,1.7,Iris-virginica
7.3,2.9,6.3,1.8,Iris-virginica
6.7,2.5,5.8,1.8,Iris-virginica
7.2,3.6,6.1,2.5,Iris-virginica
6.5,3.2,5.1,2.0,Iris-virginica
6.4,2.7,5.3,1.9,Iris-virginica
6.8,3.0,5.5,2.1,Iris-virginica
5.7,2.5,5.0,2.0,Iris-virginica
5.8,2.8,5.1,2.4,Iris-virginica
6.4,3.2,5.3,2.3,Iris-virginica
6.5,3.0,5.5,1.8,Iris-virginica
7.7,3.8,6.7,2.2,Iris-virginica
7.7,2.6,6.9,2.3,Iris-virginica
6.0,2.2,5.0,1.5,Iris-virginica
6.9,3.2,5.7,2.3,Iris-virginica
5.6,2.8,4.9,2.0,Iris-virginica
7.7,2.8,6.7,2.0,Iris-virginica
6.3,2.7,4.9,1.8,Iris-virginica
6.7,3.3,5.7,2.1,Iris-virginica
7.2,3.2,6.0,1.8,Iris-virginica
6.2,2.8,-4.8,1.8,Iris-virginica
6.1,3.0,4.9,1.8,Iris-virginica
6.4,2.8,5.6,2.1,Iris-virginica
7.2,3.0,5.8,1.6,Iris-virginica
7.4,2.8,6.1,1.9,Iris-virginica
7.9,3.8,6.4,2.0,Iris-virginica
6.-4,2.8,5.6,2.2,Iris-virginica
6.3,2.8,"5.1",1.5,Iris-virginica
6.1,2.6,5.6,1.4,Iris-virginica
7.7,3.0,6.1,2.3,Iris-virginica
6.3,3.4,5.6,2.4,Iris-virginica
6.4,3.1,5.5,1.8,Iris-virginica
6.0,3.0,4.8,1.8,Iris-virginica
6900,3.1,5.4,2.1,Iris-virginica
6.7,3.1,5.6,2.4,Iris-virginica
6.9,3.1,5.1,2.3,Iris-virginica
580,2.7,5.1,1.9,Iris-virginica
6.8,3.2,5.9,2.3,Iris-virginica
6.7,3.3,5.7,-2.5,Iris-virginica
6.7,3.0,5.2,2.3,Iris-virginica
6.3,2.5,5.0,1.9,Iris-virginica
6.5,3.0,5.2,2.0,Iris-virginica
6.2,3.4,5.4,2.3,Iris-virginica
5.9,3.0,5.1,1.8,Iris-virginica

37
one_exercise_per_file/day05/ex01/audit/readme.md

@ -0,0 +1,37 @@
1. This question is validated if the output of is
```console
2010-01-01 0
2010-01-02 1
2010-01-03 2
2010-01-04 3
2010-01-05 4
...
2020-12-27 4013
2020-12-28 4014
2020-12-29 4015
2020-12-30 4016
2020-12-31 4017
Freq: D, Name: integer_series, Length: 4018, dtype: int64
```
The best solution uses `pd.date_range` to generate the index and `range` to generate the integer series.
2. This question is validated if the output is:
```console
2010-01-01 NaN
2010-01-02 NaN
2010-01-03 NaN
2010-01-04 NaN
2010-01-05 NaN
...
2020-12-27 4010.0
2020-12-28 4011.0
2020-12-29 4012.0
2020-12-30 4013.0
2020-12-31 4014.0
Freq: D, Name: integer_series, Length: 4018, dtype: float64
```
If the `NaN` values have been dropped the solution is also accepted. The solution uses `rolling().mean()`.

7
one_exercise_per_file/day05/ex01/readme.md

@ -0,0 +1,7 @@
# Exercise 1
The goal of this exercise is to learn to manipulate time series in Pandas.
1. Create a `Series` named `integer_series` from 1st January 2010 to 31 December 2020. At each date is associated the number of days since 1st January 2010. It starts with 0.
2. Using Pandas, compute a 7 days moving average. This transformation smooths the time series by removing small fluctuations. **without for loop**

50
one_exercise_per_file/day05/ex02/audit/readme.md

@ -0,0 +1,50 @@
Preliminary:
- As usual the first steps are:
- Check missing values and data types
- Convert string dates to datetime
- Set dates as index
- Use `info` or `describe` to have a first look at the data
The exercise is not validated if these steps have not been done.
1. The Candlestick is based on Open, High, Low and Close columns. The index is Date (datetime). As long as you inserted the right columns in `Candlestick` `Plotly` object you validate the question.
2. This question is validated if the output of `print(transformed_df.head().to_markdown())` is
| Date | Open | Close | Volume | High | Low |
|:--------------------|---------:|---------:|------------:|---------:|---------:|
| 1980-12-31 00:00:00 | 0.136075 | 0.135903 | 1.34485e+09 | 0.161272 | 0.112723 |
| 1981-01-30 00:00:00 | 0.141768 | 0.141316 | 6.08989e+08 | 0.155134 | 0.126116 |
| 1981-02-27 00:00:00 | 0.118215 | 0.117892 | 3.21619e+08 | 0.128906 | 0.106027 |
| 1981-03-31 00:00:00 | 0.111328 | 0.110871 | 7.00717e+08 | 0.120536 | 0.09654 |
| 1981-04-30 00:00:00 | 0.121811 | 0.121545 | 5.36928e+08 | 0.131138 | 0.108259 |
To get this result there are two ways: `resample` and `groupby`. There are two key steps:
- Find how to affect the aggregation on the last **business** day of each month. This is already implemented in Pandas and the keyword that should be used either in `resample` parameter or in `Grouper` is `BM`.
- Choose the right aggregation function for each variable. The prices (Open, Close and Adjusted Close) should be aggregated by taking the `mean`. Low should be aggregated by taking the `minimum` because it represents the lower price of the day, so the lowest price on the month is the lowest price of the lowest prices on the day. The same logic applied to High, leads to use the `maximum` to aggregate the High. Volume should be aggregated using the `sum` because the monthly volume is equal to the sum of daily volume over the month.
There are **482 months**.
3. The solution is accepted if it doesn't involve a for loop and the output is:
```console
Date
1980-12-12 NaN
1980-12-15 -0.047823
1980-12-16 -0.073063
1980-12-17 0.019703
1980-12-18 0.028992
...
2021-01-25 0.049824
2021-01-26 0.003704
2021-01-27 -0.001184
2021-01-28 -0.027261
2021-01-29 -0.026448
Name: Open, Length: 10118, dtype: float64
```
- The first way is to compute the return without for loop is to use `pct_change`
- The second way to compute the return without for loop is to implement the formula given in the exercise in a vectorized way. To get the value at `t-1` you can use `shift`

10120
one_exercise_per_file/day05/ex02/data/AAPL.csv

File diff suppressed because it is too large diff.load

0
one_exercise_per_file/day05/ex02/readme.md

7
one_exercise_per_file/day05/ex03/audit/readme.md

@ -0,0 +1,7 @@
1. This question is validated if, without having used a for loop, the outputted DataFrame shape's `(261, 5)` and your output is the same as the one return with this line of code:
```python
market_data.loc[market_data.index.get_level_values('Ticker')=='AAPL'].sort_index().pct_change()
```
The DataFrame contains random data. Make sure your output and the one returned by this code is based on the same DataFrame.

0
one_exercise_per_file/day05/ex03/readme.md

69
one_exercise_per_file/day05/ex04/audit/readme.md

@ -0,0 +1,69 @@
Preliminary:
- As usual the first steps are:
- Check missing values and data types
- Convert string dates to datetime
- Set dates as index
- Use `info` or `describe` to have a first look at the data
The exercise is not validated if these steps haven't been done.
My results can be reproduced using: `np.random.seed = 2712`. Given the versions of NumPy used I do not guaranty the reproducibility of the results - that is why I also explain the steps to get to the solution.
1. This question is validated if the return is computed as: Return(t) = (Price(t+1) - Price(t))/Price(t) and returns this output.
```console
Date
1980-12-12 -0.052170
1980-12-15 -0.073403
1980-12-16 0.024750
1980-12-17 0.029000
1980-12-18 0.061024
...
2021-01-25 0.001679
2021-01-26 -0.007684
2021-01-27 -0.034985
2021-01-28 -0.037421
2021-01-29 NaN
Name: Daily_futur_returns, Length: 10118, dtype: float64
```
The answer is also accepted if the returns is computed as in the exercise 2 and then shifted in the futur using `shift`, but I do not recommend this implementation as it adds missing values !
An example of solution is:
```python
def compute_futur_return(price):
return (price.shift(-1) - price)/price
compute_futur_return(df['Adj Close'])
```
Note that if the index is not ordered in ascending order the futur return computed is wrong.
2. This question is validated if the index of the Series is the same as the index of the DataFrame. The data of the series can be generated using `np.random.randint(0,2,len(df.index)`.
3. This question is validated if the Pnl is computed as: signal * futur_return. Both series should have the same index.
```console
Date
1980-12-12 -0.052170
1980-12-15 -0.073403
1980-12-16 0.024750
1980-12-17 0.029000
1980-12-18 0.061024
...
2021-01-25 0.001679
2021-01-26 -0.007684
2021-01-27 -0.034985
2021-01-28 -0.037421
2021-01-29 NaN
Name: PnL, Length: 10119, dtype: float64
```
4. The question is validated if you computed the return of the strategy as: `(Total earned - Total invested) / Total` invested. The result should be close to 0. The formula given could be simplified as `(PnLs.sum())/signal.sum()`.
My return is: 0.00043546984088551553 because I invested 5147$ and I earned 5149$.
5. The question is validated if you replaced the previous signal Series with 1s. Similarly as the previous question, we earned 10128$ and we invested 10118$ which leads to a return of 0.00112670194140969 (0.1%).

44
one_exercise_per_file/day05/ex04/readme.md

@ -0,0 +1,44 @@
# Exercise 4 Backtest
The goal of this exercise is to learn to perform a backtest in Pandas. A backtest is a tool that allows you to know how a strategy would have performed retrospectively using historical data. In this exercise we will focus on the backtesting tool and not on how to build the best strategy.
We will backtest a **long only** strategy on Apple Inc. Long only means that we only consider buying the stock. The input signal at date d says if the close price will increase at d+1. We assume that the input signal is available before the market closes.
1. Drop the rows with missing values and compute the daily futur return on the Apple stock on the adjusted close price. The daily futur return means: **Return(t) = (Price(t+1) - Price(t))/Price(t)**.
There are some events as splits or dividents that artificially change the price of the stock. That is why the close price is adjusted to avoid to have outliers in the price data.
2. Create a Series that contains a random boolean array with **p=0.5**
```console
Here an example of the expected time series
2010-01-01 1
2010-01-02 0
2010-01-03 0
2010-01-04 1
2010-01-05 0
Freq: D, Name: long_only_signal, dtype: int64
```
- The information is this series should be interpreted this way:
- On the 2010-01-01 I receive `1` before the market closes meaning that, if I trust the signal, the close price of day d+1 will increase. I should buy the stock before the market closes.
- On the 2010-01-02 I receive `0` before the market closes meaning that,, if I trust the signal, the close price of day d+1 will not increase. I should not buy the stock.
3. Backtest the signal created in Question 2. Here are some assumptions made to backtest this signal:
- When, at date d, the signal equals 1 we buy 1$ of stock just before the market closes and we sell the stock just before the market closes the next day.
- When, at date d, the signal equals 0, we do not buy anything.
- The profit is not reinvested, when invested, the amount is always 1$.
- Fees are not considered
**The expected output** is a **Series that gives for each day the return of the strategy. The return of the strategy is the PnL (Profit and Losses) divided by the invested amount**. The PnL for day d is:
`(money earned this day - money invested this day)`
Let's take the example of a 20% return for an invested amount of 1$. The PnL is `(1,2 - 1) = 0.2`. We notice that the PnL when the signal is 1 equals the daily return. The Pnl when the signal is 0 is 0.
By convention, we consider that the PnL of d is affected to day d and not d+1, even if the underlying return contains the information of d+1.
**The usage of for loop is not allowed**.
4. Compute the return of the strategy. The return of the strategy is defined as: `(Total earned - Total invested) / Total invested`
5. Now the input signal is: **always buy**. Compute the daily PnL and the total PnL. Plot the daily PnL of Q5 and of Q3 on the same plot
- https://www.investopedia.com/terms/b/backtesting.asp

33
one_exercise_per_file/day05/readme.md

@ -0,0 +1,33 @@
# D05 Piscine AI - Data Science
The goal of this day is to understand practical usage of Pandas.
Today we will discover some important functionalities of Pandas. they will allow you to manipulate the data (DataFrame and Series) in order to clean, delete, add, merge and leverage more information.
In Data Science this is crucial, because without cleaned data there's no algorithms learning.
Author:
# Table of Contents:
Historical part:
# Introduction
Not only is the pandas library a central component of the data science toolkit but it is used in conjunction with other libraries in that collection.
Pandas is built on top of the NumPy package, meaning a lot of the structure of NumPy is used or replicated in Pandas. Data in pandas is often used to feed statistical analysis in SciPy, plotting functfunctionsions from Matplotlib, and machine learning algorithms in Scikit-learn.
## Historical
## Rules
...
## Ressources
Pandas website
- https://jakevdp.github.io/PythonDataScienceHandbook/
- https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf
- https://www.learndatasci.com/tutorials/python-pandas-tutorial-complete-introduction-for-beginners/
Loading…
Cancel
Save