Browse Source

fix: clean description of the day md

pull/42/head
Badr Ghazlane 3 years ago
parent
commit
e2c4e44636
  1. 36
      one_exercise_per_file/week01/day01/readme.md
  2. 35
      one_exercise_per_file/week01/day02/readme.md
  3. 52
      one_exercise_per_file/week01/day03/readme.md
  4. 33
      one_exercise_per_file/week01/day04/readme.md
  5. 42
      one_exercise_per_file/week01/day05/readme.md

36
one_exercise_per_file/week01/day01/readme.md

@ -1,27 +1,27 @@
# D01 Piscine AI - Data Science
# W1D01 Piscine AI - Data Science
## NumPy
The goal of this day is to understand practical usage of **NumPy**. **NumPy** is a commonly used Python data analysis package. By using **NumPy**, you can speed up your workflow, and interface with other packages in the Python ecosystem, like scikit-learn, that use **NumPy** under the hood. **NumPy** was originally developed in the mid 2000s, and arose from an even older package called Numeric. This longevity means that almost every data analysis or machine learning package for Python leverages **NumPy** in some way.
Version of NumPy I used to do the exercises: 1.18.1
## Exercises of the day
I suggest to use the most recent one.
Author:
<div style="page-break-after: always"></div>
# Outline: (optional)
- Exercise 1 Your first NumPy array
- Exercise 2 Zeros
- Exercise 3 Slicing
- Exercise 4 Random
- Exercise 5: Split, concatenate, reshape arrays
- Exercise 6: Broadcasting and Slicing
- Exercise 7: NaN
- Exercise 8: Wine
- Exercise 9 Football tournament
A. Introduction
## Virtual Environment
- Python 3.x
- NumPy
- Jupyter or JupyterLab
B. Rules
C. Exercises
## Rules
... Notebook Colabs or Jupyter Notebook
Save one notebook per day or one per exercise. Use markdown to divide your notebook in different exercises.
*Version of NumPy I used to do the exercises: 1.18.1*.
I suggest to use the most recent one.
## Ressources

35
one_exercise_per_file/week01/day02/readme.md

@ -1,26 +1,31 @@
# D02 Piscine AI - Data Science
# W1D02 Piscine AI - Data Science
Author:
## Pandas
# Table of Contents:
Historical part:
The goal of this day is to understand practical usage of **Pandas**.
As **Pandas** in intensively used in Data Science, other days of the piscine will be dedicated to it.
# Introduction
Not only is the **Pandas** library a central component of the data science toolkit but it is used in conjunction with other libraries in that collection.
The goal of this day is to understand practical usage of Pandas.
As Pandas in intensively used in Data Science, other days of the piscine will be dedicated to it.
Not only is the Pandas library a central component of the data science toolkit but it is used in conjunction with other libraries in that collection.
Pandas is built on top of the NumPy package, meaning a lot of the structure of NumPy is used or replicated in Pandas. Data in pandas is often used to feed statistical analysis in SciPy, plotting functions from Matplotlib, and machine learning algorithms in Scikit-learn.
**Pandas** is built on top of the NumPy package, meaning a lot of the structure of NumPy is used or replicated in **Pandas**. Data in **Pandas** is often used to feed statistical analysis in SciPy, plotting functions from Matplotlib, and machine learning algorithms in Scikit-learn.
Most of the topics we will cover today are explained and describes with examples in the first resource. The number of exercises is low on purpose: Take the time to understand the chapter 5 of the resource, even if there are 40 pages.
The version of Pandas I used is '1.0.1'.
## Exercises of the day
- Exercise 1 Your first DataFrame
- Exercise 2 Electric power consumption
- Exercise 3 E-commerce purchases
- Exercise 4 Handling missing values
## Rules
## Virtual Environment
- Python 3.x
- NumPy
- Pandas
- Jupyter or JupyterLab
...
*Version of Pandas I used to do the exercises: 1.0.1*.
I suggest to use the most recent one.
## Resources
@ -40,4 +45,4 @@ It contains ALL you need to know about Pandas.
- https://www.learndatasci.com/tutorials/python-pandas-tutorial-complete-introduction-for-beginners/
- https://jakevdp.github.io/PythonDataScienceHandbook/03.04-missing-values.html
- https://jakevdp.github.io/PythonDataScienceHandbook/03.04-missing-values.html

52
one_exercise_per_file/week01/day03/readme.md

@ -1,31 +1,47 @@
# D03 Piscine AI - Data Science
# W1D03 Piscine AI - Data Science
Author:
# Introduction
## Visualizations
While working on a dataset it is important to check the distribution of the data. Obviously, for most of humans it is difficult to visualize the data in more than 3 dimensions
Viz is important to understand the data and to show results. We have already seen there are some basinc viz functionalities in Pandas.
Now we'll discover two of the most know viz libraries in Python:
"Viz" is important to understand the data and to show results. We'll discover three libraries to visualize data in Python. These are one of the most used visualisation "libraries" in Python:
- Pandas viz
- Pandas visualization module
- Matplotlib
- Plotly
Pandas viz is pratique: rapid plot, relies on Matplotlib. (check matplotlib doc sometimes not all params are detailed in pandas doc)
For more elaborate plots Matplotlib is necessary
The goal is to understand the basics of those libraries. You'll have time during the project to master one (or the three) of them.
You may wonder why using one library is not enough. The reason is simple: it depends on the usage.
For example if you want to check the data quickly you may want to use Pandas viz module or Matplotlib.
If you want to plot a custom and more elaborated plot I suggest to use Matplotlib or Plotly.
And, if you want to create a very nice and interactive plot I suggest to use Plotly.
## Exercises of the day
- Exercise 1 Pandas plot 1
- Exercise 2 Pandas plot 2
- Exercise 3 Matplotlib 1
- Exercise 4 Matplotlib 2
- Exercise 5 Matplotlib subplots
- Exercise 6 Plotly 1
- Exercise 7 Plotly Box plots
And finaly Plotly is a interactive plot library.s
## Rules
## Virtual Environment
- Python 3.x
- NumPy
- Pandas
- Matplotlib
- Plotly
- Jupyter or JupyterLab
I suggest to use the most recent version of the packages.
Always a title, legend, ...s
## Resources
## Ressources
s
https://matplotlib.org/3.3.3/tutorials/index.html
https://towardsdatascience.com/matplotlib-tutorial-learn-basics-of-pythons-powerful-plotting-library-b5d1b8f67596
- https://matplotlib.org/3.3.3/tutorials/index.html
- https://towardsdatascience.com/matplotlib-tutorial-learn-basics-of-pythons-powerful-plotting-library-b5d1b8f67596
https://github.com/rougier/matplotlib-tutorial
https://jakevdp.github.io/PythonDataScienceHandbook/05.13-kernel-density-estimation.html
- https://github.com/rougier/matplotlib-tutorial
- https://jakevdp.github.io/PythonDataScienceHandbook/05.13-kernel-density-estimation.html

33
one_exercise_per_file/week01/day04/readme.md

@ -1,20 +1,35 @@
# D04 Piscine AI - Data Science
# W1D04 Piscine AI - Data Science
Author:
## Data wrangling with Pandas
# Table of Contents:
Data wrangling is one of the crucial tasks in data science and analysis which includes operations like:
Historical part:
- Data Sorting: To rearrange values in ascending or descending order.
- Data Filtration: To create a subset of available data.
- Data Reduction: To eliminate or replace unwanted values.
- Data Access: To read or write data files.
- Data Processing: To perform aggregation, statistical, and similar operations on specific values.
Ax explained before, Pandas is an open source library, specifically developed for data science and analysis. It is built upon the Numpy (to handle numeric data in tabular form) package and has inbuilt data structures to ease-up the process of data manipulation, aka data munging/wrangling.
Data wrangling, unify source of data ...
## Exercises of the day
# Introduction
- Exercise 1 Concatenate
- Exercise 2 Merge
- Exercise 3 Merge MultiIndex
- Exercise 4 Groupby Apply
- Exercise 5 Groupby Agg
- Exercise 6 Unstack
...
## Virtual Environment
- Python 3.x
- NumPy
- Pandas
- Jupyter or JupyterLab
## Resources
*Version of Pandas I used to do the exercises: 1.0.1*.
I suggest to use the most recent one.
Pandas website
## Resources
- https://jakevdp.github.io/PythonDataScienceHandbook/

42
one_exercise_per_file/week01/day05/readme.md

@ -1,33 +1,37 @@
# D05 Piscine AI - Data Science
# W1D05 Piscine AI - Data Science
The goal of this day is to understand practical usage of Pandas.
Today we will discover some important functionalities of Pandas. they will allow you to manipulate the data (DataFrame and Series) in order to clean, delete, add, merge and leverage more information.
## Time Series with Pandas
In Data Science this is crucial, because without cleaned data there's no algorithms learning.
Time series data are data that are indexed by a sequence of dates or times. Today, you'll learn how to use methods built into Pandas to work with this index. You'll also learn for instance:
- to resample time series to change the frequency
- to calculate rolling and cumulative values for times series
- to build a backtest
Author:
Time series a used A LOT in finance. You'll learn to evaluate financial strategies using Pandas. It is important to keep in mind that Python is vectorized. That's why some questions constraint you to not use a for loop ;-).
# Table of Contents:
## Exercises of the day
Historical part:
- Exercise 1 Series
- Exercise 2 Financial data
- Exercise 3 Multi asset returns
- Exercise 4 Backtest
# Introduction
Not only is the pandas library a central component of the data science toolkit but it is used in conjunction with other libraries in that collection.
## Virtual Environment
- Python 3.x
- NumPy
- Pandas
- Jupyter or JupyterLab
Pandas is built on top of the NumPy package, meaning a lot of the structure of NumPy is used or replicated in Pandas. Data in pandas is often used to feed statistical analysis in SciPy, plotting functfunctionsions from Matplotlib, and machine learning algorithms in Scikit-learn.
*Version of Pandas I used to do the exercises: 1.0.1*.
I suggest to use the most recent one.
## Historical
## Rules
...
## Ressources
Pandas website
## Resources
- https://jakevdp.github.io/PythonDataScienceHandbook/
- https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf
- https://www.learndatasci.com/tutorials/python-pandas-tutorial-complete-introduction-for-beginners/
- https://towardsdatascience.com/different-ways-to-iterate-over-rows-in-a-pandas-dataframe-performance-comparison-dc0d5dcef8fe

Loading…
Cancel
Save