Browse Source

fix: clean description day 1 and 2 of week2

pull/42/head
Badr Ghazlane 3 years ago
parent
commit
5d83c4d776
  1. 3
      one_exercise_per_file/week02/day01/readme.md
  2. 44
      one_exercise_per_file/week02/day02/readme.md

3
one_exercise_per_file/week02/day01/readme.md

@ -21,8 +21,7 @@ We will also learn progressively the Machine Learning methodology for supervised
- Exercise 2 Linear regression in 1D
- Exercise 3 Train test split
- Exercise 4 Forecast diabetes progression
- Exercise 5 Forecast diabetes progression
- Bonus: Exercise 6 Gradient Descent - **Optional**
- Bonus: Exercise 5 Gradient Descent - **Optional**
## Virtual Environment

44
one_exercise_per_file/week02/day02/readme.md

@ -1,12 +1,9 @@
# W2D02 Piscine AI - Data Science
Classification
## Classification with Scikit Learn
# Table of Contents:
The goal of this day is to understand practical classification.
# Introduction
Classification
Today we will learn a different approach in Machine Learning: the classification which is a large domain in the field of statistics and machine learning. Generally, it can be broken down in two areas:
- **Binary classification**, where we wish to group an outcome into one of two groups.
@ -25,20 +22,39 @@ Logistic regression steps:
- Compute sigmoid(size)=0.7 because the sigmoid returns values between 0 and 1
- Return the class: 0.7 > 0.5 => class 1. Thus, the gender is male
More details:
For the linear regression exercises, the loss (Mean Square Error - MSE) is minimized with an algorithm called **gradient descent**. In the classification, the loss MSE can't be used because the output of the model is 0 or 1 (for binary classification).
- https://towardsdatascience.com/understanding-logistic-regression-9b02c2aec102
The **logloss** or **cross entropy** is the loss used for classification. Similarly, it has some nice mathematical properties. The minimization of the **logloss** is not covered in the exercises. However, since it is used in most machine learning models for classification, I recommend to spend some time reading the related article.
For the linear regression exercises, the loss (Mean Square Error - MSE) is minimized with an algorithm called **gradient descent**. In the classification, the loss MSE can't be used because the output of the model is 0 or 1 (for binary classification).
The **logloss** or **cross entropy** is the loss used for classification. Similarly, it has some nice mathematical properties. The minimization of the **logloss** is not covered in the exercises. However, since it is used in most machine learning models for classification, I recommend to spend some time reading the related article. This article gives a nice example of how it works:
## Exercises of the day
- Exercise 1 Logistic regression with Scikit-learn
- Exercise 2 Sigmoid
- Exercise 3 Decision boundary
- Exercise 4 Train test split
- Exercise 5 Breast Cancer prediction
- Bonus: Exercise 6 Multi-class - **Optional**
- https://towardsdatascience.com/cross-entropy-for-classification-d98e7f974451
- https://medium.com/swlh/what-is-logistic-regression-62807de62efa
## Virtual Environment
- Python 3.x
- NumPy
- Pandas
- Matplotlib
- Scikit Learn
- Jupyter or JupyterLab
## Historical
*Version of Scikit Learn I used to do the exercises: 0.22*. I suggest to use the most recent one. Scikit Learn 1.0 is finally available after ... 14 years.
## Rules
## Ressources
### Logistic regression
- https://towardsdatascience.com/understanding-logistic-regression-9b02c2aec102
### Logloss
- https://towardsdatascience.com/cross-entropy-for-classification-d98e7f974451
## Ressources
- https://medium.com/swlh/what-is-logistic-regression-62807de62efa
Loading…
Cancel
Save