fix: clean description day 1 and 2 of week2

3 years ago · 5d83c4d776
2 changed files with 31 additions and 16 deletions
--- a/one_exercise_per_file/week02/day01/readme.md
+++ b/one_exercise_per_file/week02/day01/readme.md
@ -21,8 +21,7 @@ We will also learn progressively the Machine Learning methodology for supervised
 - Exercise 2 Linear regression in 1D
 - Exercise 3 Train test split
 - Exercise 4 Forecast diabetes progression
- Exercise 5 Forecast diabetes progression
- Bonus: Exercise 6 Gradient Descent - **Optional**
+- Bonus: Exercise 5 Gradient Descent - **Optional**


 ## Virtual Environment 
--- a/one_exercise_per_file/week02/day02/readme.md
+++ b/one_exercise_per_file/week02/day02/readme.md
@ -1,12 +1,9 @@
 # W2D02  Piscine AI - Data Science

-Classification
+## Classification with Scikit Learn

-# Table of Contents:
+The goal of this day is to understand practical classification. 

-# Introduction
-
-Classification
 Today we will learn a different approach in Machine Learning: the classification which is a large domain in the field of statistics and machine learning. Generally, it can be broken down in two areas:

 - **Binary classification**, where we wish to group an outcome into one of two groups.
@ -25,20 +22,39 @@ Logistic regression steps:
 - Compute sigmoid(size)=0.7 because the sigmoid returns values between 0 and 1
 - Return the class: 0.7 > 0.5 => class 1. Thus, the gender is male

-More details:
+For the linear regression exercises, the loss (Mean Square Error - MSE) is minimized with an algorithm called **gradient descent**. In the classification, the loss MSE  can't be used because the output of the model is 0 or 1 (for binary classification).

- https://towardsdatascience.com/understanding-logistic-regression-9b02c2aec102
+The **logloss** or **cross entropy** is the loss used for classification. Similarly, it has some nice mathematical properties. The minimization of the **logloss** is not covered in the exercises. However, since it is used in most machine learning models for classification, I recommend to spend some time reading the related article. 

-For the linear regression exercises, the loss (Mean Square Error - MSE) is minimized with an algorithm called **gradient descent**. In the classification, the loss MSE  can't be used because the output of the model is 0 or 1 (for binary classification).

-The **logloss** or **cross entropy** is the loss used for classification. Similarly, it has some nice mathematical properties. The minimization of the **logloss** is not covered in the exercises. However, since it is used in most machine learning models for classification, I recommend to spend some time reading the related article. This article gives a nice example of how it works:
+## Exercises of the day
+
+- Exercise 1 Logistic regression with Scikit-learn
+- Exercise 2 Sigmoid
+- Exercise 3 Decision boundary
+- Exercise 4 Train test split
+- Exercise 5 Breast Cancer prediction
+- Bonus: Exercise 6 Multi-class - **Optional**

- https://towardsdatascience.com/cross-entropy-for-classification-d98e7f974451

- https://medium.com/swlh/what-is-logistic-regression-62807de62efa
+## Virtual Environment 
+- Python 3.x
+- NumPy
+- Pandas
+- Matplotlib
+- Scikit Learn
+- Jupyter or JupyterLab

-## Historical
+*Version of Scikit Learn I used to do the exercises: 0.22*. I suggest to use the most recent one. Scikit Learn 1.0 is finally available after ... 14 years. 

-## Rules
+## Ressources
+
+### Logistic regression
+
+- https://towardsdatascience.com/understanding-logistic-regression-9b02c2aec102
+
+### Logloss
+
+- https://towardsdatascience.com/cross-entropy-for-classification-d98e7f974451

-## Ressources
+- https://medium.com/swlh/what-is-logistic-regression-62807de62efa