diff --git a/one_exercise_per_file/week02/day01/readme.md b/one_exercise_per_file/week02/day01/readme.md index 66c437b..f21043d 100644 --- a/one_exercise_per_file/week02/day01/readme.md +++ b/one_exercise_per_file/week02/day01/readme.md @@ -21,8 +21,7 @@ We will also learn progressively the Machine Learning methodology for supervised - Exercise 2 Linear regression in 1D - Exercise 3 Train test split - Exercise 4 Forecast diabetes progression -- Exercise 5 Forecast diabetes progression -- Bonus: Exercise 6 Gradient Descent - **Optional** +- Bonus: Exercise 5 Gradient Descent - **Optional** ## Virtual Environment diff --git a/one_exercise_per_file/week02/day02/readme.md b/one_exercise_per_file/week02/day02/readme.md index 85f97c5..c7d3e5c 100644 --- a/one_exercise_per_file/week02/day02/readme.md +++ b/one_exercise_per_file/week02/day02/readme.md @@ -1,12 +1,9 @@ # W2D02 Piscine AI - Data Science -Classification +## Classification with Scikit Learn -# Table of Contents: +The goal of this day is to understand practical classification. -# Introduction - -Classification Today we will learn a different approach in Machine Learning: the classification which is a large domain in the field of statistics and machine learning. Generally, it can be broken down in two areas: - **Binary classification**, where we wish to group an outcome into one of two groups. @@ -25,20 +22,39 @@ Logistic regression steps: - Compute sigmoid(size)=0.7 because the sigmoid returns values between 0 and 1 - Return the class: 0.7 > 0.5 => class 1. Thus, the gender is male -More details: +For the linear regression exercises, the loss (Mean Square Error - MSE) is minimized with an algorithm called **gradient descent**. In the classification, the loss MSE can't be used because the output of the model is 0 or 1 (for binary classification). -- https://towardsdatascience.com/understanding-logistic-regression-9b02c2aec102 +The **logloss** or **cross entropy** is the loss used for classification. Similarly, it has some nice mathematical properties. The minimization of the **logloss** is not covered in the exercises. However, since it is used in most machine learning models for classification, I recommend to spend some time reading the related article. -For the linear regression exercises, the loss (Mean Square Error - MSE) is minimized with an algorithm called **gradient descent**. In the classification, the loss MSE can't be used because the output of the model is 0 or 1 (for binary classification). -The **logloss** or **cross entropy** is the loss used for classification. Similarly, it has some nice mathematical properties. The minimization of the **logloss** is not covered in the exercises. However, since it is used in most machine learning models for classification, I recommend to spend some time reading the related article. This article gives a nice example of how it works: +## Exercises of the day + +- Exercise 1 Logistic regression with Scikit-learn +- Exercise 2 Sigmoid +- Exercise 3 Decision boundary +- Exercise 4 Train test split +- Exercise 5 Breast Cancer prediction +- Bonus: Exercise 6 Multi-class - **Optional** -- https://towardsdatascience.com/cross-entropy-for-classification-d98e7f974451 -- https://medium.com/swlh/what-is-logistic-regression-62807de62efa +## Virtual Environment +- Python 3.x +- NumPy +- Pandas +- Matplotlib +- Scikit Learn +- Jupyter or JupyterLab -## Historical +*Version of Scikit Learn I used to do the exercises: 0.22*. I suggest to use the most recent one. Scikit Learn 1.0 is finally available after ... 14 years. -## Rules +## Ressources + +### Logistic regression + +- https://towardsdatascience.com/understanding-logistic-regression-9b02c2aec102 + +### Logloss + +- https://towardsdatascience.com/cross-entropy-for-classification-d98e7f974451 -## Ressources \ No newline at end of file +- https://medium.com/swlh/what-is-logistic-regression-62807de62efa \ No newline at end of file