2.2 KiB
W2D01 Piscine AI - Data Science
The goal of this day is to understand practical Linear regression and supervised learning.
Author:
Table of Contents
Historical part:
Introduction
The word "regression" was introduced by Sir Francis Galton (a cousin of C. Darwin) when he studied the size of individuals within a progeny. He was trying to understand why large individuals in a population appeared to have smaller children, more close to the average population size; hence the introduction of the term "regression".
Today we will learn a basic algorithm used in supervised learning : The Linear Regression. We will be using Scikit-learn which is a machine learning library. It is designed to interoperate with the Python libraries NumPy and Pandas.
We will also learn progressively the Machine Learning methodology for supervised learning - today we will focus on evaluating a machine learning model by splitting the data set in a train set and a test set.
'0.22.1'
Rules
Ressources
To start with Scikit-learn
-
https://scikit-learn.org/stable/tutorial/basic/tutorial.html
-
https://jakevdp.github.io/PythonDataScienceHandbook/05.02-introducing-scikit-learn.html
Machine learning methodology and algorithms
-
This course provides a broad introduction to machine learning, datamining, and statistical pattern recognition. Andrew Ng is a star in the Machine Learning community. I recommend to spend some time during the projects to focus on some algorithms. However, Python is not the language used for the course. https://www.coursera.org/learn/machine-learning
-
https://docs.microsoft.com/en-us/azure/machine-learning/algorithm-cheat-sheet
Linear Regression
-
https://towardsdatascience.com/laymans-introduction-to-linear-regression-8b334a3dab09
-
https://towardsdatascience.com/linear-regression-the-actually-complete-introduction-67152323fcf2