mirror of https://github.com/01-edu/Branch-AI.git
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
2.8 KiB
2.8 KiB
Exercise 6 Multi-class (Optional)
The goal of this exercise is to learn to train a classification algorithm on a multi-class labelled data. Some algorithms as SVM or Logistic Regression do not natively support multi-class (more than 2 classes). There are some approaches that allow to use these algorithms on multi-class data. Let's assume we work with 3 classes: A, B and C.
- One-vs-Rest considers 3 binary classification problems: A vs B,C; B vs A,C and C vs A,B. If there are 10 classes, 10 binary classification problems would be fitted.
- One-vs-One considers 3 binary classification problems: A vs B, A vs C, B vs C. If there are 10 classes, 45 binary classification problems would be fitted. Given, the volume of data, this technique may not be scalable.
More details:
Let's implement the One-vs-Rest approach from LogisticRegression
.
Preliminary:
- Import the Setosa data set from Scikit-learn
from sklearn.datasets import load_iris
iris = load_iris()
X = pd.DataFrame(data=iris['data'], columns=iris.feature_names)
y = pd.DataFrame(data=iris['target'], columns=['target'])
- Using train_test_split, split the data set in a train set and test set (20%) with
shuffle=True
andrandom_state=43
.
- Create a function that takes as input the data and returns three trained classifiers.
clf0
takes as input a binary data set where the class 1 is0
and class 0 is1
and2
.clf1
takes as input a binary data set where the class 1 is1
and class 0 is0
and2
.clf2
takes as input a binary data set where the class 1 is2
and class 0 is0
and1
.
def train(X_train,y_train):
#TODO
return clf0, clf1, clf2
- Create a function that takes as input the trained classifiers and the features set and that returns the predicted class. Use
predict_one_vs_all
to output the predicted classes on the test set. Compare the results with Logistic Regression algorithm from scikit learn used in One-vs-All mode. The results may change because the solver may not converge. Later this week, we will learn to preprocess the data to avoid convergence issues.
clf0
outputs the probability to belong to the class 1 which is0
.clf1
outputs the probability to belong to the class 1 which is1
.clf2
outputs the probability to belong to the class 1 which is2
.
The predicted class is the one that gets the highest probability among the three models.
def predict_one_vs_all(X, clf0, clf1, clf2 ):
#TODO
return classes