mirror of https://github.com/01-edu/Branch-AI.git
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
1.4 KiB
1.4 KiB
Exercise 4 Classification
The goal of this exercise is to learn to evaluate a machine learning model using many classification metrics.
Preliminary:
- Import Breast Cancer data set and split it in a train set and a test set (20%). Fit a logistic regression on the data set. The goal is focus on the metrics, that is why the code to fit the logistic Regression is given.
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
X , y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.20)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
classifier = LogisticRegression()
classifier.fit(X_train_scaled, y_train)
-
Predict on the train set and test set
-
Compute F1, accuracy, precision, recall, roc_auc scores on the train set and test set. Print the confusion matrix on the test set results.
Note: AUC can only be computed on probabilities, not on classes.
- Plot the AUC curve for on the test set using roc_curve of scikit learn. There many ways to create this plot. It should look like this: