feat: update project and correction

3 years ago · d4872dbe3b
2 changed files with 109 additions and 14 deletions
--- a/one_md_per_day_format/projects/project3/project3.md
+++ b/one_md_per_day_format/projects/project3/project3.md
@ -42,20 +42,25 @@ The two steps are detailed below.
 ## Face emotions classfication


-Emotion detection is one of the most researched topics in the modern-day machine learning arena. The ability to accurately detect and identify an emotion opens up numerous doors for Advanced Human Computer Interaction. The aim of this project is to detect up to five distinct facial emotions in real time. This project runs on top of a Convolutional Neural Network (CNN) that is built with the help of Keras whose backend is TensorFlow in Python. The facial emotions that can be detected and classified by this system are Happy, Sad, Anger, Surprise and Neutral.
+Emotion detection is one of the most researched topics in the modern-day machine learning arena. The ability to accurately detect and identify an emotion opens up numerous doors for Advanced Human Computer Interaction. The aim of this project is to detect up to seven distinct facial emotions in real time. This project runs on top of a Convolutional Neural Network (CNN) that is built with the help of Keras whose backend is TensorFlow in Python. The facial emotions that can be detected and classified by this system are Happy, Sad, Angry, Surprise, Fear, Disgust and Neutral.
+

 Your goal is to implement a program that takes as input a video stream that contains a person's face and that predicts the emotion of the person. 

 **Step 1**: **Fit the emotion classifier**

- Train a CNN on the dataset TODO le nommer. Here is an example of architecture you can implement: https://www.quora.com/What-is-the-VGG-neural-network . **The CNN has to perform more than 65% on the test set**. You will see that the CNNs take a lot of time to train. You don't want to overfit the neural network. I strongly suggest to use early stopping, callbacks and to monitor the training using the tensorboard.
+- Train a CNN on the dataset `train.csv`. Here is an example of architecture you can implement: https://www.quora.com/What-is-the-VGG-neural-network . **The CNN has to perform more than 70% on the test set**. You will see that the CNNs take a lot of time to train. You don't want to overfit the neural network. I strongly suggest to use early stopping, callbacks and to monitor the training using the tensorboard. 
+
+You have to save the trained model in `my_own_model.pkl` and to explain the chosen architecture in `my_own_model_architecture.txt`. Use `model.summary())` to print the architecture. It is also expected that you explains the iterations and how you end up choosing your final architecture. Save a screenshot of the tensorboard while the model's training in `tensorboard.png` and save a plot with the learning curves showing the model training and stopping BEFORE the model starts overfitting in `learning_curves.png`. 

- Optional: Use a pre-trained CNN to improve the accuracy. You will find some huge CNN's architecture that perform well. The issue is that it is expensive to train them from scratch. You'll need a lot of GPUs, memory and time. **Pre-trained CNNs** solve partially this issue because they are already trained on a dataset and perform well on some use cases. However, building a CNN from scratch is required, as mentioned, this step is optional and doesn't replace the first one. 
+- Optional: Use a pre-trained CNN to improve the accuracy. You will find some huge CNN's architecture that perform well. The issue is that it is expensive to train them from scratch. You'll need a lot of GPUs, memory and time. **Pre-trained CNNs** solve partially this issue because they are already trained on a dataset and perform well on some use cases. However, building a CNN from scratch is required, as mentioned, this step is optional and doesn't replace the first one. Similarly, save the model and explain the chosen architecture. 

 **Step 2**: **Classify emotions from a video stream** 


- Use the video stream outputted by your computer's webcam and preprocess it to make it compatible with the CNN you trained. One of the preprocessing steps is: face detection. As you may have seen the training samples are imaged centered on a face. To do so, I suggest to use a pre-trained model to detect faces.  OpenCV for image processing tasks where we identify a face from a live webcam feed which is then processed and fed into the trained neural network for emotion detection.
+- Use the video stream outputted by your computer's webcam and preprocess it to make it compatible with the CNN you trained. One of the preprocessing steps is: face detection. As you may have seen the training samples are imaged centered on a face. To do so, I suggest to use a pre-trained model to detect faces.  OpenCV for image processing tasks where we identify a face from a live webcam feed which is then processed and fed into the trained neural network for emotion detection. The preprocessing pipeline will be corrected with a functional test in `preprocessing_test`: 
+    - **Input**: Video stream of 20 sec with a face on it
+    - **Output**: 20 (or 21) images cropped and centered on the face with 48 x 48 grayscale pixels

 - Predict at least one emotion per second from the video stream. The minimum requirement is printing in the prompt the predicted emotion with its associated probability. 

@ -76,15 +81,30 @@ project
 │   environment.yml    
 │
 └───data
-│   │   emotion_classfication.csv
-│
+│   │   train.csv
+│   │   test.csv
+│   │   xxx.csv
 │
 └───results
-│   │   my_own_model.pkl
-│   │   pre_trained_model.pkl (optional)
-│   │   my_own_model_architecture.txt
-│   │   pre_trained_model_architecture.txt (optional)
-│   │   hacked_image.png   (optional)
+│   │   
+|   |───model (free format)
+│   │   │   my_own_model.pkl
+│   │   │   my_own_model_architecture.txt
+│   │   │   tensorboard.png 
+│   │   │   learning_curves.png 
+│   │   │   pre_trained_model.pkl (optional)
+│   │   │   pre_trained_model_architecture.txt (optional)
+│   │  
+|   |───hack_cnn (free format)
+│   │   │   hacked_image.png   (optional)
+│   │   │   input_image.png
+│   │   
+|   |───preprocessing_test 
+|   |   |   input_video.mp4  (free format)
+│   │   │   image0.png  (free format)
+│   │   │   image1.png
+│   │   │   imagen.png
+│   │   │   image20.png
 |
 |───scripts
 │   │   train.py
@ -97,7 +117,8 @@ project


 ``` 
-pre_trained_model_architecture.txt: architecture and source
+
+- Run **predict.py** expected output:

 ```prompt 
 python predict.py
@ -105,6 +126,7 @@ python predict.py
 Accuracy on test set: 72%

 ```
+- Run **predict_live_stream.py** expected output:

 ```prompt 
 python predict_live_stream.py
@ -139,3 +161,4 @@ Preprocessing ...
 - http://ice.dlut.edu.cn/valse2018/ppt/WeihongDeng_VALSE2018.pdf

 - https://arxiv.org/pdf/1812.06387.pdf
+
--- a/one_md_per_day_format/projects/project3/project3_correction.md
+++ b/one_md_per_day_format/projects/project3/project3_correction.md
@ -1,4 +1,76 @@
-https://github.com/XC-Li/Facial_Expression_Recognition/tree/master/Code/RAFDB
+# Computer vision correction 

+## CNN emotion classifier

-https://github.com/karansjc1/emotion-detection/tree/master/with%20flask
+
+### The first step is validated if:
+- the model is trained only the training set 
+- the accuracy on the test set is higher than 70% 
+- the model learning curves should prove that the model is not overfitting 
+- the training has been stopped early enough to avoid the overfitting
+- a screenshot shows the usage of the tensorboard
+- the text document should explain why the architecture was chosen and what were the previous iterations 
+- predict.py runs without any error and returns: 
+
+
+    ```prompt 
+    python predict.py
+
+    Accuracy on test set: 72%
+
+    ```
+
+
+### The second step is validated if: 
+
+
+
+- the preprocessing pipeline takes as input the webcam video stream and saves in a separate folder preprocessed* images
+- preprocessing requirement*:
+    - sample at least one image per second
+    - the face is detected on the image
+    - the image is reshaped and centered (the face is at the center of the image)
+    - the algorithm that detects the face is imported via cv2
+    - the image is converted to 48 x 48 grayscale pixels' image
+- the trained model takes as input the preprocessed image and predicts the emotion with the associated probability
+    
+- **predict_live_stream.py** runs without any error and returns: 
+
+    ```prompt 
+    python predict_live_stream.py
+
+    Reading video stream ... 
+
+    Preprocessing ... 
+    11:11:11s : Happy , 73% 
+
+    Preprocessing ... 
+    11:11:12s : Happy , 93% 
+
+    Preprocessing ... 
+    11:11:13s : Surprise , 71% 
+
+    Preprocessing ... 
+    11:11:14s : Neutral , 82% 
+
+    ... 
+
+    Preprocessing ... 
+    11:13:29s : Happy , 63% 
+
+    ```
+#### Hack the CNN - explanation:
+
+The neural network trains by updating its weights given the training error. If an image is misclassfied the neural network changes its weight to classify it correctly. The trick is to keep the neural network's weights unchanged and to modify the input pixels in order to force the neural network to predict the wanted class. 
+This part is validated if:
+- an image from the database that gives more than 90% probability of Happy
+- the neural network modifies the input pixels to predict Sad
+- the modified image is SLIGHTLY changed. It means that you recognies very easily the original image. 
+- the predicted class of the modified image is Sad
+
+
+
+Here are three ressources that detail similar approaches: 
+- https://github.com/XC-Li/Facial_Expression_Recognition/tree/master/Code/RAFDB
+- https://github.com/karansjc1/emotion-detection/tree/master/with%20flask
+- https://www.kaggle.com/drbeanesp21/aliaj-final-facial-expression-recognition (simplified)