Browse Source

chore(emotion-detector): fix small grammar mistakes

pull/1196/merge
nprimo 3 months ago committed by Niccolò Primo
parent
commit
ce8a0b67a8
  1. 26
      subjects/ai/emotions-detector/README.md
  2. 14
      subjects/ai/emotions-detector/audit/README.md

26
subjects/ai/emotions-detector/README.md

@ -1,15 +1,15 @@
# Emotions detection with Deep Learning
Cameras are everywhere. Videos and images have become one of the most interesting data sets for artificial intelligence.
Image processing is a quite board research area, not just filtering, compression, and enhancement. Besides, we are even interested in the question, “what is in images?”, i.e., content analysis of visual inputs, which is part of the main task of computer vision. The study of computer vision could make possible such tasks as 3D reconstruction of scenes, motion capturing, and object recognition, which are crucial for even higher-level intelligence such as
Image processing is a quite broad research area, not just filtering, compression, and enhancement. Besides, we are even interested in the question, “what is in images?”, i.e., content analysis of visual inputs, which is part of the main task of computer vision. The study of computer vision could make possible such tasks as 3D reconstruction of scenes, motion capturing, and object recognition, which are crucial for even higher-level intelligence such as
image and video understanding, and motion understanding.
For this 2 months project we will focus on two tasks:
- emotion classfication
- emotion classification
- face tracking
With the computing power exponentially increasing the computer vision field has been developping exponentially. This is a key element because the computer power allows to use more easily a type of neural networks very powerful on images: CNN's (Convolutional Neural Networks). Before the CNN's were democratized, the algorithms used relied a lot on human analysis to extract features which obviously time consuming and not reliable. If you're interested in the "old school methodology" this article explains it: towardsdatascience.com/classifying-facial-emotions-via-machine-learning-5aac111932d3.
The history behind this field is fascinating ! Here is a short summary of its history https://kapernikov.com/basic-introduction-to-computer-vision/
With the computing power exponentially increasing the computer vision field has been developing exponentially. This is a key element because the computer power allows using more easily a type of neural networks very powerful on images: CNN's (Convolutional Neural Networks). Before the CNNs were democratized, the algorithms used relied a lot on human analysis to extract features which obviously time-consuming and not reliable. If you're interested in the "old school methodology" this article explains it: towardsdatascience.com/classifying-facial-emotions-via-machine-learning-5aac111932d3.
The history behind this field is fascinating! Here is a short summary of its history https://kapernikov.com/basic-introduction-to-computer-vision/
### Project goal and suggested timeline
@ -19,7 +19,7 @@ The goal of the project is to implement a **system that detects the emotion on a
- detect a face in an image
- train a CNN to detect the emotion on a face
That is why I suggest to start the project with a preliminary step. The goal of this step is to understand how CNNs work and how to classify images. This preliminary step should take approximately **two weeks**.
That is why I suggest starting the project with a preliminary step. The goal of this step is to understand how CNNs work and how to classify images. This preliminary step should take approximately **two weeks**.
Then starts the emotion detection in a webcam video stream step that will last until the end of the project !
@ -27,9 +27,9 @@ The two steps are detailed below.
### Preliminary:
- Take this lesson. This course is a reference for many reasons and one of them is the creator: **Andrew Ng**. He explains the basics of CNNs but also some more advanced topics as transfer learning, siamese networks etc ... I suggest to focus on Week 1 and 2 and to spend less time on Week 3 and 4. Don't worry the time scoping of such MOOCs are conservative ;-). Here is the link: https://www.coursera.org/learn/convolutional-neural-networks . You can attend the lessons for free !
- Take this lesson. This course is a reference for many reasons and one of them is the creator: **Andrew Ng**. He explains the basics of CNNs but also some more advanced topics as transfer learning, siamese networks etc ... I suggest to focus on Week 1 and 2 and to spend less time on Week 3 and 4. Don't worry the time scoping of such MOOCs are conservative ;-). Here is the link: https://www.coursera.org/learn/convolutional-neural-networks. You can attend the lessons for free !
- Participate to this challenge: https://www.kaggle.com/c/digit-recognizer/code . The MNIST dataset is a reference in computer vision. Researchers use it as a benchmark to compare their models. Start first with a logistic regression to understand how to handle images in Python. And then train your first CNN on this data set.
- Participate in this challenge: https://www.kaggle.com/c/digit-recognizer/code. The MNIST dataset is a reference in computer vision. Researchers use it as a benchmark to compare their models. Start first with a logistic regression to understand how to handle images in Python. And then train your first CNN on this data set.
### Face emotions classification
@ -39,22 +39,22 @@ Your goal is to implement a program that takes as input a video stream that cont
**Step 1**: **Fit the emotion classifier**
- Train a CNN on the dataset `train.csv`. Here is an example of architecture you can implement: https://www.quora.com/What-is-the-VGG-neural-network . **The CNN has to perform more than 70% on the test set**. You will see that the CNNs take a lot of time to train. You don't want to overfit the neural network. I strongly suggest to use early stopping, callbacks and to monitor the training using the tensorboard.
- Train a CNN on the dataset `train.csv`. Here is an example of architecture you can implement: https://www.quora.com/What-is-the-VGG-neural-network. **The CNN has to perform more than 70% on the test set**. You will see that the CNNs take a lot of time to train. You don't want to overfit the neural network. I strongly suggest to use early stopping, callbacks and to monitor the training using the `TensorBoard`.
You have to save the trained model in `my_own_model.pkl` and to explain the chosen architecture in `my_own_model_architecture.txt`. Use `model.summary())` to print the architecture. It is also expected that you explains the iterations and how you end up choosing your final architecture. Save a screenshot of the tensorboard while the model's training in `tensorboard.png` and save a plot with the learning curves showing the model training and stopping BEFORE the model starts overfitting in `learning_curves.png`.
You have to save the trained model in `my_own_model.pkl` and to explain the chosen architecture in `my_own_model_architecture.txt`. Use `model.summary())` to print the architecture. It is also expected that you explain the iterations and how you end up choosing your final architecture. Save a screenshot of the `TensorBoard` while the model's training in `tensorboard.png` and save a plot with the learning curves showing the model training and stopping BEFORE the model starts overfitting in `learning_curves.png`.
- Optional: Use a pre-trained CNN to improve the accuracy. You will find some huge CNN's architecture that perform well. The issue is that it is expensive to train them from scratch. You'll need a lot of GPUs, memory and time. **Pre-trained CNNs** solve partially this issue because they are already trained on a dataset and perform well on some use cases. However, building a CNN from scratch is required, as mentioned, this step is optional and doesn't replace the first one. Similarly, save the model and explain the chosen architecture.
**Step 2**: **Classify emotions from a video stream**
- Use the video stream outputted by your computer's webcam and preprocess it to make it compatible with the CNN you trained. One of the preprocessing steps is: face detection. As you may have seen the training samples are imaged centered on a face. To do so, I suggest to use a pre-trained model to detect faces. OpenCV for image processing tasks where we identify a face from a live webcam feed which is then processed and fed into the trained neural network for emotion detection. The preprocessing pipeline will be corrected with a functional test in `preprocessing_test`:
- Use the video stream outputted by your computer's webcam and preprocess it to make it compatible with the CNN you trained. One of the preprocessing steps is: face detection. As you may have seen the training samples are imaged centered on a face. To do so, I suggest using a pre-trained model to detect faces. OpenCV for image processing tasks where we identify a face from a live webcam feed which is then processed and fed into the trained neural network for emotion detection. The preprocessing pipeline will be corrected with a functional test in `preprocessing_test`:
- **Input**: Video stream of 20 sec with a face on it
- **Output**: 20 (or 21) images cropped and centered on the face with 48 x 48 grayscale pixels
- Predict at least one emotion per second from the video stream. The minimum requirement is printing in the prompt the predicted emotion with its associated probability. If there's any problem related to the webcam use as input the a recorded video stream.
- Predict at least one emotion per second from the video stream. The minimum requirement is printing in the prompt the predicted emotion with its associated probability. If there's any problem related to the webcam use as input the recorded video stream.
For that step, I suggest again to use **OpenCV** as much as possible. This link shows how to work with a video stream with OpenCV. OpenCV documentation may become deprecated in the futur. However, OpenCV will always provide tools to work with video streams, so search on the internet for OpenCV documentation and more specifically "opencv video streams". https://docs.opencv.org/4.x/dd/d43/tutorial_py_video_display.html
For that step, I suggest again to use **OpenCV** as much as possible. This link shows how to work with a video stream with OpenCV. OpenCV documentation may become deprecated in the future. However, OpenCV will always provide tools to work with video streams, so search on the internet for OpenCV documentation and more specifically "openCV video streams". https://docs.opencv.org/4.x/dd/d43/tutorial_py_video_display.html
- Optional: **(very cool)** Hack the CNN. Take a picture for which the prediction of your CNN is **Happy**. Now, hack the CNN: using the same image **SLIGHTLY** modified make the CNN predict **Sad**. https://medium.com/@ageitgey/machine-learning-is-fun-part-8-how-to-intentionally-trick-neural-networks-b55da32b7196
@ -127,7 +127,7 @@ Preprocessing ...
```
### Useful ressources:
### Useful resources:
- https://machinelearningmastery.com/what-is-computer-vision/

14
subjects/ai/emotions-detector/audit/README.md

@ -4,7 +4,7 @@
###### Does the structure of the project is equivalent to the one described in the subject `Delivery` section?
###### Does the readme file summarize how to run the code and explain the global approach?
###### Does the README file summarize how to run the code and explain the global approach?
###### Does the environment contain all libraries used and their versions that are necessary to run the code?
@ -20,9 +20,9 @@
###### Has the training been stopped early enough to avoid the overfitting?
###### Does the screenshot show the usage of the tensorboard to monitor the training?
###### Does the screenshot show the usage of the `TensorBoard` to monitor the training?
###### Does the text document explain why the architecture was chosen and what were the previous iterations?
###### Does the text document explain why the architecture was chosen, and what were the previous iterations?
###### Does the following command `python ./scripts/predict.py` run without any error and returns an accuracy greater than 70%?
@ -45,7 +45,7 @@
###### Is the image converted to 48 x 48 grayscale pixels' image?
###### If there's an issue related to the webcam, does the code takes as input a video recorded video stream?
###### If there's an issue related to the webcam, does the code take as input a video recorded video stream?
###### Does the following command `python ./scripts/predict_live_stream.py` run without any error and return the following?
@ -74,14 +74,14 @@
#### Hack the CNN - guidelines:
The neural network trains by updating its weights given the training error. If an image is misclassfied the neural network changes its weight to classify it correctly. The trick is to keep the neural network's weights unchanged and to modify the input pixels in order to force the neural network to predict the wanted class.
The neural network trains by updating its weights given the training error. If an image is misclassified the neural network changes its weight to classify it correctly. The trick is to keep the neural network's weights unchanged and to modify the input pixels in order to force the neural network to predict the wanted class.
This part is validated if:
##### Choose an image from the database that gives more than 90% probability of `Happy`
###### Does the neural network modifies the input pixels to predict Sad?
###### Does the neural network modify the input pixels to predict Sad?
###### Can you recognize easily the chosen image? The modified image is SLIGHTLY changed. It means that you recognise very easily the original image.
###### Can you recognize easily the chosen image? The modified image is SLIGHTLY changed. It means that you recognize very easily the original image.
Here are three resources that detail similar approaches:

Loading…
Cancel
Save