Real-time Facial Emotion Recognition using Deep Learning and OpenCV

11 min readNov 15, 2023

As humans, our faces are expressive canvases, revealing a spectrum of emotions from joy and surprise to sadness and anger. In this article, we are going to leverage the power of deep learning and OpenCV to dive into real-time facial emotion recognition from unraveling the complexities of the building a convolutional neural network to training the ML model.
while we also discover the significance of this technology and its applications in enhancing human-computer interactions and emotional intelligence.

Facial emotion recognition represents an intersection of artificial intelligence and human psychology. The ability to detect facial expressions not only holds immense value in understanding human emotions but also serves as a interesting direction to look into for enhancing various technological applications. By unraveling the subtle nuances of our facial expressions, machines gain a profound ability to understand human emotions, paving the way for a host of applications across diverse industries.

Project Overview

Building a Convolution Neural Network:

We’ll start by understanding the complexities of constructing a CNN, the backbone of our facial emotion recognition model. This involves understanding the architecture, layers, and parameters crucial for accurate emotion detection.

Training the Machine Learning Model:

Once the CNN is in place, we’ll then jump into the process of training the machine learning model. This step involves feeding the model with labeled data to enable it to recognize and classify facial expressions accurately.

Real-Time Detection using OpenCV:

The final stage of our project involves implementing real-time facial emotion recognition. Leveraging the OpenCV library, we’ll connect your computer’s camera to the model, enabling it to detect and display emotions in real-time.

Ensure that you have Visual Studio Code (VSCode) and Python installed on your system.

Installing Dataset

Click this link to go to kaggle and download the dataset.

This dataset contains images from which we will train our model and also test it.

Required modules

Open a folder in vscode where you would like to save this project. Create a file named requirements.txt and copy all the modules mentioned below.

tensorflow
keras
pandas
numpy
jupyter
notebook
tqdm
opencv-contrib-python
scikit-learn

Now open a new terminal and download all the modules uisng the command

pip install -r requirements.txt

If this doesn’t work, install every module one by one.

pip install tensorflow
pip install keras
......

These are all the modules required to build our convolutional neural network.

Building a convolutional neural network

Use this command in the terminal to open jupyter notebook. We will be creating and training our neural network in here.

jupyter notebook

Now click on new and create a new notebook.

Importing Libraries and Modules

from keras.utils import to_categorical
from keras_preprocessing.image import load_img
from keras.models import Sequential
from keras.layers import Dense, Conv2D, Dropout, Flatten, MaxPooling2D
import os
import pandas as pd
import numpy as np

This includes the necessary libraries and modules for building and training a neural network. It uses Keras for building the model, pandas for handling data in a tabular form, and other modules for various functionalities.

Data Preparation

TRAIN_DIR = 'images/train'
TEST_DIR = 'images/test'

These lines define the directories where the training and testing images are located. (I renamed validation as test)

def createdataframe(dir):
    image_paths = []
    labels = []
    for label in os.listdir(dir):
        for imagename in os.listdir(os.path.join(dir, label)):
            image_paths.append(os.path.join(dir, label, imagename))
            labels.append(label)
        print(label, "completed")
    return image_paths, labels

This function creates a pandas DataFrame that essentially collects the file paths of images and their corresponding labels, organizing them for further processing.

Parameters:
dir: The input directory containing subdirectories for each label (e.g., 'angry', 'happy').

Variables:
image_paths: An empty list to store the full paths of the image files.
labels: An empty list to store the corresponding labels for each image.

Iteration:
The function iterates over each subdirectory (label) in the given directory (os.listdir(dir)). Within each label’s directory, it iterates over each image file (os.listdir(os.path.join(dir, label))).

Image Path and Label Collection:
For each image, the full path is constructed using os.path.join() and added to the image_paths list.
The label of the image is added to the labels list.

Print Statement:
After processing all images in a label, it prints that the label is completed.

Return:
The function returns the lists of image_paths and labels containing information about all the images in the given directory.

train = pd.DataFrame()
train['image'], train['label'] = createdataframe(TRAIN_DIR)
print(train)

It creates the training DataFrame using the createdataframe function.

test = pd.DataFrame()
test['image'], test['label'] = createdataframe(TEST_DIR)
print(test)

Similarly, it creates the testing DataFrame.

Data Preprocessing

from tqdm.notebook import tqdm

This line imports the tqdm module for displaying a progress bar.

def extract_features(images):
    features = []
    for image in tqdm(images):
        img = load_img(image, grayscale=True)
        img = np.array(img)
        features.append(img)
    features = np.array(features)
    features = features.reshape(len(features), 48, 48, 1)
    return features

This function reads and processes images using the tqdm progress bar.

Parameters:
images: A list containing the file paths of images for which features need to be extracted.

Variables:
features: An empty list to store the extracted features.

Iteration:
The function iterates over each image path in the given list of images.

Feature Extraction:
For each image, it loads the image using load_img from Keras with grayscale=True.
Converts the image to a NumPy array.
Appends the image to the features list.

Conversion and Reshaping:
Converts the list of features to a NumPy array.
Reshapes the array to match the required input shape for the model (48x48x1).

Return:
The function returns the extracted features in the form of a NumPy array.

Extraction:

train_features = extract_features(train['image'])
test_features = extract_features(test['image'])

Feature Extraction Function:
The extract_features function is called for both the training and testing datasets, passing the lists of image file paths (train[‘image’] and test[‘image’]).

Extracting Features:
The function iterates over each image path.
For each image, it loads the image in grayscale using Keras’s load_img.
Converts the image to a NumPy array.
Appends the image to the features list.

Conversion and Reshaping:
The list of features is converted into a NumPy array.
The array is reshaped to match the required input shape for the model (48x48x1).

Result:
train_features and test_features now contain the extracted features from the training and testing datasets, respectively.

Normalization:

x_train = train_features / 255.0
x_test = test_features / 255.0

Each pixel value in the images (ranging from 0 to 255) is divided by 255.0.
This operation scales the pixel values to the range of 0 to 1.

Importance of Normalization:
Normalization is a common preprocessing step in neural network training.
It ensures that the input features are on a similar scale, preventing certain features from dominating the learning process due to their larger magnitude.

Result:
x_train and x_test now contain the normalized feature values for the training and testing datasets, respectively.

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
le.fit(train['label'])
y_train = le.transform(train['label'])
y_test = le.transform(test['label'])
y_train = to_categorical(y_train, num_classes=7)
y_test = to_categorical(y_test, num_classes=7)

This part encodes the labels using LabelEncoder and converts them into categorical format.

Building the Neural Network Model

# sequential model
model = Sequential()

# convolutional layers
model.add(Conv2D(128, kernel_size=(3,3), activation='relu', input_shape=(48,48,1)))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.4))

model.add(Conv2D(256, kernel_size=(3,3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.4))

model.add(Conv2D(512, kernel_size=(3,3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.4))

model.add(Conv2D(512, kernel_size=(3,3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.4))

# flattening
model.add(Flatten())

# fully connected layers
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.4))
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.3))

# output layer
model.add(Dense(7, activation='softmax'))

# model compilation
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = 'accuracy' )

Sequential Model:
The Sequential class is used to create a linear stack of layers for building the model layer by layer.

Convolutional Layers:

Four sets of convolutional layers are added.
Each set consists of a Conv2D layer followed by a MaxPooling2D layer and a Dropout layer.
The number of filters (feature detectors) increases in each set, capturing hierarchical features.

Flattening and Fully Connected Layers:

The Flatten layer is added to transform the 2D array into a vector.
Two fully connected (Dense) layers follow, each with ReLU activation.
Dropout layers are added to prevent overfitting.

Output Layer:

The final layer is a Dense layer with 7 neurons (equal to the number of classes) and a softmax activation function, suitable for multi-class classification.

Compilation:

The model is compiled with the Adam optimizer, categorical crossentropy loss (appropriate for multi-class classification), and accuracy as the evaluation metric.

This code defines a convolutional neural network (CNN) using the Keras Sequential API. It includes convolutional layers, max-pooling layers, dropout layers, and fully connected layers.

Training the Model

model.fit(x=x_train, y=y_train, batch_size=128, epochs=100, validation_data=(x_test, y_test))

x_train: The input data for training, which consists of the features (images) of the training set.

y_train: The target (label) data for training, which contains the corresponding labels for the training set.

batch_size: The number of samples used in each iteration during training. In this case, it’s set to 128. This parameter controls the number of training samples utilized in one iteration.

epochs: The number of times the entire training dataset is passed forward and backward through the neural network. Here, it’s set to 100, meaning the model will see the entire dataset 100 times during training.

validation_data: A tuple containing the validation data, which is used to evaluate the model after each epoch. It consists of validation features (x_test) and validation labels (y_test).

This function will train the model on the provided training data (x_train and y_train) and validate it on the specified validation data after each epoch. The training process aims to optimize the model’s parameters to minimize the defined loss function, making predictions more accurate. The validation data helps assess the model’s generalization performance on unseen data.

This took me around 5 hours to train the model and the accuracy was 62%.

Saving the Model

model_json = model.to_json()
with open("emotiondetector.json", 'w') as json_file:
    json_file.write(model_json)
model.save("emotiondetector.h5")

It saves the trained model architecture and weights to files.

Loading the Model and Making Predictions

from keras.models import model_from_json
json_file = open("emotiondetector.json", "r")
model_json = json_file.read()
json_file.close()
model = model_from_json(model_json)
model.load_weights("emotiondetector.h5")

open(“emotiondetector.json”, “r”): This line opens the JSON file containing the model architecture in read mode (“r”). The file is assumed to be named “emotiondetector.json.”

model_json = json_file.read(): Reads the content of the JSON file and stores it in the variable model_json.

json_file.close(): Closes the opened JSON file.

model_from_json(model_json): This function is part of Keras and creates a model from the architecture described in the loaded JSON string (model_json). It constructs the model without any weights.

model.load_weights(“emotiondetector.h5”): Loads the pre-trained weights into the model from the H5 file named “emotiondetector.h5.” These weights were likely saved during the training phase.

Making Predictions on Test Images

label = ['angry', 'disgust', 'fear', 'happy', 'neutral', 'sad', 'surprise']
def ef(image):
    img = load_img(image, grayscale=True)
    feature = np.array(img)
    feature = feature.reshape(1, 48, 48, 1)
    return feature / 255.0

label: A list containing emotion labels corresponding to the model’s output classes.

def ef(image): This line defines a function named ef that takes an image file path as an argument.

img = load_img(image, grayscale=True): Loads the image using Keras’s load_img function with the grayscale=True argument, indicating that the image should be loaded in grayscale. The resulting image (img) is a Keras image object.

feature = np.array(img): Converts the image object to a NumPy array, making it suitable for further processing.

feature = feature.reshape(1, 48, 48, 1): Reshapes the array to match the expected input shape of the neural network model. The shape (1, 48, 48, 1) suggests a single image with dimensions 48x48 and one channel (grayscale).

return feature / 255.0: Normalizes the pixel values of the image to be in the range [0, 1] by dividing each pixel value by 255.0. Neural networks often perform better when input data is normalized.

This function takes an image file path, processes it to meet the input requirements of the model, and returns the normalized feature that can be fed into the model for prediction.

image = 'images/train/sad/42.jpg'
print("original image is of sad")
img = ef(image)
pred = model.predict(img)
pred_label = label[pred.argmax()]
print("model prediction is ", pred_label)

It makes predictions on sample images and prints the predicted labels.

Visualizing Predictions

import matplotlib.pyplot as plt
%matplotlib inline

import matplotlib.pyplot as plt: Imports the matplotlib library’s pyplot module and aliases it as plt. pyplot is a collection of functions that make matplotlib work like MATLAB.

%matplotlib inline: This is a Jupyter magic command that allows the visualizations generated by matplotlib to be displayed directly in the Jupyter Notebook, rather than in a separate window.

image = 'images/train/sad/42.jpg'
print("original image is of sad")
img = ef(image)
pred = model.predict(img)
pred_label = label[pred.argmax()]
print("model prediction is ", pred_label)
plt.imshow(img.reshape(48, 48), cmap='gray')

In this segment of the code, we select an image from the training set labeled as “sad” and preprocess it using the ef function. The trained model is then employed to predict the emotion associated with the image, and the result is displayed. The model's prediction is printed, indicating the emotion it believes the image expresses. Additionally, the original image is visualized in grayscale using matplotlib. This process provides a practical illustration of the model's effectiveness in recognizing and categorizing emotions in facial expressions.

Output of the above block of code

original image is of sad
1/1 [==============================] - 0s 55ms/step
model prediction is  sad
<matplotlib.image.AxesImage at 0x16abfe14e80>

Here the model recognizes the emotion displayed accurately as per the dataset 😁. Lets try with another image.

image = 'images/train/fear/2.jpg'
print("original image is of fear")
img = ef(image)
pred = model.predict(img)
pred_label = label[pred.argmax()]
print("model prediction is ",pred_label)
plt.imshow(img.reshape(48,48),cmap='gray')

Output

original image is of fear
1/1 [==============================] - 0s 31ms/step
model prediction is  sad
<matplotlib.image.AxesImage at 0x16abfe99060>

Here the model recognizes the wrong emotion so it is incorrect 😖.

This is how you can test how well the model is working.

Making the model real time using OpenCV

import cv2
from keras.models import model_from_json
import numpy as np

# Load the pre-trained model architecture from JSON file
json_file = open("facialemotionmodel.json", "r")
model_json = json_file.read()
json_file.close()
model = model_from_json(model_json)

# Load the pre-trained model weights
model.load_weights("facialemotionmodel.h5")

# Load the Haar cascade classifier for face detection
haar_file = cv2.data.haarcascades + 'haarcascade_frontalface_default.xml'
face_cascade = cv2.CascadeClassifier(haar_file)

# Define a function to extract features from an image
def extract_features(image):
    feature = np.array(image)
    feature = feature.reshape(1, 48, 48, 1)
    return feature / 255.0

# Open the webcam (camera)
webcam = cv2.VideoCapture(0)

# Define labels for emotion classes
labels = {0: 'angry', 1: 'disgust', 2: 'fear', 3: 'happy', 4: 'neutral', 5: 'sad', 6: 'surprise'}

while True:
    # Read a frame from the webcam
    i, im = webcam.read()

    # Convert the frame to grayscale
    gray = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)

    # Detect faces in the grayscale frame
    faces = face_cascade.detectMultiScale(im, 1.3, 5)

    try:
        # For each detected face, perform facial emotion recognition
        for (p, q, r, s) in faces:
            # Extract the region of interest (ROI) which contains the face
            image = gray[q:q + s, p:p + r]

            # Draw a rectangle around the detected face
            cv2.rectangle(im, (p, q), (p + r, q + s), (255, 0, 0), 2)

            # Resize the face image to the required input size (48x48)
            image = cv2.resize(image, (48, 48))

            # Extract features from the resized face image
            img = extract_features(image)

            # Make a prediction using the trained model
            pred = model.predict(img)

            # Get the predicted label for emotion
            prediction_label = labels[pred.argmax()]

            # Display the predicted emotion label near the detected face
            cv2.putText(im, f'Emotion: {prediction_label}', (p - 10, q - 10),
                        cv2.FONT_HERSHEY_COMPLEX_SMALL, 2, (0, 0, 255))

        # Display the frame with annotations in real-time
        cv2.imshow("Real-time Facial Emotion Recognition", im)

        # Break the loop if the 'Esc' key is pressed
        if cv2.waitKey(1) == 27:
            break

    except cv2.error:
        pass

# Release the webcam and close all OpenCV windows
webcam.release()
cv2.destroyAllWindows()

The code uses OpenCV to capture frames from the webcam and detect faces in real-time.
For each detected face, the script performs facial emotion recognition using the pre-trained model.
The recognized emotion label is displayed near each detected face.
The loop continues until the ‘Esc’ key is pressed, at which point the webcam is released, and OpenCV windows are closed.

This was a project that I replicated with the intent of learning how to build neural networks.
Credits > https://www.youtube.com/watch?v=aoCIoumbWQY
Entire Code on GitHub: https://github.com/kumarvivek9088/Face_Emotion_Recognition_Machine_Learning/blob/main/trainmodel.ipynb