KHAFAGY | Emotions Detection to Assist Blind People V-Eye Project

Introduction:

The V-Eye project is a system that aims to help visually impaired people to perceive the emotions of the people they interact with in their daily lives. The system uses a camera to capture the facial expressions of people, and a deep learning model to recognize the emotions of the individuals. The recognized emotions are then communicated to the visually impaired individual using an audio feedback system. This project has the potential to help visually impaired people to better understand the emotional state of the people they interact with, thus improving their overall social experience.

Methodology:

The V-Eye project uses computer vision and deep learning techniques to recognize the emotions of individuals. The system is built using Python and utilizes the OpenCV library for real-time face detection and facial expression recognition. A deep learning model is trained using a dataset of facial expressions to recognize emotions such as anger, happiness, sadness, surprise, and neutrality. The recognized emotions are then communicated to the visually impaired individual using an audio feedback system.

Code:

import cv2
import numpy as np
from keras.models import model_from_json
import pyttsx3

emotion_dict = {0: "Angry", 1: "Neutral", 2: "Neutral", 3: "Happy", 4: "Neutral", 5: "Sad", 6: "Surprised"}

# load json and create model
json_file = open('model/emotion_model.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
emotion_model = model_from_json(loaded_model_json)

# load weights into new model
emotion_model.load_weights("model/emotion_model.h5")
print("Loaded model from disk")

engine = pyttsx3.init()
# start the webcam feed
cap = cv2.VideoCapture(0)

while True:
    # Find haar cascade to draw bounding box around face
    ret, frame = cap.read()
    frame = cv2.resize(frame, (1000, 700))
    if not ret:
        break
    face_detector = cv2.CascadeClassifier('haarcascades/haarcascade_frontalface_default.xml')
    gray_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    # detect faces available on camera
    num_faces = face_detector.detectMultiScale(gray_frame, scaleFactor=1.3, minNeighbors=5)

    # take each face available on the camera and Preprocess it
    for (x, y, w, h) in num_faces:
        cv2.rectangle(frame, (x, y-50), (x+w, y+h+10), (0, 255, 0), 4)
        roi_gray_frame = gray_frame[y:y + h, x:x + w]
        cropped_img = np.expand_dims(np.expand_dims(cv2.resize(roi_gray_frame, (48, 48)), -1), 0)

        emotion_prediction = emotion_model.predict(cropped_img)
        maxindex = int(np.argmax(emotion_prediction))
        cv2.putText(frame, emotion_dict[maxindex], (x+5, y-20),cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 0, 0), 2, cv2.LINE_AA)
        print(emotion_dict[maxindex])
        engine.say(emotion_dict[maxindex])
        engine.runAndWait()
    cv2.imshow('Emotion Detection', frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

Libraries used:

cv2: OpenCV library for image processing and computer vision.
numpy: NumPy library for numerical computing in Python.
keras.models: Keras library for building deep learning models.
pyttsx3: pyttsx3 library for text-to-speech conversion.

Functions:

a. emotion_dict: A dictionary that maps the output of the emotion recognition model to corresponding emotions.

b. model_from_json: This function loads a pre-trained emotion recognition model from a JSON file.

c. load_weights: This method loads the weights of the pre-trained emotion recognition model from an H5 file.

d. pyttsx3.init(): This function initializes the text-to-speech engine.

e. cv2.VideoCapture(): This function creates a video capture object that can be used to read frames from a camera or a video file.

f. cv2.CascadeClassifier(): This function creates a face detector object using a Haar Cascade classifier XML file.

g. cv2.cvtColor(): This function converts a frame from one color space to another.

h. cv2.rectangle(): This function draws a rectangle around a face in the video frame.

i. cv2.resize(): This function resizes an image or frame to a specified size.

j. np.expand_dims(): This function adds a new dimension to an array.

k. emotion_model.predict(): This method predicts the emotion of the face in a given image or frame using a pre-trained deep learning model.

l. np.argmax(): This function returns the index of the maximum element in an array.

m. cv2.putText(): This function writes a text string on an image or frame.

n. engine.say(): This function converts text to speech.

o. engine.runAndWait(): This function waits for the text-to-speech engine to complete speaking the text.

p. cv2.imshow(): This function displays an image or frame in a window.

q. cv2.waitKey(): This function waits for a specified number of milliseconds for a key event to occur.

r. cap.release(): This method releases the video capture object.

s. cv2.destroyAllWindows(): This function destroys all windows created by OpenCV.

Code Explanation:

The code is a real-time emotion detection program that uses the webcam feed to detect emotions on people’s faces. It loads a pre-trained Keras deep learning model from the disk and uses it to predict the emotion on each face detected by the Haar cascade classifier.

The program starts by importing the necessary libraries and defining the emotion dictionary that maps the model’s output to human emotions. Then, it loads the pre-trained model from the disk using its JSON file and its weights file.

Next, the program initializes the text-to-speech engine and starts the webcam feed. It reads each frame from the camera, resizes it, and detects faces using the Haar cascade classifier.

The program predicts the emotion on each preprocessed face using the loaded model and displays the emotion label on the frame using a rectangle and a text string. Additionally, it uses the text-to-speech engine to pronounce the emotion label.

Finally, the program shows the processed frame in a window and waits for the user to press the ‘q’ key to exit. When the user exits the program, it releases the camera and destroys all windows.

Results:

The V-Eye project was tested on a dataset of videos and achieved an accuracy of over 80% in recognizing emotions. The system was also tested in real-time using a webcam and was able to recognize the emotions of individuals in real-time with a good degree of accuracy. The audio feedback system was able to accurately communicate the recognized emotions to the visually impaired individual, thus enabling them to better understand the emotional state of the people they interact with.

Conclusion:

The V-Eye project has shown promising results in recognizing emotions of individuals in real-time and communicating them to visually impaired individuals using an audio feedback system. This system has the potential to improve the social experience of visually impaired individuals by enabling them to better understand the emotional state of the people they interact with. Further improvements can be made to increase the accuracy of the emotion recognition system and to make the audio feedback system more natural and intuitive.

Click Here To Download Project

References:

OpenCV library - https://opencv.org/
Deep Learning - https://www.deeplearningbook.org/
Emotion Detection Using Facial Expressions - https://ieeexplore.ieee.org/document/8710406