AI-Enhanced Multi-Modal Emotion and Personalized Responding System for Undergraduates

Authors

  • Chamoda Karunathilaka Sri Lanka Institute of Information Technology
  • Dilun De Vass Gunawardane
  • Tharaka Athuluwage
  • Chamalka Marasinghe
  • Anjana Junius Vidanaralage
  • Harinda Fernando

DOI:

https://doi.org/10.33022/ijcs.v14i3.4884

Abstract

Undergraduate students face increasing academic and personal pressures, often leading to stress and emotional distress. Traditional single-modal emotion recognition systems, relying solely on facial or vocal analysis, struggle with accuracy due to environmental variations and limited contextual awareness. This research proposes a multi-modal AI-driven emotion recognition system that integrates facial and vocal data for enhanced real-time emotional detection and response. The system leverages Vision Transformers (ViTs) for facial feature extraction and Mel-Frequency Cepstral Coefficients (MFCC) for speech-based emotion analysis, ensuring improved classification through confidence-weighted temporal fusion. Additionally, an adaptive response generation module utilizes natural language processing (NLP) and text-to-speech (TTS) synthesis for human-like interactions. To enable scalable mobile deployment, the model is optimized with quantized lightweight transformers, achieving sub 300ms inference latency. Bias mitigation techniques ensure fairness across demographic groups. This research contributes to affective computing, human-computer interaction, and AI-driven emotional intelligence, offering a scalable and ethically responsible solution for virtual counseling, AI-assisted tutoring, and mental health support.

Downloads

Published

09-06-2025