Deep Learning Framework for Person Classification in Video Streams Using Yolo8, EfficientNet-B7, GRU, and Attention Mechanism

Main Article Content

Shatha Talib Rashid, Hasanen S. Abdullah

Abstract

This paper introduces an end-to-end deep learning framework designed to analyze student behavior in classroom environments using automated video processing. The system begins by segmenting classroom video into individual frames, which are then filtered using image hashing techniques to eliminate redundant frames. A GRU-based module subsequently preserves temporal coherence among the selected frames. Human subjects are detected within these frames using YOLOv8, with cropped person images used to create a labeled dataset. For feature extraction, the framework utilizes EfficientNet-B7, a pre-trained CNN known for its high accuracy and computational efficiency. Temporal dependencies are modeled using GRU layers, while an attention mechanism emphasizes critical behavioral sequences. These modules are integrated into a unified classification network. Experimental results, conducted on the Techno CS dataset, demonstrate the model’s ability to classify student behavior into four distinct categories with 95% validation accuracy, indicating the robustness of the architecture and its potential for real-time implementation in smart classroom settings.

Article Details

Section
Articles