Deep Learning Framework for Person Classification in Video Streams Using Yolo8, EfficientNet-B7, GRU, and Attention Mechanism

Shatha Talib Rashid

PDF

Published: Jun 30, 2025

Keywords:

Long Short-Term Memory (LSTM), YOLOv8, Video classification, Person identification, Gated Recurrent Unit (GRU), Attention mechanism.

Shatha Talib Rashid, Hasanen S. Abdullah

Abstract

This paper introduces an end-to-end deep learning framework designed to analyze student behavior in classroom environments using automated video processing. The system begins by segmenting classroom video into individual frames, which are then filtered using image hashing techniques to eliminate redundant frames. A GRU-based module subsequently preserves temporal coherence among the selected frames. Human subjects are detected within these frames using YOLOv8, with cropped person images used to create a labeled dataset. For feature extraction, the framework utilizes EfficientNet-B7, a pre-trained CNN known for its high accuracy and computational efficiency. Temporal dependencies are modeled using GRU layers, while an attention mechanism emphasizes critical behavioral sequences. These modules are integrated into a unified classification network. Experimental results, conducted on the Techno CS dataset, demonstrate the model’s ability to classify student behavior into four distinct categories with 95% validation accuracy, indicating the robustness of the architecture and its potential for real-time implementation in smart classroom settings.

Issue

Vol. 46 No. 02 (2025): June 2025

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details