Deep Learning for Educational Video Analysis: Benchmarking ASR Systems and Pipeline Optimization

Zuev, G.; Kantonistova, E.

doi:10.66693/mathai.1006

Article #1006

Issue MathAI 2026 Selected Papers Special Issue

Received 02 Apr 2026

Accepted 15 May 2026

Published 22 May 2026

Deep Learning for Educational Video Analysis: Benchmarking ASR Systems and Pipeline Optimization

Gordey Zuev *

E

Elena Kantonistova

MathAI 2026 Selected Papers Special Issue

Mathematics & AI 2026, 1, 6

DOI: 10.66693/mathai.1006 Published: May 22, 2026 Accepted: May 15, 2026 Received: April 2, 2026

Deep Learning Automatic Speech Recognition (ASR) Pipeline Optimization Cost Optimization Educational Video Transcription

Abstract

We present a comparative analysis of eight managed commercial speech recognition providers (provider-side preprocessing, segmentation, and serving) for educational video transcription and enrichment, evaluated on over 700 lecture recordings (900+ hours) across disciplines. The Fireworks whisper-v3-turbo endpoint offers a favorable cost–quality–latency trade-off versus surveyed alternatives. Audio preprocessing reduces billed duration by 10–25% with negligible accuracy loss. Prompt-based “Video Vocabulary” reduces terminology errors without fine-tuning. We implement a parallel pipeline that cuts end-to-end turnaround from over 30 minutes of manual effort per recording to under two minutes, supports up to 50 concurrent jobs, and achieves roughly 22× speedup at about $0.075 per hour of content for transcription plus pedagogical enrichment (summaries, chapter topics, self-check questions) at list prices. The system is deployed in production.

Cite this article

Zuev, G.; Kantonistova, E. Deep Learning for Educational Video Analysis: Benchmarking ASR Systems and Pipeline Optimization. Mathematics & AI 2026, 1, 6. https://doi.org/10.66693/mathai.1006

41 views 0 downloads

Mathematics & AI

Deep Learning for Educational Video Analysis: Benchmarking ASR Systems and Pipeline Optimization

Abstract

Cite this article

Full Text (PDF)