Klagenfurt, June 7, 2023
Congratulations to Dr. Ekrem Çetinkaya for successfully defending his dissertation on “Video Coding Enhancements for HTTP Adaptive Streaming using Machine Learning” at Universität Klagenfurt in the context of the Christian Doppler Laboratory ATHENA.
Abstract
Video is evolving into a crucial tool as daily lives are increasingly centered around visual communication. The demand for better video content is constantly rising, from entertainment to business meetings. The delivery of video content to users is of utmost significance. HTTP adaptive streaming, in which the video content adjusts to the changing network circumstances, has become the de-facto method for delivering internet video.
As video technology continues to advance, it presents a number of challenges, one of which is the large amount of data required to describe a video accurately. To address this issue, it is necessary to have a powerful video encoding tool. Historically, these efforts have relied on hand-crafted tools and heuristics. However, with the recent advances in machine learning, there has been increasing exploration into using these techniques to enhance video coding performance.
This thesis proposes eight contributions that enhance video coding performance for HTTP adaptive streaming using machine learning. These contributions are presented in four categories:
- Fast Multi-Rate Encoding with Machine Learning: This category consists of two contributions that target the need for encoding multiple representations of the same video for HTTP adaptive streaming. FaME-ML tackles the multi-rate encoding problem using convolutional neural networks to guide encoding decisions, while FaRes-ML extends the solution for multi-resolution scenarios. Evaluations showed FaME-ML could reduce parallel encoding time by 41% and FaRes-ML could reduce overall encoding time by 46% while preserving the visual quality.
- Enhancing Visual Quality on Mobile Devices: The second category consists of three contributions targeting the need for the improved visual quality of videos on mobile devices. The limited hardware of mobile devices makes them a challenging environment to execute complex machine learning models. SR-ABR explores the integration of the super-resolution approach into the adaptive bitrate selection algorithm. SR-ABR can save up to 43% bandwidth. LiDeR is addressing the computational complexity of super-resolution networks by proposing an alternative that considers the limitations of mobile devices by design. LiDeR can increase execution speed up to 428% compared to state-of-the-art networks while managing to preserve the visual quality. MoViDNN is proposed to enable straightforward evaluation of machine learning-based solutions for improving visual quality on mobile devices.
- Light-Field Image Coding with Super-Resolution: Emerging media formats provide a more immersive experience with the cost of increased data size. The third category proposes a single contribution to tackle the huge data size of light field images by utilizing super-resolution. LFC-SASR can reduce data size by 54% while preserving the visual quality.
- Blind Visual Quality Assessment Using Vision Transformers: The final category consists of a single contribution that is proposed to tackle the blind visual quality assessment problem for videos. BQ-ViT utilizes recently proposed vision transformer architecture. It can predict the visual quality of a video with a high correlation (0.895 PCC) by using only the encoded frames.
The thesis is available for download here. Slides and video are available as follows: