ATHENA Christian Doppler (CD) Laboratory

Multi-resolution Encoding for HTTP Adaptive Streaming using VVenC

Posted on January 21, 2025 by

Multi-resolution Encoding for HTTP Adaptive Streaming using VVenC

The IEEE International Symposium on Circuits and Systems (IEEE ISCAS 2025)

https://2025.ieee-iscas.org/

25–28 May 2025 // London, United Kingdom

[PDF]

Kamran Qureshi (AAU, Austria), Hadi Amirpour (AAU, Austria), Christian Timmerer (AAU, Austria)

Abstract: HTTP Adaptive Streaming (HAS) is a widely adopted method for delivering video content over the Internet, requiring each video to be encoded at multiple bitrates and resolution pairs, known as representations, to adapt to various network conditions and device capabilities. This multi-bitrate encoding introduces significant challenges due to the computational and time-intensive nature of encoding multiple representations. Conventional approaches often encode these videos independently without leveraging similarities between different representations of the same input video. This paper proposes an accelerated multi-resolution encoding strategy that utilizes representations of lower resolutions as references to speed up the encoding of higher resolutions when using Versatile Video Coding (VVC); specifically in VVenC, an optimized open-source software implementation. For multi-resolution encoding, a mid-bitrate representation serves as the reference, allowing interpolated encoded partition data to efficiently guide the partitioning process in higher resolutions. The proposed approach uses shared encoding information to reduce redundant calculations, thereby optimizing the partitioning decisions. Experimental results demonstrate that the proposed technique achieves a reduction of up to 17% compared to medium preset in encoding time across videos of varying complexities with minimal BDBR/BDT of 0.12 compared to the fast preset.

Posted in ATHENA | Comments Off

Improving the Efficiency of VVC using Partitioning of Reference Frames

Posted on January 21, 2025 by

Improving the Efficiency of VVC using Partitioning of Reference Frames

The IEEE International Symposium on Circuits and Systems (IEEE ISCAS 2025)

https://2025.ieee-iscas.org/

25–28 May 2025 // London, United Kingdom

[PDF]

Kamran Qureshi (AAU, Austria), Hadi Amirpour (AAU, Austria), Christian Timmerer (AAU, Austria)

Abstract: In response to the growing demand for high-quality videos, a new coding standard, Versatile Video Coding (VVC), was released in 2020. VVC is based on the same hybrid coding architecture as its predecessor, High-Efficiency Video Coding (HEVC), providing a bitrate reduction of approximately 50% for the same subjective quality. VVC extends HEVC’s Coding Tree Unit (CTU) partitioning with more flexible block sizes, increasing its encoding complexity. Optimization is essential to making efficient use of VVC in practical applications. VVenC, an optimized open-source VVC encoder, introduces multiple presets to address the trade-off between compression efficiency and encoder complexity. Although an optimized set of encoding tools has been selected for each preset, the rate-distortion (RD) search space in the encoder presets still poses a challenge for efficient encoder implementations. This paper proposes Early Termination using Reference Frames (ETRF). It improves the trade-off between encoding efficiency and time complexity and positions itself as a new preset between medium and fast presets. The CTU partitioning map of the reference frames present in lower temporal layers is employed to accelerate the encoding of frames in higher temporal layers. The results show a reduction in the encoding time of around 22% compared to the medium preset. Specifically, for videos with high spatial and temporal complexities, which typically require longer encoding times, the proposed method shows an improved BDBR/BDT compared to the fast preset.

Posted in ATHENA | Comments Off

CLIP-DQA: Blindly Evaluating Dehazed Images from Global and Local Perspectives Using CLIP

Posted on January 21, 2025 by

CLIP-DQA: Blindly Evaluating Dehazed Images from Global and Local Perspectives Using CLIP

The IEEE International Symposium on Circuits and Systems (IEEE ISCAS 2025)

https://2025.ieee-iscas.org/

25–28 May 2025 // London, United Kingdom

[PDF]

Yirui Zeng (Cardiff University, UK), Jun Fu (Cardiff University), Hadi Amirpour (AAU, Austria), Huasheng Wang (Alibaba Group), Guanghui Yue (Shenzhen University, China), Hantao Liu (Cardiff University), Ying Chen (Alibaba Group), Wei Zhou (Cardiff University)

Abstract: Blind dehazed image quality assessment (BDQA), which aims to accurately predict the visual quality of dehazed images without any reference information, is essential for the evaluation, comparison, and optimization of image dehazing algorithms. Existing learning-based BDQA methods have achieved remarkable success, while the small scale of DQA datasets limits their performance. To address this issue, in this paper, we propose to adapt Contrastive Language-Image Pre-Training (CLIP), pre-trained on large-scale image-text pairs, to the BDQA task. Specifically, inspired by the fact that the human visual system understands images based on hierarchical features, we take global and local information of the dehazed image as the input of CLIP. To accurately map the input hierarchical information of dehazed images into the quality score, we tune both the vision branch and language branch of CLIP with prompt learning. Experimental results on two authentic DQA datasets demonstrate that our proposed approach, named CLIP-DQA, achieves more accurate quality predictions over existing BDQA methods.

Posted in ATHENA | Comments Off

ICME’25 Special Session: Advances in Medical Data Analysis for Multimedia Systems

Posted on January 15, 2025 by

IEEE ICME 2025

IEEE International Conference on Multimedia & Expo (ICME) 2025

Special Session: Advances in Medical Data Analysis for Multimedia Systems

https://2025.ieeeicme.org/ss04-advances-in-medical-data-analysis-for-multimedia-systems/

Motivation & significance

With the growing modality and volume of medical data available, multimedia systems play a crucial role in supporting clinical decision-making, personalized patient care, and real-time health monitoring. From medical imaging and wearable device data to electronic health records, integrating and analyzing these varied data sources is essential for developing comprehensive diagnostic systems.

Topics of interest

AI-driven medical data analysis techniques for disease diagnosis
Development of multimedia-based diagnostic datasets for training and validation
AI-based approaches for lesion detection and segmentation in multimedia systems
Applications of AI in multimedia-supported disease monitoring
Investigation of the relationship between data analysis techniques and diagnostic accuracy
Integration of multimedia data for comprehensive disease monitoring
Interpretability and transparency of AI models in clinical settings Advanced AI-driven disease diagnosis methods within multimedia systems

Organizers

Wei Zhou, Assistant Professor, Cardiff University, UK, email: ZhouW26@cardiff.ac.uk
Hadi Amirpour, Assistant Professor, Alpen-Adria-Universität Klagenfurt, Austria, email: hadi.amirpour@aau.at
Baoru Huang, Assistant Professor, University of Liverpool, UK, email: Baoru.Huang@liverpool.ac.uk
Weide Liu, Research Fellow, Harvard Medical School, USA, email: weide.liu@childrens.harvard.edu
Guanghui Yue, Assistant Professor, Shenzhen University, China, email: yueguanghui@szu.edu.cn
Nilmini Wickramasinghe, Professor, La Trobe University, Australia, email: n.wickramasinghe@latrobe.edu.au

Posted in ATHENA | Comments Off

2nd IEEE ICME Workshop on Surpassing Latency Limits in Adaptive Live Video Streaming (LIVES’25)

Posted on January 8, 2025 by

The 2nd IEEE ICME Workshop on

Surpassing Latency Limits in Adaptive Live Video Streaming (LIVES’25)

June 30 to July 4, 2025, Nantes, France.

CFP

Call For Submissions

Delivering video content from a video server to viewers over the Internet is time-consuming in the streaming workflow and has to be handled to offer an uninterrupted streaming experience. The end-to-end latency, i.e., from the camera capture to the user device, is particularly problematic for live streaming. Some streaming-based applications, such as virtual events, esports, online learning, gaming, webinars, and all-hands meetings, require low latency for their operation. Video streaming is ubiquitous in many applications, devices, and fields. Delivering high Quality-of-Experience (QoE) to the streaming viewers is crucial, while the requirement to process a large amount of data to satisfy such QoE cannot be handled with human-constrained possibilities. (more details)

Important Dates

Submission deadline: March 25, 2025
Author notification: April 18, 2025
Camera-ready: April 30, 2025
Workshop date: July 4, 2025

Posted in ATHENA | Comments Off

Best Student Paper Award at NAB BEIT Conference 2025

Posted on January 8, 2025 by

Two-Pass Encoding for Live Video Streaming

NAB Broadcast Engineering and IT (BEIT) Conference

5–9 April 2025 | Las Vegas, NV, USA

[PDF]

Mohammad Ghasempour (AAU, Austria); Hadi Amirpour (AAU, Austria); Christian Timmerer (AAU, Austria)

Abstract: Live streaming has become increasingly important in our daily lives due to the growing demand for real-time content consumption. Traditional live video streaming typically relies on single-pass encoding due to its low latency. However, it lacks video content analysis, often resulting in inefficient compression and quality fluctuations during playback. Constant Rate Factor (CRF) encoding, a type of single-pass method, offers more consistent quality but suffers from unpredictable output bitrate, complicating bandwidth management. In contrast, multi-pass encoding improves compression efficiency through multiple passes. However, its added latency makes it unsuitable for live streaming. In this paper, we propose OTPS, an online two-pass encoding scheme that overcomes these limitations by employing fast feature extraction on a downscaled video representation and a gradient-boosting regression model to predict the optimal CRF for encoding. This approach provides consistent quality and efficient encoding while avoiding the latency introduced by traditional multi-pass techniques. Experimental results show that OTPS offers 3.7% higher compression efficiency than single-pass encoding and achieves up to 28.1% faster encoding than multi-pass modes. Compared to single-pass encoding, encoded videos using OTPS exhibit 5% less deviation from the target bitrate while delivering notably more consistent quality.