DORBINE Project Approved by FFG.
DORBINE is a project
- Funded by the Austrian Research Promotion Agency FFG
- A cooperative project between AIR6 Systems and Alpen-Adria-Universität
Klagenfurt (AAU)
DORBINE is a project
[PDF]
Mohammad Ghasempour (AAU, Austria), Hadi Amirpour (AAU, Austria), and Christian Timmerer (AAU, Austria)
Abstract: Live video streaming’s growing demand for high-quality content has resulted in significant energy consumption, creating challenges for sustainable media delivery. Traditional adaptive video streaming approaches rely on the over-provisioning of resources leading to a fixed bitrate ladder, which is often inefficient for the heterogeneous set of use cases and video content. Although dynamic approaches like per-title encoding optimize the bitrate ladder for each video, they mainly target video-on-demand to avoid latency and fail to address energy consumption. In this paper, we present LiveESTR, a method for building a quality- and energy-aware bitrate ladder for live video streaming. LiveESTR eliminates the need for exhaustive video encoding processes on the server side, ensuring that the bitrate ladder construction process is fast and energy efficient. A lightweight model for multi-label classification, along with a lookup table, is utilized to estimate the optimized resolution-bitrate pair in the bitrate ladder. Furthermore, both spatial and temporal resolutions are supported to achieve high energy savings while preserving compression efficiency. Therefore, a tunable parameter λ and a threshold τ are introduced to balance the trade-off between compression, quality, and energy efficiency. Experimental results show that LiveESTR reduces the encoder and decoder energy consumption by 74.6% and 29.7%, with only a 2.1% increase in Bjøntegaard Delta Rate (BD-Rate) compared to traditional per-title encoding. Furthermore, it is shown that by increasing λ to prioritize video quality, LiveESTR achieves 2.2% better compression efficiency in terms of BD-Rate while still reducing decoder energy consumption by 7.5%.
[PDF]
Kamran Qureshi (AAU, Austria), Hadi Amirpour (AAU, Austria), Christian Timmerer (AAU, Austria)
Abstract: HTTP Adaptive Streaming (HAS) is a widely adopted method for delivering video content over the Internet, requiring each video to be encoded at multiple bitrates and resolution pairs, known as representations, to adapt to various network conditions and device capabilities. This multi-bitrate encoding introduces significant challenges due to the computational and time-intensive nature of encoding multiple representations. Conventional approaches often encode these videos independently without leveraging similarities between different representations of the same input video. This paper proposes an accelerated multi-resolution encoding strategy that utilizes representations of lower resolutions as references to speed up the encoding of higher resolutions when using Versatile Video Coding (VVC); specifically in VVenC, an optimized open-source software implementation. For multi-resolution encoding, a mid-bitrate representation serves as the reference, allowing interpolated encoded partition data to efficiently guide the partitioning process in higher resolutions. The proposed approach uses shared encoding information to reduce redundant calculations, thereby optimizing the partitioning decisions. Experimental results demonstrate that the proposed technique achieves a reduction of up to 17% compared to medium preset in encoding time across videos of varying complexities with minimal BDBR/BDT of 0.12 compared to the fast preset.
[PDF]
Kamran Qureshi (AAU, Austria), Hadi Amirpour (AAU, Austria), Christian Timmerer (AAU, Austria)
Abstract: In response to the growing demand for high-quality videos, a new coding standard, Versatile Video Coding (VVC), was released in 2020. VVC is based on the same hybrid coding architecture as its predecessor, High-Efficiency Video Coding (HEVC), providing a bitrate reduction of approximately 50% for the same subjective quality. VVC extends HEVC’s Coding Tree Unit (CTU) partitioning with more flexible block sizes, increasing its encoding complexity. Optimization is essential to making efficient use of VVC in practical applications. VVenC, an optimized open-source VVC encoder, introduces multiple presets to address the trade-off between compression efficiency and encoder complexity. Although an optimized set of encoding tools has been selected for each preset, the rate-distortion (RD) search space in the encoder presets still poses a challenge for efficient encoder implementations. This paper proposes Early Termination using Reference Frames (ETRF). It improves the trade-off between encoding efficiency and time complexity and positions itself as a new preset between medium and fast presets. The CTU partitioning map of the reference frames present in lower temporal layers is employed to accelerate the encoding of frames in higher temporal layers. The results show a reduction in the encoding time of around 22% compared to the medium preset. Specifically, for videos with high spatial and temporal complexities, which typically require longer encoding times, the proposed method shows an improved BDBR/BDT compared to the fast preset.
[PDF]
Yirui Zeng (Cardiff University, UK), Jun Fu (Cardiff University), Hadi Amirpour (AAU, Austria), Huasheng Wang (Alibaba Group), Guanghui Yue (Shenzhen University, China), Hantao Liu (Cardiff University), Ying Chen (Alibaba Group), Wei Zhou (Cardiff University)
Abstract: Blind dehazed image quality assessment (BDQA), which aims to accurately predict the visual quality of dehazed images without any reference information, is essential for the evaluation, comparison, and optimization of image dehazing algorithms. Existing learning-based BDQA methods have achieved remarkable success, while the small scale of DQA datasets limits their performance. To address this issue, in this paper, we propose to adapt Contrastive Language-Image Pre-Training (CLIP), pre-trained on large-scale image-text pairs, to the BDQA task. Specifically, inspired by the fact that the human visual system understands images based on hierarchical features, we take global and local information of the dehazed image as the input of CLIP. To accurately map the input hierarchical information of dehazed images into the quality score, we tune both the vision branch and language branch of CLIP with prompt learning. Experimental results on two authentic DQA datasets demonstrate that our proposed approach, named CLIP-DQA, achieves more accurate quality predictions over existing BDQA methods.
IEEE ICME 2025
IEEE International Conference on Multimedia & Expo (ICME) 2025
Special Session: Advances in Medical Data Analysis for Multimedia Systems
https://2025.ieeeicme.org/ss04-advances-in-medical-data-analysis-for-multimedia-systems/
Motivation & significance
With the growing modality and volume of medical data available, multimedia systems play a crucial role in supporting clinical decision-making, personalized patient care, and real-time health monitoring. From medical imaging and wearable device data to electronic health records, integrating and analyzing these varied data sources is essential for developing comprehensive diagnostic systems.
Topics of interest
Organizers
Delivering video content from a video server to viewers over the Internet is time-consuming in the streaming workflow and has to be handled to offer an uninterrupted streaming experience. The end-to-end latency, i.e., from the camera capture to the user device, is particularly problematic for live streaming. Some streaming-based applications, such as virtual events, esports, online learning, gaming, webinars, and all-hands meetings, require low latency for their operation. Video streaming is ubiquitous in many applications, devices, and fields. Delivering high Quality-of-Experience (QoE) to the streaming viewers is crucial, while the requirement to process a large amount of data to satisfy such QoE cannot be handled with human-constrained possibilities. (more details)