QoE- and Energy-aware Content Consumption for HTTP Adaptive Streaming

Klagenfurt, July 31, 2025

Congratulations to Dr. Daniele Lorenzi for successfully defending his dissertation on “QoE- and Energy-aware Content Consumption for HTTP Adaptive Streaming” at Universität Klagenfurt in the context of the Christian Doppler Laboratory ATHENA.

Abstract

HTTP Adaptive Streaming (HAS) has become the dominant paradigm for video delivery over the Internet, enabling scalable and flexible content consumption across heterogeneous networks and devices. The continuous growth of video traffic, coupled with the increasing complexity of multimedia content and the proliferation of resource-constrained devices, poses significant challenges for streaming systems. In particular, service providers and researchers must jointly address Quality of Experience (QoE), energy consumption, and emerging protocol and content technologies to meet user expectations while ensuring sustainable operation.

This dissertation investigates QoE- and energy-aware content consumption in HAS, with a primary focus on client-side adaptation mechanisms. Through a systematic analysis of existing approaches, the thesis identifies key limitations in current Adaptive Bitrate (ABR) algorithms, which often prioritize bitrate maximization without sufficiently considering perceptual quality, energy efficiency, codec diversity, or new networking capabilities. To address these challenges, the dissertation proposes a set of novel methodologies, algorithms, and datasets that jointly optimize QoE and energy consumption under realistic network and device constraints.

The first contribution explores the exploitation of emerging transport protocols, specifically HTTP/3 and QUIC, to enhance QoE in HAS. The proposed DoFP+ approach leverages advanced protocol features such as stream multiplexing, prioritization, and termination to upgrade previously downloaded low-quality segments during playback. Extensive experimental evaluations demonstrate significant QoE improvements, reduced stall events, and more efficient bandwidth utilization compared to state-of-the-art approaches.

As a second contribution, the dissertation addresses the limitations of single-codec streaming by introducing MEDUSA, a dynamic multi-codec ABR approach. MEDUSA enables per-segment codec selection based on content-aware perceptual quality and segment size information, allowing the system to adapt to varying content complexity over time. Results show that dynamic codec switching can substantially improve perceptual quality while reducing transmitted data volume, thereby benefiting both end users and streaming providers.

The third contribution focuses on sustainable video streaming through energy-aware adaptation. The thesis introduces E-WISH, an energy-aware ABR algorithm that incorporates an explicit energy consumption model into the quality selection process, reducing playback stalls and lowering power usage without compromising QoE. To support systematic energy evaluations, the dissertation further presents COCONUT, a comprehensive dataset of fine-grained energy measurements collected from multiple client devices. This dataset enables in-depth analysis of the impact of video, device, and network parameters on energy consumption in HAS.

Finally, the dissertation investigates neural-enhanced streaming (NES), where client-side machine learning techniques are used to improve visual quality at the cost of additional computational overhead. To balance QoE gains and power consumption in heterogeneous client environments, the thesis proposes Receptive, a coordinated system that jointly optimizes ABR decisions and neural enhancement strategies across multiple users. Experimental results demonstrate that Receptive achieves substantial QoE improvements while significantly reducing energy consumption on NES-capable devices.

Overall, this dissertation advances the state of the art in HTTP Adaptive Streaming by introducing protocol-aware, content-aware, and energy-aware adaptation mechanisms, complemented by realistic datasets and comprehensive evaluations. The presented contributions provide a solid foundation for future research and practical deployments aiming to deliver high-quality, energy-efficient, and sustainable video streaming services.

Slides are available here.

Posted in ATHENA | Comments Off on QoE- and Energy-aware Content Consumption for HTTP Adaptive Streaming

Visual Communication in the Age of AI: VCIP 2025 Highlights from Klagenfurt

VCIP 2025 in Klagenfurt: Advancing Sustainable and Trustworthy Visual Communications in the Age of AI

From December 1–4, 2025, the Department of Information Technology (ITEC) at the University of Klagenfurt (Austria) hosted the International Conference on Visual Communications and Image Processing (VCIP 2025), welcoming an international community of researchers, practitioners, and industry experts to discuss the future of visual communication and image processing. Under the theme “Sustainable and Trustworthy Visual Communications in the Age of AI,” the conference addressed some of the most pressing challenges and opportunities facing the field today.

A forum for cutting-edge research and dialogue

VCIP 2025 continued the long-standing tradition of the conference as a premier venue for both foundational and applied research. Over four days, the program offered a diverse range of tutorials, overview talks, keynotes, oral and poster sessions, and interactive formats. Discussions ranged from adaptive and low-latency streaming, source coding, and compressed-domain processing to volumetric media, computational vision, quality assessment, and AI-driven restoration and enhancement techniques.

Inspiring keynotes on trust, sustainability, and clarity

A particular highlight of VCIP 2025 were the three keynote talks, which set the tone for the conference by connecting technical innovation with broader societal concerns. The speakers addressed trustworthy multimedia communication in the era of AI-generated content, the environmental impact and sustainability of visual technologies, and the role of visual analytics in making complex data understandable across time and space. Together, the keynotes sparked lively discussions that extended well beyond the conference halls.

Tutorials, overview talks, and emerging topics

Four half-day tutorials provided in-depth insights into current and emerging technologies, including generative face video coding for ultra-low bitrate communication, JPEG AI standardization and implementation, the convergence of low-level image processing and generative AI, and the past, present, and future of volumetric video. Complementing these, overview talks offered broader perspectives on emotion and quality estimation, AI-enabled video streaming efficiency, 3D scene capture and compression, and the use of large vision–language models for visual quality assessment.

Supporting early-career researchers and innovation

VCIP 2025 placed strong emphasis on nurturing the next generation of researchers. The doctoral symposium and the VSPC Rising Star session provided a platform for early-career scientists to present their work and engage with senior experts. Demo, open-source, and dataset sessions further highlighted the practical impact of research, showcasing tools, prototypes, and resources that bridge theory and application.

Exchange beyond the technical program

In addition to the scientific sessions, the conference offered a vibrant social program, including a welcome reception, a Glühwein gathering, a conference banquet, and a closing ceremony with awards. These events fostered informal exchange, strengthened international collaboration, and contributed to the open and collegial atmosphere that characterizes VCIP.

Best paper award: “AlignGS: Aligning Geometry and Semantics for Robust Indoor Reconstruction from Sparse Views”, Yijie Gao, Houqiang Zhong, Tianchi Zhu, Li Song, Zhengxue Cheng, Qiang Hu (Shanghai Jiao Tong University)

Rising Star: Heming Sun (Yokohama National University), “Traditional and Learned Image and Video Coding: From Algorithms to Implementations”

A successful event for Klagenfurt and the department/university

Hosting VCIP 2025 marked an important milestone for the Department of Information Technology (ITEC) at the University of Klagenfurt, reinforcing its international visibility in the fields of visual computing, multimedia systems, and artificial intelligence. By bringing together experts from academia and industry and opening selected sessions to the wider university community, the conference created valuable opportunities for interdisciplinary exchange and long-term collaboration.

VCIP 2025 concluded with a strong sense of momentum, underscoring the importance of responsible, transparent, and sustainable approaches to visual communication in an AI-driven world. The discussions and connections formed in Klagenfurt will continue to shape research and innovation in the field well beyond the conference itself.

Following a successful edition in Klagenfurt, VCIP 2026 will take place in Singapore from December 13–16, 2026, focusing on “Visual Communications and Image Processing at the Frontiers of Generative and Perceptual AI” (https://vcip-2026.org/).

Posted in ATHENA, News | Comments Off on Visual Communication in the Age of AI: VCIP 2025 Highlights from Klagenfurt

Farzad Tashtarian completed his habilitation on Network-Assisted Adaptive Streaming: Toward Optimal QoE through System Collaboration

Network-Assisted Adaptive Streaming: Toward Optimal QoE through System Collaboration

Farzad Tashtarian

Habilitation

On 14.11.2025, Farzad Tashtarian defended his habilitation “Network-Assisted Adaptive Streaming: Toward Optimal QoE through System Collaboration” at the University of Klagenfurt, Austria.

Abstract: Providing seamless, low-latency, and energy-efficient video streaming experiences remains an ongoing challenge as content delivery infrastructures evolve to support higher resolutions, immersive formats, and heterogeneous networks. This talk explores an end-to-end perspective on network-assisted adaptive streaming, where close coordination between the player, network, and edge/cloud components enables data-driven and context-aware optimization. It will discuss adaptive bitrate algorithm design, cost- and delay-conscious edge transcoding, and multi-objective optimization across the streaming pipeline. Emerging AI-based methods—such as reinforcement learning, generative modeling, and large language model (LLM) orchestration—will be highlighted as key enablers for intelligent and self-adjusting video delivery. The talk concludes with a discussion of open challenges, scalability, and future research directions toward resilient, efficient, and user-centric streaming infrastructures.

Committee members: Prof. Martin Pinzger (Chairperson), Prof. Oliver Hohlfeld (external member), Prof. Bernhard Rinner, Prof. Angelika Wiegele, Prof. Chitchanok Chuengsatiansup, MSc Zoha Azimi Ourimi, Dr. Alice Tarzariol, Kateryna Taranov, and Gregor Lammer

Posted in ATHENA | Comments Off on Farzad Tashtarian completed his habilitation on Network-Assisted Adaptive Streaming: Toward Optimal QoE through System Collaboration

Interactive Stereoscopic Videos On Head-mounted Displays

Interactive Stereoscopic Videos On Head-mounted Displays

IEEE VCIP 2025

December 1 – December 4, 2025

Klagenfurt, Austria

[PDF]

Afshin Gholami(AAU, Austria), Hadi Amirpour (AAU, Austria), Christian Timmerer (AAU, Austria)

Abstract: This paper presents ISV-Demo, a novel system for delivering interactive stereoscopic video experiences on head-mounted displays in spatial environments. Unlike traditional flat or real-time VR media, our approach leverages pre-rendered, high-fidelity 3D video to enable immersive, cinematic storytelling enhanced by user agency. Viewers influence narrative direction by making real-time decisions at branching points within the stereoscopic scene. We propose two interaction models: a timeline-based model with embedded prompts, and a loop-based segmented model offering flexible timing and decision persistence. These models define a new paradigm for authored cinematic interaction in extended reality, addressing the gap between passive 3D video and dynamic VR content.

Posted in ATHENA | Comments Off on Interactive Stereoscopic Videos On Head-mounted Displays

STACK: Spatial Tower Assembly using Controlled Kinetics

Perceptual JND Prediction for VMAF Using Content-Adaptive Dual-Path Attention

IEEE VCIP 2025

December 1 – December 4, 2025

Klagenfurt, Austria

[PDF]

Milad Ghanbari (AAU, Austria), Hadi Amirpour (AAU, Austria), Christian Timmerer (AAU, Austria), M.H. Izadimehr (AAU, Austria), Wei Zhou (Cardiff University, UK), Cosmin Stejerean (Meta, US)

Abstract: This paper presents a block stacking simulation developed for Apple Vision Pro (AVP) using Unity’s PolySpatial framework, designed to study both depth perception in spatial computing and physics comprehension of user-driven kinetic controls in augmented reality (AR). The simulation offers two interactive modes: a tower assembly mode and a removal mode. Each game session includes four stages with the virtual table positioned at various distances to observe user adaptation across varying virtual depths. User input is captured through eye tracking and hand tracking, and block behavior is handled by real-time physics simulation, which includes collision response, gravity, and mass-based interactions. The system supports two physics configurations: raw Unity physics and a modified variant with adjusted material and rigidbody parameters for improved stability and realism. It utilizes spatial computing features such as world anchoring to preserve spatial consistency and depth perception through stereoscopic rendering and dynamic shadows, so that users can better judge the spatial coordinates between virtual blocks and their physical surroundings. The simulation is intended to evaluate how 3D spatial rendering and physically realistic interactions contribute to immersion and task performance in AR environments. To assess user performance, the system records key interaction metrics to support analysis of learning progression, control accuracy, and adaptability across varying distances and physics configurations. This work contributes to the understanding of spatial and physics-based interaction design in AR and may inform future applications in education, simulation, and spatial gaming.

Posted in ATHENA | Comments Off on STACK: Spatial Tower Assembly using Controlled Kinetics

Perceptual JND Prediction for VMAF Using Content-Adaptive Dual-Path Attention

Perceptual JND Prediction for VMAF Using Content-Adaptive Dual-Path Attention

IEEE VCIP 2025

December 1 – December 4, 2025

Klagenfurt, Austria

[PDF]

MohammadAli Hamidi (University of Cagliari, Italy),  Hadi Amirpour (AAU, Austria), Christian Timmerer (AAU, Austria), Luigi Atzori (University of Cagliari, Italy)

Abstract: Just Noticeable Difference (JND) thresholds, particularly for quality metrics such as Video Multimethod Assessment Fusion (VMAF), are critical in streaming, helping identify when quality changes become perceptible and reducing redundant bitrate representations. The Satisfied User Ratio (SUR) complements JND by quantifying the percentage of users who do not perceive a difference, offering practical guidance for perceptually optimized streaming. This paper proposes a novel two-branch deep neural network (DNN) for predicting the 75% SUR for VMAF, the encoding level where 75% of viewers cannot perceive degradation. The framework combines handcrafted features (e.g., spatial and temporal indicators such as SI, TI, etc. and deep learning-based (DL-based) representations extracted via a convolutional neural network (CNN) backbone. The DL-based branch employs a spatio-temporal attention mechanism and a Long Short-Term Memory (LSTM) to capture temporal dynamics, while the handcrafted branch encodes interpretable indicators through a fully connected layer. Both outputs are fused and passed through a lightweight Multilayer Perceptron (MLP) to predict 75% SUR. To improve robustness to noise and label uncertainty, the model is trained using the Smooth-L1 loss. Experiments on the VideoSet dataset show our method outperforms SOTA across all metrics, achieving a notably higher R² score (0.46 vs. 0.36), indicating improved prediction reliability and low computational complexity, making it suitable for real-time video streaming.

Posted in ATHENA | Comments Off on Perceptual JND Prediction for VMAF Using Content-Adaptive Dual-Path Attention

Is there a relationship between Mean Opinion Score (MOS) and Just Noticeable Difference (JND)?

Is there a relationship between Mean Opinion Score (MOS) and Just Noticeable Difference (JND)?

IEEE VCIP 2025

December 1 – December 4, 2025

Klagenfurt, Austria

[PDF]

Jingwen Zhu (Nantes Université, France),  Hadi Amirpour (AAU, Austria), Wei Zhou (Cardiff, UK), Patrick Le Callet (Nantes Université, France)

Abstract: Evaluating perceived video quality is essential for ensuring high Quality of Experience (QoE) in modern streaming applications. While existing subjective datasets and Video Quality Metrics (VQMs) cover a broad quality range, many practical use cases—especially for premium users—focus on high-quality scenarios requiring finer granularity. Just Noticeable Difference (JND) has emerged as a key concept for modeling perceptual thresholds in these high-end regions and plays an important role in perceptual bitrate ladder construction. However, the relationship between JND and the more widely used Mean Opinion Score (MOS) remains unclear. In this paper, we conduct a Degradation Category Rating (DCR) subjective study based on an existing JND dataset to examine how MOS corresponds to the 75% Satisfied User Ratio (SUR) points of the 1st and 2nd JNDs. We find that while MOS values at JND points generally align with theoretical expectations (e.g., 4.75 for the 75% SUR of the 1st JND), the reverse mapping—from MOS to JND—is ambiguous due to overlapping confidence intervals across PVS indices. Statistical significance analysis further shows that DCR studies with limited participants may not detect meaningful differences between reference and JND videos.

Posted in ATHENA | Comments Off on Is there a relationship between Mean Opinion Score (MOS) and Just Noticeable Difference (JND)?