STACK: Spatial Tower Assembly using Controlled Kinetics

Perceptual JND Prediction for VMAF Using Content-Adaptive Dual-Path Attention

IEEE VCIP 2025

December 1 – December 4, 2025

Klagenfurt, Austria

[PDF]

Milad Ghanbari (AAU, Austria), Hadi Amirpour (AAU, Austria), Christian Timmerer (AAU, Austria), M.H. Izadimehr (AAU, Austria), Wei Zhou (Cardiff University, UK), Cosmin Stejerean (Meta, US)

Abstract: This paper presents a block stacking simulation developed for Apple Vision Pro (AVP) using Unity’s PolySpatial framework, designed to study both depth perception in spatial computing and physics comprehension of user-driven kinetic controls in augmented reality (AR). The simulation offers two interactive modes: a tower assembly mode and a removal mode. Each game session includes four stages with the virtual table positioned at various distances to observe user adaptation across varying virtual depths. User input is captured through eye tracking and hand tracking, and block behavior is handled by real-time physics simulation, which includes collision response, gravity, and mass-based interactions. The system supports two physics configurations: raw Unity physics and a modified variant with adjusted material and rigidbody parameters for improved stability and realism. It utilizes spatial computing features such as world anchoring to preserve spatial consistency and depth perception through stereoscopic rendering and dynamic shadows, so that users can better judge the spatial coordinates between virtual blocks and their physical surroundings. The simulation is intended to evaluate how 3D spatial rendering and physically realistic interactions contribute to immersion and task performance in AR environments. To assess user performance, the system records key interaction metrics to support analysis of learning progression, control accuracy, and adaptability across varying distances and physics configurations. This work contributes to the understanding of spatial and physics-based interaction design in AR and may inform future applications in education, simulation, and spatial gaming.

Posted in ATHENA | Comments Off on STACK: Spatial Tower Assembly using Controlled Kinetics

Perceptual JND Prediction for VMAF Using Content-Adaptive Dual-Path Attention

Perceptual JND Prediction for VMAF Using Content-Adaptive Dual-Path Attention

IEEE VCIP 2025

December 1 – December 4, 2025

Klagenfurt, Austria

[PDF]

MohammadAli Hamidi (University of Cagliari, Italy),  Hadi Amirpour (AAU, Austria), Christian Timmerer (AAU, Austria), Luigi Atzori (University of Cagliari, Italy)

Abstract: Just Noticeable Difference (JND) thresholds, particularly for quality metrics such as Video Multimethod Assessment Fusion (VMAF), are critical in streaming, helping identify when quality changes become perceptible and reducing redundant bitrate representations. The Satisfied User Ratio (SUR) complements JND by quantifying the percentage of users who do not perceive a difference, offering practical guidance for perceptually optimized streaming. This paper proposes a novel two-branch deep neural network (DNN) for predicting the 75% SUR for VMAF, the encoding level where 75% of viewers cannot perceive degradation. The framework combines handcrafted features (e.g., spatial and temporal indicators such as SI, TI, etc. and deep learning-based (DL-based) representations extracted via a convolutional neural network (CNN) backbone. The DL-based branch employs a spatio-temporal attention mechanism and a Long Short-Term Memory (LSTM) to capture temporal dynamics, while the handcrafted branch encodes interpretable indicators through a fully connected layer. Both outputs are fused and passed through a lightweight Multilayer Perceptron (MLP) to predict 75% SUR. To improve robustness to noise and label uncertainty, the model is trained using the Smooth-L1 loss. Experiments on the VideoSet dataset show our method outperforms SOTA across all metrics, achieving a notably higher R² score (0.46 vs. 0.36), indicating improved prediction reliability and low computational complexity, making it suitable for real-time video streaming.

Posted in ATHENA | Comments Off on Perceptual JND Prediction for VMAF Using Content-Adaptive Dual-Path Attention

Is there a relationship between Mean Opinion Score (MOS) and Just Noticeable Difference (JND)?

Is there a relationship between Mean Opinion Score (MOS) and Just Noticeable Difference (JND)?

IEEE VCIP 2025

December 1 – December 4, 2025

Klagenfurt, Austria

[PDF]

Jingwen Zhu (Nantes Université, France),  Hadi Amirpour (AAU, Austria), Wei Zhou (Cardiff, UK), Patrick Le Callet (Nantes Université, France)

Abstract: Evaluating perceived video quality is essential for ensuring high Quality of Experience (QoE) in modern streaming applications. While existing subjective datasets and Video Quality Metrics (VQMs) cover a broad quality range, many practical use cases—especially for premium users—focus on high-quality scenarios requiring finer granularity. Just Noticeable Difference (JND) has emerged as a key concept for modeling perceptual thresholds in these high-end regions and plays an important role in perceptual bitrate ladder construction. However, the relationship between JND and the more widely used Mean Opinion Score (MOS) remains unclear. In this paper, we conduct a Degradation Category Rating (DCR) subjective study based on an existing JND dataset to examine how MOS corresponds to the 75% Satisfied User Ratio (SUR) points of the 1st and 2nd JNDs. We find that while MOS values at JND points generally align with theoretical expectations (e.g., 4.75 for the 75% SUR of the 1st JND), the reverse mapping—from MOS to JND—is ambiguous due to overlapping confidence intervals across PVS indices. Statistical significance analysis further shows that DCR studies with limited participants may not detect meaningful differences between reference and JND videos.

Posted in ATHENA | Comments Off on Is there a relationship between Mean Opinion Score (MOS) and Just Noticeable Difference (JND)?

Patent Approval for “Efficient two-pass encoding scheme for adaptive live streaming”

Efficient two-pass encoding scheme for adaptive live streaming

US Patent

[PDF]

Vignesh Menon (Alpen-Adria-Universität Klagenfurt, Austria), Hadi Amirpour (Alpen-Adria-Universität Klagenfurt, Austria), and Christian Timmerer (Alpen-Adria-Universität Klagenfurt, Austria)

 

Abstract: Techniques for efficient two-pass encoding for live streaming are described herein. A method for efficient two-pass encoding may include extracting low-complexity features of a video segment, predicting an optimized constant rate factor (CRF) for the video segment using the low-complexity features, and encoding the video segment with the optimized CRF at a target bitrate. A system for efficient two-pass encoding may include a feature extraction module configured to extract low-complexity features from a video segment, a neural network configured to predict an optimized CRF as a function of the low-complexity features and a target bitrate, and an encoder configured to encode the video segment using the optimized CRF at the target bitrate.

Posted in ATHENA | Comments Off on Patent Approval for “Efficient two-pass encoding scheme for adaptive live streaming”

X4-MATCH: Sustainable Prediction-based Distribution of Video Encoding on Cloud and Edge

X4-MATCH: Sustainable Prediction-based Distribution of Video Encoding on Cloud and Edge

40th IEEE International Parallel & Distributed Processing Symposium

May 25-29, 2026
New Orleans, USA
https://www.ipdps.org/

[PDF]

Samira Afzal (Baylor University), Narges Mehran (University of Salzburg), Andrew C. Freeman (Baylor University), Manuel Hoi (University of Klagenfurt), Armin Lachini (University of Klagenfurt), Radu Prodan (University of Innsbruck), Christian Timmerer (University of Klagenfurt)

Abstract: The rapid expansion of video traffic has made it one of the most energy-intensive workloads on cloud and edge infrastructures. As encoding remains essential for streaming, gaming, and immersive applications, efficient task scheduling is required to balance service quality, cost efficiency, and sustainability. In this work, we propose a sustainable scheduling framework that integrates machine learning–based performance prediction with game-theoretic matching (X4-MATCH), designed to distribute video encoding workloads across cloud–edge infrastructures. The framework formulates four key performance metrics, including processing and transmission time, price, energy use, and CO2 emissions, as optimization objectives to balance performance and sustainability goals. This method leverages the eXtra-trees regressor model to predict performance metrics for video encoding tasks, integrated with a Matching theory-based resource allocation strategy to efficiently utilize computational resources across cloud and edge computing resources. We experimentally validate the effectiveness of X4-MATCH on a real-world testbed incorporating Amazon Web Services (AWS) cloud virtual machines/instances and local edge servers. Results show that X4-MATCH outperforms state-of-the-art methods by reducing total time by 63.3%, price by 54.2%, and energy by 56.8%.

Index Terms: Video encoding, energy efficiency, cloud and edge, matching theory, extra-trees regressor

X4-MATCH distribution overview

Posted in ATHENA, GAIA | Comments Off on X4-MATCH: Sustainable Prediction-based Distribution of Video Encoding on Cloud and Edge

Eye-Tracking, Quality Assessment, and QoE Prediction Models for Point Cloud Videos: Extended Analysis of the ComPEQ-MR Dataset

Eye-Tracking, Quality Assessment, and QoE Prediction Models for Point Cloud Videos: Extended Analysis of the ComPEQ-MR Dataset

IEEE Access 2025

[PDF]

Shivi Vats (AAU, Austria), Minh Nguyen (AAU, Austria)*, Christian Timmerer (AAU, Austria), Hermann Hellwagner (AAU, Austria)

Abstract: Point cloud videos, also termed dynamic point clouds (DPCs), have the potential to provide immersive experiences with six degrees of freedom (6DoF). However, there are still several open issues in understanding the Quality of Experience (QoE) and visual attention of end users while experiencing 6DoF volumetric videos. For instance, the quality impact of compressing DPCs, which requires a significant amount of both time and computational resources, needs further investigation. Also, QoE prediction models for DPCs in 6DoF have rarely been developed due to the lack of visual quality databases. Furthermore, visual attention in 6DoF is hardly explored, which impedes research into more sophisticated approaches for adaptive streaming of DPCs. In this paper, we review and analyze in detail the open-source Compressed Point cloud dataset with Eye-tracking and Quality assessment in Mixed Reality (ComPEQ–MR). The dataset, initially presented in [24], comprises 4 uncompressed (raw) DPCs as well as compressed versions processed by Moving Picture Experts Group (MPEG) reference tools (i.e., VPCC and 2 GPCC variants). The dataset includes eye-tracking data of 41 study participants watching the raw DPCs with 6DoF, yielding 164 visual attention maps. We analyze this data and present head and gaze movement results here. The dataset also includes results from subjective tests conducted to assess the quality of the DPCs, each both uncompressed and compressed with 12 levels of distortion, resulting in 2132 quality scores. This work presents the QoE performance results of the compression techniques, the factors with significant impact on participant ratings, and the correlation of the objective Peak Signal-to-Noise Ratio (PSNR) metrics with Mean Opinion Scores (MOS). The results indicate superior performance of the VPCC codec as well as significant variations in quality ratings based on codec choice, bitrate, and quality/distortion level, providing insights for optimizing point cloud video compression in MR applications. Finally, making use of the subjective scores, we trained and evaluated models for QoE prediction for DPCs compressed using the pertinent MPEG tools. We present the models and their prediction results, noting that the fine-tuned ITU-T P.1203 models exhibit good correlation with the subjective ratings. The dataset is available at https://ftp.itec.aau.at/datasets/ComPEQ-MR/.

* Minh Nguyen is currently a Research Associate at Fraunhofer FOKUS, Germany but this work was done when he was working for AAU.
Posted in SPIRIT | Comments Off on Eye-Tracking, Quality Assessment, and QoE Prediction Models for Point Cloud Videos: Extended Analysis of the ComPEQ-MR Dataset

STEP-MR: A Subjective Testing and Eye-Tracking Platform for Dynamic Point Clouds in Mixed Reality

STEP-MR: A Subjective Testing and Eye-Tracking Platform for Dynamic Point Clouds in Mixed Reality

MMM 2026

January 29 – January 31, 2026

Prague, Czech Republic

[PDF, Poster]

Shivi Vats (AAU, Austria), Christian Timmerer (AAU, Austria), Hermann Hellwagner (AAU, Austria)

Abstract: The use of point cloud (PC) streaming in mixed reality (MR) environments is of particular interest due to the immersiveness and the six degrees of freedom (6DoF) provided by the 3D content. However, this immersiveness requires significant bandwidth. Innovative solutions have been developed to address these challenges, such as PC compression and/or spatially tiling the PC to stream different portions at different quality levels. This paper presents a brief overview of a Subjective Testing and Eye-tracking Platform for dynamic point clouds in Mixed Reality (STEP-MR) for the Microsoft HoloLens 2. STEP-MR was used to conduct subjective tests (described in [1]) with 41 participants, yielding over 2000 responses and more than 150 visual attention maps, the results of which can be used, among other things, to improve dynamic (animated) point cloud streaming solutions mentioned above. Building on our previous platform, the new version now enables eye-tracking tests, including calibration and heatmap generation. Additionally, STEP-MR features modifications to the subjective tests’ functionality, such as a new rating scale and adaptability to participant movement during the tests, along with other user experience changes.

[1] Nguyen, M., Vats, S., Zhou, X., Viola, I., Cesar, P., Timmerer, C., & Hellwagner, H. (2024). ComPEQ-MR: Compressed Point Cloud Dataset with Eye Tracking and Quality Assessment in Mixed Reality. Proceedings of the 15th ACM Multimedia Systems Conference, 367–373. https://doi.org/10.1145/3625468.3652182
Posted in SPIRIT | Comments Off on STEP-MR: A Subjective Testing and Eye-Tracking Platform for Dynamic Point Clouds in Mixed Reality