ATHENA Christian Doppler (CD) Laboratory

ACM MM’25: SVD: Spatial Video Dataset

Posted on August 4, 2025 by

SVD: Spatial Video Dataset

ACM Multimedia 2025

October 27 – October 31, 2025

Dublin, Ireland

[PDF]

MH Izadimehr, Milad Ghanbari, Guodong Chen, Wei Zhou, Xiaoshuai Hao, Mallesham Dasari, Christian Timmerer, Hadi Amirpour

Abstract: Stereoscopic video has long been the subject of research due to its ability to deliver immersive three-dimensional content to a wide range of applications, from virtual and augmented reality to advanced human–computer interaction. The dual‑view format inherently provides binocular disparity cues that enhance depth perception and realism, making it indispensable for fields such as telepresence, 3D mapping, and robotic vision. Until recently, however, end‑to‑end pipelines for capturing, encoding, and viewing high‑quality 3D video were neither widely accessible nor optimized for consumer‑grade devices. Today’s smartphones, such as the iPhone Pro and modern HMDs like the AVP, offer built‑in support for stereoscopic video capture, hardware‑accelerated encoding, and seamless playback on devices like the AVP and Meta Quest 3, which require minimal user intervention. Apple refers to this streamlined workflow as spatial Video. Making the full stereoscopic video process available to everyone has made new applications possible. Despite these advances, there remains a notable absence of publicly available datasets that include the complete spatial video pipeline on consumer platforms, hindering reproducibility and comparative evaluation of emerging algorithms.

In this paper, we introduce SVD, a spatial video dataset comprising 300 five-second video sequences, i.e., 150 captured using an iPhone Pro and 150 with an AVP. Additionally, 10 longer videos with a minimum duration of 2 minutes have been recorded. The SVD is publicly released under an open source license to facilitate research in codec performance evaluation, subjective and objective Quality of Experience assessment, depth‑based computer vision, stereoscopic video streaming, and other emerging 3D applications such as neural rendering and volumetric capture. Link to the dataset: https://cd-athena.github.io/SVD/.

Posted in ATHENA | Comments Off

ACM MM’25 BNI: GenStream: Semantic Streaming Framework for Generative Reconstruction of Human-centric Media

Posted on August 1, 2025 by

GenStream: Semantic Streaming Framework for Generative Reconstruction of Human-centric Media

ACM Multimedia 2025

October 27 – October 31, 2025

Dublin, Ireland

[PDF]

Emanuele Artioli (AAU, Austria), Daniele Lorenzi (AAU, Austria), Shivi Vats (AAU, Austria),Farzad Tashtarian (AAU, Austria), Christian Timmerer (AAU, Austria)

Abstract: Video streaming dominates global internet traffic, yet conventional pipelines remain inefficient for structured, human-centric content such as sports, performance, or interactive media. Standard codecs re-encode entire frames, foreground and background alike, treating all pixels uniformly and ignoring the semantic structure of the scene. This leads to significant bandwidth waste, particularly in scenarios where backgrounds are static and motion is constrained to a few salient actors. We introduce GenStream, a semantic streaming framework that replaces dense video frames with compact, structured metadata. Instead of transmitting pixels, GenStream encodes each scene as a combination of skeletal keypoints, camera viewpoint parameters, and a static 3D background model. These elements are transmitted to the client, where a generative model reconstructs photorealistic human figures and composites them into the 3D scene from the original viewpoint. This paradigm enables extreme compression, achieving over 99.9% bandwidth reduction compared to HEVC. We partially validate GenStream on Olympic figure skating footage and demonstrate potential high perceptual fidelity under minimal data. Looking forward, GenStream opens new directions in volumetric avatar synthesis, canonical 3D actor fusion across views, personalized and immersive viewing experiences at arbitrary viewpoints, and lightweight scene reconstruction, laying the groundwork for scalable, intelligent streaming in the post-codec era.

Posted in ATHENA | Comments Off

ACM MM’25: Nature-1k: The Raw Beauty of Nature in 4K at 60FPS

Posted on August 1, 2025 by

Nature-1k: The Raw Beauty of Nature in 4K at 60FPS

ACM Multimedia 2025

October 27 – October 31, 2025

Dublin, Ireland

[PDF]

Mohammad Ghasempour (AAU, Austria), Hadi Amirpour (AAU, Austria), Christian Timmerer (AAU, Austria)

Abstract: The push toward data-driven video processing, combined with recent advances in video coding and streaming technologies, has fueled the need for diverse, large-scale, and high-quality video datasets. However, the limited availability of such datasets remains a key barrier to the development of next-generation video processing solutions. In this paper, we introduce Nature-1k, a large-scale video dataset consisting of 1000 professionally captured 4K Ultra High Definition (UHD) videos, each recorded at 60fps. The dataset covers a wide range of environments, lighting conditions, texture complexities, and motion patterns. To maintain temporal consistency, which is crucial for spatio-temporal learning applications, the dataset avoids scene cuts within the sequences. We further characterize the dataset using established metrics, including spatial and temporal video complexity metrics, as well as colorfulness, brightness, and contrast distribution. Moreover, Nature-1k includes a compressed version to support rapid prototyping and lightweight testing. The quality of the compressed videos is evaluated using four commonly used video quality metrics: PSNR, SSIM, MS-SSIM, and VMAF. Finally, we compare Nature-1k with existing datasets to demonstrate its superior quality and content diversity. The dataset is suitable for a wide range of applications, including Generative Artificial Intelligence (AI), video super-resolution and enhancement, video interpolation, as well as video coding, and adaptive video streaming optimization. Dataset URL: Link

Posted in ATHENA | Comments Off

Receiving Kernel-Level Insights via eBPF: Can ABR Algorithms Adapt Smarter?

Posted on July 31, 2025 by

Receiving Kernel-Level Insights via eBPF: Can ABR Algorithms Adapt Smarter?

Würzburg Workshop on Next-Generation Communication Networks (WueWoWAS) 2025

6 – 8 Oct 2025, Würzburg, Germany

[PDF]

Mohsen Ghasemi (Sharif University of Technology, Iran); Daniele Lorenzi (Alpen-Adria-Universität Klagenfurt, Austria); Mahdi Dolati (Sharif University of Technology, Iran); Farzad Tashtarian (Alpen-Adria Universität Klagenfurt, Austria); Sergey Gorinsky (IMDEA Networks Institute, Spain); Christian Timmerer (Alpen-Adria-Universität Klagenfurt & Bitmovin, Austria)

Abstract: The rapid rise of video streaming services such as Netflix and YouTube has made video delivery the largest driver of global Internet traffic, including mobile networks such as 5G or the upcoming 6G network. To maintain playback quality, client devices employ Adaptive Bitrate (ABR) algorithms that adjust video quality based on metrics like available bandwidth and buffer occupancy. However, these algorithms often react slowly to sudden bandwidth fluctuations due to limited visibility
into network conditions, leading to stall events that significantly degrade the user’s Quality of Experience (QoE). In this work, we introduce CaBR, a Congestion-aware adaptive BitRate decision module designed to operate on top of existing ABR algorithms. CaBR enhances video streaming performance by leveraging real-time, in-kernel network telemetry collected via the extended Berkeley Packet Filter (eBPF). By utilizing congestion metrics such as queue lengths observed at network switches, CaBR refines the bitrate selection of the underlying ABR algorithms for upcoming segments, enabling faster adaptation to changing network conditions. Our evaluation shows that CaBR significantly reduces the playback stalls and improves QoE by up to 25% compared to state-of-the-art approaches in a congested environment.

Posted in ATHENA | Comments Off

BMVC’25: Cross-Modal Scene Semantic Alignment for Image Complexity Assessment

Posted on July 31, 2025 by

Cross-Modal Scene Semantic Alignment for Image Complexity Assessment

British Machine Vision Conference (BMVC) 2025

November, 2025

Sheffield, UK

[PDF]

Yuqing Luo, YIXIAO LI, Jiang Liu, Jun Fu, Hadi Amirpour, Guanghui Yue, Baoquan Zhao, Padraig Corcoran, Hantao Liu, Wei Zhou

Abstract: Image complexity assessment (ICA) is a challenging task in perceptual evaluation due to the subjective nature of human perception and the inherent semantic diversity in real-world images. Existing ICA methods predominantly rely on hand-crafted or shallow convolutional neural network-based features of a single visual modality, which are insufficient to fully capture the perceived representations closely related to image complexity. Recently, cross-modal scene semantic information has been shown to play a crucial role in various computer vision tasks, particularly those involving perceptual understanding. However, the exploration of cross-modal scene semantic information in the context of ICA remains unaddressed. Therefore, in this paper, we propose a novel ICA method called Cross-Modal Scene Semantic Alignment (CM-SSA), which leverages scene semantic alignment from a cross-modal perspective to enhance ICA performance, enabling complexity predictions to be more consistent with subjective human perception. Specifically, the proposed CM-SSA consists of a complexity regression branch and a scene semantic alignment branch. The complexity regression branch estimates image complexity levels under the guidance of the scene semantic alignment branch, while the scene semantic alignment branch is used to align images with corresponding text prompts that convey rich scene semantic information by pair-wise learning. Extensive experiments on several ICA datasets demonstrate that the proposed CM-SSA significantly outperforms state-of-the-art approaches.

Posted in ATHENA | Comments Off

Interns at ATHENA (Summer 2025)

Posted on July 30, 2025 by

In July 2025, the ATHENA Christian Doppler Laboratory hosted four interns working on the following topics:

Leon Kordasch: Large-scale 4K 60fps video dataset
Theresa Petschenig: Video generation and quality assessment

At the conclusion of their internships, the interns showcased their projects and findings, earning official certificates from the university. The collaboration proved to be a rewarding experience for both the interns and the researchers at ATHENA. Through personalized mentorship, hands-on training, and ongoing support, the interns benefited from an enriched learning journey. This comprehensive guidance enabled them to build strong practical skills while deepening their understanding of research methodologies and technologies in the video streaming domain. We sincerely thank both interns for their enthusiasm, dedication, and insightful feedback, which contributed meaningfully to the ATHENA lab’s ongoing efforts.

Leon Kordasch: “My internship at ATHENA was an incredibly valuable experience. The team was welcoming and supportive, and I especially appreciated the guidance of my supervisor, Mohammad Ghasempour, who did a great job explaining the theoretical background and technical concepts needed for my work. During my time there, I developed a high-quality and diverse 4K60 video dataset for applications such as AI training, real-time upscaling, and advanced video encoding research.”

Theresa Petschenig: “My four-week internship at ATHENA was a really enjoyable and meaningful experience. I worked on a project related to video generation and quality assessment, which allowed me to dive into some fascinating topics. I got a much better understanding of how AI-generated videos are created and evaluated, and what makes them look realistic. The internship gave me a perfect balance of practical work and learning new concepts. My supervisor, Yiying, was very nice and helpful throughout the internship. The atmosphere in the office was calm and welcoming, and the team was really friendly. I’m grateful for everything I’ve learned and for the chance to be part of such a supportive environment. This experience gave me both valuable knowledge and great memories.”

Posted in ATHENA | Comments Off

STEP-MR: A Subjective Testing and Eye-Tracking Platform for Dynamic Point Clouds in Mixed Reality

Posted on July 28, 2025 by

STEP-MR: A Subjective Testing and Eye-Tracking Platform for Dynamic Point Clouds in Mixed Reality

EuroXR 2025

September 03 – September 05, 2025

Winterthur, Switzerland

Shivi Vats (AAU, Austria), Christian Timmerer (AAU, Austria), Hermann Hellwagner (AAU, Austria)

Please Note: While the paper was accepted and presented at EuroXR 2025, it was not published in the proceedings. Please see the new post for the accepted paper to MMM ’26 here for the PDF.

Abstract: The use of point cloud (PC) streaming in mixed reality (MR) environments is of particular interest due to the immersiveness and the six degrees of freedom (6DoF) provided by the 3D content. However, this immersiveness requires significant bandwidth. Innovative solutions have been developed to address these challenges, such as PC compression and/or spatially tiling the PC to stream different portions at different quality levels. This paper presents a brief overview of a Subjective Testing and Eye-tracking Platform for dynamic point clouds in Mixed Reality (STEP-MR) for the Microsoft HoloLens 2. STEP-MR was used to conduct subjective tests (described in [1]) with 41 participants, yielding over 2000 responses and more than 150 visual attention maps, the results of which can be used, among other things, to improve dynamic (animated) point cloud streaming solutions mentioned above. Building on our previous platform , the new version now enables eye-tracking tests, including calibration and heatmap generation. Additionally, STEP-MR features modifications to the subjective tests’ functionality, such as a new rating scale and adaptability to participant movement during the tests, along with other user experience changes.

[1] Nguyen, M., Vats, S., Zhou, X., Viola, I., Cesar, P., Timmerer, C., & Hellwagner, H. (2024). ComPEQ-MR: Compressed Point Cloud Dataset with Eye Tracking and Quality Assessment in Mixed Reality. Proceedings of the 15th ACM Multimedia Systems Conference, 367–373. https://doi.org/10.1145/3625468.3652182

Posted in SPIRIT | Comments Off

Enter your email Address

ACM MM’25: SVD: Spatial Video Dataset

SVD: Spatial Video Dataset

ACM Multimedia 2025

October 27 – October 31, 2025

Dublin, Ireland

ACM MM’25 BNI: GenStream: Semantic Streaming Framework for Generative Reconstruction of Human-centric Media

GenStream: Semantic Streaming Framework for Generative Reconstruction of Human-centric Media

ACM Multimedia 2025

October 27 – October 31, 2025

Dublin, Ireland

ACM MM’25: Nature-1k: The Raw Beauty of Nature in 4K at 60FPS

Nature-1k: The Raw Beauty of Nature in 4K at 60FPS

ACM Multimedia 2025

October 27 – October 31, 2025

Dublin, Ireland

Receiving Kernel-Level Insights via eBPF: Can ABR Algorithms Adapt Smarter?

Receiving Kernel-Level Insights via eBPF: Can ABR Algorithms Adapt Smarter?

6 – 8 Oct 2025, Würzburg, Germany

BMVC’25: Cross-Modal Scene Semantic Alignment for Image Complexity Assessment

Cross-Modal Scene Semantic Alignment for Image Complexity Assessment

British Machine Vision Conference (BMVC) 2025

November, 2025

Yuqing Luo, YIXIAO LI, Jiang Liu, Jun Fu, Hadi Amirpour, Guanghui Yue, Baoquan Zhao, Padraig Corcoran, Hantao Liu, Wei Zhou

Interns at ATHENA (Summer 2025)

STEP-MR: A Subjective Testing and Eye-Tracking Platform for Dynamic Point Clouds in Mixed Reality

STEP-MR: A Subjective Testing and Eye-Tracking Platform for Dynamic Point Clouds in Mixed Reality

EuroXR 2025

September 03 – September 05, 2025

Winterthur, Switzerland

Project Funding

Archives

Links

Multimedia Communication

ITEC Homepage

Recent Posts