BMVC’25: Cross-Modal Scene Semantic Alignment for Image Complexity Assessment

Cross-Modal Scene Semantic Alignment for Image Complexity Assessment

British Machine Vision Conference (BMVC) 2025

November, 2025

Sheffield, UK

[PDF]

Yuqing Luo, YIXIAO LI, Jiang Liu, Jun Fu, Hadi Amirpour, Guanghui Yue, Baoquan Zhao, Padraig Corcoran, Hantao Liu, Wei Zhou

Abstract: Image complexity assessment (ICA) is a challenging task in perceptual evaluation due to the subjective nature of human perception and the inherent semantic diversity in real-world images. Existing ICA methods predominantly rely on hand-crafted or shallow convolutional neural network-based features of a single visual modality, which are insufficient to fully capture the perceived representations closely related to image complexity. Recently, cross-modal scene semantic information has been shown to play a crucial role in various computer vision tasks, particularly those involving perceptual understanding. However, the exploration of cross-modal scene semantic information in the context of ICA remains unaddressed. Therefore, in this paper, we propose a novel ICA method called Cross-Modal Scene Semantic Alignment (CM-SSA), which leverages scene semantic alignment from a cross-modal perspective to enhance ICA performance, enabling complexity predictions to be more consistent with subjective human perception. Specifically, the proposed CM-SSA consists of a complexity regression branch and a scene semantic alignment branch. The complexity regression branch estimates image complexity levels under the guidance of the scene semantic alignment branch, while the scene semantic alignment branch is used to align images with corresponding text prompts that convey rich scene semantic information by pair-wise learning. Extensive experiments on several ICA datasets demonstrate that the proposed CM-SSA significantly outperforms state-of-the-art approaches.

Posted in ATHENA | Comments Off on BMVC’25: Cross-Modal Scene Semantic Alignment for Image Complexity Assessment

Interns at ATHENA (Summer 2025)

     

In July 2025, the ATHENA Christian Doppler Laboratory hosted four interns working on the following topics:

  • Leon Kordasch: Large-scale 4K 60fps video dataset
  • Theresa Petschenig: Video generation and quality assessment

At the conclusion of their internships, the interns showcased their projects and findings, earning official certificates from the university. The collaboration proved to be a rewarding experience for both the interns and the researchers at ATHENA. Through personalized mentorship, hands-on training, and ongoing support, the interns benefited from an enriched learning journey. This comprehensive guidance enabled them to build strong practical skills while deepening their understanding of research methodologies and technologies in the video streaming domain. We sincerely thank both interns for their enthusiasm, dedication, and insightful feedback, which contributed meaningfully to the ATHENA lab’s ongoing efforts.

Leon Kordasch: My internship at ATHENA was an incredibly valuable experience. The team was welcoming and supportive, and I especially appreciated the guidance of my supervisor, Mohammad Ghasempour, who did a great job explaining the theoretical background and technical concepts needed for my work. During my time there, I developed a high-quality and diverse 4K60 video dataset for applications such as AI training, real-time upscaling, and advanced video encoding research.

Theresa Petschenig: My four-week internship at ATHENA was a really enjoyable and meaningful experience. I worked on a project related to video generation and quality assessment, which allowed me to dive into some fascinating topics. I got a much better understanding of how AI-generated videos are created and evaluated, and what makes them look realistic. The internship gave me a perfect balance of practical work and learning new concepts. My supervisor, Yiying, was very nice and helpful throughout the internship. The atmosphere in the office was calm and welcoming, and the team was really friendly. I’m grateful for everything I’ve learned and for the chance to be part of such a supportive environment. This experience gave me both valuable knowledge and great memories.

Posted in ATHENA | Comments Off on Interns at ATHENA (Summer 2025)

STEP-MR: A Subjective Testing and Eye-Tracking Platform for Dynamic Point Clouds in Mixed Reality

STEP-MR: A Subjective Testing and Eye-Tracking Platform for Dynamic Point Clouds in Mixed Reality

EuroXR 2025

September 03 – September 05, 2025

Winterthur, Switzerland

Shivi Vats (AAU, Austria), Christian Timmerer (AAU, Austria), Hermann Hellwagner (AAU, Austria)

Please Note: While the paper was accepted and presented at EuroXR 2025, it was not published in the proceedings. Please see the new post for the accepted paper to MMM ’26 here for the PDF.

Abstract: The use of point cloud (PC) streaming in mixed reality (MR) environments is of particular interest due to the immersiveness and the six degrees of freedom (6DoF) provided by the 3D content. However, this immersiveness requires significant bandwidth. Innovative solutions have been developed to address these challenges, such as PC compression and/or spatially tiling the PC to stream different portions at different quality levels. This paper presents a brief overview of a Subjective Testing and Eye-tracking Platform for dynamic point clouds in Mixed Reality (STEP-MR) for the Microsoft HoloLens 2. STEP-MR was used to conduct subjective tests (described in [1]) with 41 participants, yielding over 2000 responses and more than 150 visual attention maps, the results of which can be used, among other things, to improve dynamic (animated) point cloud streaming solutions mentioned above. Building on our previous platform , the new version now enables eye-tracking tests, including calibration and heatmap generation. Additionally, STEP-MR features modifications to the subjective tests’ functionality, such as a new rating scale and adaptability to participant movement during the tests, along with other user experience changes.

[1] Nguyen, M., Vats, S., Zhou, X., Viola, I., Cesar, P., Timmerer, C., & Hellwagner, H. (2024). ComPEQ-MR: Compressed Point Cloud Dataset with Eye Tracking and Quality Assessment in Mixed Reality. Proceedings of the 15th ACM Multimedia Systems Conference, 367–373. https://doi.org/10.1145/3625468.3652182
Posted in SPIRIT | Comments Off on STEP-MR: A Subjective Testing and Eye-Tracking Platform for Dynamic Point Clouds in Mixed Reality

ACM MM’25 Open Source: diveXplore – An Open-Source Software for Modern Video Retrieval with Image/Text Embeddings

diveXplore – An Open-Source Software for Modern Video Retrieval with Image/Text Embeddings

ACM Multimedia 2025

October 27 – October 31, 2025

Dublin, Ireland

[PDF]

Mario Leopold (AAU, Austria), Farzad Tashtarian (AAU, Austria), Klaus Schöffmann (AAU, Austria)

Abstract:Effective video retrieval in large-scale datasets presents a significant challenge, with existing tools often being too complex, lacking sufficient retrieval capabilities, or being too slow for rapid search tasks. This paper introduces diveXplore, an open-source software designed for interactive video retrieval. Due to its success in various competitions like the Video Browser Showdown (VBS) and the Interactive Video Retrieval 4 Beginners (IVR4B), as well as its continued development since 2017, diveXplore is a solid foundation for various kinds of retrieval tasks. The system is built on a three-layer architecture, comprising a backend for offline preprocessing, a middleware with a Node.js and Python server for query handling, and a MongoDB for metadata storage, as well as an Angular-based frontend for user interaction. Key functionalities include free-text search using natural language, temporal queries, similarity search, and other specialized search strategies. By open-sourcing diveXplore, we aim to establish a solid baseline for future research and development in the video retrieval community, encouraging contributions and adaptations for a wide range of use cases, even beyond competitive settings.

Posted in ATHENA | Comments Off on ACM MM’25 Open Source: diveXplore – An Open-Source Software for Modern Video Retrieval with Image/Text Embeddings

Patent Approval for “Content-adaptive encoder preset prediction for adaptive live streaming”

Content-adaptive encoder preset prediction for adaptive live streaming

US Patent

[PDF]

Vignesh Menon (Alpen-Adria-Universität Klagenfurt, Austria), Hadi Amirpour (Alpen-Adria-Universität Klagenfurt, Austria), and Christian Timmerer (Alpen-Adria-Universität Klagenfurt, Austria)

 

Abstract: Techniques for content-adaptive encoder preset prediction for adaptive live streaming are described herein. A method for content-adaptive encoder preset prediction for adaptive live streaming includes performing video complexity feature extraction on a video segment to extract complexity features such as an average texture energy, an average temporal energy, and an average lumiscence. These inputs may be provided to an encoding time prediction model, along with a bitrate ladder, a resolution set, a target video encoding speed, and a number of CPU threads for the video segment, to predict an encoding time, and an optimized encoding preset may be selected for the video segment by a preset selection function using the predicted encoding time. The video segment may be encoded according to the optimized encoding preset.

Posted in ATHENA | Comments Off on Patent Approval for “Content-adaptive encoder preset prediction for adaptive live streaming”

ICCV VQualA’25: VQualA 2025 Challenge on Image Super-Resolution Generated Content Quality Assessment: Methods and Results

VQualA 2025 Challenge on Image Super-Resolution Generated Content Quality Assessment: Methods and Results

ICCV VQualA 2025

October 19 – October 23, 2025

Hawai’i, USA

[PDF]

Hadi Amirpour (AAU, Austria), et al.

Abstract: This paper presents the ISRGC-Q Challenge, built upon the Image Super-Resolution Generated Content Quality Assessment (ISRGen-QA) dataset, and organized as part of the Visual Quality Assessment (VQualA) Competition at the ICCV 2025 Workshops. Unlike existing Super-Resolution Image Quality Assessment (SR-IQA) datasets, ISRGen-QA places greater emphasis on SR images generated by the latest generative approaches, including Generative Adversarial Networks (GANs) and diffusion models. The primary goal of this challenge is to analyze the unique artifacts introduced by modern super-resolution techniques and to evaluate their perceptual quality effectively. A total of 108 participants registered for the challenge, with 4 teams submitting valid solutions and fact sheets for the final testing phase. These submissions demonstrated state-of-the-art (SOTA) performance on the ISRGen-QA dataset. The project is publicly available at: https://github.com/Lighting-YXLI/ISRGen-QA.

Posted in ATHENA | Comments Off on ICCV VQualA’25: VQualA 2025 Challenge on Image Super-Resolution Generated Content Quality Assessment: Methods and Results

ICCV VQualA’25: VQualA 2025 Challenge on Face Image Quality Assessment: Methods and Results

VQualA 2025 Challenge on Face Image Quality Assessment: Methods and Results

ICCV VQualA 2025

October 19 – October 23, 2025

Hawai’i, USA

[PDF]

MohammadAli Hamidi (University of Cagliari, Italy), Hadi Amirpour (AAU, Austria), et al.

Abstract: Face images have become integral to various applications. but real-world capture conditions often lead to degradations such as noise, blur, compression artifacts, and poor lighting. These degradations negatively impact image quality and downstream tasks. To promote advancements in face image quality assessment (FIQA), we introduce the VQualA 2025 Challenge on Face Image Quality Assessment, part of ICCV 2025 Workshops. Participants developed efficient models (≤0.5 GFLOPs, ≤5M parameters) predicting Mean Opinion Scores (MOS) under realistic degradations. Submissions were rigorously evaluated using objective metrics and human perceptual judgments. The challenge attracted 127 participants, resulting in 1519 valid final submissions. Detailed methodologies and results are presented, contributing to practical FIQA solutions.

 

Posted in ATHENA | Comments Off on ICCV VQualA’25: VQualA 2025 Challenge on Face Image Quality Assessment: Methods and Results