ICCV 2025 Workshop: Visual Quality Assessment Competition

Visual Quality Assessment Competition

VQualA

co-located with ICCV 2025

https://vquala.github.io/

VQualA Logo

Visual quality assessment plays a crucial role in computer vision, serving as a fundamental step in tasks such as image quality assessment (IQA), image super-resolution, document image enhancement, and video restoration. Traditional visual quality assessment techniques often rely on scalar metrics like Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM), which, while effective in certain contexts, fall short in capturing the perceptual quality experienced by human observers. This gap emphasizes the need for more perceptually aligned and comprehensive evaluation methods that can adapt to the growing demands of applications such as medical imaging, satellite remote sensing, immersive media, and document processing. In recent years, advancements in deep learning, generative models, and multimodal large language models (MLLMs) have opened up new avenues for visual quality assessment. These models offer capabilities that extend beyond traditional scalar metrics, enabling more nuanced assessments through natural language explanations, open-ended visual comparisons, and enhanced context awareness. With these innovations, VQA is evolving to better reflect human perceptual judgments, making it a critical enabler for next-generation computer vision applications.

The VQualA Workshop aims to bring together researchers and practitioners from academia and industry to discuss and explore the latest trends, challenges, and innovations in visual quality assessment. We welcome original research contributions addressing, but not limited to, the following topics:

  • Image and video quality assessment
  • Perceptual quality assessment techniques
  • Multi-modal quality evaluation (image, video, text)
  • Visual quality assessment for immersive media (VR/AR)
  • Document image enhancement and quality analysis
  • Quality assessment under adverse conditions (low light, weather distortions, motion blur)
  • Robust quality metrics for medical and satellite imaging
  • Perceptual-driven image and video super-resolution
  • Visual quality in restoration tasks (denoising, deblurring, upsampling)
  • Human-centric visual quality assessment
  • Learning-based quality assessment models (CNNs, Transformers, MLLMs)
  • Cross-domain visual quality adaptation
  • Benchmarking and datasets for perceptual quality evaluation
  • Integration of large language models for quality explanation and assessment
  • Open-ended comparative assessments with natural language reasoning
  • Emerging applications of VQA in autonomous driving, surveillance, and smart cities
Posted in ATHENA | Comments Off on ICCV 2025 Workshop: Visual Quality Assessment Competition

Pattern Recognition Special Issue

Pattern Recognition Special Issue on

Advances in Multimodal-Driven Video Understanding and Assessment

Link to the Special Issue

The rapid growth of video content across various domains has led to an increasing demand for more intelligent and efficient video understanding and assessment techniques. This Special Issue focuses on the integration of multimodal information, such as audio, text, and sensor data, with video to enhance processing, analysis, and interpretation. Multimodal-driven approaches are crucial for numerous real-world applications, including automated surveillance, content recommendation, and healthcare diagnostics.

This Special Issue invites cutting-edge research on topics such as video capture, compression, transmission, enhancement, and quality assessment, alongside advancements in deep learning, multimodal fusion, and real-time processing frameworks. By exploring innovative methodologies and emerging applications, we aim to provide a comprehensive perspective on the latest developments in this dynamic and evolving field.

Topics of interest include but are not limited to:

  • Multimodal-driven video capture techniques
  • Video compression and efficient transmission for/using multimodal data
  • Deep learning-based video enhancement and super-resolution
  • Multimodal action and activity recognition
  • Audio-visual and text-video fusion methods
  • Video quality assessment with multimodal cues
  • Video captioning and summarization using multimodal data
  • Real-time multimodal video processing frameworks
  • Explainability and interpretability in multimodal video models
  • Applications in surveillance, healthcare, and autonomous systems

Guest editors:

Wei Zhou, PhD
Cardiff University, Cardiff, United Kingdom
Email: zhouw26@cardiff.ac.uk

Yakun Ju, PhD
University of Leicester, Leicester, United Kingdom
Email: yj174@leicester.ac.uk

Hadi Amirpour, PhD
University of Klagenfurt, Klagenfurt, Austria
Email: hadi.amirpour@aau.at

Bruce Lu, PhD
University of Western Australia, Perth, Australia
Email: bruce.lu@uwa.edu.au

Jun Liu, PhD
Lancaster University, Lancaster, United Kingdom
Email: j.liu81@lancaster.ac.uk

Important dates 

Submission Portal Open: April 04, 2025

Submission Deadline: October 30, 2025

Acceptance Deadline: May 30, 2026

Keywords:

Multimodal video analysis, video understanding, deep learning, video quality assessment, action recognition, real-time video processing, audio-visual learning, text-video processing

Posted in ATHENA | Comments Off on Pattern Recognition Special Issue

ACM TOMM: HTTP Adaptive Streaming: A Review on Current Advances and Future Challenges

HTTP Adaptive Streaming: A Review on Current Advances and Future Challenges

ACM Transactions on Multimedia Computing, Communications, and Applications

[PDF]

Christian Timmerer (AAU, AT), Hadi Amirpour (AAU, AT), Farzad Tashtarian (AAU, AT), Samira Afzal (AAU, AT), Amr Rizk (Leibniz University Hannover, DE), Michael Zink (University of Massachusetts Amherst, US), and Hermann Hellwagner (AAU, AT)

Abstract: Video streaming has evolved from push-based, broad-/multicasting approaches with dedicated hard-/software infrastructures to pull-based unicast schemes utilizing existing Web-based infrastructure to allow for better scalability. In this article, we provide an overview of the foundational principles of HTTP adaptive streaming (HAS), from video encoding to end user consumption, while focusing on the key advancements in adaptive bitrate algorithms, quality of experience (QoE), and energy efficiency. Furthermore, the article highlights the ongoing challenges of optimizing network infrastructure, minimizing latency, and managing the environmental impact of video streaming. Finally, future directions for HAS, including immersive media streaming and neural network-based video codecs, are discussed, positioning HAS at the forefront of next-generation video delivery technologies.

Keywords: HTTP Adaptive Streaming, HAS, DASH, Video Coding, Video Delivery, Video Consumption, Quality of Experience, QoE

Posted in ATHENA | Comments Off on ACM TOMM: HTTP Adaptive Streaming: A Review on Current Advances and Future Challenges

ICME 2025: Neural Representations for Scalable Video Coding

Neural Representations for Scalable Video Coding

IEEE International Conference on Multimedia & Expo (ICME) 2025

June 30 – July 4, 2025

Nantes, France

[PDF]

Yiying Wei (AAU, Austria), Hadi Amirpour (AAU, Austria) and Christian Timmerer (AAU, Austria)

Abstract: Scalable video coding encodes a video stream into multiple layers so that it can be decoded at different levels of quality/resolution, depending on the device’s capabilities or the available network bandwidth. Recent advances in implicit neural representation (INR)-based video codecs have shown competitive compression performance to both traditional and other learning-based methods. In INR approaches, a neural network is trained to overfit a video sequence, and its parameters are compressed to create a compact representation of the video content. While they achieve promising results, existing INR-based codecs require training separate networks for each resolution/quality of a video, making them challenging for scalable compression. In this paper, we propose Neural representations for Scalable Video Coding (NSVC) that encodes multi-resolution/-quality videos into a single neural network comprising multiple layers. The base layer (BL) of the neural network encodes video streams with the lowest resolution/quality. Enhancement layers (ELs) encode additional information that can be used to reconstruct a higher resolution/quality video during decoding using the BL as a starting point. This multi-layered structure allows the scalable bitstream to be truncated to adapt to the client’s bandwidth conditions or computational decoding requirements. Experimental results show that NSVC outperforms AVC’s Scalable Video Coding (SVC) extension and surpasses HEVC’s scalable extension (SHVC) in terms of VMAF. Additionally, NSVC achieves comparable decoding speeds at high resolutions/qualities.

Posted in ATHENA | Comments Off on ICME 2025: Neural Representations for Scalable Video Coding

IEEE TMM: VQM4HAS: A Real-time Quality Metric for HEVC Videos in HTTP Adaptive Streaming

VQM4HAS: A Real-time Quality Metric for HEVC Videos in HTTP Adaptive Streaming

IEEE Transactions on Multimedia

[PDF]

 Hadi Amirpour (AAU, AT), Jingwen Zhu (Nantes University, FR), Wei Zhu (Cardiff University, UK), Patrick Le Callet (Nantes University, FR), and Christian Timmerer (AAU, AT)

Abstract: In HTTP Adaptive Streaming (HAS), a video is encoded at various bitrate-resolution pairs, collectively known as the bitrate ladder, allowing users to select the most suitable representation based on their network conditions. Optimizing this set of pairs to enhance the Quality of Experience (QoE) requires accurately measuring the quality of these representations. VMAF and ITU-T’s P.1204.3 are highly reliable metrics for assessing the quality of representations in HAS. However, in practice, using these metrics for optimization is often impractical for live streaming applications due to their high computational costs and the large number of bitrate-resolution pairs in the bitrate ladder that need to be evaluated. To address their high complexity, our paper introduces a new method called VQM4HAS, which extracts low-complexity features including (i) video complexity features, (ii) frame-level encoding statistics logged during the encoding process, and (iii) lightweight video quality metrics. These extracted features are then fed into a regression model to predict VMAF and P.1204.3, respectively.
The VQM4HAS model is designed to operate on a per bitrate-resolution pair, per-resolution, and cross-representation basis, optimizing quality predictions across different HAS scenarios.
Our experimental results demonstrate that VQM4HAS achieves a high correlation with VMAF and P.1204.3, with Pearson correlation coefficients (PCC) ranging from 0.95 to 0.96 for VMAF and 0.97 to 0.99 for P.1204.3, depending on the resolution. Despite achieving a high correlation with VMAF and P.1204.3, VQM4HAS exhibits significantly less complexity than both metrics, with 98% and 99% less complexity for VMAF and P.1204.3, respectively, making it suitable for live streaming scenarios.
We also conduct a feature importance analysis to further reduce the complexity of the proposed method.
Furthermore, we evaluate the effectiveness of our method by using it to predict subjective quality scores. The results show that VQM4HAS achieves a higher correlation with subjective scores at various resolutions, despite its minimal complexity.

 

Posted in ATHENA | Comments Off on IEEE TMM: VQM4HAS: A Real-time Quality Metric for HEVC Videos in HTTP Adaptive Streaming

ACM MM’25 Tutorial: Perceptually Inspired Visual Quality Assessment in Multimedia Communication

ACM MM 2025
October 27, 2025

Dublin, Ireland

https://acmmm2025.org/tutorial/

Tutorial speakers:

  • Wei Zhou (Cardiff University)
  • Hadi Amirpour (University of Klagenfurt)

Tutorial description:

As multimedia services like video streaming, video conferencing, virtual reality (VR), and online gaming continue to expand, ensuring high perceptual quality becomes a priority for maintaining user satisfaction and competitiveness. However, during acquisition, compression, transmission, and storage, multimedia content undergoes various distortions, causing degradation in experienced quality. Thus, perceptual quality assessment, which focuses on evaluating the quality of multimedia content based on human perception, is essential for optimizing user experiences in advanced communication systems. Several challenges are involved in the quality assessment process, including diverse characteristics of multimedia content such as image, video, VR, point cloud, mesh, multimodality, etc., and complex distortion scenarios as well as viewing conditions. The tutorial first presents a detailed overview of principles and methods for perceptually inspired visual quality assessment. This includes both subjective methods, where users directly rate their experience, and objective methods, where algorithms predict human perception based on measurable factors such as bitrate, frame rate, and compression levels. Based on the basics of perceptually inspired visual quality assessment, metrics for different multimedia data are then introduced. Apart from the traditional image and video, immersive multimedia and AI-generated content will also be involved.

Posted in ATHENA | Comments Off on ACM MM’25 Tutorial: Perceptually Inspired Visual Quality Assessment in Multimedia Communication

End-to-End Learning-based Video Streaming Enhancement Pipeline: A Generative AI Approach

ACM 35th Workshop on Network and Operating System Support for Digital Audio and Video (NOSSDAV’25)

31 March – 3 April 2025 | Stellenbosch, South Africa

[PDF]

Emanuele Artioli (Alpen-Adria Universität Klagenfurt, Austria), Farzad Tashtarian (Alpen-Adria Universität Klagenfurt, Austria), Christian Timmerer (Alpen-Adria Universität Klagenfurt, Austria)

Abstract: The primary challenge of video streaming is to balance high video quality with smooth playback. Traditional codecs are well tuned for this trade-off, yet their inability to use context means they must encode the entire video data and transmit it to the client.
This paper introduces ELVIS (End-to-end Learning-based Video Streaming Enhancement Pipeline), an end-to-end architecture that combines server-side encoding optimizations with client-side generative in-painting to remove and reconstruct redundant video data. Its modular design allows ELVIS to integrate different codecs, in-painting models, and quality metrics, making it adaptable to future innovations.
Our results show that current technologies achieve improvements of up to 11 VMAF points over baseline benchmarks, though challenges remain for real-time applications due to computational demands. ELVIS represents a foundational step toward incorporating generative AI into video streaming pipelines, enabling higher quality experiences without increased bandwidth requirements.
By leveraging generative AI, we aim to develop a client-side tool, to incorporate in a dedicated video streaming player, that combines the accessibility of multilingual dubbing with the authenticity of the original speaker’s performance, effectively allowing a single actor to deliver their voice in any language. To the best of our knowledge, no current streaming system can capture the speaker’s unique voice or emotional tone.

Index Terms— HTTP adaptive streaming, Generative AI, End-to-end architecture, Quality of Experience.

Posted in ATHENA | Comments Off on End-to-End Learning-based Video Streaming Enhancement Pipeline: A Generative AI Approach