Visual Communication in the Age of AI: VCIP 2025 Highlights from Klagenfurt

VCIP 2025 in Klagenfurt: Advancing Sustainable and Trustworthy Visual Communications in the Age of AI

From December 1–4, 2025, the Department of Information Technology (ITEC) at the University of Klagenfurt (Austria) hosted the International Conference on Visual Communications and Image Processing (VCIP 2025), welcoming an international community of researchers, practitioners, and industry experts to discuss the future of visual communication and image processing. Under the theme “Sustainable and Trustworthy Visual Communications in the Age of AI,” the conference addressed some of the most pressing challenges and opportunities facing the field today.

A forum for cutting-edge research and dialogue

VCIP 2025 continued the long-standing tradition of the conference as a premier venue for both foundational and applied research. Over four days, the program offered a diverse range of tutorials, overview talks, keynotes, oral and poster sessions, and interactive formats. Discussions ranged from adaptive and low-latency streaming, source coding, and compressed-domain processing to volumetric media, computational vision, quality assessment, and AI-driven restoration and enhancement techniques.

Inspiring keynotes on trust, sustainability, and clarity

A particular highlight of VCIP 2025 were the three keynote talks, which set the tone for the conference by connecting technical innovation with broader societal concerns. The speakers addressed trustworthy multimedia communication in the era of AI-generated content, the environmental impact and sustainability of visual technologies, and the role of visual analytics in making complex data understandable across time and space. Together, the keynotes sparked lively discussions that extended well beyond the conference halls.

Tutorials, overview talks, and emerging topics

Four half-day tutorials provided in-depth insights into current and emerging technologies, including generative face video coding for ultra-low bitrate communication, JPEG AI standardization and implementation, the convergence of low-level image processing and generative AI, and the past, present, and future of volumetric video. Complementing these, overview talks offered broader perspectives on emotion and quality estimation, AI-enabled video streaming efficiency, 3D scene capture and compression, and the use of large vision–language models for visual quality assessment.

Supporting early-career researchers and innovation

VCIP 2025 placed strong emphasis on nurturing the next generation of researchers. The doctoral symposium and the VSPC Rising Star session provided a platform for early-career scientists to present their work and engage with senior experts. Demo, open-source, and dataset sessions further highlighted the practical impact of research, showcasing tools, prototypes, and resources that bridge theory and application.

Exchange beyond the technical program

In addition to the scientific sessions, the conference offered a vibrant social program, including a welcome reception, a Glühwein gathering, a conference banquet, and a closing ceremony with awards. These events fostered informal exchange, strengthened international collaboration, and contributed to the open and collegial atmosphere that characterizes VCIP.

Best paper award: “AlignGS: Aligning Geometry and Semantics for Robust Indoor Reconstruction from Sparse Views”, Yijie Gao, Houqiang Zhong, Tianchi Zhu, Li Song, Zhengxue Cheng, Qiang Hu (Shanghai Jiao Tong University)

Rising Star: Heming Sun (Yokohama National University), “Traditional and Learned Image and Video Coding: From Algorithms to Implementations”

A successful event for Klagenfurt and the department/university

Hosting VCIP 2025 marked an important milestone for the Department of Information Technology (ITEC) at the University of Klagenfurt, reinforcing its international visibility in the fields of visual computing, multimedia systems, and artificial intelligence. By bringing together experts from academia and industry and opening selected sessions to the wider university community, the conference created valuable opportunities for interdisciplinary exchange and long-term collaboration.

VCIP 2025 concluded with a strong sense of momentum, underscoring the importance of responsible, transparent, and sustainable approaches to visual communication in an AI-driven world. The discussions and connections formed in Klagenfurt will continue to shape research and innovation in the field well beyond the conference itself.

Following a successful edition in Klagenfurt, VCIP 2026 will take place in Singapore from December 13–16, 2026, focusing on “Visual Communications and Image Processing at the Frontiers of Generative and Perceptual AI” (https://vcip-2026.org/).

Posted in ATHENA, News | Comments Off on Visual Communication in the Age of AI: VCIP 2025 Highlights from Klagenfurt

Farzad Tashtarian completed his habilitation on Network-Assisted Adaptive Streaming: Toward Optimal QoE through System Collaboration

Network-Assisted Adaptive Streaming: Toward Optimal QoE through System Collaboration

Farzad Tashtarian

Habilitation

On 14.11.2025, Farzad Tashtarian defended his habilitation “Network-Assisted Adaptive Streaming: Toward Optimal QoE through System Collaboration” at the University of Klagenfurt, Austria.

Abstract: Providing seamless, low-latency, and energy-efficient video streaming experiences remains an ongoing challenge as content delivery infrastructures evolve to support higher resolutions, immersive formats, and heterogeneous networks. This talk explores an end-to-end perspective on network-assisted adaptive streaming, where close coordination between the player, network, and edge/cloud components enables data-driven and context-aware optimization. It will discuss adaptive bitrate algorithm design, cost- and delay-conscious edge transcoding, and multi-objective optimization across the streaming pipeline. Emerging AI-based methods—such as reinforcement learning, generative modeling, and large language model (LLM) orchestration—will be highlighted as key enablers for intelligent and self-adjusting video delivery. The talk concludes with a discussion of open challenges, scalability, and future research directions toward resilient, efficient, and user-centric streaming infrastructures.

Committee members: Prof. Martin Pinzger (Chairperson), Prof. Oliver Hohlfeld (external member), Prof. Bernhard Rinner, Prof. Angelika Wiegele, Prof. Chitchanok Chuengsatiansup, MSc Zoha Azimi Ourimi, Dr. Alice Tarzariol, Kateryna Taranov, and Gregor Lammer

Posted in ATHENA | Comments Off on Farzad Tashtarian completed his habilitation on Network-Assisted Adaptive Streaming: Toward Optimal QoE through System Collaboration

Interactive Stereoscopic Videos On Head-mounted Displays

Interactive Stereoscopic Videos On Head-mounted Displays

IEEE VCIP 2025

December 1 – December 4, 2025

Klagenfurt, Austria

[PDF]

Afshin Gholami(AAU, Austria), Hadi Amirpour (AAU, Austria), Christian Timmerer (AAU, Austria)

Abstract: This paper presents ISV-Demo, a novel system for delivering interactive stereoscopic video experiences on head-mounted displays in spatial environments. Unlike traditional flat or real-time VR media, our approach leverages pre-rendered, high-fidelity 3D video to enable immersive, cinematic storytelling enhanced by user agency. Viewers influence narrative direction by making real-time decisions at branching points within the stereoscopic scene. We propose two interaction models: a timeline-based model with embedded prompts, and a loop-based segmented model offering flexible timing and decision persistence. These models define a new paradigm for authored cinematic interaction in extended reality, addressing the gap between passive 3D video and dynamic VR content.

Posted in ATHENA | Comments Off on Interactive Stereoscopic Videos On Head-mounted Displays

STACK: Spatial Tower Assembly using Controlled Kinetics

Perceptual JND Prediction for VMAF Using Content-Adaptive Dual-Path Attention

IEEE VCIP 2025

December 1 – December 4, 2025

Klagenfurt, Austria

[PDF]

Milad Ghanbari (AAU, Austria), Hadi Amirpour (AAU, Austria), Christian Timmerer (AAU, Austria), M.H. Izadimehr (AAU, Austria), Wei Zhou (Cardiff University, UK), Cosmin Stejerean (Meta, US)

Abstract: This paper presents a block stacking simulation developed for Apple Vision Pro (AVP) using Unity’s PolySpatial framework, designed to study both depth perception in spatial computing and physics comprehension of user-driven kinetic controls in augmented reality (AR). The simulation offers two interactive modes: a tower assembly mode and a removal mode. Each game session includes four stages with the virtual table positioned at various distances to observe user adaptation across varying virtual depths. User input is captured through eye tracking and hand tracking, and block behavior is handled by real-time physics simulation, which includes collision response, gravity, and mass-based interactions. The system supports two physics configurations: raw Unity physics and a modified variant with adjusted material and rigidbody parameters for improved stability and realism. It utilizes spatial computing features such as world anchoring to preserve spatial consistency and depth perception through stereoscopic rendering and dynamic shadows, so that users can better judge the spatial coordinates between virtual blocks and their physical surroundings. The simulation is intended to evaluate how 3D spatial rendering and physically realistic interactions contribute to immersion and task performance in AR environments. To assess user performance, the system records key interaction metrics to support analysis of learning progression, control accuracy, and adaptability across varying distances and physics configurations. This work contributes to the understanding of spatial and physics-based interaction design in AR and may inform future applications in education, simulation, and spatial gaming.

Posted in ATHENA | Comments Off on STACK: Spatial Tower Assembly using Controlled Kinetics

Perceptual JND Prediction for VMAF Using Content-Adaptive Dual-Path Attention

Perceptual JND Prediction for VMAF Using Content-Adaptive Dual-Path Attention

IEEE VCIP 2025

December 1 – December 4, 2025

Klagenfurt, Austria

[PDF]

MohammadAli Hamidi (University of Cagliari, Italy),  Hadi Amirpour (AAU, Austria), Christian Timmerer (AAU, Austria), Luigi Atzori (University of Cagliari, Italy)

Abstract: Just Noticeable Difference (JND) thresholds, particularly for quality metrics such as Video Multimethod Assessment Fusion (VMAF), are critical in streaming, helping identify when quality changes become perceptible and reducing redundant bitrate representations. The Satisfied User Ratio (SUR) complements JND by quantifying the percentage of users who do not perceive a difference, offering practical guidance for perceptually optimized streaming. This paper proposes a novel two-branch deep neural network (DNN) for predicting the 75% SUR for VMAF, the encoding level where 75% of viewers cannot perceive degradation. The framework combines handcrafted features (e.g., spatial and temporal indicators such as SI, TI, etc. and deep learning-based (DL-based) representations extracted via a convolutional neural network (CNN) backbone. The DL-based branch employs a spatio-temporal attention mechanism and a Long Short-Term Memory (LSTM) to capture temporal dynamics, while the handcrafted branch encodes interpretable indicators through a fully connected layer. Both outputs are fused and passed through a lightweight Multilayer Perceptron (MLP) to predict 75% SUR. To improve robustness to noise and label uncertainty, the model is trained using the Smooth-L1 loss. Experiments on the VideoSet dataset show our method outperforms SOTA across all metrics, achieving a notably higher R² score (0.46 vs. 0.36), indicating improved prediction reliability and low computational complexity, making it suitable for real-time video streaming.

Posted in ATHENA | Comments Off on Perceptual JND Prediction for VMAF Using Content-Adaptive Dual-Path Attention

Is there a relationship between Mean Opinion Score (MOS) and Just Noticeable Difference (JND)?

Is there a relationship between Mean Opinion Score (MOS) and Just Noticeable Difference (JND)?

IEEE VCIP 2025

December 1 – December 4, 2025

Klagenfurt, Austria

[PDF]

Jingwen Zhu (Nantes Université, France),  Hadi Amirpour (AAU, Austria), Wei Zhou (Cardiff, UK), Patrick Le Callet (Nantes Université, France)

Abstract: Evaluating perceived video quality is essential for ensuring high Quality of Experience (QoE) in modern streaming applications. While existing subjective datasets and Video Quality Metrics (VQMs) cover a broad quality range, many practical use cases—especially for premium users—focus on high-quality scenarios requiring finer granularity. Just Noticeable Difference (JND) has emerged as a key concept for modeling perceptual thresholds in these high-end regions and plays an important role in perceptual bitrate ladder construction. However, the relationship between JND and the more widely used Mean Opinion Score (MOS) remains unclear. In this paper, we conduct a Degradation Category Rating (DCR) subjective study based on an existing JND dataset to examine how MOS corresponds to the 75% Satisfied User Ratio (SUR) points of the 1st and 2nd JNDs. We find that while MOS values at JND points generally align with theoretical expectations (e.g., 4.75 for the 75% SUR of the 1st JND), the reverse mapping—from MOS to JND—is ambiguous due to overlapping confidence intervals across PVS indices. Statistical significance analysis further shows that DCR studies with limited participants may not detect meaningful differences between reference and JND videos.

Posted in ATHENA | Comments Off on Is there a relationship between Mean Opinion Score (MOS) and Just Noticeable Difference (JND)?

Patent Approval for “Efficient two-pass encoding scheme for adaptive live streaming”

Efficient two-pass encoding scheme for adaptive live streaming

US Patent

[PDF]

Vignesh Menon (Alpen-Adria-Universität Klagenfurt, Austria), Hadi Amirpour (Alpen-Adria-Universität Klagenfurt, Austria), and Christian Timmerer (Alpen-Adria-Universität Klagenfurt, Austria)

 

Abstract: Techniques for efficient two-pass encoding for live streaming are described herein. A method for efficient two-pass encoding may include extracting low-complexity features of a video segment, predicting an optimized constant rate factor (CRF) for the video segment using the low-complexity features, and encoding the video segment with the optimized CRF at a target bitrate. A system for efficient two-pass encoding may include a feature extraction module configured to extract low-complexity features from a video segment, a neural network configured to predict an optimized CRF as a function of the low-complexity features and a target bitrate, and an encoder configured to encode the video segment using the optimized CRF at the target bitrate.

Posted in ATHENA | Comments Off on Patent Approval for “Efficient two-pass encoding scheme for adaptive live streaming”