Dual-guided Generative Frame Interpolation

Dual-guided Generative Frame Interpolation

2026 IEEE International Conference on Acoustics, Speech, and Signal Processing (IEEE ICASSP 2026)

4 – 8 May, 2026

Barcelona, Spain

[PDF]

Yiying Wei (AAU, Austria), Hadi Amirpour (AAU, Austria) and Christian Timmerer (AAU, Austria)

Abstract: Video frame interpolation (VFI) aims to generate intermediate frames between given keyframes to enhance temporal resolution and visual smoothness. While conventional optical flow–based methods and recent generative approaches achieve promising results, they often struggle with large displacements, failing to maintain temporal coherence and semantic consistency. In this work, we propose dual-guided generative frame interpolation (DGFI), a framework that integrates semantic guidance from vision-language models and flow guidance into a pre-trained diffusion-based image-to-video (I2V) generator. Specifically, DGFI extracts textual descriptions and injects multimodal embeddings to capture high-level semantics, while estimated motion guidance provides smooth transitions. Experiments on public datasets demonstrate the effectiveness of our dual-guided method over the state-of-the-art approaches.

Posted in ATHENA | Comments Off on Dual-guided Generative Frame Interpolation

IEEE ICME Workshop on “Physical Principles for Reliable 3D Modelling in Multimedia (P3DMM)”

The IEEE ICME Workshop on

Physical Principles for Reliable 3D Modelling in Multimedia (P3DMM)

July 5 to July 9, 2026, Bangkok, Thailand

CFP

Reliable 3D modelling is a foundational capability for many multimedia applications, yet achieving metrically accurate and physically meaningful 3D representations in real-world environments remains challenging. Variations in illumination, material properties, motion, sensor configurations, and environmental conditions often undermine the robustness and interpretability of purely data-driven approaches. This workshop focuses on advancing physically informed and physically interpretable 3D modelling, learning, and perception methods that explicitly incorporate physical principles to improve reliability, consistency, and trustworthiness across multimedia scenarios. This workshop aims to provide a unified forum for discussing how such physical principles can be systematically embedded into modern learning frameworks, including neural fields, radiance models, and multimodal foundation models.

In addition to physics-guided methods, the workshop welcomes contributions that integrate physical priors with diverse multimedia and sensing modalities, such as RGB-D, multi-view and video data, IMU and robotic kinematics, force and tactile sensing, acoustic measurements, and spectral or hyperspectral imaging. Particular emphasis is placed on methods that enhance physical consistency, interpretability, and measurement fidelity, enabling reliable 3D modelling for applications such as digital twins, intelligent manufacturing, robotics, biomedical imaging, and computational multimedia systems.

Call for Papers

We invite original submissions that address challenges and advances across the full spectrum of Physical Principles 3D Modelling. Topics of interest include, but are not limited to:

  • Learning-based 3D modelling with physical principles
  • Physics-coherent neural fields and radiance models
  • Shape, lighting and material decomposition with physical consistency
  • Modelling contact, collision and rigid/deformable body behaviour
  • Data-driven methods enriched by physical cues
  • Reliable 3D modelling in complex multimedia environments
  • Physical cues for digital twins and manufacturing
  • Multimedia applications requiring physically interpretable 3D models
  • Datasets, metrics and evaluations for physics-informed 3D modelling

Submission Guidance: Submit via CMT

Download CFP (PDF): Click here to download

Important Registration Note: All accepted papers need to be covered by a full registration.

 

Organizers

Posted in ATHENA | Comments Off on IEEE ICME Workshop on “Physical Principles for Reliable 3D Modelling in Multimedia (P3DMM)”

ELLMPEG: An Edge-based Agentic LLM Video Processing Tool

ELLMPEG: An Edge-based Agentic LLM Video Processing Tool

            The 17th ACM Multimedia System Conference (MMSys’26)

Hong Kong SAR

4th – 8th April 2026

Zoha Azimi, Reza Farahani, Radu Prodan, Christian Timmerer

Abstract: Large language models (LLMs), the foundation of generative AI systems like ChatGPT, are transforming many fields and applications, including multimedia, enabling more advanced content generation, analysis, and interaction. However, cloud-based LLM deployments face three key limitations: high computational and energy demands, privacy and reliability risks from remote processing, and recurring API costs. Recent advances in agentic AI, especially in structured reasoning and tool use, offer a better way to exploit open and locally deployed tools and LLM models. This paper presents ELLMPEG, an
edge-enabled agentic LLM framework for the automated generation of video-processing commands. ELLMPEG integrates tool-aware Retrieval-Augmented Generation (RAG) with iterative self-reflection to produce and locally verify executable FFmpeg and VVenC com-
mands directly at the edge, eliminating reliance on external cloud APIs. To evaluate ELLMPEG, we collect a dedicated prompt dataset comprising 480 diverse queries covering different categories of FFmpeg and the Versatile Video Codec (VVC) encoder (VVenC) com-
mands. We validate command generation accuracy and evaluate four open-source LLMs based on command validity, tokens generated per second, inference time, and energy efficiency. We also execute the generated commands to assess their runtime correctness and practical applicability. Experimental results show that Qwen2.5, when augmented with the ELLMPEG framework, achieves an average command-generation accuracy of 78 % with zero recurring API cost, outperforming all other open-source models across both the FFmpeg and VVenC datasets.

Posted in ATHENA | Comments Off on ELLMPEG: An Edge-based Agentic LLM Video Processing Tool

Residual U-Network: 3D Point Cloud-Based Automotive Pressure Field Prediction Model

Residual U-Network: 3D Point Cloud-Based Automotive Pressure Field Prediction Model

18th International Congress on Image and Signal Processing, BioMedical Engineering, and Informatics (CISP-BMEI 2025)
October 25 – 27, 2025
Qingdao, China
http://www.cisp-bmei.cn/

[PDF]

Hezhi Li, Hongyou Chen, Lingfeng Qu, Baodan Tian, Yong Fan, Hadi Amirpour, and Christian Timmerer

Abstract: Automotive surface pressure field prediction is important for design optimization and performance evaluation of vehicle aerodynamics, fuel efficiency, and automotive safety. Although traditional computational fluid dynamics methods are accurate, they incur high computational costs and are time-consuming. Most existing deep learning methods show limitations in learning pressure variation features near complex geometric shapes of automotive exteriors. To address these issues, this paper proposes a deep learning method based on a hybrid architecture combining Residual Network (ResNet) and U-Network (UNet). The method processes 3D point cloud representations of automotive geometries by converting them into structured grid formats with signed distance function values for efficient neural network processing. The method improves the model’s predictive capability for complex geometric regions by integrating the Convolutional Block Attention Module (CBAM) attention mechanism. In the model, the Residual Convolutional Block Attention Module (ResCBAM) combines residual connections with channel and spatial attention mechanisms to improve perception of key pressure field features. The Decoder Convolutional Block Attention Module (DeCBAM) fuses multi-scale feature information in the decoder pathway, recovering neural network feature details. The feature fusion module integrates global flow field distribution features extracted by the encoder with local geometric detail features reconstructed by the decoder. Additionally, an automated hyperparameter optimization strategy is employed to improve the model’s prediction accuracy and generalization capability. To validate model performance, experiments are conducted on three automotive surface pressure datasets. Experimental results demonstrate that the proposed model achieves better prediction accuracy and generalization capability.

Posted in ATHENA | Comments Off on Residual U-Network: 3D Point Cloud-Based Automotive Pressure Field Prediction Model

ICIP’26 Special Session: Generative Visual Coding: Emerging Paradigms for Future Communication

IEEE ICIP 2026

IEEE International Conference on Image Processing (ICIP) 2026

Special Session: Generative Visual Coding: Emerging Paradigms for Future Communication

https://floatbutterfly.github.io/ICIP2026-special-session-GVC/

Generative Visual Coding (GVC) is an emerging paradigm that explores how generative models and structured visual representations can redefine visual communication. By integrating generative capabilities into the coding process, GVC enables new forms of representation, transmission, and reconstruction that enhance perceptual and semantic fidelity while improving communication efficiency. Beyond human-centric reconstruction, GVC supports machine- and task-oriented communication, where compact and semantically meaningful representations benefit downstream analysis and decision-making.

The paradigm also motivates theoretical study on how generative priors interact with information constraints, optimization objectives, and emerging concepts in semantic communication. As generative processes gain prominence, principled evaluation becomes increasingly essential, encouraging advances in quality assessment, distortion modeling, and the development of benchmark datasets for generative and hybrid codec systems. Efficiency remains central to deployment, underscoring the importance of model design, complexity optimization, and computational scalability.

GVC further extends to immersive and spatial communication, including three-dimensional and scene-level content. In these settings, generative models can infer geometry, semantics, and contextual relationships, enabling new modes of multi-view and interactive media delivery. Overall, GVC offers a unified framework that integrates generative modeling, visual coding, and intelligent communication, laying the groundwork for next-generation visual communication systems.

Scope / Topics

  • Generative foundation models, methodologies, frameworks, and analytical perspectives for visual coding and task-oriented communication
  • Theoretical modeling and rate–distortion perspectives for generative and semantic visual communication
  • Evaluation frameworks, quality assessment, and benchmark datasets for generative coding systems
  • Complexity optimization for generative visual communication
  • Generative coding use cases (e.g., Generative Face Video Coding)
  • Generative visual communication for immersive, three-dimensional, and spatially aware media

Submission: Submission Website

  • Paper Format: up to 5 pages + 1 page of only references (see Author Kit)
  • Topic Selection: When submitting, select Special Session ‘Generative Visual Coding: Emerging Paradigms for Future Communication’ as well as up to two additional regular topics (Step 5)

 Important Dates

  • Special Session Submission Opens: January 7, 2026
  • Paper Submission Deadline: February 4, 2026 (Extended)
  • Notification of Acceptance: April 22, 2026
  • Camera-Ready Paper Due: May 13, 2026

Organizers

  • Jianhui Chang, China Telecom Cloud Computing Research Institute
  • Hadi Amirpour, University of Klagenfurt
  • Giuseppe Valenzise, Université Paris-Saclay
Posted in ATHENA | Comments Off on ICIP’26 Special Session: Generative Visual Coding: Emerging Paradigms for Future Communication

ICIP’26 Special Session: Visual Information Processing for Human-centered Immersive Experiences

IEEE ICIP 2026

IEEE International Conference on Image Processing (ICIP) 2026

Special Session: Visual Information Processing for Human-centered Immersive Experiences

https://medialab.dei.unipd.it/special-session-icip-2026/

Immersive systems such as Virtual and Extended Reality are becoming widespread thanks to the wide diffusion of relatively low-cost headsets and the increased immersivity and sense of presence they provide with respect to their 2D counterparts. However, the novelty of the involved technologies as well as the variety of available media types, together with the high number of applications, entail endless challenges for the research community. One key feature of immersive systems is that they inherently place users at the center of the experience, allowing them to actively explore, manipulate, and interact with content. As a result, immersive systems introduce new perceptual, behavioral, and interaction aspects that require dedicated investigation. This special session focuses on the role of visual information processing in enabling human-centered immersive experiences, providing complementary insights into how visual information plays a critical role in enhancing effectiveness, comfort, usability, and perceptual quality in next-generation immersive applications.

Topics of interest

  • Visual attention mechanisms
  • Perceptual modelling
  • Emerging media formats (stereoscopic and omnidirectional imagery, light fields, point clouds, meshes, and Gaussian splats)
  • Multimodal immersive applications
  • Quality of Experience

Submission instructions

  • Submission sitehttps://icip2026.exordo.com/
  • Topic selection: when submitting your paper, you will be able to find the accepted special sessions as part of the list of topics (Step 5). Please make sure to select the Special Session ‘Visual information processing for human-centered immersive experiences’ as well as up to two additional regular topics, to assist in the review process and for program-building purposes.
  • Format: up to 5 pages + 1 page for references only (refer to the Author Kit)
  • Conference websitehttps://2026.ieeeicip.org/
  • Please note: special session papers will undergo the same rigorous peer-review process as regular papers.

Important dates and deadlines

  • Submission deadline: February 4, 2026 (AoE)
  • Notification of Acceptance: April 22, 2026
  • Camera-Ready Paper Due: May 13, 2026
  • Conference: 13-17 September 2026, Tampere, Finland

Organizers and contacts

  • Sara Baldoni, University of Padova 
  • Hadi Amirpour, University of Klagenfurt
Posted in ATHENA | Comments Off on ICIP’26 Special Session: Visual Information Processing for Human-centered Immersive Experiences

Patent Approval for “Video encoding complexity predictor”

Video encoding complexity predictor

US Patent

[PDF]

Vignesh Menon (Alpen-Adria-Universität Klagenfurt, Austria), Hadi Amirpour (Alpen-Adria-Universität Klagenfurt, Austria), and Christian Timmerer (Alpen-Adria-Universität Klagenfurt, Austria)

 

Abstract: Techniques for predicting video encoding complexity are described herein. A method for predicting video encoding complexity includes performing video complexity feature extraction on a video segment to extract low-complexity frame-based features, predicting video encoding complexity for the video segment using the low-complexity frame-based features, and outputting a predicted encoding bitrate and a predicted encoding time. An embodiment may include implementing a hybrid model using a CNN, wherein a latent vector from a frame of the video segment is extracted and also may be used to predict video encoding complexity. The predicted encoding bitrates and encoding times may be provided to encoding infrastructure for use in optimizing a schedule of encodings.

Posted in ATHENA | Comments Off on Patent Approval for “Video encoding complexity predictor”