Generative AI for Realistic Voice Dubbing Across Languages

Generative AI for Realistic Voice Dubbing Across Languages

ACM 4th Mile-High Video Conference (MHV’25)

18–20 February 2025 | Denver, CO, USA

[PDF]

Emanuele Artioli (Alpen-Adria Universität Klagenfurt, Austria), Daniele Lorenzi (Alpen-Adria Universität Klagenfurt, Austria), Farzad Tashtarian (Alpen-Adria Universität Klagenfurt, Austria), Christian Timmerer (Alpen-Adria Universität Klagenfurt, Austria)

Abstract: The demand for accessible, multilingual video content has grown significantly with the global rise of streaming platforms, social media, and online learning. The traditional solutions for making content accessible across languages include subtitles, even generated ones, as YouTube offers, and synthesizing voiceovers, offered, for example, by the Yandex Browser. Subtitles are cost-effective and reflect the original voice of the speaker, which is often essential for authenticity. However, they require viewers to divide their attention between reading text and watching visuals, which can diminish engagement, especially for highly visual content. Synthesized voiceovers, on the other hand, eliminate this need by providing an auditory translation. Still, they typically lack the emotional depth and unique vocal characteristics of the original speaker, which can affect the viewing experience and disconnect audiences from the intended pathos of the content. A straightforward solution would involve having the original actor “perform” in every language, thereby preserving the traits that define their character or narration style. However, recording actors in multiple languages is impractical, time-intensive, and expensive, especially for widely distributed media.

By leveraging generative AI, we aim to develop a client-side tool, to incorporate in a dedicated video streaming player, that combines the accessibility of multilingual dubbing with the authenticity of the original speaker’s performance, effectively allowing a single actor to deliver their voice in any language. To the best of our knowledge, no current streaming system can capture the speaker’s unique voice or emotional tone.

Index Terms— HTTP adaptive streaming, Generative AI, Audio.

Posted in ATHENA | Comments Off on Generative AI for Realistic Voice Dubbing Across Languages

Adaptive Quality and Energy Enhancement in Video Streaming with RecABR

Adaptive Quality and Energy Enhancement in Video Streaming with RecABR

ACM 4th Mile-High Video Conference (MHV’25)

18–20 February 2025 | Denver, CO, USA

[PDF]

Daniele Lorenzi (Alpen-Adria Universität Klagenfurt, Austria), Farzad Tashtarian (Alpen-Adria Universität Klagenfurt, Austria), Christian Timmerer (Alpen-Adria Universität Klagenfurt, Austria)

Abstract: HTTP Adaptive Streaming (HAS) dominates video delivery but faces sustainability issues due to its energy demands. Current adaptive bitrate (ABR) algorithms prioritize quality, neglecting the energy costs of higher bitrates. Super-resolution (SR) can enhance quality but increases energy use, especially for GPU-equipped devices in competitive networks. RecABR addresses these challenges by clustering clients based on device attributes (e.g., GPU, resolution) and optimizing parameters via linear programming. This reduces computational overhead and ensures energy-efficient, quality-aware recommendations. Using metrics like VMAF and compressed SR models, RecABR minimizes storage and processing costs, making it scalable for CDN edge deployment.

Index Terms— QoE, HAS, Super-resolution, Energy.

Posted in ATHENA | Comments Off on Adaptive Quality and Energy Enhancement in Video Streaming with RecABR

Patent Approval for “Scalable Per-Title Encoding”

Scalable Per-Title Encoding 

US Patent

[PDF]

Hadi Amirpour (Alpen-Adria-Universität Klagenfurt, Austria) and Christian Timmerer (Alpen-Adria-Universität Klagenfurt, Austria)

 

Abstract: A scalable per-title encoding technique may include detecting scene cuts in an input video received by an encoding network or system, generating segments of the input video, performing per-title encoding of a segment of the input video, training a deep neural network (DNN) for each representation of the segment, thereby generating a trained DNN, compressing the trained DNN, thereby generating a compressed trained DNN, and generating an enhanced bitrate ladder including metadata comprising the compressed trained DNN. In some embodiments, the method also may include generating a base layer bitrate ladder for CPU devices, and providing the enhanced bitrate ladder for GPU-available devices.

Posted in ATHENA | Comments Off on Patent Approval for “Scalable Per-Title Encoding”

In-Band Quality Notification from Users to ISPs

In-Band Quality Notification from Users to ISPs

IEEE 13th International Conference on Cloud Networking (CloudNet)

27–29 November 2024 // Rio de Janeiro, Brazil

[PDF]

Leonardo Peroni (UC3M, Spain); Sergey Gorinsky (IMDEA Networks Institute, Spain); Farzad Tashtarian (Alpen-Adria Universität Klagenfurt, Austria)

Abstract: While ISPs (Internet service providers) strive to improve QoE (quality of experience) for end users, end-to-end traffic encryption by OTT (over-the-top) providers undermines independent inference of QoE by an ISP. Due to the economic and technological complexity of the modern Internet, ISP-side QoE inference based on OTT assistance or out-of-band signaling sees low adoption. This paper presents IQN (in-band quality notification), a novel mechanism for signaling QoE impairments from an automated agent on the end-user device to the server-to-client ISP responsible for QoE-impairing congestion. Compatible with multi-ISP paths, asymmetric routing, and other Internet realities, IQN does not require OTT support and induces the OTT server to emit distinctive packet patterns that encode QoE information, enabling ISPs to infer QoE by monitoring these patterns in network traffic. We develop a prototype system, YouStall, which applies IQN signaling to ISP-side inference of YouTube stalls.
Cloud-based experiments with YouStall on YouTube Live streams validate IQN’s feasibility and effectiveness, demonstrating its potential for accurate user-assisted ISP-side QoE inference from encrypted traffic in real Internet environments.

Index Terms— End user, QoE, ISP, QoE-impairment inference, OTT provider, end-to-end traffic encryption, in-band signaling.

Posted in ATHENA | Comments Off on In-Band Quality Notification from Users to ISPs

Characterizing the Geometric Complexity of G-PCC Compressed Point Clouds

Characterizing the Geometric Complexity of G-PCC Compressed Point Clouds

IEEE Visual Communications and Image Processing (IEEE VCIP 2024)

Tokyo, Japan, December 8-11, 2024

[PDF]

Annalisa Gallina (UNIPD, Italy), Hadi Amirpour (AAU, Austria), Sara Baldoni (UNIPD, Italy), Giuseppe Valenzise (UPSaclay, France), Federica Battisti (UNIPD, Italy).

Abstract: Measuring the complexity of visual content is crucial in various applications, such as selecting sources to test processing algorithms, designing subjective studies, and efficiently determining the appropriate encoding parameters and bandwidth allocation for streaming. While spatial and temporal complexity measures exist for 2D videos, a geometric complexity measure for 3D content is still lacking.
In this paper, we present the first study to characterize the geometric complexity of 3D point clouds. Inspired by existing complexity measures, we propose several compression-based definitions of geometric complexity derived from the rate-distortion curves obtained by compressing a dataset of point clouds using G-PCC. Additionally, we introduce density-based and geometry-based descriptors to predict complexity. Our initial results show that even simple density measures can accurately predict the geometric complexity of point clouds.

Index Terms— Point cloud, complexity, compression, G-PCC.

Posted in ATHENA | Comments Off on Characterizing the Geometric Complexity of G-PCC Compressed Point Clouds

Energy-Efficient Video Streaming: A Study on Bit Depth and Color Subsampling

Energy-Efficient Video Streaming: A Study on Bit Depth and Color Subsampling

IEEE Visual Communications and Image Processing (IEEE VCIP 2024)

Tokyo, Japan, December 8-11, 2024

[PDF]

Hadi Amirpour (AAU, Austria), Lingfeng Qu (Guangzhou University, China), Jong Hwan Ko (SKKU, South Korea), Cosmin Stejerean (Meta, USA), Christian Timmerer (AAU, Austria)

Abstract: As video dimensions — including resolution, frame rate, and bit depth — increase, a larger bitrate is required to maintain a higher Quality of Experience (QoE). While videos are often optimized for resolution and frame rate to improve compression and energy efficiency, the impact of color space is often overlooked.
Larger color spaces are essential for avoiding color banding and delivering High Dynamic Range (HDR) content with richer, more accurate colors, although this comes at the cost of higher processing energy. This paper investigates the effects of bit depth and color subsampling on video compression efficiency and energy consumption. By analyzing different bit depths and subsampling schemes, we aim to determine optimized settings that balance compression efficiency with energy consumption, ultimately contributing to more sustainable and high-quality video delivery. We evaluate both encoding and decoding energy consumption and assess the quality of videos using various metrics including PSNR, VMAF, ColorVideoVDP, and CAMBI. Our findings offer valuable insights for video codec developers and content providers aiming to improve the performance and environmental footprint of their video streaming services.

Index Terms— Video encoding, video decoding, video quality, bit depth, color subsampling, energy.

Posted in ATHENA | Comments Off on Energy-Efficient Video Streaming: A Study on Bit Depth and Color Subsampling

MVCD: Multi-Dimensional Video Compression Dataset

MVCD: Multi-Dimensional Video Compression Dataset (Best paper candidate)

IEEE Visual Communications and Image Processing (IEEE VCIP 2024)

Tokyo, Japan, December 8-11, 2024

[PDF]

Hadi Amirpour (AAU, Austria), Mohammad Ghasempour (AAU, Austria), Farzad Tashtarian (AAU, Austria), Ahmed Telili (TII, UAE), Samira Afzal (AAU, Austria), Wassim Hamidouche (INSA, France), Christian Timmerer (AAU, Austria)

Abstract: In the field of video streaming, the optimization of video encoding and decoding processes is crucial for delivering high-quality video content. Given the growing concern about carbon dioxide emissions, it is equally necessary to consider the energy consumption associated with video streaming. Therefore, to take advantage of machine learning techniques for optimizing video delivery, a dataset encompassing the energy consumption of the encoding and decoding process is needed. This paper introduces a comprehensive dataset featuring diverse video content, encoded and decoded using various codecs and spanning different devices. The dataset includes 1000 videos encoded with four resolutions (2160p, 1080p, 720p, and 540p) at two frame rates (30 fps and 60 fps), resulting in eight unique encodings for each video. Each video is further encoded with four different codecs — AVC (libx264), HEVC (libx265), AV1 (libsvtav1), and VVC (VVenC) — at four quality levels defined by QPs of 22, 27, 32, and 37. In addition, for AV1, three additional QPs of 35, 46, and 55 are considered. We measure both encoding and decoding time and energy consumption on various devices to provide a comprehensive evaluation, employing various metrics and tools. Additionally, we assess encoding bitrate and quality using quality metrics such as PSNR, SSIM, MS-SSIM, and VMAF. All data and the reproduction commands and scripts have been made publicly available as part of the dataset, which can be used for various applications such as rate and quality control, resource allocation, and energy-efficient streaming.

Dataset URL: https://github.com/cd-athena/MVCD

Index Terms— Video encoding, decoding, energy, complexity, quality.

Posted in ATHENA | Comments Off on MVCD: Multi-Dimensional Video Compression Dataset