Beyond Curves and Thresholds – Introducing Uncertainty Estimation to Satisfied User Ratios for Compressed Video

Picture Coding Symposium (PCS) 

12-14 June 2024, Taichung, Taiwan

[PDF]

Jingwen Zhu (University of Nantes, France), Hadi Amirpour (AAU, Austria), Raimund Schatz (AIT, Austria), Patrick Le Callet (University of Nantes, France)and Christian Timmerer (AAU, Austria)

Abstract: Just Noticeable Difference (JND) establishes the threshold between two images or videos wherein differences in quality remain imperceptible to an individual. This threshold, collectively known as the Satisfied User Ratio (SUR), holds significant importance in image and video compression applications, ensuring that differences in quality are imperceptible to the majority (p%) of users, known as p%SUR. While substantial efforts have been dedicated to predicting the p%SUR for various encoding parameters (e.g., QP) and quality metrics (e.g., VMAF), referred to as proxies, systematic consideration of the prediction uncertainties associated with these proxies has hitherto remained unexplored. In this paper, we analyze the uncertainty of p%SUR through Confidence Interval (CI) estimation and assess the consistency of various Video Quality Metrics (VQMs) as proxies for SUR. The analysis reveals challenges in directly using p%SUR as ground truth for training models and highlights the need for uncertainty estimation for SUR with different proxies.

Posted in ATHENA | Comments Off on Beyond Curves and Thresholds – Introducing Uncertainty Estimation to Satisfied User Ratios for Compressed Video

ICIP 2024 Grand Challenge on 360° Video Super Resolution

IEEE International Conference on Image Processing (IEEE ICIP)

Grand Challenge on

360° Video Super Resolution and Quality Enhancement

27-30 October 2024, Abi Dhabi, UAE

https://www.icip24-video360sr.ae/home

 

Abstract: Omnidirectional visual content, commonly referred to as 360-degree images and videos, has garnered significant interest in both academia and industry, establishing itself as the primary media modality for VR/XR applications. 360-degree videos offer numerous features and advantages, allowing users to view scenes in all directions, providing an immersive quality of experience with up to 3 degrees of freedom (3DoF). When integrated on embedded devices with remote control, 360-degree videos offer additional degrees of freedom, enabling movement within the space (6DoF). However, 360-degree videos come with specific requirements, such as high-resolution content with up to 16K video resolution to ensure a high-quality representation of the scene. Moreover, limited bandwidth in wireless communication, especially under mobility conditions, imposes strict constraints on the available throughput to prevent packet loss and maintain low end-to-end latency. Adaptive resolution and efficient compression of 360-degree video content can address these challenges by adapting to the available throughput while maintaining high video quality at the decoder. Nevertheless, the downscaling and coding of the original content before transmission introduces visible distortions and loss of information details that cannot be recovered at the decoder side. In this context, machine learning techniques have demonstrated outstanding performance in alleviating coding artifacts and recovering lost details, particularly for 2D video. Compared to 2D video, 360-degree video presents a lower angular resolution issue, requiring augmentation of both the resolution and the quality of the video. This challenge presents an opportunity for the scientific research and industrial community to propose solutions for quality enhancement and super-resolution for 360-degree videos.

 

Posted in ATHENA | Comments Off on ICIP 2024 Grand Challenge on 360° Video Super Resolution

Prof. Mohammad Ghanbari (1948-2024)

In the wake of the passing of Prof. Mohammad Ghanbari, we extend our deepest condolences to his family during this challenging time. Prof. Ghanbari was a distinguished member of our Christian Doppler Laboratory ATHENA since its inception, and we consider ourselves privileged to have had the opportunity to collaborate with him. His contributions, reflected in over 30 joint publications in video coding and streaming, have been accepted at renowned publication venues such as IEEE TCSVT, ACM TOMM, IEEE TIP, IEEE TNSM, IEEE ICIP, IEEE ICASSP, IEEE ICME, ACM MMSys, PCS, IEEE MMSP, among others. Prof. Ghanbari played a pivotal role in the success of our research endeavors, and his profound knowledge, insightful input, and invaluable guidance were consistently valued.

The entire Institute for Information Technology, especially those at the Christian Doppler Laboratory ATHENA, feels deeply saddened by the loss of Prof. Ghanbari. As we come to terms with this period of mourning, reflection, and farewell, we extend our warmest wishes and heartfelt sympathies to his family and the wider research community.

Posted in ATHENA | Comments Off on Prof. Mohammad Ghanbari (1948-2024)

DIGITWISE: Digital Twin-based Modeling of Adaptive Video Streaming Engagement

The 15th ACM Multimedia Systems Conference

15-18 April, 2024 | Bari, Italy

Conference website

[PDF]

Emanuele Artioli (AAU, Austria), Farzad Tashtarian (AAU, Austria), and Christian Timmerer (AAU, Austria)

Abstract:

As the popularity of video streaming entertainment continues to grow, understanding how users engage with the content and react to its changes becomes a critical success factor for every stakeholder. User engagement, i.e., the percentage of video the user watches before quitting, is central to customer loyalty, content personalization, ad relevance, and A/B testing. This paper presents DIGITWISE, a digital twin-based approach for modeling adaptive video streaming engagement. Traditional adaptive bitrate (ABR) algorithms assume that all users react similarly to video streaming artifacts and network issues, neglecting individual user sensitivities. DIGITWISE leverages the concept of a digital twin, a digital replica of a physical entity, to model user engagement based on past viewing sessions. The digital twin receives input about streaming events and utilizes supervised machine learning to predict user engagement for a given session. The system model consists of a data processing pipeline, machine learning models acting as digital twins, and a unified model to predict engagement. DIGITWISE employs the XGBoost model in both digital twins and unified models. The proposed architecture demonstrates the importance of personal user sensitivities, reducing user engagement prediction error by up to 5.8% compared to non-user-aware models. Furthermore, DIGITWISE can optimize content provisioning and delivery by identifying the features that maximize engagement, providing an average engagement increase of up to 8.6 %.

Keywords: digital twin, user engagement, xgboost

Posted in ATHENA | Comments Off on DIGITWISE: Digital Twin-based Modeling of Adaptive Video Streaming Engagement

E-WISH: An Energy-aware ABR Algorithm For Green HTTP Adaptive Video Streaming

ACM Mile-High Video 2024

February 11-14, 2024, Marriott DTC, Denver, US

[PDF]

Daniele Lorenzi (AAU, Austria), Minh Nguyen (AAU, Austria), Farzad Tashtarian (AAU, Austria), and Christian Timmerer (AAU, Austria)

Abstract:

HTTP Adaptive Streaming (HAS) is the de-facto solution for delivering video content over the Internet. The climate crisis has highlighted the environmental impact of information and communication technologies (ICT) solutions and the need for green solutions to reduce ICT’s carbon footprint. As video streaming dominates Internet traffic, research in this direction is vital now more than ever. HAS relies on Adaptive BitRate (ABR) algorithms, which dynamically choose suitable video representations to accommodate device characteristics and network conditions. ABR algorithms typically prioritize video quality, ignoring the energy impact of their decisions. Consequently, they often select the video representation with the highest bitrate under good network conditions, thereby increasing energy consumption. This is problematic, especially for energy-limited devices, because it affects the device’s battery life and the user experience. To address the aforementioned issues, we propose E-WISH, a novel energy-aware ABR algorithm, which extends the already-existing WISH algorithm to consider energy consumption while selecting the quality for the next video segment. According to the experimental findings, E-WISH shows the ability to improve Quality of Experience (QoE) by up to 52% according to the ITU-T P.1203 model (mode 0) while simultaneously reducing energy consumption by up to 12% with respect to state-of-the-art approaches.

Keywords: HTTP adaptive streaming, Energy, Adaptive Bitrate (ABR), DASH

Posted in ATHENA | Comments Off on E-WISH: An Energy-aware ABR Algorithm For Green HTTP Adaptive Video Streaming

Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low Latency Encoding

MHV 2024: ACM Mile High Video

11 – 14 Feb 2024 |  Denver, United States

Conference Website

[PDF][Slides]

Vignesh V Menon (Fraunhofer HHI),  Jingwen Zhu (École Centrale Nantes), Prajit T Rajendran (Université Paris-Saclay),   Samira Afzal (Alpen-Adria-Universität Klagenfurt), Klaus Schoeffmann (Alpen-Adria-Universität Klagenfurt), Patrick Le Callet (École Centrale Nantes), and Christian Timmerer (Alpen-Adria-Universität Klagenfurt)

Abstract: In HTTP adaptive live streaming applications, video segments are encoded at a fixed set of bitrate-resolution pairs known as bitrate ladder. Live encoders use the fastest available encoding configuration, referred to as preset, to ensure the minimum possible latency in video encoding. However, an optimized preset and optimized number of CPU threads for each encoding instance may result in (i) increased quality and (ii) efficient CPU utilization while encoding. For low latency live encoders, the encoding speed is expected to be more than or equal to the video framerate. To this light, this paper introduces a Just Noticeable Difference (JND)-Aware Low latency Encoding Scheme (JALE), which uses random forest-based models to jointly determine the optimized encoder preset and thread count for each representation, based on video complexity features, the target encoding speed, the total number of available CPU threads, and the target encoder. Experimental results show that, on average, JALE yield a quality improvement of 1.32 dB PSNR and 5.38 VMAF points with the same bitrate, compared to the fastest preset encoding of the HTTP Live Streaming (HLS) bitrate ladder using x265 HEVC open-source encoder with eight CPU threads used for each representation. These enhancements are achieved while maintaining the desired encoding speed. Furthermore, on average, JALE results in an overall storage reduction of 72.70%, a reduction in the total number of CPU threads used by 63.83%, and a 37.87% reduction in the overall encoding time, considering a JND of six VMAF points.

Keywords: Live streaming, low latency, encoder preset, CPU threads, HEVC.

Posted in GAIA | Comments Off on Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low Latency Encoding

Content-adaptive Video Coding for HTTP Adaptive Streaming

Klagenfurt, January 15, 2024

Congratulations to Dr. Vignesh V Menon for successfully defending his dissertation on “Content-adaptive Video Coding for HTTP Adaptive Streaming” at Universität Klagenfurt in the context of the Christian Doppler Laboratory ATHENA.

Abstract

In today’s dynamic streaming landscape, where viewers access content on various devices and en- counter fluctuating network conditions, optimizing video delivery for each unique scenario is imperative. Video content complexity analysis, content-adaptive video coding, and multi-encoding methods are fundamental for the success of adaptive video streaming, as they serve crucial roles in delivering high-quality video experiences to a diverse audience. Video content complexity analysis allows us to comprehend the video content’s intricacies, such as motion, texture, and detail, providing valuable insights to enhance encoding decisions. By understanding the content’s characteristics, we can efficiently allocate bandwidth and encoding resources, thereby improving compression efficiency without compromising quality. Content-adaptive video coding techniques built upon this analysis involve dynamically adjusting encoding parameters based on the content complexity. This adaptability ensures that the video stream remains visually appealing and artifacts are minimized, even under challenging network conditions. Multi-encoding methods further bolster adaptive streaming by offering faster encoding of multiple representations of the same video at different bitrates. This versatility reduces computational overhead and enables efficient resource allocation on the server side. Collectively, these technologies empower adaptive video streaming to deliver optimal visual quality and uninterrupted viewing experiences, catering to viewers’ diverse needs and preferences across a wide range of devices and network conditions. Embracing video content complexity analysis, content-adaptive video coding, and multi-encoding methods is essential to meet modern video streaming platforms’ evolving demands and create immersive experiences that captivate and engage audiences. In this light, this dissertation proposes contributions categorized into four classes:

Video complexity analysis: For the online analysis of video content complexity, selecting low- complexity features is critical to ensure low-latency video streaming without disruptions. The spatial information (SI) and temporal information (TI) are state-of-the-art spatial and temporal complexity features. However, these features are not optimized for online analysis in live-streaming applications. Moreover, the correlation of the features to the video coding parameters like bitrate and encoding time is not significant. This thesis proposes discrete cosine transform (DCT)-energy-based spatial and temporal complexity features to overcome these limitations and provide an efficient video com- plexity analysis regarding accuracy and speed for every video (segment). The proposed features are determined at an average rate of 370 frames per second for ultra high definition (UHD) video content and used in estimating encoding parameters online.

Content-adaptive encoding optimizations: Content-adaptive encoding algorithms enable bet- ter control of codec-specific parameters and mode decisions inside the encoder to achieve higher bitrate savings and/or save encoding time. The contributions of this class are listed as follows:

  1. A scene detection algorithm is proposed using the video complexity analysis features. The proposed algorithm yields a true positive rate of 78.26% and a false positive rate of 0.01%, compared to the state-of-the-art algorithm’s true positive rate of 53.62% and false positive rate of 0.03%.
  2. An intra coding unit depth prediction (INCEPT) algorithm is proposed, which limits rate- distortion optimization for each coding tree unit (CTU) in high efficiency video coding (HEVC) by utilizing the spatial correlation with the neighboring CTUs, which is computed using the luma texture complexity feature introduced in the first contribution class. Experimental results show that INCEPT achieves a 23.24% reduction in the overall encoding time with a negligible loss in compression efficiency.

Online per-title encoding optimizations: Per-title encoding has gained attraction over recent years in adaptive streaming applications. Each video is segmented into multiple scenes, and optimal encoding parameters are selected. The contributions in this category are listed as follows:

  1. Online resolution prediction scheme (ORPS), which predicts optimized resolution using the video content complexity of the video segment and the predefined set of target bitrates, is proposed. ORPS yields an average bitrate reduction of 17.28% and 22.79% for the same PSNR and VMAF, respectively, compared to the standard HTTP live streaming (HLS) bitrate ladder using x265 constant bitrate (CBR) encoding.
  2. Online framerate prediction scheme (OFPS) is proposed to predict optimized framerate using the video content complexity of the video segment and the predefined set of target bitrates. OFPS yields an average bitrate reduction of 15.87% and 18.20% for the same PSNR and VMAF, respectively, compared to the original framerate CBR encoding of UHD 120fps sequences using x265, accompanied by an overall encoding time reduction of 21.82%.
  3. Just noticeable difference (JND)-aware bitrate ladder prediction scheme (JBLS) is proposed, which predicts optimized bitrate-resolution pairs such that there is a perceptual quality differ- ence of one JND between representations. An average bitrate reduction of 12.94% and 17.94% for the same PSNR and VMAF, respectively, is observed, compared to the HLS CBR bitrate ladder encoding using x265. For a target JND of 6 VMAF points, JBLS achieves a storage reduction of 42.48% and 25.35% reduction in encoding time.
  4. Online encoding preset prediction scheme (OEPS) is proposed, which predicts the optimized encoder preset based on the target bitrate, resolution, and video framerate for every video segment. OEPS yields consistent encoding speed across various representations with an overall quality improvement of 0.83 dB PSNR and 5.81 VMAF points with the same bitrate, compared to the fastest preset encoding of the HLS CBR bitrate ladder using x265.
  5. A JND-aware two-pass per-title encoding scheme, named live variable bitrate encoding (LiveVBR) is proposed, which predicts perceptually-aware bitrate-resolution-framerate-rate factor tuples for the bitrate ladder of each video segment. LiveVBR yields an average bitrate reduction of 18.80% and 32.59% for the same PSNR and VMAF, respectively, compared to the HLS CBR bitrate ladder encoding using x265. For a target JND of six VMAF points, LiveVBR also resulted in a 68.96% reduction in storage space and an 18.58% reduction in encoding time, with a negligible impact on streaming latency.

Multi-encoding optimizations: Presently, most streaming services utilize cloud-based encoding techniques, enabling a fully parallel encoding process to reduce the overall encoding time. This dissertation comprehensively proposes various multi-rate and multi-encoding schemes in serial and parallel encoding scenarios. Furthermore, it introduces novel heuristics to limit the rate-distortion optimiza- tion (RDO) process across multiple representations. Based on these heuristics, three multi-encoding schemes are proposed, which rely on encoder analysis sharing across different representations: (i) optimized for the highest compression efficiency, (ii) optimized for the best compression efficiency- encoding time savings trade-off and (iii) optimized for the best encoding time savings. Experimental results demonstrate that the proposed multi-encoding schemes (i), (ii), and (iii) reduce the overall serial encoding time by 34.71%, 45.27%, and 68.76% with a 2.3%, 3.1%, and 4.5% bitrate increase to maintain the same VMAF, respectively compared to stand-alone encodings. The overall parallel encoding time is reduced by 22.03%, 20.72%, and 76.82% compared to stand-alone encodings for schemes (i), (ii), and (iii), respectively.


Slides available here: https://www.slideshare.net/slideshows/contentadaptive-video-coding-for-http-adaptive-streaming/265462304

Posted in ATHENA | Comments Off on Content-adaptive Video Coding for HTTP Adaptive Streaming