Farzad Tashtarian is invited to talk on “Network-Assisted Video Streaming” at the University of Isfahan, Isfahan, Iran.
Farzad Tashtarian is invited to talk on “Network-Assisted Video Streaming” at the University of Isfahan, Isfahan, Iran.
Ekrem Çetinkaya got the Best Doctoral Symposium Paper Award at ACM MMSys 2021 for his paper titled “Machine Learning Based Video Coding Enhancements for HTTP Adaptive Streaming”.
More information about the paper can be found in the blog post.
FAUST: Fast Per-Scene Encoding Using Entropy-Based Scene Detection and Machine Learning
30th IEEE Conference of the Open Innovations Association FRUCT
27-29 October 2021
Anatoliy Zabrovskiy, Prateek Agrawal, Christian Timmerer, and Radu Prodan
Abstract: HTTP adaptive video streaming is a widespread and sought-after technology on the Internet that allows clients to dynamically switch between different stream qualities presented in the bitrate ladder to optimize overall received video quality. Currently, there exist several approaches of different complexity for building such a ladder. The simplest method is to use a static bitrate ladder, and the more complex one is to compute a per-title encoding ladder. The main drawback of these approaches is that they do not provide bitrate ladders for scenes with different visual complexity within the video. Moreover, most modern methods require additional computationally-intensive test encodings of the entire video to construct the convex hull, used to calculate the bitrate ladder. This paper proposes a new fast per-scene encoding approach called FAUST based on 1) quick entropy-based scene detection and 2) prediction of optimized bitrate ladder for each scene using an artificial neural network. The results show that our model reduces the mean absolute error to 0.15, the mean square error to 0.08, and the bitrate to 13.5% while increasing the difference in video multimethod assessment fusion to 5.6 points.
The Fast Multi-Resolution and Multi-Rate Encoding for HTTP Adaptive Streaming Using Machine Learning paper from ATHENA lab is nominated for the Best New Streaming Innovation Award in the Streaming Media Readers’ Choice Awards 2021.
Voting can be done on the awards’ website. The voting is open until October 4. You can find the paper under the Best New Streaming Innovation Award section as following:
More information about the paper can be found here.
IEEE Visual Communications and Image Processing (VCIP 2021)
5-8 December 2021, Munich, Germany
Hadi Amirpour (Alpen-Adria-Universität Klagenfurt), Hannaneh Barahouei Pasandi (Virginia Commonwealth University), Mohammad Ghanbari (School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK), and Christian Timmerer (Alpen-Adria-Universität Klagenfurt)
In per-title encoding, to optimize a bitrate ladder over spatial resolution, each video segment is downscaled to a set of spatial resolutions and they are all encoded at a given set of bitrates. To find the highest quality resolution for each bitrate, the low-resolution encoded videos are upscaled to the original resolution, and a convex hull is formed based on the scaled qualities. Deep learning-based video super-resolution (VSR) approaches show a significant gain over traditional approaches and they are becoming more and more efficient over time. This paper improves the per-title encoding over the upscaling methods by using deep neural network-based VSR algorithms as they show a significant gain over traditional approaches. Utilizing a VSR algorithm by improving the quality of low-resolution encodings can improve the convex hull. As a result, it will lead to an improved bitrate ladder. To avoid bandwidth wastage at perceptually lossless bitrates a maximum threshold for the quality is set and encodings beyond it are eliminated from the bitrate ladder. Similarly, a minimum threshold is set to avoid low-quality video delivery. The encodings between the maximum and minimum thresholds are selected based on one Just Noticeable Difference. Our experimental results show that the proposed per-title encoding results in a 24% bitrate reduction and 53% storage reduction compared to the state-of-the-art method.
Index Terms—HAS, per-title, deep learning, compression, bitrate ladder.
IEEE Visual Communications and Image Processing (VCIP 2021)
5-8 December 2021, Munich, Germany
Hadi Amirpour (Alpen-Adria-Universität Klagenfurt), Raimund Schatz (AIT Austrian Institute of Technology, Austria), Mohammad Ghanbari (School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK), and Christian Timmerer (Alpen-Adria-Universität Klagenfurt)
Due to the growing importance of optimizing quality and efficiency of video streaming delivery, accurate assessment of user perceived video quality becomes increasingly relevant. However, due to the wide range of viewing distances encountered in real-world viewing settings, actually perceived video quality can vary significantly in everyday viewing situations. In this paper, we investigate and quantify the influence of viewing distance on perceived video quality. A subjective experiment was conducted with full HD sequences at three different stationary viewing distances, with each video sequence being encoded at three different quality levels. Our study results confirm that the viewing distance has a significant influence on the quality assessment. In particular, they show that an increased viewing distance generally leads to an increased perceived video quality, especially at low media encoding quality levels. In this context, we also provide an estimation of potential bitrate savings that knowledge of actual viewing distance would enable in practice.
Since current objective video quality metrics do not systematically take into account viewing distance, we also analyze and quantify the influence of viewing distance on the correlation between objective and subjective metrics. Our results confirm the need for distance-aware objective metrics when accurate prediction of perceived video quality in real-world environments is required.
Index Terms—video streaming, QoE, viewing distance, subjective testing.
Babak Taraghi (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt), Minh Nguyen (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt), Hadi Amirpour (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt), Christian Timmerer (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt)
Abstract: With the recent growth of multimedia traffic over the Internet and emerging multimedia streaming service providers, improving Quality of Experience (QoE) for HTTP Adaptive Streaming (HAS) becomes more important. Alongside other factors, such as the media quality, HAS relies on the performance of the media player’s Adaptive Bitrate (ABR) algorithm to optimize QoE in multimedia streaming sessions. QoE in HAS suffers from weak or unstable internet connections and suboptimal ABR decisions. As a result of imperfect adaptiveness to the characteristics and conditions of the internet connection, stall events and quality level switches could occur and with different durations that negatively affect the QoE. In this paper, we address various identified open issues related to the QoE for HAS, notably (i) the minimum noticeable duration for stall events in HAS;(ii) the correlation between the media quality and the impact of stall events on QoE; (iii) the end-user preference regarding multiple shorter stall events versus a single longer stall event; and (iv) the end-user preference of media quality switches over stall events. Therefore, we have studied these open issues from both objective and subjective evaluation perspectives and presented the correlation between the two types of evaluations. The findings documented in this paper can be used as a baseline for improving ABR algorithms and policies in HAS.
Keywords: Crowdsourcing; HTTP Adaptive Streaming; Quality of Experience; Quality Switches; Stall Events; Subjective Evaluation; Objective Evaluation.
In July 2021, the ATHENA Christian Doppler Laboratory hosted three interns working on the following topics:
At the end of the internship, the interns presented their work and the results in the form of a presentation, and they have obtained a certificate of internship. We believe that the joint work was useful both for the laboratory and for the interns themselves. We would like to thank the interns for their genuine interest, productive work, and excellent feedback about our laboratory.
Kassian Fuger: “Those four weeks of the internship meant a lot for my future career path. The internship gave me a great view of what working with a smart and friendly team is like. Working on my tasks by myself taught me a lot and I learned and improved in every aspect of the internship and if something was unclear my supervisor was always there to help. I was well accepted into the team and we sometimes went for lunch together which was very nice and helped me to be confident around the team. Anyways spending a part of my summertime at ATHENA was super fun and a good time investment. Being on the ATHENA team was a great experience and I think everyone can learn something there.”
Vanessa Fröhlich: “I really enjoyed the internship at ATHENA and would even be happy to do it a second time. It was a great experience working with the people there and getting to know certain things that I would have definitely not learned elsewhere. It was very interesting to work with a new type of programming! I also appreciated very much how I felt part of the team in the first days already and also enjoyed working independently but with the help of my supervisor Farzad whenever I needed something or was stuck with an exercise. I am very glad that I found this internship and can say that I have gained a lot of experience there. Thank you!”
We wish the interns every success in their journey through life and we hope to see them soon back at the University of Klagenfurt and ATHENA.
By CHRISTIAN TIMMERER, Senior Member IEEE
MATHIAS WIEN, Member IEEE
LU YU, Senior Member IEEE
AMY REIBMAN, Fellow IEEE Guest Editor
Abstract: Multimedia content (i.e., video, image, audio) is responsible for the majority of today’s Internet traffic and numbers are expecting to grow beyond 80% in the near future. For more than 30 years, international standards provide tools for interoperability and are both source and sink for challenging research activities in the domain of multimedia compression and system technologies. The goal of this special issue is to review those standards and focus on (i) the technology developed in the context of these standards and (ii) research questions addressing aspects of these standards which are left open for competition by both academia and industry.
Index Terms—Open Media Standards, MPEG, JPEG, JVET,AOM, Computational Complexity
C. Timmerer, M. Wien, L. Yu and A. Reibman, “Special issue on Open Media Compression: Overview, Design Criteria, and Outlook on Emerging Standards,” in Proceedings of the IEEE, vol. 109, no. 9, pp. 1423-1434, Sept. 2021, doi: 10.1109/JPROC.2021.3098048.
J. Han et al., “A Technical Overview of AV1,” in Proceedings of the IEEE, vol. 109, no. 9, pp. 1435-1462, Sept. 2021, doi: 10.1109/JPROC.2021.3058584.
Abstract: The AV1 video compression format is developed by the Alliance for Open Media consortium. It achieves more than a 30% reduction in bit rate compared to its predecessor VP9 for the same decoded video quality. This article provides a technical overview of the AV1 codec design that enables the compression performance gains with considerations for hardware feasibility.
B. Bross, J. Chen, J. -R. Ohm, G. J. Sullivan and Y. -K. Wang, “Developments in International Video Coding Standardization After AVC, With an Overview of Versatile Video Coding (VVC),” in Proceedings of the IEEE, vol. 109, no. 9, pp. 1463-1493, Sept. 2021, doi: 10.1109/JPROC.2020.3043399.
Abstract: In the last 17 years, since the finalization of the first version of the now-dominant H.264/Moving Picture Experts Group-4 (MPEG-4) Advanced Video Coding (AVC) standard in 2003, two major new generations of video coding standards have been developed. These include the standards known as High Efficiency Video Coding (HEVC) and Versatile Video Coding (VVC). HEVC was finalized in 2013, repeating the ten-year cycle time set by its predecessor and providing about 50% bit-rate reduction over AVC. The cycle was shortened by three years for the VVC project, which was finalized in July 2020, yet again achieving about a 50% bit-rate reduction over its predecessor (HEVC). This article summarizes these developments in video coding standardization after AVC. It especially focuses on providing an overview of the first version of VVC, including comparisons against HEVC. Besides further advances in hybrid video compression, as in previous development cycles, the broad versatility of the application domain that is highlighted in the title of VVC is explained. Included in VVC is the support for a wide range of applications beyond the typical standard- and high-definition camera-captured content codings, including features to support computer-generated/screen content, high dynamic range content, multilayer and multiview coding, and support for immersive media such as 360° video.
D. Ding, Z. Ma, D. Chen, Q. Chen, Z. Liu and F. Zhu, “Advances in Video Compression System Using Deep Neural Network: A Review and Case Studies,” in Proceedings of the IEEE, vol. 109, no. 9, pp. 1494-1520, Sept. 2021, doi: 10.1109/JPROC.2021.3059994.
Abstract: Significant advances in video compression systems have been made in the past several decades to satisfy the near-exponential growth of Internet-scale video traffic. From the application perspective, we have identified three major functional blocks, including preprocessing, coding, and postprocessing, which have been continuously investigated to maximize the end-user quality of experience (QoE) under a limited bit rate budget. Recently, artificial intelligence (AI)-powered techniques have shown great potential to further increase the efficiency of the aforementioned functional blocks, both individually and jointly. In this article, we review recent technical advances in video compression systems extensively, with an emphasis on deep neural network (DNN)-based approaches, and then present three comprehensive case studies. On preprocessing, we show a switchable texture-based video coding example that leverages DNN-based scene understanding to extract semantic areas for the improvement of a subsequent video coder. On coding, we present an end-to-end neural video coding framework that takes advantage of the stacked DNNs to efficiently and compactly code input raw videos via fully data-driven learning. On postprocessing, we demonstrate two neural adaptive filters to, respectively, facilitate the in-loop and postfiltering for the enhancement of compressed frames. Finally, a companion website hosting the contents developed in this work can be accessed publicly at https://purdueviper.github.io/dnn-coding/.
J. M. Boyce et al., “MPEG Immersive Video Coding Standard,” in Proceedings of the IEEE, vol. 109, no. 9, pp. 1521-1536, Sept. 2021, doi: 10.1109/JPROC.2021.3062590.
Abstract: This article introduces the ISO/IEC MPEG Immersive Video (MIV) standard, MPEG-I Part 12, which is undergoing standardization. The draft MIV standard provides support for viewing immersive volumetric content captured by multiple cameras with six degrees of freedom (6DoF) within a viewing space that is determined by the camera arrangement in the capture rig. The bitstream format and decoding processes of the draft specification along with aspects of the Test Model for Immersive Video (TMIV) reference software encoder, decoder, and renderer are described. The use cases, test conditions, quality assessment methods, and experimental results are provided. In the TMIV, multiple texture and geometry views are coded as atlases of patches using a legacy 2-D video codec, while optimizing for bitrate, pixel rate, and quality. The design of the bitstream format and decoder is based on the visual volumetric video-based coding (V3C) and video-based point cloud compression (V-PCC) standard, MPEG-I Part 5.
C. Cao, M. Preda, V. Zakharchenko, E. S. Jang and T. Zaharia, “Compression of Sparse and Dense Dynamic Point Clouds—Methods and Standards,” in Proceedings of the IEEE, vol. 109, no. 9, pp. 1537-1558, Sept. 2021, doi: 10.1109/JPROC.2021.3085957.
Abstract: In this article, a survey of the point cloud compression (PCC) methods by organizing them with respect to the data structure, coding representation space, and prediction strategies is presented. Two paramount families of approaches reported in the literature—the projection- and octree-based methods—are proven to be efficient for encoding dense and sparse point clouds, respectively. These approaches are the pillars on which the Moving Picture Experts Group Committee developed two PCC standards published as final international standards in 2020 and early 2021, respectively, under the names: video-based PCC and geometry-based PCC. After surveying the current approaches for PCC, the technologies underlying the two standards are described in detail from an encoder perspective, providing guidance for potential standard implementors. In addition, experiment evaluations in terms of compression performances for both solutions are provided.
A. Descampe et al., “JPEG XS—A New Standard for Visually Lossless Low-Latency Lightweight Image Coding,” in Proceedings of the IEEE, vol. 109, no. 9, pp. 1559-1577, Sept. 2021, doi: 10.1109/JPROC.2021.3080916.
Abstract: Joint Photographic Experts Group (JPEG) XS is a new International Standard from the JPEG Committee (formally known as ISO/International Electrotechnical Commission (IEC) JTC1/SC29/WG1). It defines an interoperable, visually lossless low-latency lightweight image coding that can be used for mezzanine compression within any AV market. Among the targeted use cases, one can cite video transport over professional video links (serial digital interface (SDI), internet protocol (IP), and Ethernet), real-time video storage, memory buffers, omnidirectional video capture and rendering, and sensor compression (for example, in cameras and the automotive industry). The core coding system is composed of an optional color transform, a wavelet transform, and a novel entropy encoder, processing groups of coefficients by coding their magnitude level and packing the magnitude refinement. Such a design allows for visually transparent quality at moderate compression ratios, scalable end-to-end latency that ranges from less than one line to a maximum of 32 lines of the image, and a low-complexity real-time implementation in application-specific integrated circuit (ASIC), field-programmable gate array (FPGA), central processing unit (CPU), and graphics processing unit (GPU). This article details the key features of this new standard and the profiles and formats that have been defined so far for the various applications. It also gives a technical description of the core coding system. Finally, the latest performance evaluation results of recent implementations of the standard are presented, followed by the current status of the ongoing standardization process and future milestones.
S. R. Quackenbush and J. Herre, “MPEG Standards for Compressed Representation of Immersive Audio,” in Proceedings of the IEEE, vol. 109, no. 9, pp. 1578-1589, Sept. 2021, doi: 10.1109/JPROC.2021.3075390.
Abstract: The term “immersive audio” is frequently used to describe an audio experience that provides the listener the sensation of being fully immersed or “present” in a sound scene. This can be achieved via different presentation modes, such as surround sound (several loudspeakers horizontally arranged around the listener), 3D audio (with loudspeakers at, above, and below listener ear level), and binaural audio to headphones. This article provides an overview of two recent standards that support the bitrate-efficient carriage of high-quality immersive sound. The first is MPEG-H 3D audio, which is a versatile standard that supports multiple immersive sound signal formats (channels, objects, and higher order ambisonics) and is now being adopted in broadcast and streaming applications. The second is MPEG-I immersive audio, an extension of 3D audio, currently under development, which is targeted for virtual and augmented reality applications. This will support rendering of fully user-interactive immersive sound for three degrees of user movement [three degrees of freedom (3DoF)], i.e., yaw, pitch, and roll head movement, and for six degrees of user movement [six degrees of freedom (6DoF)], i.e., 3DoF plus translational x, y, and z user position movements.
M. M. Hannuksela and Y. -K. Wang, “An Overview of Omnidirectional MediA Format (OMAF),” in Proceedings of the IEEE, vol. 109, no. 9, pp. 1590-1606, Sept. 2021, doi: 10.1109/JPROC.2021.3063544.
Abstract: During recent years, there have been product launches and research for enabling immersive audio–visual media experiences. For example, a variety of head-mounted displays and 360° cameras are available in the market. To facilitate interoperability between devices and media system components by different vendors, the Moving Picture Experts Group (MPEG) developed the Omnidirectional MediA Format (OMAF), which is arguably the first virtual reality (VR) system standard. OMAF is a storage and streaming format for omnidirectional media, including 360° video and images, spatial audio, and associated timed text. This article provides a comprehensive overview of OMAF.
J. Voges, M. Hernaez, M. Mattavelli and J. Ostermann, “An Introduction to MPEG-G: The First Open ISO/IEC Standard for the Compression and Exchange of Genomic Sequencing Data,” in Proceedings of the IEEE, vol. 109, no. 9, pp. 1607-1622, Sept. 2021, doi: 10.1109/JPROC.2021.3082027.
Abstract: The development and progress of high-throughput sequencing technologies have transformed the sequencing of DNA from a scientific research challenge to practice. With the release of the latest generation of sequencing machines, the cost of sequencing a whole human genome has dropped to less than $ 600. Such achievements open the door to personalized medicine, where it is expected that genomic information of patients will be analyzed as a standard practice. However, the associated costs, related to storing, transmitting, and processing the large volumes of data, are already comparable to the costs of sequencing. To support the design of new and interoperable solutions for the representation, compression, and management of genomic sequencing data, the Moving Picture Experts Group (MPEG) jointly with working group 5 of ISO/TC276 “Biotechnology” has started to produce the ISO/IEC 23092 series, known as MPEG-G. MPEG-G does not only offer higher levels of compression compared with the state of the art but it also provides new functionalities, such as built-in support for random access in the compressed domain, support for data protection mechanisms, flexible storage, and streaming capabilities. MPEG-G only specifies the decoding syntax of compressed bitstreams, as well as a file format and a transport format. This allows for the development of new encoding solutions with higher degrees of optimization while maintaining compatibility with any existing MPEG-G decoder.
The Quality of Experience (QoE) is well-defined in QUALINET white papers [here, here], but its assessment and metrics are subject to research. The aim of this workshop on “Quality of Immersive Media: Assessment and Metrics” is to provide a forum for researchers and practitioners to discuss the latest findings in this field. The scope of this workshop is (i) to raise awareness about MPEG efforts in the context of quality of immersive visual media and (ii) invite experts (outside of MPEG) to present new techniques relevant to this workshop.
Quality assessments in the context of the MPEG standardization process typically serve two purposes: (1) to foster decision-making on the tool adoptions during the standardization process and (2) to validate the outcome of a standardization effort compared to an established anchor (i.e., for verification testing).
We kindly invite you to the first online MPEG AG 5 Workshop on Quality of Immersive Media: Assessment and Metrics as follows.
15:00-15:10: Joel Jung & Christian Timmerer (AhG co-chairs): Welcome notice
15:10-15:30: Mathias Wien (AG 5 convenor): MPEG Visual Quality Assessment: Tasks and Perspectives
Abstract: The Advisory Group on MPEG Visual Quality Assessment (ISO/IEC JTC1 SC29/AG5) has been founded in 2020 with the goal to select and design subjective quality evaluation methodologies and objective quality metrics for the assessment of visual coding technologies in the context of the MPEG standardization work. In this talk, the current work items, as well as perspectives and first achievements of the group, are presented.
15:30-15:50: Aljosa Smolic: Perception and Quality of Immersive Media
Abstract: Interest in immersive media increased significantly over recent years. Besides applications in entertainment, culture, health, industry, etc., telepresence and remote collaboration gained importance due to the pandemic and climate crisis. Immersive media have the potential to increase social integration and to reduce greenhouse gas emissions. As a result, technologies along the whole pipeline from capture to display are maturing and applications are becoming available, creating business opportunities. One aspect of immersive technologies that is still relatively undeveloped is the understanding of perception and quality, including subjective and objective assessment. The interactive nature of immersive media poses new challenges to estimation of saliency or visual attention, and to the development of quality metrics. The V-SENSE lab of Trinity College Dublin addresses these questions in current research. This talk will highlight corresponding examples in 360 VR video, light fields, volumetric video and XR.
16:00-16:20: Jesús Gutiérrez: Quality assessment of immersive media: Recent activities within VQEG
Abstract: This presentation will provide an overview of the recent activities carried out on quality assessment of immersive media within the Video Quality Experts Group (VQEG), particularly within the Immersive Media Group (IMG). Among other efforts, outcomes will be presented from the cross-lab test (carried out by ten different labs) in order to assess and validate subjective evaluation methodologies for 360º videos, which was instrumental in the development of the ITU-T Recommendation P.919. Also, insights will be provided on the current plans on exploring the evaluation of the quality of experience of immersive communication systems, considering different technologies such as 360º video, point cloud, free-viewpoint video, etc.
16:20-16:40: Alexander Raake: <to-be-provided>