BPG Specification

version 0.9.2

Copyright (c) 2014 Fabrice Bellard

1) Introduction
---------------

BPG is a lossy and lossless picture compression format based on HEVC
[1]. It supports grayscale, CCIR 601 YCbCr, RGB, YCgCo color spaces
with an optional alpha channel. The CMYK color space is also supported
with an RGB or YCbCr color space. The bit depth of each component is
from 8 to 14 bits.

The chroma can be subsampled by a factor of two in horizontal or both
in horizontal and vertical directions (4:4:4, 4:2:2 or 4:2:0 chroma
formats are supported). The chroma is sampled at the same position
relative to the luma as in the JPEG format [2] and uses the full range
of values. 

Arbitrary metadata (such as EXIF, ICC profile, XMP) are supported.

2) Bitstream conventions
------------------------

The bit stream is byte aligned and bit fields are read from most
significant to least signficant bit in each byte.

- u(n) is an unsigned integer stored on n bits.

- ue7(n) is an unsigned integer of at most n bits stored on a variable
  number of bytes. All the bytes except the last one have a '1' as
  their first bit. The unsigned integer is represented as the
  concatenation of the remaining 7 bit codewords. Only the shortest
  encoding for a given unsigned integer shall be accepted by the
  decoder (i.e. the first byte is never 0x80). Example:

  Encoded bytes       Unsigned integer value
  0x08                8
  0x84 0x1e           542
  0xac 0xbe 0x17      728855

- ue(v) : unsigned integer 0-th order Exp-Golomb-coded (see HEVC
  specification).

- b(8) is an arbitrary byte.

3) File format
--------------

3.1) Syntax
-----------

heic_file() {

     file_magic                                                  u(32)

     pixel_format                                                u(3)
     alpha_present_flag                                          u(1)
     bit_depth_minus_8                                           u(4)

     color_space                                                 u(4)
     extension_present_flag                                      u(1)
     reserved_zeros                                              u(3)
     
     picture_width                                               ue7(32)
     picture_height                                              ue7(32)
     
     picture_data_length                                         ue7(32)
     if (extension_present_flag)  
        extension_data_length                                    ue7(32)
     if (alpha_present_flag)  
        alpha_data_length                                        ue7(32)
    
     if (extension_present_flag) {
         extension_data()
     }

     hevc_header_and_data()

     if (alpha_present_flag) {
         hevc_header_and_data()
     }

}

extension_data() 
{
     for(i = 0; i < v; i++) {
         extension_tag                                           ue7(32)
         extension_tag_length                                    ue7(32)
         for(j = 0; j < extension_tag_length; j++) {
             extension_tag_data_byte                             b(8)
         }
     }
}
     
hevc_header_and_data()
{
     hevc_header_length                                          ue7(32)
     log2_min_luma_coding_block_size_minus3                      ue(v)
     log2_diff_max_min_luma_coding_block_size                    ue(v)
     log2_min_transform_block_size_minus2                        ue(v)
     log2_diff_max_min_transform_block_size                      ue(v)
     max_transform_hierarchy_depth_intra                         ue(v)
     sample_adaptive_offset_enabled_flag                         u(1)
     pcm_enabled_flag                                            u(1)
     if (pcm_enabled_flag) {
         pcm_sample_bit_depth_luma_minus1                        u(4)
         pcm_sample_bit_depth_chroma_minus1                      u(4)
         log2_min_pcm_luma_coding_block_size_minus3              ue(v)
         log2_diff_max_min_pcm_luma_coding_block_size            ue(v)
         pcm_loop_filter_disabled_flag                           u(1)
     }
     strong_intra_smoothing_enabled_flag                         u(1)
     sps_extension_present_flag                                  u(1)
     if (sps_extension_present_flag) {
         sps_range_extension_flag                                u(1)
         sps_extension_7bits                                     u(7)     
     }
     if (sps_range_extension_flag) {
         transform_skip_rotation_enabled_flag                    u(1)
         transform_skip_context_enabled_flag                     u(1)
         implicit_rdpcm_enabled_flag                             u(1)
         explicit_rdpcm_enabled_flag                             u(1)
         extended_precision_processing_flag                      u(1)
         intra_smoothing_disabled_flag                           u(1)
         high_precision_offsets_enabled_flag                     u(1)
         persistent_rice_adaptation_enabled_flag                 u(1)
         cabac_bypass_alignment_enabled_flag                     u(1)
     }
     trailing_bits                                               u(v)

     hevc_data()
}

hevc_data() 
{
     for(i = 0; i < v; i++) {
         hevc_data_byte                                          b(8)
     }
}
        

3.2) Semantics
--------------

     'file_magic' is defined as 0x425047fb.

     'pixel_format' indicates the chroma subsamping:

       0 : Grayscale
       1 : 4:2:0
       2 : 4:2:2
       3 : 4:4:4

       The other values are reserved.
       
       The chroma samples in the 4:2:0 and 4:2:2 formats are sampled
       at the same position as JPEG [2].

     'alpha_present_flag' indicates that the picture contains an alpha
     plane in addition to the color data. When an alpha plane is
     present, the color data is not pre-multiplied.
     
     'bit_depth_minus_8' is the number of bits used for each component
     minus 8. In this version of the specification, bit_depth_minus_8
     <= 6.

     'extension_present_flag' indicates that extension data are
     present. These data are ignored by the current decoders.

     'color_space' indicates the color space. It must be 0 when
     pixel_format = 0 (grayscale):

       0 : YCbCr (CCIR 601, same as JPEG)
       1 : RGB (component order: G B R)
       2 : YCgCo (same as HEVC matrix_coeffs = 8)
       3 : YCbCrK
       4 : CMYK

       The other values are reserved.

       YCbCr is defined as full range CCIR 601 YCbCr.

       RGB is defined as full range RGB. G is stored as the Y plane. B
       in the Cb plane and R in the Cr plane.

       YCgCo is defined as HEVC matrix_coeffs = 8, full range. Y is
       stored in the Y plane. Cg in the Cb plane and Co in the Cr
       plane.
       
       YCbCrK is an encoding for CMYK data. Y, Cb and Cr are stored in
       their respective planes. K' is stored in the alpha plane which
       must be present. The original CMYK data shall be recovered as
       follows:
           - Convert the YCbCr data to RGB data.
           - C = (1 - R), M = (1 - G), Y = (1 - B), K = (1 - K')
             
       CMYK is an encoding for CMYK data. G is stored as the Y
       plane. B in the Cb plane and R in the Cr plane. K' is stored in
       the alpha plane which must be present. The original CMYK data
       shall be recovered as follows:
           - C = (1 - R), M = (1 - G), Y = (1 - B), K = (1 - K')
          
     'reserved_zeros' must be 0 in this version.

     'picture_width' is the picture width in pixels. The value 0 is
     not allowed.

     'picture_height' is the picture height in pixels. The value 0 is
     not allowed.

     'picture_data_length' is the picture data length in bytes.

     'extension_data_length' is the extension data length in bytes.

     'alpha_data_length' is the alpha data length in bytes.

     'extension_data()' is the extension data.

     'extension_tag' is the extension tag. The following values are defined:

       1: EXIF data.

       2: ICC profile (see [4])

       3: XMP (see [5])

       4: Thumbnail (the thumbnail shall be a lower resolution version
       of the image and stored in BPG format).

     The decoder shall ignore the tags it does not support.

     'extension_tag_length' is the length in bytes of the extension tag.

     'hevc_header_length' is the length in bytes of the following data
     up to and including 'trailing_bits'.
     
     'log2_min_luma_coding_block_size_minus3',
     'log2_diff_max_min_luma_coding_block_size',
     'log2_min_transform_block_size_minus2',
     'log2_diff_max_min_transform_block_size',
     'max_transform_hierarchy_depth_intra',
     'sample_adaptive_offset_enabled_flag', 'pcm_enabled_flag',
     'pcm_sample_bit_depth_luma_minus1',
     'pcm_sample_bit_depth_chroma_minus1',
     'log2_min_pcm_luma_coding_block_size_minus3',
     'log2_diff_max_min_pcm_luma_coding_block_size',
     'pcm_loop_filter_disabled_flag',
     'strong_intra_smoothing_enabled_flag', 'sps_extension_flag'
     'sps_extension_present_flag', 'sps_range_extension_flag'
     'transform_skip_rotation_enabled_flag',
     'transform_skip_context_enabled_flag',
     'implicit_rdpcm_enabled_flag', 'explicit_rdpcm_enabled_flag',
     'extended_precision_processing_flag',
     'intra_smoothing_disabled_flag',
     'high_precision_offsets_enabled_flag',
     'persistent_rice_adaptation_enabled_flag',
     'cabac_bypass_alignment_enabled_flag' are
     the corresponding fields of the HEVC SPS syntax element.
         
     'trailing_bits' has a value of 0 and has a length from 0 to 7
     bits so that the next data is byte aligned.

     'hevc_data()' contains the corresponding HEVC picture data,
     excluding the first NAL start code (i.e. the first 0x00 0x00 0x01
     or 0x00 0x00 0x00 0x01 bytes). The VPS and SPS NALs shall not be
     included in the HEVC picture data. The decoder can recover the
     necessary fields from the header by doing the following
     assumptions:

     - vps_video_parameter_set_id = 0
     - sps_video_parameter_set_id = 0
     - sps_max_sub_layers = 1
     - sps_seq_parameter_set_id = 0
     - chroma_format_idc: for picture data: 
         chroma_format_idc = pixel_format
       for alpha data: 
         chroma_format_idc = 0.
     - separate_colour_plane_flag = 0
     - pic_width_in_luma_samples = ceil(picture_width/cb_size) * cb_size
     - pic_height_in_luma_samples = ceil(picture_height/cb_size) * cb_size
       with cb_size = 1 << log2_min_luma_coding_block_size
     - bit_depth_luma_minus8 = bit_depth_minus_8
     - bit_depth_chroma_minus8 = bit_depth_minus_8
     - scaling_list_enabled_flag = 0
                    
3.3) HEVC Profile
-----------------

Conforming HEVC bit streams shall conform to the Main 4:4:4 16 Still
Picture, Level 8.5 of the HEVC specification with the following
modifications.

- separate_colour_plane_flag shall be 0 when present.

- bit_depth_luma_minus8 <= 6

- bit_depth_chroma_minus8 = bit_depth_luma_minus8

- explicit_rdpcm_enabled_flag = 0 (does not matter for intra frames)

- extended_precision_processing_flag = 0

- cabac_bypass_alignment_enabled_flag = 0

- high_precision_offsets_enabled_flag = 0 (does not matter for intra frames)

- If the encoded image is larger than the size indicated by
picture_width and picture_height, the lower right part of the decoded
image shall be cropped. If a horizontal (resp. vertical) decimation by
two is done for the chroma and that the width (resp. height) is n
pixels, ceil(n/2) pixels must be kept as the resulting chroma
information.

4) Design choices
-----------------

(This section is informative)

- Our design principle was to keep the format as simple as possible
  while taking the HEVC codec as basis. Our main metric to evaluate
  the simplicity was the size of a software decoder which outputs 32
  bit RGBA pixel data.

- Pixel formats: we wanted to be able to convert JPEG images to BPG
  with as little loss as possible. So supporting the same color space
  (CCIR 601 YCbCr) with the same range (full range) and most of the
  allowed JPEG chroma formats (4:4:4, 4:2:2, 4:2:0 or grayscale) was
  mandatory to avoid going back to RGB or doing a subsampling or
  interpolation.

- Alpha support: alpha support is mandatory. We chose to use a
  separate HEVC monochrome plane to handle it instead of another
  format to simplify the decoder.

- Color spaces: In addition to YCbCr, RGB is supported for the high
  quality or lossless cases. YCgCo is supported because it may give
  slightly better results than YCbCr for high quality
  images. CMYK/YCbCrK are supported so that JPEGs containing these
  color space can be converted. The alpha plane is used to store the K
  plane. The data is stored with inverted components (1-X) so that the
  conversion to RGB is simplified.

- Bit depth: we decided to support the HEVC bit depths 8 to 14. The
  added complexity is small and it allows to support high quality
  pictures from cameras. The default bit depth in the encoder is 10 so
  that a slightly higher compression is achieved because there are
  less rounding errors in the decoding steps.

- Picture file format: keeping a completely standard HEVC stream would
  have meant a more difficult parsing for the picture header which is
  a problem for the various image utilities to get the basic picture
  information (pixel format, width, height). So we added a small
  header before the HEVC bit stream. The picture header is byte
  oriended so it is easy to parse.

- HEVC bit stream: the standard HEVC headers (the VPS and SPS NALs)
  give an overhead of about 60 bytes for no added value in the case of
  picture compression. Since the alpha plane uses a different HEVC bit
  stream, it also adds the same overhead again. So we removed the VPS
  and SPS NALs and added a very small header with the equivalent
  information (typically 4 bytes). We also removed the first NAL start
  code which is not useful. It is still possible to reconstruct a
  standard HEVC stream to feed an unmodified hardware decoder if needed.

- Extensions: the metadata are stored at the beginning of the file so
  that they can be read at the same time as the header. Since metadata
  tend to evolve faster than the image formats, we left room for
  extension by using a (tag, lengh) representation. The decoder can
  easily skip all the metadata because their length is explicitly
  stored in the image header.

5) References
-------------

[1] High efficiency video coding (HEVC) version 2 (ITU-T Recommendation H.265)

[2] JPEG File Interchange Format version 1.02 ( http://www.w3.org/Graphics/JPEG/jfif3.pdf )

[3] EXIF version 2.2 (JEITA CP-3451)

[4] The International Color Consortium ( http://www.color.org/ )

[5] Extensible Metadata Platform (XMP) http://www.adobe.com/devnet/xmp.html
