Free Lossless Audio Codec (FLAC)
Abstract
This document defines the Free Lossless Audio Codec (FLAC) format and
its streamable subset. FLAC is designed to reduce the amount of
computer storage space needed to store digital audio signals without
losing information in doing so (i.e., lossless). FLAC is free in the
sense that its specification is open and its reference implementation
is open source. Compared to other lossless (audio) coding formats,
FLAC is a format with low complexity and can be coded to and from
with little computing resources. Decoding of FLAC has seen multiple
independent implementations on many different platforms, and both
encoding and decoding can be implemented without floating-point
arithmetic.
Status of This Memo
Copyright Notice
Copyright (c) 2024 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Revised BSD License text as described in Section 4.e of the
Trust Legal Provisions and are provided without warranty as described
in the Revised BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 1. Introduction
2. Notation and Conventions . . . . . . . . . . . . . . . . . . 4 2. Notation and Conventions
3. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 5 3. Definitions
4. Conceptual overview . . . . . . . . . . . . . . . . . . . . . 7 4. Conceptual Overview
4.1. Blocking . . . . . . . . . . . . . . . . . . . . . . . . 8 4.1. Blocking
4.2. Interchannel Decorrelation . . . . . . . . . . . . . . . 8 4.2. Interchannel Decorrelation
4.3. Prediction . . . . . . . . . . . . . . . . . . . . . . . 9 4.3. Prediction
4.4. Residual Coding . . . . . . . . . . . . . . . . . . . . . 10 4.4. Residual Coding
5. Format principles . . . . . . . . . . . . . . . . . . . . . . 11 5. Format Principles
6. Format layout overview . . . . . . . . . . . . . . . . . . . 13 6. Format Layout Overview
7. Streamable subset . . . . . . . . . . . . . . . . . . . . . . 14 7. Streamable Subset
8. File-level metadata . . . . . . . . . . . . . . . . . . . . . 15 8. File-Level Metadata
8.1. Metadata block header . . . . . . . . . . . . . . . . . . 15 8.1. Metadata Block Header
8.2. Streaminfo . . . . . . . . . . . . . . . . . . . . . . . 16 8.2. Streaminfo
8.3. Padding . . . . . . . . . . . . . . . . . . . . . . . . . 19 8.3. Padding
8.4. Application . . . . . . . . . . . . . . . . . . . . . . . 19 8.4. Application
8.5. Seektable . . . . . . . . . . . . . . . . . . . . . . . . 20 8.5. Seektable
8.5.1. Seekpoint . . . . . . . . . . . . . . . . . . . . . . 21 8.5.1. Seekpoint
8.6. Vorbis comment . . . . . . . . . . . . . . . . . . . . . 21 8.6. Vorbis Comment
8.6.1. Standard field names . . . . . . . . . . . . . . . . 22 8.6.1. Standard Field Names
8.6.2. Channel mask . . . . . . . . . . . . . . . . . . . . 23 8.6.2. Channel Mask
8.7. Cuesheet . . . . . . . . . . . . . . . . . . . . . . . . 25 8.7. Cuesheet
8.7.1. Cuesheet track . . . . . . . . . . . . . . . . . . . 27 8.7.1. Cuesheet Track
8.8. Picture . . . . . . . . . . . . . . . . . . . . . . . . . 28 8.8. Picture
9. Frame structure . . . . . . . . . . . . . . . . . . . . . . . 32 9. Frame Structure
9.1. Frame header . . . . . . . . . . . . . . . . . . . . . . 33 9.1. Frame Header
9.1.1. Block size bits . . . . . . . . . . . . . . . . . . . 33 9.1.1. Block Size Bits
9.1.2. Sample rate bits . . . . . . . . . . . . . . . . . . 34 9.1.2. Sample Rate Bits
9.1.3. Channels bits . . . . . . . . . . . . . . . . . . . . 35 9.1.3. Channels Bits
9.1.4. Bit depth bits . . . . . . . . . . . . . . . . . . . 37 9.1.4. Bit Depth Bits
9.1.5. Coded number . . . . . . . . . . . . . . . . . . . . 37 9.1.5. Coded Number
9.1.6. Uncommon block size . . . . . . . . . . . . . . . . . 39 9.1.6. Uncommon Block Size
9.1.7. Uncommon sample rate . . . . . . . . . . . . . . . . 39 9.1.7. Uncommon Sample Rate
9.1.8. Frame header CRC . . . . . . . . . . . . . . . . . . 40 9.1.8. Frame Header CRC
9.2. Subframes . . . . . . . . . . . . . . . . . . . . . . . . 40 9.2. Subframes
9.2.1. Subframe header . . . . . . . . . . . . . . . . . . . 40 9.2.1. Subframe Header
9.2.2. Wasted bits per sample . . . . . . . . . . . . . . . 41 9.2.2. Wasted Bits per Sample
9.2.3. Constant subframe . . . . . . . . . . . . . . . . . . 42 9.2.3. Constant Subframe
9.2.4. Verbatim subframe . . . . . . . . . . . . . . . . . . 42 9.2.4. Verbatim Subframe
9.2.5. Fixed predictor subframe . . . . . . . . . . . . . . 42 9.2.5. Fixed Predictor Subframe
9.2.6. Linear predictor subframe . . . . . . . . . . . . . . 44 9.2.6. Linear Predictor Subframe
9.2.7. Coded residual . . . . . . . . . . . . . . . . . . . 46 9.2.7. Coded Residual
9.3. Frame footer . . . . . . . . . . . . . . . . . . . . . . 49 9.3. Frame Footer
10. Container mappings . . . . . . . . . . . . . . . . . . . . . 49 10. Container Mappings
10.1. Ogg mapping . . . . . . . . . . . . . . . . . . . . . . 49 10.1. Ogg Mapping
10.2. Matroska mapping . . . . . . . . . . . . . . . . . . . . 51 10.2. Matroska Mapping
10.3. ISO Base Media File Format (MP4) mapping . . . . . . . . 51 10.3. ISO Base Media File Format (MP4) Mapping
11. Implementation status . . . . . . . . . . . . . . . . . . . . 52 11. Security Considerations
12. Security Considerations . . . . . . . . . . . . . . . . . . . 52 12. IANA Considerations
13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 55 12.1. Media Type Registration
13.1. Media type registration . . . . . . . . . . . . . . . . 55 12.2. FLAC Application Metadata Block IDs Registry
13.2. Application ID Registry . . . . . . . . . . . . . . . . 56 13. References
14. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 58 13.1. Normative References
15. References . . . . . . . . . . . . . . . . . . . . . . . . . 59 13.2. Informative References
15.1. Normative References . . . . . . . . . . . . . . . . . . 59 Appendix A. Numerical Considerations
15.2. Informative References . . . . . . . . . . . . . . . . . 60 A.1. Determining the Necessary Data Type Size
Appendix A. Numerical considerations . . . . . . . . . . . . . . 62 A.2. Stereo Decorrelation
A.1. Determining the necessary data type size . . . . . . . . 63 A.3. Prediction
A.2. Stereo decorrelation . . . . . . . . . . . . . . . . . . 63 A.4. Residual
A.3. Prediction . . . . . . . . . . . . . . . . . . . . . . . 64 A.5. Rice Coding
A.4. Residual . . . . . . . . . . . . . . . . . . . . . . . . 65 Appendix B. Past Format Changes
A.5. Rice coding . . . . . . . . . . . . . . . . . . . . . . . 66 B.1. Addition of Blocking Strategy Bit
Appendix B. Past format changes . . . . . . . . . . . . . . . . 66 B.2. Restriction of Encoded Residual Samples
B.1. Addition of blocking strategy bit . . . . . . . . . . . . 66 B.3. Addition of 5-Bit Rice Parameters
B.2. Restriction of encoded residual samples . . . . . . . . . 67 B.4. Restriction of LPC Shift to Non-negative Values
B.3. Addition of 5-bit Rice parameters . . . . . . . . . . . . 67 Appendix C. Interoperability Considerations
B.4. Restriction of LPC shift to non-negative values . . . . . 68 C.1. Features outside of the Streamable Subset
Appendix C. Interoperability considerations . . . . . . . . . . 68 C.2. Variable Block Size
C.1. Features outside of the streamable subset . . . . . . . . 68 C.3. 5-Bit Rice Parameter
C.2. Variable block size . . . . . . . . . . . . . . . . . . . 68 C.4. Rice Escape Code
C.3. 5-bit Rice parameter . . . . . . . . . . . . . . . . . . 69 C.5. Uncommon Block Size
C.4. Rice escape code . . . . . . . . . . . . . . . . . . . . 69 C.6. Uncommon Bit Depth
C.5. Uncommon block size . . . . . . . . . . . . . . . . . . . 69 C.7. Multi-Channel Audio and Uncommon Sample Rates
C.6. Uncommon bit depth . . . . . . . . . . . . . . . . . . . 69 C.8. Changing Audio Properties Mid-Stream
C.7. Multi-channel audio and uncommon sample rates . . . . . . 70 Appendix D. Examples
C.8. Changing audio properties mid-stream . . . . . . . . . . 71 D.1. Decoding Example 1
Appendix D. Examples . . . . . . . . . . . . . . . . . . . . . . 71 D.1.1. Example File 1 in Hexadecimal Representation
D.1. Decoding example 1 . . . . . . . . . . . . . . . . . . . 72 D.1.2. Example File 1 in Binary Representation
D.1.1. Example file 1 in hexadecimal representation . . . . 72 D.1.3. Signature and Streaminfo
D.1.2. Example file 1 in binary representation . . . . . . . 72 D.1.4. Audio Frames
D.1.3. Signature and streaminfo . . . . . . . . . . . . . . 72 D.2. Decoding Example 2
D.1.4. Audio frames . . . . . . . . . . . . . . . . . . . . 74 D.2.1. Example File 2 in Hexadecimal Representation
D.2. Decoding example 2 . . . . . . . . . . . . . . . . . . . 76 D.2.2. Example File 2 in Binary Representation (Only Audio
D.2.1. Example file 2 in hexadecimal representation . . . . 76 Frames)
D.2.2. Example file 2 in binary representation (only audio D.2.3. Streaminfo Metadata Block
frames) . . . . . . . . . . . . . . . . . . . . . . . 77 D.2.4. Seektable
D.2.3. Streaminfo metadata block . . . . . . . . . . . . . . 78 D.2.5. Vorbis Comment
D.2.4. Seektable . . . . . . . . . . . . . . . . . . . . . . 78 D.2.6. Padding
D.2.5. Vorbis comment . . . . . . . . . . . . . . . . . . . 79 D.2.7. First Audio Frame
D.2.6. Padding . . . . . . . . . . . . . . . . . . . . . . . 80 D.2.8. Second Audio Frame
D.2.7. First audio frame . . . . . . . . . . . . . . . . . . 81 D.2.9. MD5 Checksum Verification
D.2.8. Second audio frame . . . . . . . . . . . . . . . . . 87 D.3. Decoding Example 3
D.2.9. MD5 checksum verification . . . . . . . . . . . . . . 90 D.3.1. Example File 3 in Hexadecimal Representation
D.3. Decoding example 3 . . . . . . . . . . . . . . . . . . . 90 D.3.2. Example File 3 in Binary Representation (Only Audio
D.3.1. Example file 3 in hexadecimal representation . . . . 90 Frame)
D.3.2. Example file 3 in binary representation (only audio D.3.3. Streaminfo Metadata Block
frame) . . . . . . . . . . . . . . . . . . . . . . . 90 D.3.4. Audio Frame
D.3.3. Streaminfo metadata block . . . . . . . . . . . . . . 90 Acknowledgments
D.3.4. Audio frame . . . . . . . . . . . . . . . . . . . . . 91 Authors' Addresses
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 96
1. Introduction 1. Introduction
This document defines the FLAC format and its streamable subset. This document defines the Free Lossless Audio Codec (FLAC) format and
FLAC files and streams can code for pulse-code modulated (PCM) audio its streamable subset. FLAC files and streams can code for pulse-
with 1 to 8 channels, sample rates from 1 up to 1048575 hertz and bit code modulated (PCM) audio with 1 to 8 channels, sample rates from 1
depths from 4 up to 32 bits. Most tools for coding to and decoding to 1048575 hertz, and bit depths from 4 to 32 bits. Most tools for
from the FLAC format have been optimized for CD-audio, which is PCM coding to and decoding from the FLAC format have been optimized for
audio with 2 channels, a sample rate of 44.1 kHz, and a bit depth of CD-audio, which is PCM audio with 2 channels, a sample rate of 44.1
16 bits. kHz, and a bit depth of 16 bits.
FLAC is able to achieve lossless compression because samples in audio FLAC is able to achieve lossless compression because samples in audio
signals tend to be highly correlated with their close neighbors. In signals tend to be highly correlated with their close neighbors. In
contrast with general-purpose compressors, which often use contrast with general-purpose compressors, which often use
dictionaries, do run-length coding, or exploit long-term repetition, dictionaries, do run-length coding, or exploit long-term repetition,
FLAC removes redundancy solely in the very short term, looking back FLAC removes redundancy solely in the very short term, looking back
at at most 32 samples. at 32 samples at most.
The coding methods provided by the FLAC format work best on PCM audio The coding methods provided by the FLAC format work best on PCM audio
signals, of which the samples have a signed representation and are signals, of which the samples have a signed representation and are
centered around zero. Audio signals in which samples have an centered around zero. Audio signals in which samples have an
unsigned representation must be transformed to a signed unsigned representation must be transformed to a signed
representation as described in this document in order to achieve representation as described in this document in order to achieve
reasonable compression. The FLAC format is not suited for reasonable compression. The FLAC format is not suited for
compressing audio that is not PCM. compressing audio that is not PCM.
2. Notation and Conventions 2. Notation and Conventions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"OPTIONAL" in this document are to be interpreted as described in BCP "OPTIONAL" in this document are to be interpreted as described in
14 [RFC2119] [RFC8174] when, and only when, they appear in all BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here. capitals, as shown here.
Values expressed as u(n) represent unsigned big-endian integer using Values expressed as u(n) represent an unsigned big-endian integer
n bits. Values expressed as s(n) represent signed big-endian integer using n bits. Values expressed as s(n) represent a signed big-endian
using n bits, signed two's complement. Where necessary n is integer using n bits, signed two's complement. Where necessary, n is
expressed as an equation using * (multiplication), / (division), + expressed as an equation using * (multiplication), / (division), +
(addition), or - (subtraction). An inclusive range of the number of (addition), or - (subtraction). An inclusive range of the number of
bits expressed is represented with an ellipsis, such as u(m...n). bits expressed is represented with an ellipsis, such as u(m...n).
While the FLAC format can store digital audio as well as other While the FLAC format can store digital audio as well as other
digital signals, this document uses terminology specific to digital digital signals, this document uses terminology specific to digital
audio. The use of more generic terminology was deemed less clear, so audio. The use of more generic terminology was deemed less clear, so
a reader interested in non-audio use of the FLAC format is expected a reader interested in non-audio use of the FLAC format is expected
to make the translation from audio-specific terms to more generic to make the translation from audio-specific terms to more generic
terminology. terminology.
3. Definitions 3. Definitions
* *Lossless compression*: reducing the amount of computer storage *Lossless compression*: Reducing the amount of computer storage
space needed to store data without needing to remove or space needed to store data without needing to remove or
irreversibly alter any of this data in doing so. In other words, irreversibly alter any of this data in doing so. In other words,
decompressing losslessly compressed information returns exactly decompressing losslessly compressed information returns exactly
the original data. the original data.
* *Lossy compression*: like lossless compression, but instead *Lossy compression*: Like lossless compression, but instead
removing, irreversibly altering, or only approximating information removing, irreversibly altering, or only approximating information
for the purpose of further reducing the amount of computer storage for the purpose of further reducing the amount of computer storage
space needed. In other words, decompressing lossy compressed space needed. In other words, decompressing lossy compressed
information returns an approximation of the original data. information returns an approximation of the original data.
* *Block*: A (short) section of linear pulse-code modulated audio *Block*: A (short) section of linear PCM audio with one or more
with one or more channels. channels.
* *Subblock*: All samples within a corresponding block for one *Subblock*: All samples within a corresponding block for one
channel. One or more subblocks form a block, and all subblocks in channel. One or more subblocks form a block, and all subblocks in
a certain block contain the same number of samples. a certain block contain the same number of samples.
* *Frame*: A frame header, one or more subframes, and a frame *Frame*: A frame header, one or more subframes, and a frame footer.
footer. It encodes the contents of a corresponding block. It encodes the contents of a corresponding block.
* *Subframe*: An encoded subblock. All subframes within a frame *Subframe*: An encoded subblock. All subframes within a frame code
code for the same number of samples. When interchannel for the same number of samples. When interchannel decorrelation
decorrelation is used, a subframe can correspond to either the is used, a subframe can correspond to either the (per-sample)
(per-sample) average of two subblocks or the (per-sample) average of two subblocks or the (per-sample) difference between
difference between two subblocks, instead of to a subblock two subblocks, instead of to a subblock directly; see Section 4.2.
directly, see Section 4.2.
* *Interchannel samples*: A sample count that applies to all *Interchannel samples*: A sample count that applies to all channels.
channels. For example, one second of 44.1 kHz audio has 44100 For example, one second of 44.1 kHz audio has 44100 interchannel
interchannel samples, meaning each channel has that number of samples, meaning each channel has that number of samples.
* *Block size*: The number of interchannel samples contained in a *Block size*: The number of interchannel samples contained in a
block or coded in a frame. block or coded in a frame.
* *Bit depth* or *bits per sample*: the number of bits used to *Bit depth* or *bits per sample*: The number of bits used to contain
contain each sample. This MUST be the same for all subblocks in a each sample. This MUST be the same for all subblocks in a block
block but MAY be different for different subframes in a frame but MAY be different for different subframes in a frame because of
because of interchannel decorrelation. (See Section 4.2 for interchannel decorrelation. (See Section 4.2 for details on
details on interchannel decorrelation) interchannel decorrelation.)
* *Predictor*: a model used to predict samples in an audio signal *Predictor*: A model used to predict samples in an audio signal
based on past samples. FLAC uses such predictors to remove based on past samples. FLAC uses such predictors to remove
redundancy in a signal in order to be able to compress it. redundancy in a signal in order to be able to compress it.
* *Linear predictor*: a predictor using linear prediction (see *Linear predictor*: A predictor using linear prediction (see
[LinearPrediction]). This is also called *linear predictive [LinearPrediction]). This is also called *linear predictive
coding (LPC)*. With a linear predictor, each prediction is a coding (LPC)*. With a linear predictor, each prediction is a
linear combination of past samples, hence the name. A linear linear combination of past samples (hence the name). A linear
predictor has a causal discrete-time finite impulse response (see predictor has a causal discrete-time finite impulse response (see
[FIR]). [FIR]).
* *Muxing*: short for multiplexing, combining several streams or *Muxing*: Short for multiplexing. Combining several streams or
files into a single stream or file. In the context of this files into a single stream or file. In the context of this
document, muxing more specifically refers to embedding a FLAC document, muxing specifically refers to embedding a FLAC stream in
stream in a container as described in Section 10. a container as described in Section 10.
* *Fixed predictor*: a linear predictor in which the model *Fixed predictor*: A linear predictor in which the model parameters
parameters are the same across all FLAC files, and thus do not are the same across all FLAC files and thus do not need to be
need to be stored. stored.
* *Predictor order*: the number of past samples that a predictor *Predictor order*: The number of past samples that a predictor uses.
uses. For example, a 4th order predictor uses the 4 samples For example, a 4th order predictor uses the 4 samples directly
directly preceding a certain sample to predict it. In FLAC, preceding a certain sample to predict it. In FLAC, samples used
samples used in a predictor are always consecutive, and are always in a predictor are always consecutive and are always the samples
the samples directly before the sample that is being predicted. directly before the sample that is being predicted.
* *Residual*: The audio signal that remains after a predictor has *Residual*: The audio signal that remains after a predictor has been
been subtracted from a subblock. If the predictor has been able subtracted from a subblock. If the predictor has been able to
to remove redundancy from the signal, the samples of the remaining remove redundancy from the signal, the samples of the remaining
signal (the *residual samples*) will have, on average, a smaller signal (the *residual samples*) will have, on average, a smaller
numerical value than the original signal. numerical value than the original signal.
* *Rice code*: A variable-length code (see [VarLengthCode]) that *Rice code*: A variable-length code (see [VarLengthCode]) that
compresses data by making use of the observation that, after using compresses data by making use of the observation that, after using
an effective predictor, most residual samples are closer to zero an effective predictor, most residual samples are closer to zero
than the original samples, while still allowing for a small part than the original samples, while still allowing for a small part
of the samples to be much larger. of the samples to be much larger.
4. Conceptual overview 4. Conceptual Overview
Similar to many other audio coders, a FLAC file is encoded following Similar to many other audio coders, a FLAC file is encoded following
the steps below. On decoding a FLAC file, these steps are undone in the steps below. To decode a FLAC file, these steps are performed in
reverse order, i.e., from bottom to top. reverse order, i.e., from bottom to top.
* *Blocking* (see Section 4.1). The input is split up into many 1. *Blocking* (see Section 4.1). The input is split up into many
contiguous blocks. contiguous blocks.
* *Interchannel Decorrelation* (see Section 4.2). In the case of 2. *Interchannel Decorrelation* (see Section 4.2). In the case of
stereo streams, the FLAC format allows for transforming the left- stereo streams, the FLAC format allows for transforming the left-
right signal into a mid-side signal, a left-side signal or a side- right signal into a mid-side signal, a left-side signal, or a
right signal to remove redundancy between channels. Choosing side-right signal to remove redundancy between channels.
between any of these transformations is done independently for Choosing between any of these transformations is done
each block. independently for each block.
* *Prediction* (see Section 4.3). To remove redundancy in a signal, 3. *Prediction* (see Section 4.3). To remove redundancy in a
a predictor is stored for each subblock or its transformation as signal, a predictor is stored for each subblock or its
formed in the previous step. A predictor consists of a simple transformation as formed in the previous step. A predictor
mathematical description that can be used, as the name implies, to consists of a simple mathematical description that can be used,
predict a certain sample from the samples that preceded it. As as the name implies, to predict a certain sample from the samples
this prediction is rarely exact, the error of this prediction is that preceded it. As this prediction is rarely exact, the error
passed on to the next stage. The predictor of each subblock is of this prediction is passed on to the next stage. The predictor
completely independent from other subblocks. Since the methods of of each subblock is completely independent from other subblocks.
prediction are known to both the encoder and decoder, only the Since the methods of prediction are known to both the encoder and
parameters of the predictor need to be included in the compressed decoder, only the parameters of the predictor need to be included
stream. If no usable predictor can be found for a certain in the compressed stream. If no usable predictor can be found
subblock, the signal is stored uncompressed and the next stage is for a certain subblock, the signal is stored uncompressed, and
skipped. the next stage is skipped.
* *Residual Coding* (see Section 4.4). As the predictor does not 4. *Residual Coding* (see Section 4.4). As the predictor does not
describe the signal exactly, the difference between the original describe the signal exactly, the difference between the original
signal and the predicted signal (called the error or residual signal and the predicted signal (called the error or residual
signal) is coded losslessly. If the predictor is effective, the signal) is coded losslessly. If the predictor is effective, the
residual signal will require fewer bits per sample than the residual signal will require fewer bits per sample than the
original signal. FLAC uses Rice coding, a subset of Golomb original signal. FLAC uses Rice coding, a subset of Golomb
coding, with either 4-bit or 5-bit parameters to code the residual coding, with either 4-bit or 5-bit parameters to code the
signal. residual signal.
In addition, FLAC specifies a metadata system (see Section 8), which In addition, FLAC specifies a metadata system (see Section 8) that
allows arbitrary information about the stream to be included at the allows arbitrary information about the stream to be included at the
beginning of the stream. beginning of the stream.
4.1. Blocking 4.1. Blocking
The block size used for audio data has a direct effect on the The block size used for audio data has a direct effect on the
compression ratio. If the block size is too small, the resulting compression ratio. If the block size is too small, the resulting
large number of frames means that a disproportionate amount of bytes large number of frames means that a disproportionate number of bytes
will be spent on frame headers. If the block size is too large, the will be spent on frame headers. If the block size is too large, the
characteristics of the signal may vary so much that the encoder will characteristics of the signal may vary so much that the encoder will
be unable to find a good predictor. In order to simplify encoder/ be unable to find a good predictor. In order to simplify encoder/
decoder design, FLAC imposes a minimum block size of 16 samples, decoder design, FLAC imposes a minimum block size of 16 samples,
except for the last block, and a maximum block size of 65535 samples. except for the last block, and a maximum block size of 65535 samples.
The last block is allowed to be smaller than 16 samples to be able to The last block is allowed to be smaller than 16 samples to be able to
match the length of the encoded audio without using padding. match the length of the encoded audio without using padding.
While the block size does not have to be constant in a FLAC file, it While the block size does not have to be constant in a FLAC file, it
is often difficult to find the optimal arrangement of block sizes for is often difficult to find the optimal arrangement of block sizes for
maximum compression. Because of this, the FLAC format explicitly maximum compression. Because of this, the FLAC format explicitly
stores whether a file has a constant or a variable block size stores whether a file has a constant or a variable block size
throughout the stream, and stores a block number instead of a sample throughout the stream and stores a block number instead of a sample
number to slightly improve compression if a stream has a constant number to slightly improve compression if a stream has a constant
block size. block size.
4.2. Interchannel Decorrelation 4.2. Interchannel Decorrelation
In many audio files, channels are correlated. The FLAC format can Channels are correlated in many audio files. The FLAC format can
exploit this correlation in stereo files by not directly coding exploit this correlation in stereo files by coding an average of all
subblocks into subframes, but instead coding an average of all
samples in both subblocks (a mid channel) or the difference between samples in both subblocks (a mid channel) or the difference between
all samples in both subblocks (a side channel). The following all samples in both subblocks (a side channel) instead of directly
combinations are possible: coding subblocks into subframes. The following combinations are
* *Independent*. All channels are coded independently. All non- * *Independent*. All channels are coded independently. All non-
stereo files MUST be encoded this way. stereo files MUST be encoded this way.
* *Mid-side*. A left and right subblock are converted to mid and * *Mid-side*. A left and right subblock are converted to mid and
side subframes. To calculate a sample for a mid subframe, the side subframes. To calculate a sample for a mid subframe, the
corresponding left and right samples are summed and the result is corresponding left and right samples are summed, and the result is
shifted right by 1 bit. To calculate a sample for a side shifted right by 1 bit. To calculate a sample for a side
subframe, the corresponding right sample is subtracted from the subframe, the corresponding right sample is subtracted from the
corresponding left sample. On decoding, all mid channel samples corresponding left sample. On decoding, all mid channel samples
have to be shifted left by 1 bit. Also, if a side channel sample have to be shifted left by 1 bit. Also, if a side channel sample
is odd, 1 has to be added to the corresponding mid channel sample is odd, 1 has to be added to the corresponding mid channel sample
after it has been shifted left by one bit. To reconstruct the after it has been shifted left by 1 bit. To reconstruct the left
left channel, the corresponding samples in the mid and side channel, the corresponding samples in the mid and side subframes
subframes are added and the result shifted right by 1 bit, while are added and the result shifted right by 1 bit. For the right
for the right channel the side channel has to be subtracted from channel, the side channel has to be subtracted from the mid
the mid channel and the result shifted right by 1 bit. channel and the result shifted right by 1 bit.
* *Left-side*. The left subblock is coded and the left and right * *Left-side*. The left subblock is coded, and the left and right
subblocks are used to code a side subframe. The side subframe is subblocks are used to code a side subframe. The side subframe is
constructed in the same way as for mid-side. To decode, the right constructed in the same way as for mid-side. To decode, the right
subblock is restored by subtracting the samples in the side subblock is restored by subtracting the samples in the side
subframe from the corresponding samples in the the left subframe. subframe from the corresponding samples in the left subframe.
* *Side-right*. The left and right subblocks are used to code a side * *Side-right*. The left and right subblocks are used to code a side
subframe and the right subblock is coded. The side subframe is subframe, and the right subblock is coded. The side subframe is
constructed in the same way as for mid-side. To decode, the left constructed in the same way as for mid-side. To decode, the left
subblock is restored by adding the samples in the side subframe to subblock is restored by adding the samples in the side subframe to
the corresponding samples in the right subframe. the corresponding samples in the right subframe.
The side channel needs one extra bit of bit depth as the subtraction The side channel needs one extra bit of bit depth, as the subtraction
can produce sample values twice as large as the maximum possible in can produce sample values twice as large as the maximum possible in
any given bit depth. The mid channel in mid-side stereo does not any given bit depth. The mid channel in mid-side stereo does not
need one extra bit, as it is shifted right one bit. The right shift need one extra bit, as it is shifted right 1 bit. The right shift of
of the mid channel does not lead to lossy behavior, because an odd the mid channel does not lead to lossy behavior because an odd sample
sample in the mid subframe must always be accompanied by a in the mid subframe must always be accompanied by a corresponding odd
corresponding odd sample in the side subframe, which means the lost sample in the side subframe, which means the lost least-significant
least-significant bit can be restored by taking it from the sample in bit can be restored by taking it from the sample in the side
the side subframe. subframe.
4.3. Prediction 4.3. Prediction
The FLAC format has four methods for modeling the input signal: The FLAC format has four methods for modeling the input signal:
1. *Verbatim*. Samples are stored directly, without any modeling. 1. *Verbatim*. Samples are stored directly, without any modeling.
This method is used for inputs with little correlation, like This method is used for inputs with little correlation, such as
white noise. Since the raw signal is not actually passed through white noise. Since the raw signal is not actually passed through
the residual coding stage (it is added to the stream 'verbatim'), the residual coding stage (it is added to the stream "verbatim"),
this method is different from using a zero-order fixed predictor. this method is different from using a zero-order fixed predictor.
2. *Constant*. A single sample value is stored. This method is used 2. *Constant*. A single sample value is stored. This method is used
whenever a signal is pure DC ("digital silence"), i.e., a whenever a signal is pure DC ("digital silence"), i.e., a
constant value throughout. constant value throughout.
3. *Fixed predictor*. Samples are predicted with one of five fixed 3. *Fixed predictor*. Samples are predicted with one of five fixed
(i.e., predefined) predictors, and the error of this prediction (i.e., predefined) predictors, and the error of this prediction
is processed by the residual coder. These fixed predictors are is processed by the residual coder. These fixed predictors are
well suited for predicting simple waveforms. Since the well suited for predicting simple waveforms. Since the
skipping to change at page 10, line 18 skipping to change at line 431
predictor, using a generic linear predictor adds overhead as predictor, using a generic linear predictor adds overhead as
predictor coefficients need to be stored. Therefore, this method predictor coefficients need to be stored. Therefore, this method
of prediction is best suited for predicting more complex of prediction is best suited for predicting more complex
waveforms, where the added overhead is offset by space savings in waveforms, where the added overhead is offset by space savings in
the residual coding stage resulting from more accurate the residual coding stage resulting from more accurate
prediction. A linear predictor in FLAC has two parameters prediction. A linear predictor in FLAC has two parameters
besides the predictor coefficients and the predictor order: the besides the predictor coefficients and the predictor order: the
number of bits with which each coefficient is stored (the number of bits with which each coefficient is stored (the
coefficient precision) and a prediction right shift. A coefficient precision) and a prediction right shift. A
prediction is formed by taking the sum of multiplying each prediction is formed by taking the sum of multiplying each
predictor coefficient with the corresponding past sample, and predictor coefficient with the corresponding past sample and
dividing that sum by applying the specified right shift. For dividing that sum by applying the specified right shift. For
more information, see Section 9.2.6. more information, see Section 9.2.6.
A FLAC encoder is free to select any of the above methods to model A FLAC encoder is free to select any of the above methods to model
the input. However, to ensure lossless coding, the following the input. However, to ensure lossless coding, the following
exceptions apply: exceptions apply:
* When the samples that need to be stored do not all have the same * When the samples that need to be stored do not all have the same
value (i.e., the signal is not constant), a constant subframe value (i.e., the signal is not constant), a constant subframe
cannot be used. cannot be used.
skipping to change at page 10, line 29 skipping to change at line 442
dividing that sum by applying the specified right shift. For dividing that sum by applying the specified right shift. For
more information, see Section 9.2.6. more information, see Section 9.2.6.
A FLAC encoder is free to select any of the above methods to model A FLAC encoder is free to select any of the above methods to model
the input. However, to ensure lossless coding, the following the input. However, to ensure lossless coding, the following
exceptions apply: exceptions apply:
* When the samples that need to be stored do not all have the same * When the samples that need to be stored do not all have the same
value (i.e., the signal is not constant), a constant subframe value (i.e., the signal is not constant), a constant subframe
cannot be used. cannot be used.
* When an encoder is unable to find a fixed or linear predictor for * When an encoder is unable to find a fixed or linear predictor for
which all residual samples are representable in 32-bit signed which all residual samples are representable in 32-bit signed
integers as stated in Section 9.2.7, a verbatim subframe is used. integers as stated in Section 9.2.7, a verbatim subframe is used.
For more information on fixed and linear predictors, see For more information on fixed and linear predictors, see
[HPL-1999-144] and [robinson-tr156]. [Lossless-Compression] and [Robinson-TR156].
4.4. Residual Coding 4.4. Residual Coding
If a subframe uses a predictor to approximate the audio signal, a If a subframe uses a predictor to approximate the audio signal, a
residual is stored to 'correct' the approximation to the exact value. residual is stored to "correct" the approximation to the exact value.
When an effective predictor is used, the average numerical value of When an effective predictor is used, the average numerical value of
the residual samples is smaller than that of the samples before the residual samples is smaller than that of the samples before
prediction. While having smaller values on average, it is possible prediction. While having smaller values on average, it is possible
that a few 'outlier' residual samples are much larger than any of the that a few "outlier" residual samples are much larger than any of the
original samples. Sometimes these outliers even exceed the range the original samples. Sometimes these outliers even exceed the range
bit depth of the original audio offers. that the bit depth of the original audio offers.
To be able to efficiently code such a stream of relatively small To efficiently code such a stream of relatively small numbers with an
numbers with an occasional outlier, Rice coding (a subset of Golomb occasional outlier, Rice coding (a subset of Golomb coding) is used.
coding) is used. Depending on how small the numbers are that have to Depending on how small the numbers are that have to be coded, a Rice
be coded, a Rice parameter is chosen. The numerical value of each parameter is chosen. The numerical value of each residual sample is
residual sample is split into two parts by dividing it by 2^(Rice split into two parts by dividing it by 2^(Rice parameter), creating a
parameter), creating a quotient and a remainder. The quotient is quotient and a remainder. The quotient is stored in unary form and
stored in unary form, the remainder in binary form. If indeed most the remainder in binary form. If indeed most residual samples are
residual samples are close to zero and a suitable Rice parameter is close to zero and a suitable Rice parameter is chosen, this form of
chosen, this form of coding, with a so-called variable-length code, coding, with a so-called variable-length code, uses fewer bits than
uses fewer bits than the residual in unencoded form. the residual in unencoded form.
As Rice codes can only handle unsigned numbers, signed numbers are As Rice codes can only handle unsigned numbers, signed numbers are
zigzag encoded to a so-called folded residual. See Section 9.2.7 for zigzag encoded to a so-called folded residual. See Section 9.2.7 for
a more thorough explanation. a more thorough explanation.
Quite often, the optimal Rice parameter varies over the course of a Quite often, the optimal Rice parameter varies over the course of a
subframe. To accommodate this, the residual can be split up into subframe. To accommodate this, the residual can be split up into
partitions, where each partition has its own Rice parameter. To keep partitions, where each partition has its own Rice parameter. To keep
overhead and complexity low, the number of partitions used in a overhead and complexity low, the number of partitions used in a
subframe is limited to powers of two. subframe is limited to powers of two.
The FLAC format uses two forms of Rice coding, which only differ in The FLAC format uses two forms of Rice coding, which only differ in
the number of bits used for encoding the Rice parameter, either 4 or the number of bits used for encoding the Rice parameter, either 4 or
5 bits. 5 bits.
5. Format principles 5. Format Principles
FLAC has no format version information, but it does contain reserved FLAC has no format version information, but it does contain reserved
space in several places. Future versions of the format MAY use this space in several places. Future versions of the format MAY use this
reserved space safely without breaking the format of older streams. reserved space safely without breaking the format of older streams.
Older decoders MAY choose to abort decoding when encountering data Older decoders MAY choose to abort decoding when encountering data
encoded using methods they do not recognize. Apart from reserved that is encoded using methods they do not recognize. Apart from
patterns, the format specifies forbidden patterns in certain places, reserved patterns, the format specifies forbidden patterns in certain
meaning that the patterns MUST NOT appear in any bitstream. They are places, meaning that the patterns MUST NOT appear in any bitstream.
listed in the following table. They are listed in the following table.
+=========================================+=============+ +=========================================+=============+
| Description | Reference | | Description | Reference |
+=========================================+=============+ +=========================================+=============+
| Metadata block type 127 | Section 8.1 | | Metadata block type 127 | Section 8.1 |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| Minimum and maximum block sizes smaller | Section 8.2 | | Minimum and maximum block sizes smaller | Section 8.2 |
| than 16 in streaminfo metadata block | | | than 16 in streaminfo metadata block | |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| Sample rate bits 0b1111 | Section | | Sample rate bits 0b1111 | Section |
| | 9.1.2 | | | 9.1.2 |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| Uncommon blocksize 65536 | Section | | Uncommon block size 65536 | Section |
| | 9.1.6 | | | 9.1.6 |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| Predictor coefficient precision bits | Section | | Predictor coefficient precision bits | Section |
| 0b1111 | 9.2.6 | | 0b1111 | 9.2.6 |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
| Negative predictor right shift | Section | | Negative predictor right shift | Section |
| | 9.2.6 | | | 9.2.6 |
+-----------------------------------------+-------------+ +-----------------------------------------+-------------+
Table 1 Table 1
All numbers used in a FLAC bitstream are integers, there are no All numbers used in a FLAC bitstream are integers; there are no
floating-point representations. All numbers are big-endian coded, floating-point representations. All numbers are big-endian coded,
except the field lengths used in Vorbis comments (see Section 8.6), except the field lengths used in Vorbis comments (see Section 8.6),
which are little-endian coded. This exception for Vorbis comments is which are little-endian coded. This exception for Vorbis comments is
to keep as much commonality as possible with Vorbis comments as used to keep as much commonality as possible with Vorbis comments as used
by the Vorbis codec (see [Vorbis]). All numbers are unsigned except by the Vorbis codec (see [Vorbis]). All numbers are unsigned except
linear predictor coefficients, the linear prediction shift (see linear predictor coefficients, the linear prediction shift (see
Section 9.2.6), and numbers that directly represent samples, which Section 9.2.6), and numbers that directly represent samples, which
are signed. None of these restrictions apply to application metadata are signed. None of these restrictions apply to application metadata
blocks or to Vorbis comment field contents. blocks or to Vorbis comment field contents.
All samples encoded to and decoded from the FLAC format MUST be in a All samples encoded to and decoded from the FLAC format MUST be in a
signed representation. signed representation.
There are several ways to convert unsigned sample representations to There are several ways to convert unsigned sample representations to
signed sample representations, but the coding methods provided by the signed sample representations, but the coding methods provided by the
FLAC format work best on audio signals of which the numerical values FLAC format work best on audio signals of which the numerical values
of the samples are centered around zero, i.e., have no DC offset. In of the samples are centered around zero, i.e., have no DC offset. In
most unsigned audio formats, signals are centered around halfway the most unsigned audio formats, signals are centered around halfway
range of the unsigned integer type used. If that is the case, within the range of the unsigned integer type used. If that is the
converting sample representations by first copying the number to a case, converting sample representations by first copying the number
signed integer with sufficient range and then subtracting half of the to a signed integer with a sufficient range and then subtracting half
range of the unsigned integer type, results in a signal with samples of the range of the unsigned integer type results in a signal with
centered around 0. samples centered around 0.
Unary coding in a FLAC bitstream is done with zero bits terminated Unary coding in a FLAC bitstream is done with zero bits terminated
with a one bit, e.g., the number 5 is coded unary as 0b000001. This with a one bit, e.g., the number 5 is coded unary as 0b000001. This
prevents the frame sync code from appearing in unary coded numbers. prevents the frame sync code from appearing in unary-coded numbers.
When a FLAC file contains data that is forbidden or otherwise not When a FLAC file contains data that is forbidden or otherwise not
valid, decoder behavior is left unspecified. A decoder MAY choose to valid, decoder behavior is left unspecified. A decoder MAY choose to
stop decoding upon encountering such data. Examples of such data are stop decoding upon encountering such data. Examples of such data
include the following:
* One or more decoded sample values exceed the range offered by the * One or more decoded sample values exceed the range offered by the
bit depth as coded for that frame. E.g., in a frame with a bit bit depth as coded for that frame. For example, in a frame with a
depth of 8 bits, any samples not in the inclusive range from -128 bit depth of 8 bits, any samples not in the inclusive range from
to 127 are not valid. -128 to 127 are not valid.
* The number of wasted bits (see Section 9.2.2) used by a subframe * The number of wasted bits (see Section 9.2.2) used by a subframe
is such that the bit depth of that subframe (see Section 9.2.3 for is such that the bit depth of that subframe (see Section 9.2.3 for
a description of subframe bit depth) equals zero or is negative. a description of subframe bit depth) equals zero or is negative.
* A frame header CRC (see Section 9.1.8) or frame footer CRC (see
Section 9.3) does not validate.
* One of the forbidden bit patterns described in Table 1 above is
6. Format layout overview * A frame header Cyclic Redundancy Check (CRC) (see Section 9.1.8)
or frame footer CRC (see Section 9.3) does not validate.
* One of the forbidden bit patterns described in Table 1 is used.
6. Format Layout Overview
A FLAC bitstream consists of the fLaC (i.e., 0x664C6143) marker at A FLAC bitstream consists of the fLaC (i.e., 0x664C6143) marker at
the beginning of the stream, followed by a mandatory metadata block the beginning of the stream, followed by a mandatory metadata block
(called the STREAMINFO block), any number of other metadata blocks, (called the STREAMINFO block), any number of other metadata blocks,
and then the audio frames. and then the audio frames.
FLAC supports 127 kinds of metadata blocks; currently, 7 kinds are FLAC supports 127 kinds of metadata blocks; currently, 7 kinds are
defined in Section 8. defined in Section 8.
The audio data is composed of one or more audio frames. Each frame The audio data is composed of one or more audio frames. Each frame
consists of a frame header, which contains a sync code, information consists of a frame header that contains a sync code, information
about the frame (like the block size, sample rate and number of about the frame (like the block size, sample rate, and number of
channels), and an 8-bit CRC. The frame header also contains either channels), and an 8-bit CRC. The frame header also contains either
the sample number of the first sample in the frame (for variable the sample number of the first sample in the frame (for variable
block size streams), or the frame number (for fixed block size block size streams) or the frame number (for fixed block size
streams). This allows for fast, sample-accurate seeking to be streams). This allows for fast, sample-accurate seeking to be
performed. Following the frame header are encoded subframes, one for performed. Following the frame header are encoded subframes, one for
each channel. The frame is then zero-padded to a byte boundary and each channel. The frame is then zero-padded to a byte boundary and
finished with a frame footer containing a checksum for the frame. finished with a frame footer containing a checksum for the frame.
Each subframe has its own header that specifies how the subframe is Each subframe has its own header that specifies how the subframe is
encoded. encoded.
In order to allow a decoder to start decoding at any place in the In order to allow a decoder to start decoding at any place in the
stream, each frame starts with a byte-aligned 15-bit sync code. stream, each frame starts with a byte-aligned 15-bit sync code.
However, since it is not guaranteed that the sync code does not However, since it is not guaranteed that the sync code does not
skipping to change at page 14, line 24 skipping to change at line 610
frame header contains some basic information about the stream. This frame header contains some basic information about the stream. This
information includes sample rate, bits per sample, number of information includes sample rate, bits per sample, number of
channels, etc. Since the frame header is overhead, it has a direct channels, etc. Since the frame header is overhead, it has a direct
effect on the compression ratio. To keep the frame header as small effect on the compression ratio. To keep the frame header as small
as possible, FLAC uses lookup tables for the most commonly used as possible, FLAC uses lookup tables for the most commonly used
values for frame properties. When a certain property has a value values for frame properties. When a certain property has a value
that is not covered by the lookup table, the decoder is directed to that is not covered by the lookup table, the decoder is directed to
find the value of that property (for example, the sample rate) at the find the value of that property (for example, the sample rate) at the
end of the frame header or in the streaminfo metadata block. If a end of the frame header or in the streaminfo metadata block. If a
frame header refers to the streaminfo metadata block, the file is not frame header refers to the streaminfo metadata block, the file is not
'streamable', see Section 7 for details. By using lookup tables, the "streamable"; see Section 7 for details. By using lookup tables, the
file is streamable and the frame header size small for the most file is streamable and the frame header size is small for the most
common forms of audio data. common forms of audio data.
Individual subframes (one for each channel) are coded separately Individual subframes (one for each channel) are coded separately
within a frame, and appear serially in the stream. In other words, within a frame and appear serially in the stream. In other words,
the encoded audio data is NOT channel-interleaved. This reduces the encoded audio data is NOT channel-interleaved. This reduces
decoder complexity at the cost of requiring larger decode buffers. decoder complexity at the cost of requiring larger decode buffers.
Each subframe has its own header specifying the attributes of the Each subframe has its own header specifying the attributes of the
subframe, like prediction method and order, residual coding subframe, like prediction method and order, residual coding
parameters, etc. Each subframe header is followed by the encoded parameters, etc. Each subframe header is followed by the encoded
audio data for that channel. audio data for that channel.
7. Streamable subset 7. Streamable Subset
The FLAC format specifies a subset of itself as the FLAC streamable The FLAC format specifies a subset of itself as the FLAC streamable
subset. The purpose of this is to ensure that any streams encoded subset. The purpose of this is to ensure that any streams encoded
according to this subset are truly "streamable", meaning that a according to this subset are truly "streamable", meaning that a
decoder that cannot seek within the stream can still pick up in the decoder that cannot seek within the stream can still pick up in the
middle of the stream and start decoding. It also makes hardware middle of the stream and start decoding. It also makes hardware
decoder implementations more practical by limiting the encoding decoder implementations more practical by limiting the encoding
parameters in such a way that decoder buffer sizes and other resource parameters in such a way that decoder buffer sizes and other resource
requirements can be easily determined. The streamable subset makes requirements can be easily determined. The streamable subset makes
the following limitations on what MAY be used in the stream: the following limitations on what MAY be used in the stream:
skipping to change at page 15, line 8 skipping to change at line 642
requirements can be easily determined. The streamable subset makes requirements can be easily determined. The streamable subset makes
the following limitations on what MAY be used in the stream: the following limitations on what MAY be used in the stream:
* The sample rate bits (see Section 9.1.2) in the frame header MUST * The sample rate bits (see Section 9.1.2) in the frame header MUST
be 0b0001-0b1110, i.e., the frame header MUST NOT refer to the be 0b0001-0b1110, i.e., the frame header MUST NOT refer to the
streaminfo metadata block to describe the sample rate. streaminfo metadata block to describe the sample rate.
* The bit depth bits (see Section 9.1.4) in the frame header MUST be * The bit depth bits (see Section 9.1.4) in the frame header MUST be
0b001-0b111, i.e., the frame header MUST NOT refer to the 0b001-0b111, i.e., the frame header MUST NOT refer to the
streaminfo metadata block to describe the bit depth. streaminfo metadata block to describe the bit depth.
* The stream MUST NOT contain blocks with more than 16384 * The stream MUST NOT contain blocks with more than 16384
interchannel samples, i.e., the maximum block size must not be interchannel samples, i.e., the maximum block size must not be
larger than 16384. larger than 16384.
* Audio with a sample rate less than or equal to 48000 Hz MUST NOT * Audio with a sample rate less than or equal to 48000 Hz MUST NOT
be contained in blocks with more than 4608 interchannel samples, be contained in blocks with more than 4608 interchannel samples,
i.e., the maximum block size used for this audio must not be i.e., the maximum block size used for this audio must not be
larger than 4608. larger than 4608.
* Linear prediction subframes (see Section 9.2.6) containing audio * Linear prediction subframes (see Section 9.2.6) containing audio
with a sample rate less than or equal to 48000 Hz MUST have a with a sample rate less than or equal to 48000 Hz MUST have a
predictor order less than or equal to 12, i.e., the subframe type predictor order less than or equal to 12, i.e., the subframe type
bits in the subframe header (see Section 9.2.1) MUST NOT be bits in the subframe header (see Section 9.2.1) MUST NOT be
0b101100-0b111111. 0b101100-0b111111.
* The Rice partition order (see Section 9.2.7) MUST be less than or * The Rice partition order (see Section 9.2.7) MUST be less than or
equal to 8. equal to 8.
* The channel ordering MUST be equal to one defined in * The channel ordering MUST be equal to one defined in
Section 9.1.3, i.e., the FLAC file MUST NOT need a Section 9.1.3, i.e., the FLAC file MUST NOT need a
ordering. See Section 8.6.2 for details. ordering. See Section 8.6.2 for details.
8. File-level metadata 8. File-Level Metadata
At the start of a FLAC file or stream, following the fLaC ASCII file At the start of a FLAC file or stream, following the fLaC ASCII file
signature, one or more metadata blocks MUST be present before any signature, one or more metadata blocks MUST be present before any
audio frames appear. The first metadata block MUST be a streaminfo audio frames appear. The first metadata block MUST be a streaminfo
block. block.
8.1. Metadata block header 8.1. Metadata Block Header
Each metadata block starts with a 4 byte header. The first bit in Each metadata block starts with a 4-byte header. The first bit in
this header flags whether a metadata block is the last one: it is a 0 this header flags whether a metadata block is the last one. It is 0
when other metadata blocks follow, otherwise it is a 1. The 7 when other metadata blocks follow; otherwise, it is 1. The 7
remaining bits of the first header byte contain the type of the remaining bits of the first header byte contain the type of the
metadata block as an unsigned number between 0 and 126 according to metadata block as an unsigned number between 0 and 126, according to
the following table. A value of 127 (i.e., 0b1111111) is forbidden. the following table. A value of 127 (i.e., 0b1111111) is forbidden.
The three bytes that follow code for the size of the metadata block The three bytes that follow code for the size of the metadata block
in bytes, excluding the 4 header bytes, as an unsigned number coded in bytes, excluding the 4 header bytes, as an unsigned number coded
big-endian. big-endian.
+=========+======================================================+ +=========+=======================================================+
| Value | Metadata block type | | Value | Metadata Block Type |
+=========+======================================================+ +=========+=======================================================+
| 0 | Streaminfo | | 0 | Streaminfo |
+---------+------------------------------------------------------+ +---------+-------------------------------------------------------+
| 1 | Padding | | 1 | Padding |
+---------+------------------------------------------------------+ +---------+-------------------------------------------------------+
| 2 | Application | | 2 | Application |
+---------+------------------------------------------------------+ +---------+-------------------------------------------------------+
| 3 | Seektable | | 3 | Seektable |
+---------+------------------------------------------------------+ +---------+-------------------------------------------------------+
| 4 | Vorbis comment | | 4 | Vorbis comment |
+---------+------------------------------------------------------+ +---------+-------------------------------------------------------+
| 5 | Cuesheet | | 5 | Cuesheet |
+---------+------------------------------------------------------+ +---------+-------------------------------------------------------+
| 6 | Picture | | 6 | Picture |
+---------+------------------------------------------------------+ +---------+-------------------------------------------------------+
| 7 - 126 | reserved | | 7 - 126 | Reserved |
+---------+------------------------------------------------------+ +---------+-------------------------------------------------------+
| 127 | forbidden, to avoid confusion with a frame sync code | | 127 | Forbidden (to avoid confusion with a frame sync code) |
+---------+------------------------------------------------------+ +---------+-------------------------------------------------------+
Table 2 Table 2
8.2. Streaminfo 8.2. Streaminfo
The streaminfo metadata block has information about the whole stream, The streaminfo metadata block has information about the whole stream,
like sample rate, number of channels, total number of samples, etc. such as sample rate, number of channels, total number of samples,
It MUST be present as the first metadata block in the stream. Other etc. It MUST be present as the first metadata block in the stream.
metadata blocks MAY follow. There MUST be no more than one Other metadata blocks MAY follow. There MUST be no more than one
streaminfo metadata block per FLAC stream. streaminfo metadata block per FLAC stream.
If the streaminfo metadata block contains incorrect or incomplete If the streaminfo metadata block contains incorrect or incomplete
information, decoder behavior is left unspecified (i.e., up to the information, decoder behavior is left unspecified (i.e., it is up to
decoder implementation). A decoder MAY choose to stop further the decoder implementation). A decoder MAY choose to stop further
decoding when the information supplied by the streaminfo metadata decoding when the information supplied by the streaminfo metadata
block turns out to be incorrect or contains forbidden values. A block turns out to be incorrect or contains forbidden values. A
decoder accepting information from the streaminfo block (most- decoder accepting information from the streaminfo block (most
significantly the maximum frame size, maximum block size, number of significantly, the maximum frame size, maximum block size, number of
audio channels, number of bits per sample, and total number of audio channels, number of bits per sample, and total number of
samples) without doing further checks during decoding of audio frames samples) without doing further checks during decoding of audio frames
could be vulnerable to buffer overflows. See also Section 12. could be vulnerable to buffer overflows. See also Section 11.
The following table describes the streaminfo metadata block, The following table describes the streaminfo metadata block,
excluding the metadata block header. excluding the metadata block header.
+========+=================================================+ +========+=================================================+
| Data | Description | | Data | Description |
+========+=================================================+ +========+=================================================+
| u(16) | The minimum block size (in samples) used in the | | u(16) | The minimum block size (in samples) used in the |
| | stream, excluding the last block. | | | stream, excluding the last block. |
+--------+-------------------------------------------------+ +--------+-------------------------------------------------+
skipping to change at page 17, line 31 skipping to change at line 757
+--------+-------------------------------------------------+ +--------+-------------------------------------------------+
| u(20) | Sample rate in Hz. | | u(20) | Sample rate in Hz. |
+--------+-------------------------------------------------+ +--------+-------------------------------------------------+
| u(3) | (number of channels)-1. FLAC supports from 1 | | u(3) | (number of channels)-1. FLAC supports from 1 |
| | to 8 channels. | | | to 8 channels. |
+--------+-------------------------------------------------+ +--------+-------------------------------------------------+
| u(5) | (bits per sample)-1. FLAC supports from 4 to | | u(5) | (bits per sample)-1. FLAC supports from 4 to |
| | 32 bits per sample. | | | 32 bits per sample. |
+--------+-------------------------------------------------+ +--------+-------------------------------------------------+
| u(36) | Total number of interchannel samples in the | | u(36) | Total number of interchannel samples in the |
| | stream. A value of zero here means the number | | | stream. A value of 0 here means the number of |
| | of total samples is unknown. | | | total samples is unknown. |
+--------+-------------------------------------------------+ +--------+-------------------------------------------------+
| u(128) | MD5 checksum of the unencoded audio data. This | | u(128) | MD5 checksum of the unencoded audio data. This |
| | allows the decoder to determine if an error | | | allows the decoder to determine if an error |
| | exists in the audio data even when, despite the | | | exists in the audio data even when, despite the |
| | error, the bitstream itself is valid. A value | | | error, the bitstream itself is valid. A value |
| | of 0 signifies that the value is not known. | | | of 0 signifies that the value is not known. |
+--------+-------------------------------------------------+ +--------+-------------------------------------------------+
Table 3 Table 3
The minimum block size and the maximum block size MUST be in the The minimum block size and the maximum block size MUST be in the
16-65535 range. The minimum block size MUST be equal to or less than 16-65535 range. The minimum block size MUST be equal to or less than
the maximum block size. the maximum block size.
Any frame but the last one MUST have a block size equal to or greater Any frame but the last one MUST have a block size equal to or greater
than the minimum block size and MUST have a block size equal to or than the minimum block size and MUST have a block size equal to or
lesser than the maximum block size. The last frame MUST have a block less than the maximum block size. The last frame MUST have a block
size equal to or lesser than the maximum block size, it does not have size equal to or less than the maximum block size; it does not have
to comply to the minimum block size because the block size of that to comply to the minimum block size because the block size of that
frame must be able to accommodate the length of the audio data the frame must be able to accommodate the length of the audio data the
stream contains. stream contains.
If the minimum block size is equal to the maximum block size, the If the minimum block size is equal to the maximum block size, the
file contains a fixed block size stream, as the minimum block size file contains a fixed block size stream, as the minimum block size
excludes the last block. Note that in the case of a stream with a excludes the last block. Note that in the case of a stream with a
variable block size, the actual maximum block size MAY be smaller variable block size, the actual maximum block size MAY be smaller
than the maximum block size listed in the streaminfo block, and the than the maximum block size listed in the streaminfo block, and the
actual smallest block size excluding the last block MAY be larger actual smallest block size excluding the last block MAY be larger
than the minimum block size listed in the streaminfo block. This is than the minimum block size listed in the streaminfo block. This is
because the encoder has to write these fields before receiving any because the encoder has to write these fields before receiving any
input audio data, and cannot know beforehand what block sizes it will input audio data and cannot know beforehand what block sizes it will
use, only between what bounds these will be chosen. use, only between what bounds the block sizes will be chosen.
The sample rate MUST NOT be 0 when the FLAC file contains audio. A The sample rate MUST NOT be 0 when the FLAC file contains audio. A
sample rate of 0 MAY be used when non-audio is represented. This is sample rate of 0 MAY be used when non-audio is represented. This is
useful if data is encoded that is not along a time axis, or when the useful if data is encoded that is not along a time axis or when the
sample rate of the data lies outside the range that FLAC can sample rate of the data lies outside the range that FLAC can
represent in the streaminfo metadata block. If a sample rate of 0 is represent in the streaminfo metadata block. If a sample rate of 0 is
used it is recommended to store the meaning of the encoded content in used, it is recommended to store the meaning of the encoded content
a Vorbis comment field (see Section 8.6) or an application metadata in a Vorbis comment field (see Section 8.6) or an application
block (see Section 8.4). This document does not define such metadata block (see Section 8.4). This document does not define such
metadata. metadata.
The MD5 checksum is computed by applying the MD5 message-digest The MD5 checksum is computed by applying the MD5 message-digest
algorithm in [RFC1321]. The message to this algorithm consists of algorithm in [RFC1321]. The message to this algorithm consists of
all the samples of all channels interleaved, represented in signed, all the samples of all channels interleaved, represented in signed,
little-endian form. This interleaving is on a per-sample basis, so little-endian form. This interleaving is on a per-sample basis, so
for a stereo file this means first the first sample of the first for a stereo file, this means the first sample of the first channel,
channel, then the first sample of the second channel, then the second then the first sample of the second channel, then the second sample
sample of the first channel etc. Before computing the checksum, all of the first channel, etc. Before computing the checksum, all
samples must be byte-aligned. If the bit depth is not a whole number samples must be byte-aligned. If the bit depth is not a whole number
of bytes, the value of each sample is sign extended to the next whole of bytes, the value of each sample is sign-extended to the next whole
number of bytes. number of bytes.
So, in the case of a 2-channel stream with 6-bit samples, bits will In the case of a 2-channel stream with 6-bit samples, bits will be
be lined up as follows. lined up as follows:
^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
| | | | | Bits of 2nd sample of 1st channel | | | | | Bits of 2nd sample of 1st channel
| | | | Sign extension bits of 2nd sample of 2nd channel | | | | Sign extension bits of 2nd sample of 2nd channel
| | | Bits of 1st sample of 2nd channel | | | Bits of 1st sample of 2nd channel
| | Sign extension bits of 1st sample of 2nd channel | | Sign extension bits of 1st sample of 2nd channel
| Bits of 1st sample of 1st channel | Bits of 1st sample of 1st channel
Sign extention bits of 1st sample of 1st channel Sign extension bits of 1st sample of 1st channel
As another example, in the case of a 1-channel with 12-bit samples, In the case of a 1-channel with 12-bit samples, bits are lined up as
bits are lined up as follows, showing the little-endian byte order follows, showing the little-endian byte order:
^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
| | | | | Most-significant 4 bits of 2nd sample | | | | | Most-significant 4 bits of 2nd sample
| | | | Sign extension bits of 2nd sample | | | | Sign extension bits of 2nd sample
| | | Least-significant 8 bits of 2nd sample | | | Least-significant 8 bits of 2nd sample
| | Most-significant 4 bits of 1st sample | | Most-significant 4 bits of 1st sample
| Sign extension bits of 1st sample | Sign extension bits of 1st sample
Least-significant 8 bits of 1st sample Least-significant 8 bits of 1st sample
skipping to change at page 19, line 40 skipping to change at line 851
after encoding; the user can instruct the encoder to reserve a after encoding; the user can instruct the encoder to reserve a
padding block of sufficient size so that when metadata is added, it padding block of sufficient size so that when metadata is added, it
will simply overwrite the padding (which is relatively quick) instead will simply overwrite the padding (which is relatively quick) instead
of having to insert it into the existing file (which would normally of having to insert it into the existing file (which would normally
require rewriting the entire file). There MAY be one or more padding require rewriting the entire file). There MAY be one or more padding
metadata blocks per FLAC stream. metadata blocks per FLAC stream.
+======+======================================================+ +======+======================================================+
| Data | Description | | Data | Description |
+======+======================================================+ +======+======================================================+
| u(n) | n '0' bits (n MUST be a multiple of 8, i.e., a whole | | u(n) | n "0" bits (n MUST be a multiple of 8, i.e., a whole |
| | number of bytes, and MAY be zero). n is 8 times the | | | number of bytes, and MAY be zero). n is 8 times the |
| | size described in the metadata block header. | | | size described in the metadata block header. |
+------+------------------------------------------------------+ +------+------------------------------------------------------+
Table 4 Table 4
8.4. Application 8.4. Application
The application metadata block is for use by third-party The application metadata block is for use by third-party
applications. The only mandatory field is a 32-bit identifier. An applications. The only mandatory field is a 32-bit identifier.
ID registry is being maintained at https://xiph.org/flac/id.html Application IDs are registered in the IANA "FLAC Application Metadata
(https://xiph.org/flac/id.html). Block IDs" registry (see Section 12.2).
| Data | Description |
| u(32) | Registered application ID. |
| u(n) | Application data (n MUST be a multiple of 8, i.e., |
| | a whole number of bytes) n is 8 times the size |
| | described in the metadata block header, minus the |
| | 32 bits already used for the application ID. |
Table 5 +=======+===================================================+
| Data | Description |
| u(32) | Registered application ID. |
| u(n) | Application data (n MUST be a multiple of 8, |
| | i.e., a whole number of bytes). n is 8 times the |
| | size described in the metadata block header minus |
| | the 32 bits already used for the application ID. |
Application IDs are registered with the IANA, see Section 13.2. Table 5
8.5. Seektable 8.5. Seektable
The seektable metadata block can be used to store seek points. It is The seektable metadata block can be used to store seek points. It is
possible to seek to any given sample in a FLAC stream without a seek possible to seek to any given sample in a FLAC stream without a seek
table, but the delay can be unpredictable since the bitrate may vary table, but the delay can be unpredictable since the bitrate may vary
widely within a stream. By adding seek points to a stream, this widely within a stream. By adding seek points to a stream, this
delay can be significantly reduced. There MUST NOT be more than one delay can be significantly reduced. There MUST NOT be more than one
seektable metadata block in a stream, but the table can have any seektable metadata block in a stream, but the table can have any
number of seek points. number of seek points.
Each seek point takes 18 bytes, so a seek table with 1% resolution Each seek point takes 18 bytes, so a seek table with 1% resolution
within a stream adds less than 2 kilobyte of data. The number of within a stream adds less than 2 kilobytes of data. The number of
seek points is implied by the size described in the metadata block seek points is implied by the size described in the metadata block
header, i.e., equal to size / 18. There is also a special header, i.e., equal to size / 18. There is also a special
'placeholder' seekpoint that will be ignored by decoders but can be "placeholder" seekpoint that will be ignored by decoders but can be
used to reserve space for future seek point insertion. used to reserve space for future seek point insertion.
+============+=============================+ +============+=============================+
| Data | Description | | Data | Description |
+============+=============================+ +============+=============================+
| Seekpoints | Zero or more seek points as | | Seekpoints | Zero or more seek points as |
| | defined in Section 8.5.1. | | | defined in Section 8.5.1. |
+------------+-----------------------------+ +------------+-----------------------------+
Table 6 Table 6
A seektable is generally not usable for seeking in a FLAC file A seektable is generally not usable for seeking in a FLAC file
embedded in a container (see Section 10), as such containers usually embedded in a container (see Section 10), as such containers usually
interleave FLAC data with other data and the offsets used in interleave FLAC data with other data and the offsets used in
seekpoints are those of an unmuxed FLAC stream. Also, containers seekpoints are those of an unmuxed FLAC stream. Also, containers
often provide their own seeking methods. It is, however, possible to often provide their own seeking methods. However, it is possible to
store the seektable in the container along with other metadata when store the seektable in the container along with other metadata when
muxing a FLAC file, so this stored seektable can be restored when muxing a FLAC file, so this stored seektable can be restored when
demuxing the FLAC stream into a standalone FLAC file. demuxing the FLAC stream into a standalone FLAC file.
8.5.1. Seekpoint 8.5.1. Seekpoint
+=======+==========================================================+ +=======+==========================================================+
| Data | Description | | Data | Description |
+=======+==========================================================+ +=======+==========================================================+
| u(64) | Sample number of the first sample in the target frame, | | u(64) | Sample number of the first sample in the target frame or |
| | or 0xFFFFFFFFFFFFFFFF for a placeholder point. | | | 0xFFFFFFFFFFFFFFFF for a placeholder point. |
+-------+----------------------------------------------------------+ +-------+----------------------------------------------------------+
| u(64) | Offset (in bytes) from the first byte of the first frame | | u(64) | Offset (in bytes) from the first byte of the first frame |
| | header to the first byte of the target frame's header. | | | header to the first byte of the target frame's header. |
+-------+----------------------------------------------------------+ +-------+----------------------------------------------------------+
| u(16) | Number of samples in the target frame. | | u(16) | Number of samples in the target frame. |
+-------+----------------------------------------------------------+ +-------+----------------------------------------------------------+
Table 7 Table 7
NOTES Notes:
* For placeholder points, the second and third field values are * For placeholder points, the second and third field values are
undefined. undefined.
* Seek points within a table MUST be sorted in ascending order by * Seek points within a table MUST be sorted in ascending order by
sample number. sample number.
* Seek points within a table MUST be unique by sample number, with * Seek points within a table MUST be unique by sample number, with
the exception of placeholder points. the exception of placeholder points.
* The previous two notes imply that there MAY be any number of * The previous two notes imply that there MAY be any number of
placeholder points, but they MUST all occur at the end of the placeholder points, but they MUST all occur at the end of the
table. table.
* The sample offsets are those of an unmuxed FLAC stream. The * The sample offsets are those of an unmuxed FLAC stream. The
offsets MUST NOT be updated on muxing to reflect the new offsets offsets MUST NOT be updated on muxing to reflect the new offsets
of FLAC frames in a container. of FLAC frames in a container.
8.6. Vorbis comment 8.6. Vorbis Comment
A Vorbis comment metadata block contains human-readable information A Vorbis comment metadata block contains human-readable information
coded in UTF-8. The name Vorbis comment points to the fact that the coded in UTF-8. The name "Vorbis comment" points to the fact that
Vorbis codec stores such metadata in almost the same way, see the Vorbis codec stores such metadata in almost the same way (see
[Vorbis]. A Vorbis comment metadata block consists of a vendor [Vorbis]). A Vorbis comment metadata block consists of a vendor
string optionally followed by a number of fields, which are pairs of string optionally followed by a number of fields, which are pairs of
field names and field contents. Many users refer to these fields as field names and field contents. Many users refer to these fields as
FLAC tags or simply as tags. A FLAC file MUST NOT contain more than "FLAC tags" or simply as "tags". A FLAC file MUST NOT contain more
one Vorbis comment metadata block. than one Vorbis comment metadata block.
In a Vorbis comment metadata block, the metadata block header is In a Vorbis comment metadata block, the metadata block header is
directly followed by 4 bytes containing the length in bytes of the directly followed by 4 bytes containing the length in bytes of the
vendor string as an unsigned number coded little-endian. The vendor vendor string as an unsigned number coded little-endian. The vendor
string follows UTF-8 coded, and is not terminated in any way. string follows UTF-8 coded and is not terminated in any way.
Following the vendor string are 4 bytes containing the number of Following the vendor string are 4 bytes containing the number of
fields that are in the Vorbis comment block, stored as an unsigned fields that are in the Vorbis comment block, stored as an unsigned
number, coded little-endian. If this number is non-zero, it is number coded little-endian. If this number is non-zero, it is
followed by the fields themselves, each of which is stored with a 4 followed by the fields themselves, each of which is stored with a
byte length. First, the 4 byte field length in bytes is stored as an 4-byte length. First, the 4-byte field length in bytes is stored as
unsigned number, coded little-endian. The field itself is, like the an unsigned number coded little-endian. Like the vendor string, the
vendor string, UTF-8 coded, not terminated in any way. field itself is UTF-8 coded and not terminated in any way.
Each field consists of a field name and a field content, separated by Each field consists of a field name and field contents, separated by
an = character. The field name MUST only consist of UTF-8 code an = character. The field name MUST only consist of UTF-8 code
points U+0020 through U+007E, excluding U+003D, which is the = points U+0020 through U+007E, excluding U+003D, which is the =
character. In other words, the field name can contain all printable character. In other words, the field name can contain all printable
ASCII characters except the equals sign. The evaluation of the field ASCII characters except the equals sign. The evaluation of the field
names MUST be case insensitive, so U+0041 through 0+005A (A-Z) MUST names MUST be case insensitive, so U+0041 through 0+005A (A-Z) MUST
be considered equivalent to U+0061 through U+007A (a-z) respectively. be considered equivalent to U+0061 through U+007A (a-z). The field
The field contents can contain any UTF-8 character. contents can contain any UTF-8 character.
Note that the Vorbis comment as used in Vorbis allows for on the Note that the Vorbis comment as used in Vorbis allows for 2^64 bytes
order of 2^64 bytes of data whereas the FLAC metadata block is of data whereas the FLAC metadata block is limited to 2^24 bytes.
limited to 2^24 bytes. Given the stated purpose of Vorbis comments, Given the stated purpose of Vorbis comments, i.e., human-readable
i.e., human-readable textual information, the FLAC metadata block textual information, the FLAC metadata block limit is unlikely to be
limit is unlikely to be restrictive. Also note that the 32-bit field restrictive. Also, note that the 32-bit field lengths are coded
lengths are coded little-endian, as opposed to the usual big-endian little-endian as opposed to the usual big-endian coding of fixed-
coding of fixed-length integers in the rest of the FLAC format. length integers in the rest of the FLAC format.
8.6.1. Standard field names 8.6.1. Standard Field Names
Only one standard field name is defined: the channel mask field, in Only one standard field name is defined: the channel mask field (see
Section 8.6.2. No other field names are defined because the Section 8.6.2). No other field names are defined because the
applicability of any field name is strongly tied to the content it is applicability of any field name is strongly tied to the content it is
associated with. For example, field names useful for describing associated with. For example, field names that are useful for
files that contain a single work of music would be unusable when describing files that contain a single work of music would be
labeling archived broadcasts, recordings of any kind, or a collection unusable when labeling archived broadcasts, recordings of any kind,
of music works. Even when describing a single work of music, or a collection of music works. Even when describing a single work
different conventions exist depending on the kind of music: of music, different conventions exist depending on the kind of music:
orchestral music differs from music by solo artists or bands. orchestral music differs from music by solo artists or bands.
Despite the fact that no field names are formally defined, there is a Despite the fact that no field names are formally defined, there is a
general trend among devices and software capable of FLAC playback general trend among devices and software capable of FLAC playback
that are meant to play music. Most of those recognize at least the that are meant to play music. Most of those recognize at least the
following field names: following field names:
* Title: name of the current work. Title: Name of the current work.
* Artist: name of the artist generally responsible for the current
Artist: Name of the artist generally responsible for the current
work. For orchestral works, this is usually the composer; work. For orchestral works, this is usually the composer;
otherwise, it is often the performer. otherwise, it is often the performer.
* Album: name of the collection the current work belongs to.
Album: Name of the collection the current work belongs to.
For a more comprehensive list of possible field names suited for For a more comprehensive list of possible field names suited for
describing a single work of music in various genres, the list of tags describing a single work of music in various genres, the list of tags
used in the MusicBrainz project, see [MusicBrainz], is suggested. used in the MusicBrainz project is suggested; see [MusicBrainz].
8.6.2. Channel mask 8.6.2. Channel Mask
Besides fields containing information about the work itself, one Besides fields containing information about the work itself, one
field is defined for technical reasons, of which the field name is field is defined for technical reasons:
WAVEFORMATEXTENSIBLE_CHANNEL_MASK. This field is used to communicate WAVEFORMATEXTENSIBLE_CHANNEL_MASK. This field is used to communicate
that the channels in a file differ from the default channels defined that the channels in a file differ from the default channels defined
in Section 9.1.3. For example, by default, a FLAC file containing in Section 9.1.3. For example, by default, a FLAC file containing
two channels is interpreted to contain a left and right channel, but two channels is interpreted to contain a left and right channel, but
with this field, it is possible to describe different channel with this field, it is possible to describe different channel
contents. contents.
The channel mask consists of flag bits indicating which channels are The channel mask consists of flag bits indicating which channels are
present. The flags only signal which channels are present, not in present. The flags only signal which channels are present, not in
which order, so if a file has to be encoded in which channels are which order, so if a file to be encoded has channels that are ordered
ordered differently, they have to be reordered. This mask is stored differently, they have to be reordered. This mask is stored with a
with a hexadecimal representation, preceded by 0x, see the examples hexadecimal representation preceded by 0x; see the examples below.
below. Please note that a file in which the channel order is defined Please note that a file in which the channel order is defined through
through the WAVEFORMATEXTENSIBLE_CHANNEL_MASK is not streamable (see the WAVEFORMATEXTENSIBLE_CHANNEL_MASK is not streamable (see
Section 7), as the field is not found in each frame header. The mask Section 7), as the field is not found in each frame header. The mask
bits can be found in the following table. bits can be found in the following table.
+============+=============================+ +============+=============================+
| Bit number | Channel description | | Bit Number | Channel Description |
+============+=============================+ +============+=============================+
| 0 | Front left | | 0 | Front left |
+------------+-----------------------------+ +------------+-----------------------------+
| 1 | Front right | | 1 | Front right |
+------------+-----------------------------+ +------------+-----------------------------+
| 2 | Front center | | 2 | Front center |
+------------+-----------------------------+ +------------+-----------------------------+
| 3 | Low-frequency effects (LFE) | | 3 | Low-frequency effects (LFE) |
+------------+-----------------------------+ +------------+-----------------------------+
| 4 | Back left | | 4 | Back left |
skipping to change at page 24, line 49 skipping to change at line 1083
+------------+-----------------------------+ +------------+-----------------------------+
| 16 | Top rear center | | 16 | Top rear center |
+------------+-----------------------------+ +------------+-----------------------------+
| 17 | Top rear right | | 17 | Top rear right |
+------------+-----------------------------+ +------------+-----------------------------+
Table 8 Table 8
Following are three examples: Following are three examples:
* If a file has a single channel, being a LFE channel, the Vorbis * A file has a single channel -- an LFE channel. The Vorbis comment
* If a file has four channels, being front left, front right, top * A file has four channels -- front left, front right, top front
front left, and top front right, the Vorbis comment field is left, and top front right. The Vorbis comment field is
* If an input has four channels, being back center, top front
center, front center, and top rear center in that order, they have * An input has four channels -- back center, top front center, front
to be reordered to front center, back center, top front center and center, and top rear center in that order. These have to be
top rear center. The Vorbis comment field added is reordered to front center, back center, top front center, and top
rear center. The Vorbis comment field added is
for example, 0x0008 for a single LFE channel. Parsing of for example, 0x0008 for a single LFE channel. Parsing of
both the field name and the field contents. both the field name and the field contents.
indicate that none of the audio channels of a file correlate with indicate that none of the audio channels of a file correlate with
speaker positions. This is the case when audio needs to be decoded speaker positions. This is the case when audio needs to be decoded
skipping to change at page 25, line 33 skipping to change at line 1115
multitrack recording is contained. multitrack recording is contained.
It is possible for a WAVEFORMATEXTENSIBLE_CHANNEL_MASK field to code It is possible for a WAVEFORMATEXTENSIBLE_CHANNEL_MASK field to code
for fewer channels than are present in the audio. If that is the for fewer channels than are present in the audio. If that is the
case, the remaining channels SHOULD NOT be rendered by a playback case, the remaining channels SHOULD NOT be rendered by a playback
application unfamiliar with their purpose. For example, the application unfamiliar with their purpose. For example, the
Ambisonics UHJ format is compatible with stereo playback: its first Ambisonics UHJ format is compatible with stereo playback: its first
two channels can be played back on stereo equipment, but all four two channels can be played back on stereo equipment, but all four
channels together can be decoded into surround sound. For that channels together can be decoded into surround sound. For that
example, the Vorbis comment field example, the Vorbis comment field
WAVEFORMATEXTENSIBLE_CHANNEL_MASK=0x3 would be set, indicating the WAVEFORMATEXTENSIBLE_CHANNEL_MASK=0x3 would be set, indicating that
first two channels are front left and front right, and other channels the first two channels are front left and front right and other
do not correlate with speaker positions directly. channels do not correlate with speaker positions directly.
If audio channels not assigned to any speaker are contained and If audio channels not assigned to any speaker are contained and
decoding to speaker positions is possible, it is recommended to decoding to speaker positions is possible, it is recommended to
provide metadata on how this decoding should take place in another provide metadata on how this decoding should take place in another
Vorbis comment field or an application metadata block. This document Vorbis comment field or an application metadata block. This document
does not define such metadata. does not define such metadata.
8.7. Cuesheet 8.7. Cuesheet
To either store the track and index point structure of a Compact Disc A cuesheet metadata block can be used either to store the track and
Digital Audio (CD-DA) along with its audio or to provide a mechanism index point structure of a Compact Disc Digital Audio (CD-DA) along
to store locations of interest within a FLAC file, a cuesheet with its audio or to provide a mechanism to store locations of
metadata block can be used. Certain aspects of this metadata block interest within a FLAC file. Certain aspects of this metadata block
follow directly from the CD-DA specification, called Red Book, which come directly from the CD-DA specification (called Red Book), which
is standardized as [IEC.60908.1999]. The description below is is standardized as [IEC.60908.1999]. The description below is
complete and further reference to [IEC.60908.1999] is not needed to complete, and further reference to [IEC.60908.1999] is not needed to
implement this metadata block. implement this metadata block.
The structure of a cuesheet metadata block is enumerated in the The structure of a cuesheet metadata block is enumerated in the
following table. following table.
+============+======================================================+ +============+======================================================+
| Data | Description | | Data | Description |
+============+======================================================+ +============+======================================================+
| u(128*8) | Media catalog number, in ASCII | | u(128*8) | Media catalog number in ASCII |
| | printable characters 0x20-0x7E. | | | printable characters 0x20-0x7E. |
+------------+------------------------------------------------------+ +------------+------------------------------------------------------+
| u(64) | Number of lead-in samples. | | u(64) | Number of lead-in samples. |
+------------+------------------------------------------------------+ +------------+------------------------------------------------------+
| u(1) | 1 if the cuesheet corresponds to a | | u(1) | 1 if the cuesheet corresponds to a |
| | CD-DA, else 0. | | | CD-DA; else 0. |
+------------+------------------------------------------------------+ +------------+------------------------------------------------------+
| u(7+258*8) | Reserved. All bits MUST be set to | | u(7+258*8) | Reserved. All bits MUST be set to |
| | zero. | | | zero. |
+------------+------------------------------------------------------+ +------------+------------------------------------------------------+
| u(8) | Number of tracks in this cuesheet. | | u(8) | Number of tracks in this cuesheet. |
+------------+------------------------------------------------------+ +------------+------------------------------------------------------+
| Cuesheet | A number of structures as specified | | Cuesheet | A number of structures as specified |
| tracks | in Section 8.7.1 equal to the number | | tracks | in Section 8.7.1 equal to the number |
| | of tracks specified previously. | | | of tracks specified previously. |
+------------+------------------------------------------------------+ +------------+------------------------------------------------------+
Table 9 Table 9
If the media catalog number is less than 128 bytes long, it is right- If the media catalog number is less than 128 bytes long, it is right-
padded with 0x00 bytes. For CD-DA, this is a thirteen digit number, padded with 0x00 bytes. For CD-DA, this is a 13-digit number
followed by 115 0x00 bytes. followed by 115 0x00 bytes.
The number of lead-in samples has meaning only for CD-DA cuesheets; The number of lead-in samples has meaning only for CD-DA cuesheets;
for other uses, it should be 0. For CD-DA, the lead-in is the TRACK for other uses, it should be 0. For CD-DA, the lead-in is the TRACK
00 area where the table of contents is stored; more precisely, it is 00 area where the table of contents is stored; more precisely, it is
the number of samples from the first sample of the media to the first the number of samples from the first sample of the media to the first
sample of the first index point of the first track. According to sample of the first index point of the first track. According to
[IEC.60908.1999], the lead-in MUST be silence and CD grabbing [IEC.60908.1999], the lead-in MUST be silence, and CD grabbing
software does not usually store it; additionally, the lead-in MUST be software does not usually store it; additionally, the lead-in MUST be
at least two seconds but MAY be longer. For these reasons, the lead- at least two seconds but MAY be longer. For these reasons, the lead-
in length is stored here so that the absolute position of the first in length is stored here so that the absolute position of the first
track can be computed. Note that the lead-in stored here is the track can be computed. Note that the lead-in stored here is the
number of samples up to the first index point of the first track, not number of samples up to the first index point of the first track, not
necessarily to INDEX 01 of the first track; even the first track MAY necessarily to INDEX 01 of the first track; even the first track MAY
have INDEX 00 data. have INDEX 00 data.
The number of tracks MUST be at least 1, as a cuesheet block MUST The number of tracks MUST be at least 1, as a cuesheet block MUST
have a lead-out track. For CD-DA, this number MUST be no more than have a lead-out track. For CD-DA, this number MUST be no more than
100 (99 regular tracks and one lead-out track). The lead-out track 100 (99 regular tracks and one lead-out track). The lead-out track
is always the last track in the cuesheet. For CD-DA, the lead-out is always the last track in the cuesheet. For CD-DA, the lead-out
track number MUST be 170 as specified by [IEC.60908.1999], otherwise track number MUST be 170 as specified by [IEC.60908.1999]; otherwise,
it MUST be 255. it MUST be 255.
8.7.1. Cuesheet track 8.7.1. Cuesheet Track
+=============+=====================================================+ +=============+=====================================================+
| Data | Description | | Data | Description |
+=============+=====================================================+ +=============+=====================================================+
| u(64) | Track offset of the first index point in | | u(64) | Track offset of the first index point in |
| | samples, relative to the beginning of the | | | samples, relative to the beginning of the |
| | FLAC audio stream. | | | FLAC audio stream. |
+-------------+-----------------------------------------------------+ +-------------+-----------------------------------------------------+
| u(8) | Track number. | | u(8) | Track number. |
+-------------+-----------------------------------------------------+ +-------------+-----------------------------------------------------+
skipping to change at page 28, line 5 skipping to change at line 1225
| | points specified previously. | | | points specified previously. |
+-------------+-----------------------------------------------------+ +-------------+-----------------------------------------------------+
Table 10 Table 10
Note that the track offset differs from the one in CD-DA, where the Note that the track offset differs from the one in CD-DA, where the
track's offset in the TOC is that of the track's INDEX 01 even if track's offset in the TOC is that of the track's INDEX 01 even if
there is an INDEX 00. For CD-DA, the track offset MUST be evenly there is an INDEX 00. For CD-DA, the track offset MUST be evenly
divisible by 588 samples (588 samples = 44100 samples/s * 1/75 s). divisible by 588 samples (588 samples = 44100 samples/s * 1/75 s).
A track number of 0 is not allowed, because the CD-DA specification A track number of 0 is not allowed because the CD-DA specification
reserves this for the lead-in. For CD-DA the number MUST be 1-99, or reserves this for the lead-in. For CD-DA, the number MUST be 1-99 or
170 for the lead-out; for non-CD-DA, the track number MUST be 255 for 170 for the lead-out; for non-CD-DA, the track number MUST be 255 for
the lead-out. It is recommended to start with track 1 and increase the lead-out. It is recommended to start with track 1 and increase
sequentially. Track numbers MUST be unique within a cuesheet. sequentially. Track numbers MUST be unique within a cuesheet.
The track ISRC (International Standard Recording Code) is a 12-digit The track ISRC (International Standard Recording Code) is a 12-digit
alphanumeric code; see [ISRC-handbook]. A value of 12 ASCII 0x00 alphanumeric code; see [ISRC-handbook]. A value of 12 ASCII 0x00
characters MAY be used to denote the absence of an ISRC. characters MAY be used to denote the absence of an ISRC.
There MUST be at least one index point in every track in a cuesheet There MUST be at least one index point in every track in a cuesheet
except for the lead-out track, which MUST have zero. For CD-DA, the except for the lead-out track, which MUST have zero. For CD-DA, the
number of index points MUST NOT be more than 100. number of index points MUST NOT be more than 100. Cuesheet track index point Cuesheet Track Index Point
+========+====================================+ +========+====================================+
| Data | Description | | Data | Description |
+========+====================================+ +========+====================================+
| u(64) | Offset in samples, relative to the | | u(64) | Offset in samples, relative to the |
| | track offset, of the index point. | | | track offset, of the index point. |
+--------+------------------------------------+ +--------+------------------------------------+
| u(8) | The track index point number. | | u(8) | The track index point number. |
+--------+------------------------------------+ +--------+------------------------------------+
| u(3*8) | Reserved. All bits MUST be set to | | u(3*8) | Reserved. All bits MUST be set to |
skipping to change at page 28, line 49 skipping to change at line 1269
For CD-DA, a track index point number of 0 corresponds to the track For CD-DA, a track index point number of 0 corresponds to the track
pre-gap. The first index point in a track MUST have a number of 0 or pre-gap. The first index point in a track MUST have a number of 0 or
1, and subsequently, index point numbers MUST increase by 1. Index 1, and subsequently, index point numbers MUST increase by 1. Index
point numbers MUST be unique within a track. point numbers MUST be unique within a track.
8.8. Picture 8.8. Picture
The picture metadata block contains image data of a picture in some The picture metadata block contains image data of a picture in some
way belonging to the audio contained in the FLAC file. Its format is way belonging to the audio contained in the FLAC file. Its format is
derived from the APIC frame in the ID3v2 specification, see [ID3v2]. derived from the APIC frame in the ID3v2 specification; see [ID3v2].
However, contrary to the APIC frame in ID3v2, the media type and However, contrary to the APIC frame in ID3v2, the media type and
description are prepended with a 4-byte length field instead of being description are prepended with a 4-byte length field instead of being
0x00 delimited strings. A FLAC file MAY contain one or more picture 0x00 delimited strings. A FLAC file MAY contain one or more picture
metadata blocks. metadata blocks.
Note that while the length fields for media type, description, and Note that while the length fields for media type, description, and
picture data are 4 bytes in length and could in theory code for a picture data are 4 bytes in length and could code for a size up to 4
size up to 4 GiB, the total metadata block size cannot exceed what GiB in theory, the total metadata block size cannot exceed what can
can be described by the metadata block header, i.e., 16 MiB. be described by the metadata block header, i.e., 16 MiB.
Instead of picture data, the picture metadata block can also contain Instead of picture data, the picture metadata block can also contain
an URI as described in [RFC3986]. a URI as described in [RFC3986].
The structure of a picture metadata block is enumerated in the The structure of a picture metadata block is enumerated in the
following table. following table.
+========+==========================================================+ +========+==========================================================+
| Data | Description | | Data | Description |
+========+==========================================================+ +========+==========================================================+
| u(32) | The picture type according to next table | | u(32) | The picture type according to next table. |
+--------+----------------------------------------------------------+ +--------+----------------------------------------------------------+
| u(32) | The length of the media type string in bytes. | | u(32) | The length of the media type string in bytes. |
+--------+----------------------------------------------------------+ +--------+----------------------------------------------------------+
| u(n*8) | The media type string as specified by [RFC2046], | | u(n*8) | The media type string as specified by [RFC2046], |
| | or the text string --> to signify that the data | | | or the text string --> to signify that the data |
| | part is a URI of the picture instead of the | | | part is a URI of the picture instead of the |
| | picture data itself. This field must be in | | | picture data itself. This field must be in |
| | printable ASCII characters 0x20-0x7E. | | | printable ASCII characters 0x20-0x7E. |
+--------+----------------------------------------------------------+ +--------+----------------------------------------------------------+
| u(32) | The length of the description string in bytes. | | u(32) | The length of the description string in bytes. |
+--------+----------------------------------------------------------+ +--------+----------------------------------------------------------+
| u(n*8) | The description of the picture, in UTF-8. | | u(n*8) | The description of the picture in UTF-8. |
+--------+----------------------------------------------------------+ +--------+----------------------------------------------------------+
| u(32) | The width of the picture in pixels. | | u(32) | The width of the picture in pixels. |
+--------+----------------------------------------------------------+ +--------+----------------------------------------------------------+
| u(32) | The height of the picture in pixels. | | u(32) | The height of the picture in pixels. |
+--------+----------------------------------------------------------+ +--------+----------------------------------------------------------+
| u(32) | The color depth of the picture in bits per | | u(32) | The color depth of the picture in bits per |
| | pixel. | | | pixel. |
+--------+----------------------------------------------------------+ +--------+----------------------------------------------------------+
| u(32) | For indexed-color pictures (e.g., GIF), the | | u(32) | For indexed-color pictures (e.g., GIF), the |
| | number of colors used, or 0 for non-indexed | | | number of colors used; 0 for non-indexed |
| | pictures. | | | pictures. |
+--------+----------------------------------------------------------+ +--------+----------------------------------------------------------+
| u(32) | The length of the picture data in bytes. | | u(32) | The length of the picture data in bytes. |
+--------+----------------------------------------------------------+ +--------+----------------------------------------------------------+
| u(n*8) | The binary picture data. | | u(n*8) | The binary picture data. |
+--------+----------------------------------------------------------+ +--------+----------------------------------------------------------+
Table 12 Table 12
The height, width, color depth, and 'number of colors' fields are for The height, width, color depth, and "number of colors" fields are for
informational purposes only. Applications MUST NOT use them in informational purposes only. Applications MUST NOT use them in
decoding the picture or deciding how to display it, but MAY use them decoding the picture or deciding how to display it, but applications
to decide whether to process a block or not (e.g., when selecting MAY use them to decide whether or not to process a block (e.g., when
between different picture blocks) and MAY show them to the user. If selecting between different picture blocks) and MAY show them to the
a picture has no concept for any of these fields (e.g., vector images user. If a picture has no concept for any of these fields (e.g.,
may not have a height or width in pixels) or the content of any field vector images may not have a height or width in pixels) or the
is unknown, the affected fields MUST be set to zero. content of any field is unknown, the affected fields MUST be set to
The following table contains all the defined picture types. Values The following table contains all the defined picture types. Values
other than those listed in the table are reserved. There MAY only be other than those listed in the table are reserved. There MAY only be
one each of picture types 1 and 2 in a file. In general practice, one each of picture types 1 and 2 in a file. In general practice,
many FLAC playback devices and software display the contents of a many FLAC playback devices and software display the contents of a
picture metadata block with picture type 3 (front cover) during picture metadata block, if present, with picture type 3 (front cover)
playback, if present. during playback.
+=======+=================================================+ +=======+=================================================+
| Value | Picture type | | Value | Picture Type |
+=======+=================================================+ +=======+=================================================+
| 0 | Other | | 0 | Other |
+-------+-------------------------------------------------+ +-------+-------------------------------------------------+
| 1 | PNG file icon of 32x32 pixels, see [RFC2083] | | 1 | PNG file icon of 32x32 pixels (see [RFC2083]) |
+-------+-------------------------------------------------+ +-------+-------------------------------------------------+
| 2 | General file icon | | 2 | General file icon |
+-------+-------------------------------------------------+ +-------+-------------------------------------------------+
| 3 | Front cover | | 3 | Front cover |
+-------+-------------------------------------------------+ +-------+-------------------------------------------------+
| 4 | Back cover | | 4 | Back cover |
+-------+-------------------------------------------------+ +-------+-------------------------------------------------+
| 5 | Liner notes page | | 5 | Liner notes page |
+-------+-------------------------------------------------+ +-------+-------------------------------------------------+
| 6 | Media label (e.g., CD, Vinyl or Cassette label) | | 6 | Media label (e.g., CD, Vinyl or Cassette label) |
skipping to change at page 32, line 5 skipping to change at line 1386
+-------+-------------------------------------------------+ +-------+-------------------------------------------------+
| 18 | Illustration | | 18 | Illustration |
+-------+-------------------------------------------------+ +-------+-------------------------------------------------+
| 19 | Band or artist logotype | | 19 | Band or artist logotype |
+-------+-------------------------------------------------+ +-------+-------------------------------------------------+
| 20 | Publisher or studio logotype | | 20 | Publisher or studio logotype |
+-------+-------------------------------------------------+ +-------+-------------------------------------------------+
Table 13 Table 13
The origin and use of value 17, "A bright colored fish", is unclear. The origin and use of value 17 ("A bright colored fish") is unclear.
This was copied to maintain compatibility with ID3v2. Applications This was copied to maintain compatibility with ID3v2. Applications
are discouraged from offering this value to users when embedding a are discouraged from offering this value to users when embedding a
picture. picture.
If not a picture but a URI is contained in this block, the following If a URI (not a picture) is contained in this block, the following
points apply: points apply:
* The URI can be either in absolute or relative form. If an URI is * The URI can be in either absolute or relative form. If a URI is
in relative form, it is related to the URI of the FLAC content in relative form, it is related to the URI of the FLAC content
processed. processed.
* Applications MUST obtain explicit user approval to retrieve images * Applications MUST obtain explicit user approval to retrieve images
via remote protocols and to retrieve local images not located in via remote protocols and to retrieve local images that are not
the same directory as the FLAC file being processed. located in the same directory as the FLAC file being processed.
* Applications supporting linked images MUST handle unavailability * Applications supporting linked images MUST handle unavailability
of URIs gracefully. They MAY report unavailability to the user. of URIs gracefully. They MAY report unavailability to the user.
* Applications MAY reject processing URIs for any reason, in
particular for security or privacy reasons.
9. Frame structure * Applications MAY reject processing URIs for any reason,
particularly for security or privacy reasons.
Directly after the last metadata block, one or more frames follow. 9. Frame Structure
One or more frames follow directly after the last metadata block.
Each frame consists of a frame header, one or more subframes, padding Each frame consists of a frame header, one or more subframes, padding
zero bits to achieve byte-alignment, and a frame footer. The number zero bits to achieve byte alignment, and a frame footer. The number
of subframes in each frame is equal to the number of audio channels. of subframes in each frame is equal to the number of audio channels.
Each frame header stores the audio sample rate, number of bits per Each frame header stores the audio sample rate, number of bits per
sample, and number of channels independently of the streaminfo sample, and number of channels independently of the streaminfo
metadata block and other frame headers. This was done to permit metadata block and other frame headers. This was done to permit
multicasting of FLAC files, but it also allows these properties to multicasting of FLAC files, but it also allows these properties to
change mid-stream. Because not all environments in which FLAC change mid-stream. Because not all environments in which FLAC
decoders are used are able to cope with changes to these properties decoders are used are able to cope with changes to these properties
during playback, a decoder MAY choose to stop decoding on such a during playback, a decoder MAY choose to stop decoding on such a
change. A decoder that does not check for such a change could be change. A decoder that does not check for such a change could be
vulnerable to buffer overflows. See also Section 12. vulnerable to buffer overflows. See also Section 11.
Note that storing audio with changing audio properties in FLAC Note that storing audio with changing audio properties in FLAC
results in various practical problems. For example, these changes of results in various practical problems. For example, these changes of
audio properties must happen on a frame boundary, or the process will audio properties must happen on a frame boundary or the process will
not be lossless. When a variable block size is chosen to accommodate not be lossless. When a variable block size is chosen to accommodate
this, note that blocks smaller than 16 samples are not allowed and it this, note that blocks smaller than 16 samples are not allowed;
is therefore not possible to store an audio stream in which these therefore, it is not possible to store an audio stream in which these
properties change within 16 samples of the last change or the start properties change within 16 samples of the last change or the start
of the file. Also, since the streaminfo metadata block can only of the file. Also, since the streaminfo metadata block can only
accommodate a single set of properties, it is only valid for part of accommodate a single set of properties, it is only valid for part of
such an audio stream. Instead, it is RECOMMENDED to store an audio such an audio stream. Instead, it is RECOMMENDED to store an audio
stream with changing properties in FLAC encapsulated in a container stream with changing properties in FLAC encapsulated in a container
capable of handling such changes, as these do not suffer from the capable of handling such changes, as these do not suffer from the
mentioned limitations. See Section 10 for details. mentioned limitations. See Section 10 for details.
9.1. Frame header 9.1. Frame Header
Each frame MUST start on a byte boundary and starts with the 15-bit Each frame MUST start on a byte boundary and start with the 15-bit
frame sync code 0b111111111111100. Following the sync code is the frame sync code 0b111111111111100. Following the sync code is the
blocking strategy bit, which MUST NOT change during the audio stream. blocking strategy bit, which MUST NOT change during the audio stream.
The blocking strategy bit is 0 for a fixed block size stream or 1 for The blocking strategy bit is 0 for a fixed block size stream or 1 for
a variable block size stream. If the blocking strategy is known, a a variable block size stream. If the blocking strategy is known, a
decoder can include this bit when searching for the start of a frame decoder can include this bit when searching for the start of a frame
to reduce the possibility of encountering a false positive, as the to reduce the possibility of encountering a false positive, as the
first two bytes of a frame are either 0xFFF8 for a fixed block size first two bytes of a frame are either 0xFFF8 for a fixed block size
stream or 0xFFF9 for a variable block size stream. stream or 0xFFF9 for a variable block size stream.
9.1.1. Block size bits 9.1.1. Block Size Bits
Following the frame sync code and blocking strategy bit are 4 bits Following the frame sync code and blocking strategy bit are 4 bits
(the first 4 bits of the third byte of each frame) referred to as the (the first 4 bits of the third byte of each frame) referred to as the
block size bits. Their value relates to the block size according to block size bits. Their value relates to the block size according to
the following table, where v is the value of the 4 bits as an the following table, where v is the value of the 4 bits as an
unsigned number. If the block size bits code for an uncommon block unsigned number. If the block size bits code for an uncommon block
size, this is stored after the coded number, see Section 9.1.6. size, this is stored after the coded number; see Section 9.1.6.
+=================+=============================================+ +=================+=============================================+
| Value | Block size | | Value | Block Size |
+=================+=============================================+ +=================+=============================================+
| 0b0000 | reserved | | 0b0000 | Reserved |
+-----------------+---------------------------------------------+ +-----------------+---------------------------------------------+
| 0b0001 | 192 | | 0b0001 | 192 |
+-----------------+---------------------------------------------+ +-----------------+---------------------------------------------+
| 0b0010 - 0b0101 | 144 * (2^v), i.e., 576, 1152, 2304, or 4608 | | 0b0010 - 0b0101 | 144 * (2^v), i.e., 576, 1152, 2304, or 4608 |
+-----------------+---------------------------------------------+ +-----------------+---------------------------------------------+
| 0b0110 | uncommon block size minus 1 stored as an | | 0b0110 | uncommon block size minus 1, stored as an |
| | 8-bit number | | | 8-bit number |
+-----------------+---------------------------------------------+ +-----------------+---------------------------------------------+
| 0b0111 | uncommon block size minus 1 stored as a | | 0b0111 | uncommon block size minus 1, stored as a |
| | 16-bit number | | | 16-bit number |
+-----------------+---------------------------------------------+ +-----------------+---------------------------------------------+
| 0b1000 - 0b1111 | 2^v, i.e., 256, 512, 1024, 2048, 4096, | | 0b1000 - 0b1111 | 2^v, i.e., 256, 512, 1024, 2048, 4096, |
| | 8192, 16384, or 32768 | | | 8192, 16384, or 32768 |
+-----------------+---------------------------------------------+ +-----------------+---------------------------------------------+
Table 14 Table 14
9.1.2. Sample rate bits 9.1.2. Sample Rate Bits
The next 4 bits (the last 4 bits of the third byte of each frame), The next 4 bits (the last 4 bits of the third byte of each frame),
referred to as the sample rate bits, contain the sample rate of the referred to as the sample rate bits, contain the sample rate of the
audio according to the following table. If the sample rate bits code audio according to the following table. If the sample rate bits code
for an uncommon sample rate, this is stored after the uncommon block for an uncommon sample rate, this is stored after the uncommon block
size or after the coded number if no uncommon block size was used. size; if no uncommon block size was used, this is stored after the
See Section 9.1.7. coded number. See Section 9.1.7.
+========+==========================================================+ +========+====================================+
| Value | Sample rate | | Value | Sample Rate |
+========+==========================================================+ +========+====================================+
| 0b0000 | sample rate only stored in the | | 0b0000 | sample rate only, stored in the |
| | streaminfo metadata block | | | streaminfo metadata block |
+--------+----------------------------------------------------------+ +--------+------------------------------------+
| 0b0001 | 88.2 kHz | | 0b0001 | 88.2 kHz |
+--------+----------------------------------------------------------+ +--------+------------------------------------+
| 0b0010 | 176.4 kHz | | 0b0010 | 176.4 kHz |
+--------+----------------------------------------------------------+ +--------+------------------------------------+
| 0b0011 | 192 kHz | | 0b0011 | 192 kHz |
+--------+----------------------------------------------------------+ +--------+------------------------------------+
| 0b0100 | 8 kHz | | 0b0100 | 8 kHz |
+--------+----------------------------------------------------------+ +--------+------------------------------------+
| 0b0101 | 16 kHz | | 0b0101 | 16 kHz |
+--------+----------------------------------------------------------+ +--------+------------------------------------+
| 0b0110 | 22.05 kHz | | 0b0110 | 22.05 kHz |
+--------+----------------------------------------------------------+ +--------+------------------------------------+
| 0b0111 | 24 kHz | | 0b0111 | 24 kHz |
+--------+----------------------------------------------------------+ +--------+------------------------------------+
| 0b1000 | 32 kHz | | 0b1000 | 32 kHz |
+--------+----------------------------------------------------------+ +--------+------------------------------------+
| 0b1001 | 44.1 kHz | | 0b1001 | 44.1 kHz |
+--------+----------------------------------------------------------+ +--------+------------------------------------+
| 0b1010 | 48 kHz | | 0b1010 | 48 kHz |
+--------+----------------------------------------------------------+ +--------+------------------------------------+
| 0b1011 | 96 kHz | | 0b1011 | 96 kHz |
+--------+----------------------------------------------------------+ +--------+------------------------------------+
| 0b1100 | uncommon sample rate in kHz stored | | 0b1100 | uncommon sample rate in kHz, |
| | as an 8-bit number | | | stored as an 8-bit number |
+--------+----------------------------------------------------------+ +--------+------------------------------------+
| 0b1101 | uncommon sample rate in Hz stored | | 0b1101 | uncommon sample rate in Hz, stored |
| | as a 16-bit number | | | as a 16-bit number |
+--------+----------------------------------------------------------+ +--------+------------------------------------+
| 0b1110 | uncommon sample rate in Hz divided | | 0b1110 | uncommon sample rate in Hz divided |
| | by 10, stored as a 16-bit number | | | by 10, stored as a 16-bit number |
+--------+----------------------------------------------------------+ +--------+------------------------------------+
| 0b1111 | forbidden | | 0b1111 | Forbidden |
+--------+----------------------------------------------------------+ +--------+------------------------------------+
Table 15 Table 15
9.1.3. Channels bits 9.1.3. Channels Bits
The next 4 bits (the first 4 bits of the fourth byte of each frame), The next 4 bits (the first 4 bits of the fourth byte of each frame),
referred to as the channels bits, contain both the number of channels referred to as the channels bits, contain both the number of channels
of the audio as well as any stereo decorrelation used according to of the audio as well as any stereo decorrelation used according to
the following table. the following table.
If a channel layout different than the ones listed in the following If a channel layout different than the ones listed in the following
table is used, this can be signaled with a table is used, this can be signaled with a
WAVEFORMATEXTENSIBLE_CHANNEL_MASK tag in a Vorbis comment metadata WAVEFORMATEXTENSIBLE_CHANNEL_MASK tag in a Vorbis comment metadata
block, see Section 8.6.2 for details. Note that even when such a block; see Section 8.6.2 for details. Note that even when such a
different channel layout is specified with a different channel layout is specified with a
WAVEFORMATEXTENSIBLE_CHANNEL_MASK and the channel ordering in the WAVEFORMATEXTENSIBLE_CHANNEL_MASK and the channel ordering in the
following table is overridden, the channels bits still contain the following table is overridden, the channels bits still contain the
actual number of channels coded in the frame. For details on the way actual number of channels coded in the frame. For details on the way
left/side, right/side, and mid/side stereo are coded, see left/side, right/side, and mid/side stereo are coded, see
Section 4.2. Section 4.2.
+==========+====================================================+ +==========+====================================================+
| Value | Channels | | Value | Channels |
+==========+====================================================+ +==========+====================================================+
skipping to change at page 36, line 40 skipping to change at line 1574
+----------+----------------------------------------------------+ +----------+----------------------------------------------------+
| 0b0101 | 6 channels: front left, front right, front center, | | 0b0101 | 6 channels: front left, front right, front center, |
| | LFE, back/surround left, back/surround right | | | LFE, back/surround left, back/surround right |
+----------+----------------------------------------------------+ +----------+----------------------------------------------------+
| 0b0110 | 7 channels: front left, front right, front center, | | 0b0110 | 7 channels: front left, front right, front center, |
| | LFE, back center, side left, side right | | | LFE, back center, side left, side right |
+----------+----------------------------------------------------+ +----------+----------------------------------------------------+
| 0b0111 | 8 channels: front left, front right, front center, | | 0b0111 | 8 channels: front left, front right, front center, |
| | LFE, back left, back right, side left, side right | | | LFE, back left, back right, side left, side right |
+----------+----------------------------------------------------+ +----------+----------------------------------------------------+
| 0b1000 | 2 channels, left, right, stored as left/side | | 0b1000 | 2 channels: left, right; stored as left/side |
| | stereo | | | stereo |
+----------+----------------------------------------------------+ +----------+----------------------------------------------------+
| 0b1001 | 2 channels, left, right, stored as right/side | | 0b1001 | 2 channels: left, right; stored as right/side |
| | stereo | | | stereo |
+----------+----------------------------------------------------+ +----------+----------------------------------------------------+
| 0b1010 | 2 channels, left, right, stored as mid/side stereo | | 0b1010 | 2 channels: left, right; stored as mid/side stereo |
+----------+----------------------------------------------------+ +----------+----------------------------------------------------+
| 0b1011 - | reserved | | 0b1011 - | Reserved |
| 0b1111 | | | 0b1111 | |
+----------+----------------------------------------------------+ +----------+----------------------------------------------------+
Table 16 Table 16
9.1.4. Bit depth bits 9.1.4. Bit Depth Bits
The next 3 bits (bits 5, 6 and 7 of each fourth byte of each frame) The next 3 bits (bits 5, 6, and 7 of each fourth byte of each frame)
contain the bit depth of the audio according to the following table. contain the bit depth of the audio according to the following table.
The next bit is reserved and MUST be zero.
+=======+========================================================+ +=======+========================================================+
| Value | Bit depth | | Value | Bit Depth |
+=======+========================================================+ +=======+========================================================+
| 0b000 | bit depth only stored in the streaminfo metadata block | | 0b000 | bit depth only stored in the streaminfo metadata block |
+-------+--------------------------------------------------------+ +-------+--------------------------------------------------------+
| 0b001 | 8 bits per sample | | 0b001 | 8 bits per sample |
+-------+--------------------------------------------------------+ +-------+--------------------------------------------------------+
| 0b010 | 12 bits per sample | | 0b010 | 12 bits per sample |
+-------+--------------------------------------------------------+ +-------+--------------------------------------------------------+
| 0b011 | reserved | | 0b011 | Reserved |
+-------+--------------------------------------------------------+ +-------+--------------------------------------------------------+
| 0b100 | 16 bits per sample | | 0b100 | 16 bits per sample |
+-------+--------------------------------------------------------+ +-------+--------------------------------------------------------+
| 0b101 | 20 bits per sample | | 0b101 | 20 bits per sample |
+-------+--------------------------------------------------------+ +-------+--------------------------------------------------------+
| 0b110 | 24 bits per sample | | 0b110 | 24 bits per sample |
+-------+--------------------------------------------------------+ +-------+--------------------------------------------------------+
| 0b111 | 32 bits per sample | | 0b111 | 32 bits per sample |
+-------+--------------------------------------------------------+ +-------+--------------------------------------------------------+
Table 17 Table 17
The next bit is reserved and MUST be zero. 9.1.5. Coded Number
9.1.5. Coded number
Following the reserved bit (starting at the fifth byte of the frame) Following the reserved bit (starting at the fifth byte of the frame)
is either a sample or a frame number, which will be referred to as is either a sample or a frame number, which will be referred to as
the coded number. When dealing with variable block size streams, the the coded number. When dealing with variable block size streams, the
sample number of the first sample in the frame is encoded. When the sample number of the first sample in the frame is encoded. When the
file contains a fixed block size stream, the frame number is encoded. file contains a fixed block size stream, the frame number is encoded.
See Section 9.1 on the blocking strategy bit which signals whether a See Section 9.1 on the blocking strategy bit, which signals whether a
stream is a fixed block size stream or a variable block size stream. stream is a fixed block size stream or a variable block size stream.
Also see Appendix B.1. See also Appendix B.1.
The coded number is stored in a variable length code like UTF-8 as The coded number is stored in a variable-length code like UTF-8 as
defined in [RFC3629], but extended to a maximum of 36 bits unencoded, defined in [RFC3629] but extended to a maximum of 36 bits unencoded
7 bytes encoded. or 7 bytes encoded.
When a frame number is encoded, the value MUST NOT be larger than When a frame number is encoded, the value MUST NOT be larger than
what fits a value of 31 bits unencoded or 6 bytes encoded. Please what fits a value of 31 bits unencoded or 6 bytes encoded. Please
note that as most general purpose UTF-8 encoders and decoders follow note that as most general purpose UTF-8 encoders and decoders follow
[RFC3629], they will not be able to handle these extended codes. [RFC3629], they will not be able to handle these extended codes.
Furthermore, while UTF-8 is specifically used to encode characters, Furthermore, while UTF-8 is specifically used to encode characters,
FLAC uses it to encode numbers instead. To encode or decode a coded FLAC uses it to encode numbers instead. To encode or decode a coded
number, follow the procedures of Section 3 of [RFC3629], but instead number, follow the procedures in Section 3 of [RFC3629], but instead
of using a character number, use a frame or sample number, and of using a character number, use a frame or sample number. In
instead of the table in Section 3 of [RFC3629], use the extended addition, use the extended table below instead of the table in
table below. Section 3 of [RFC3629].
+============================+=====================================+ +============================+=====================================+
| Number range (hexadecimal) | Octet sequence (binary) | | Number Range (Hexadecimal) | Octet Sequence (Binary) |
+============================+=====================================+ +============================+=====================================+
| 0000 0000 0000 - | 0xxxxxxx | | 0000 0000 0000 - | 0xxxxxxx |
| 0000 0000 007F | | | 0000 0000 007F | |
+----------------------------+-------------------------------------+ +----------------------------+-------------------------------------+
| 0000 0000 0080 - | 110xxxxx 10xxxxxx | | 0000 0000 0080 - | 110xxxxx 10xxxxxx |
| 0000 0000 07FF | | | 0000 0000 07FF | |
+----------------------------+-------------------------------------+ +----------------------------+-------------------------------------+
| 0000 0000 0800 - | 1110xxxx 10xxxxxx 10xxxxxx | | 0000 0000 0800 - | 1110xxxx 10xxxxxx 10xxxxxx |
| 0000 0000 FFFF | | | 0000 0000 FFFF | |
+----------------------------+-------------------------------------+ +----------------------------+-------------------------------------+
skipping to change at page 38, line 45 skipping to change at line 1675
+----------------------------+-------------------------------------+ +----------------------------+-------------------------------------+
Table 18 Table 18
If the coded number is a frame number, it MUST be equal to the number If the coded number is a frame number, it MUST be equal to the number
of frames preceding the current frame. If the coded number is a of frames preceding the current frame. If the coded number is a
sample number, it MUST be equal to the number of samples preceding sample number, it MUST be equal to the number of samples preceding
the current frame. In a stream where these requirements are not met, the current frame. In a stream where these requirements are not met,
seeking is not (reliably) possible. seeking is not (reliably) possible.
For example, a frame that belongs to a variable block size stream and For example, for a frame that belongs to a variable block size stream
has exactly 51 billion samples preceding it, has its coded number and has exactly 51 billion samples preceding it, the coded number is
constructed as follows. constructed as follows:
Octets 1-5 Octets 1-5
0b11111110 0b10101111 0b10011111 0b10110101 0b10100011 0b11111110 0b10101111 0b10011111 0b10110101 0b10100011
^^^^^^ ^^^^^^ ^^^^^^ ^^^^^^ ^^^^^^ ^^^^^^ ^^^^^^ ^^^^^^
| | | Bits 18-13 | | | Bits 18-13
| | Bits 24-19 | | Bits 24-19
| Bits 30-25 | Bits 30-25
Bits 36-31 Bits 36-31
Octets 6-7 Octets 6-7
0b10111000 0b10000000 0b10111000 0b10000000
^^^^^^ ^^^^^^ ^^^^^^ ^^^^^^
| Bits 6-1 | Bits 6-1
Bits 12-7 Bits 12-7
A decoder that relies on the coded number during seeking could be A decoder that relies on the coded number during seeking could be
vulnerable to buffer overflows or getting stuck in an infinite loop vulnerable to buffer overflows or getting stuck in an infinite loop
if it seeks in a stream where the coded numbers are not strictly if it seeks in a stream where the coded numbers are not strictly
increasing or otherwise not valid. See also Section 12. increasing or are otherwise not valid. See also Section 11.
9.1.6. Uncommon block size 9.1.6. Uncommon Block Size
If the block size bits defined earlier in this section were 0b0110 or If the block size bits defined earlier in this section are 0b0110 or
0b0111 (uncommon block size minus 1 stored), this follows the coded 0b0111 (uncommon block size minus 1 stored), this follows the coded
number as either an 8-bit or a 16-bit unsigned number coded big- number as either an 8-bit or a 16-bit unsigned number coded big-
endian. A value of 65535 (corresponding to a block size of 65536) is endian. A value of 65535 (corresponding to a block size of 65536) is
forbidden and MUST NOT be used, because such a block size cannot be forbidden and MUST NOT be used, because such a block size cannot be
represented in the streaminfo metadata block. A value from 0 up to represented in the streaminfo metadata block. A value from 0 up to
(and including) 14, which corresponds to a block size from 1 to 15, (and including) 14, which corresponds to a block size from 1 to 15,
is only valid for the last frame in a stream and MUST NOT be used for is only valid for the last frame in a stream and MUST NOT be used for
any other frame. See also Section 8.2. any other frame. See also Section 8.2.
9.1.7. Uncommon sample rate 9.1.7. Uncommon Sample Rate
Following the uncommon block size (or the coded number if no uncommon If the sample rate bits are 0b1100, 0b1101, or 0b1110 (uncommon
block size is stored) is the sample rate, if the sample rate bits sample rate stored), the sample rate follows the uncommon block size
were 0b1100, 0b1101, or 0b1110 (uncommon sample rate stored), as (or the coded number if no uncommon block size is stored) as either
either an 8-bit or a 16-bit unsigned number coded big-endian. an 8-bit or a 16-bit unsigned number coded big-endian.
The sample rate MUST NOT be 0 when the subframe contains audio. A The sample rate MUST NOT be 0 when the subframe contains audio. A
sample rate of 0 MAY be used when non-audio is represented. See sample rate of 0 MAY be used when non-audio is represented. See
Section 8.2 for details. Section 8.2 for details.
9.1.8. Frame header CRC 9.1.8. Frame Header CRC
Finally, after either the frame/sample number, an uncommon block Finally, an 8-bit CRC follows the frame/sample number, an uncommon
size, or an uncommon sample rate, depending on whether the latter two block size, or an uncommon sample rate (depending on whether the
are stored, is an 8-bit CRC. This CRC is initialized with 0 and has latter two are stored). This CRC is initialized with 0 and has the
the polynomial x^8 + x^2 + x^1 + x^0. This CRC covers the whole polynomial x^8 + x^2 + x^1 + x^0. This CRC covers the whole frame
frame header before the CRC, including the sync code. header before the CRC, including the sync code.
9.2. Subframes 9.2. Subframes
Following the frame header are a number of subframes equal to the Following the frame header are a number of subframes equal to the
number of audio channels. Note that as subframes contain a bitstream number of audio channels. Note that subframes contain a bitstream
that does not necessarily has to be a whole number of bytes, only the that does not necessarily have to be a whole number of bytes, so only
first subframe always starts at a byte boundary. the first subframe starts at a byte boundary.
9.2.1. Subframe header 9.2.1. Subframe Header
Each subframe starts with a header. The first bit of the header MUST Each subframe starts with a header. The first bit of the header MUST
be 0, followed by 6 bits describing which subframe type is used be 0, followed by 6 bits that describe which subframe type is used
according to the following table, where v is the value of the 6 bits according to the following table, where v is the value of the 6 bits
as an unsigned number. as an unsigned number.
+=====================+===========================================+ +=====================+===========================================+
| Value | Subframe type | | Value | Subframe Type |
+=====================+===========================================+ +=====================+===========================================+
| 0b000000 | Constant subframe | | 0b000000 | Constant subframe |
+---------------------+-------------------------------------------+ +---------------------+-------------------------------------------+
| 0b000001 | Verbatim subframe | | 0b000001 | Verbatim subframe |
+---------------------+-------------------------------------------+ +---------------------+-------------------------------------------+
| 0b000010 - 0b000111 | reserved | | 0b000010 - 0b000111 | Reserved |
+---------------------+-------------------------------------------+ +---------------------+-------------------------------------------+
| 0b001000 - 0b001100 | Subframe with a fixed predictor of order | | 0b001000 - 0b001100 | Subframe with a fixed predictor of order |
| | v-8, i.e., 0, 1, 2, 3 or 4 | | | v-8; i.e., 0, 1, 2, 3 or 4 |
+---------------------+-------------------------------------------+ +---------------------+-------------------------------------------+
| 0b001101 - 0b011111 | reserved | | 0b001101 - 0b011111 | Reserved |
+---------------------+-------------------------------------------+ +---------------------+-------------------------------------------+
| 0b100000 - 0b111111 | Subframe with a linear predictor of order | | 0b100000 - 0b111111 | Subframe with a linear predictor of order |
| | v-31, i.e., 1 through 32 (inclusive) | | | v-31; i.e., 1 through 32 (inclusive) |
+---------------------+-------------------------------------------+ +---------------------+-------------------------------------------+
Table 19 Table 19
Following the subframe type bits is a bit that flags whether the Following the subframe type bits is a bit that flags whether the
subframe uses any wasted bits (see Section 9.2.2). If it is 0, the subframe uses any wasted bits (see Section 9.2.2). If it is 0, the
subframe doesn't use any wasted bits and the subframe header is subframe doesn't use any wasted bits, and the subframe header is
complete. If it is 1, the subframe does use wasted bits and the complete. If it is 1, the subframe does use wasted bits, and the
number of used wasted bits follows unary coded. number of used wasted bits follows unary coded.
9.2.2. Wasted bits per sample 9.2.2. Wasted Bits per Sample
Most uncompressed audio file formats can only store audio samples Most uncompressed audio file formats can only store audio samples
with a bit depth that is an integer number of bytes. Samples of with a bit depth that is an integer number of bytes. Samples in
which the bit depth is not an integer number of bytes are usually which the bit depth is not an integer number of bytes are usually
stored in such formats by padding them with least-significant zero stored in such formats by padding them with least-significant zero
bits to a bit depth that is an integer number of bytes. For example, bits to a bit depth that is an integer number of bytes. For example,
shifting a 14-bit sample right by 2 pads it to a 16-bit sample, which shifting a 14-bit sample right by 2 pads it to a 16-bit sample, which
then has two zero least-significant bits. In this specification, then has two zero least-significant bits. In this specification,
these least-significant zero bits are referred to as wasted bits per these least-significant zero bits are referred to as wasted bits per
sample or simply wasted bits. They are wasted in the sense that they sample or simply wasted bits. They are wasted in the sense that they
contain no information, but are stored anyway. contain no information but are stored anyway.
The FLAC format can optionally take advantage of these wasted bits by The FLAC format can optionally take advantage of these wasted bits by
signaling their presence and coding the subframe without them. To do signaling their presence and coding the subframe without them. To do
this, the wasted bits per sample flag in a subframe header is set to this, the wasted bits per sample flag in a subframe header is set to
0 and the number of wasted bits per sample (k) minus 1 follows the 0, and the number of wasted bits per sample (k) minus 1 follows the
flag in an unary encoding. For example, if k is 3, 0b001 follows. flag in an unary encoding. For example, if k is 3, 0b001 follows.
If k = 0, the wasted bits per sample flag is 0 and no unary coded k If k = 0, the wasted bits per sample flag is 0 and no unary-coded k
follows. In this document, if a subframe header signals a certain follows. In this document, if a subframe header signals a certain
number of wasted bits, it is said it 'uses' these wasted bits. number of wasted bits, it is said it "uses" these wasted bits.
If a subframe uses wasted bits (i.e., k is not equal to 0), samples If a subframe uses wasted bits (i.e., k is not equal to 0), samples
are coded ignoring k least-significant bits. For example, if a frame are coded ignoring k least-significant bits. For example, if a frame
not employing stereo decorrelation specifies a sample size of 16 bits not employing stereo decorrelation specifies a sample size of 16 bits
per sample in the frame header and k of a subframe is 3, samples in per sample in the frame header and k of a subframe is 3, samples in
the subframe are coded as 13 bits per sample. For more details, see the subframe are coded as 13 bits per sample. For more details, see
Section 9.2.3 on how the bit depth of a subframe is calculated. A Section 9.2.3 on how the bit depth of a subframe is calculated. A
decoder MUST add k least-significant zero bits by shifting left decoder MUST add k least-significant zero bits by shifting left
(padding) after decoding a subframe sample. If the frame has left/ (padding) after decoding a subframe sample. If the frame has left/
side, right/side, or mid/side stereo, a decoder MUST perform padding side, right/side, or mid/side stereo, a decoder MUST perform padding
on the subframes before restoring the channels to left and right. on the subframes before restoring the channels to left and right.
The number of wasted bits per sample MUST be such that the resulting The number of wasted bits per sample MUST be such that the resulting
number of bits per sample (of which the calculation is explained in number of bits per sample (of which the calculation is explained in
Section 9.2.3) is larger than zero. Section 9.2.3) is larger than zero.
Besides audio files that have a certain number of wasted bits for the Besides audio files that have a certain number of wasted bits for the
whole file, there exist audio files in which the number of wasted whole file, audio files exist in which the number of wasted bits
bits varies. There are DVD-Audio discs in which blocks of samples varies. There are DVD-Audio discs in which blocks of samples have
have had their least-significant bits selectively zeroed to slightly had their least-significant bits selectively zeroed to slightly
improve the compression of their otherwise lossless Meridian Lossless improve the compression of their otherwise lossless Meridian Lossless
Packing codec, see [MLP]. There are also audio processors like Packing codec; see [MLP]. There are also audio processors like
lossyWAV, see [lossyWAV], which zero a number of least-sigificant lossyWAV (see [lossyWAV]) that zero a number of least-significant
bits for a block of samples, increasing the compression in a non- bits for a block of samples, increasing the compression in a non-
lossless way. Because of this, the number of wasted bits k MAY lossless way. Because of this, the number of wasted bits k MAY
change between frames and MAY differ between subframes. If the change between frames and MAY differ between subframes. If the
number of wasted bits changes halfway through a subframe (e.g., the number of wasted bits changes halfway through a subframe (e.g., the
first part has 2 wasted bits and the second part has 4 wasted bits) first part has 2 wasted bits and the second part has 4 wasted bits),
the subframe uses the lowest number of wasted bits, as otherwise non- the subframe uses the lowest number of wasted bits; otherwise, non-
zero bits would be discarded and the process would not be lossless. zero bits would be discarded, and the process would not be lossless.
9.2.3. Constant subframe 9.2.3. Constant Subframe
In a constant subframe, only a single sample is stored. This sample In a constant subframe, only a single sample is stored. This sample
is stored as an integer number coded big-endian, signed two's is stored as an integer number coded big-endian, signed two's
complement. The number of bits used to store this sample depends on complement. The number of bits used to store this sample depends on
the bit depth of the current subframe. The bit depth of a subframe the bit depth of the current subframe. The bit depth of a subframe
is equal to the bit depth as coded in the frame header (see is equal to the bit depth as coded in the frame header (see
Section 9.1.4), minus the number of used wasted bits coded in the Section 9.1.4) minus the number of used wasted bits coded in the
subframe header (see Section 9.2.2). If a subframe is a side subframe header (see Section 9.2.2). If a subframe is a side
subframe (see Section 4.2), the bit depth of that subframe is subframe (see Section 4.2), the bit depth of that subframe is
increased by 1 bit. increased by 1 bit.
9.2.4. Verbatim subframe 9.2.4. Verbatim Subframe
A verbatim subframe stores all samples unencoded in sequential order. A verbatim subframe stores all samples unencoded in sequential order.
See Section 9.2.3 on how a sample is stored unencoded. The number of See Section 9.2.3 on how a sample is stored unencoded. The number of
samples that need to be stored in a subframe is given by the block samples that need to be stored in a subframe is given by the block
size in the frame header. size in the frame header.
9.2.5. Fixed predictor subframe 9.2.5. Fixed Predictor Subframe
Five different fixed predictors are defined in the following table, Five different fixed predictors are defined in the following table,
one for each prediction order 0 through 4. In the table is also a one for each prediction order 0 through 4. The table also contains a
derivation, which explains the rationale for choosing these fixed derivation that explains the rationale for choosing these fixed
predictors. predictors.
+=======+==================================+======================+ +=======+==================================+======================+
| Order | Prediction | Derivation | | Order | Prediction | Derivation |
+=======+==================================+======================+ +=======+==================================+======================+
| 0 | 0 | N/A | | 0 | 0 | N/A |
+-------+----------------------------------+----------------------+ +-------+----------------------------------+----------------------+
| 1 | a(n-1) | N/A | | 1 | a(n-1) | N/A |
+-------+----------------------------------+----------------------+ +-------+----------------------------------+----------------------+
| 2 | 2 * a(n-1) - a(n-2) | a(n-1) + a'(n-1) | | 2 | 2 * a(n-1) - a(n-2) | a(n-1) + a'(n-1) |
+-------+----------------------------------+----------------------+ +-------+----------------------------------+----------------------+
| 3 | 3 * a(n-1) - 3 * a(n-2) + a(n-3) | a(n-1) + a'(n-1) + | | 3 | 3 * a(n-1) - 3 * a(n-2) + a(n-3) | a(n-1) + a'(n-1) + |
| | | a''(n-1) | | | | a''(n-1) |
+-------+----------------------------------+----------------------+ +-------+----------------------------------+----------------------+
| 4 | 4 * a(n-1) - 6 * a(n-2) + 4 * | a(n-1) + a'(n-1) + | | 4 | 4 * a(n-1) - 6 * a(n-2) + 4 * | a(n-1) + a'(n-1) + |
| | a(n-3) - a(n-4) | a''(n-1) + a'''(n-1) | | | a(n-3) - a(n-4) | a''(n-1) + a'''(n-1) |
+-------+----------------------------------+----------------------+ +-------+----------------------------------+----------------------+
Table 20 Table 20
Where Where:
* n is the number of the sample being predicted. * n is the number of the sample being predicted.
* a(n) is the sample being predicted. * a(n) is the sample being predicted.
* a(n-1) is the sample before the one being predicted. * a(n-1) is the sample before the one being predicted.
* a'(n-1) is the difference between the previous sample and the * a'(n-1) is the difference between the previous sample and the
sample before that, i.e., a(n-1) - a(n-2). This is the closest sample before that, i.e., a(n-1) - a(n-2). This is the closest
available first-order discrete derivative. available first-order discrete derivative.
* a''(n-1) is a'(n-1) - a'(n-2) or the closest available second- * a''(n-1) is a'(n-1) - a'(n-2) or the closest available second-
order discrete derivative. order discrete derivative.
* a'''(n-1) is a''(n-1) - a''(n-2) or the closest available third- * a'''(n-1) is a''(n-1) - a''(n-2) or the closest available third-
order discrete derivative. order discrete derivative.
As a predictor makes use of samples preceding the sample that is As a predictor makes use of samples preceding the sample that is
predicted, it can only be used when enough samples are known. As predicted, it can only be used when enough samples are known. As
each subframe in FLAC is coded completely independently, the first each subframe in FLAC is coded completely independently, the first
few samples in each subframe cannot be predicted. Therefore, a few samples in each subframe cannot be predicted. Therefore, a
number of so-called warm-up samples equal to the predictor order is number of so-called warm-up samples equal to the predictor order is
stored. These are stored unencoded, bypassing the predictor and stored. These are stored unencoded, bypassing the predictor and
residual coding stages. See Section 9.2.3 on how samples are stored residual coding stages. See Section 9.2.3 on how samples are stored
skipping to change at page 44, line 17 skipping to change at line 1904
+==========+===========================================+ +==========+===========================================+
| s(n) | Unencoded warm-up samples (n = subframe's | | s(n) | Unencoded warm-up samples (n = subframe's |
| | bits per sample * predictor order). | | | bits per sample * predictor order). |
+----------+-------------------------------------------+ +----------+-------------------------------------------+
| Coded | Coded residual as defined in | | Coded | Coded residual as defined in |
| residual | Section 9.2.7 | | residual | Section 9.2.7 |
+----------+-------------------------------------------+ +----------+-------------------------------------------+
Table 21 Table 21
As the fixed predictors are specified, they do not have to be stored. Because fixed predictors are specified, they do not have to be
The fixed predictor order, which is stored in the subframe header, stored. The fixed predictor order, which is stored in the subframe
specifies which predictor is used. header, specifies which predictor is used.
To encode a signal with a fixed predictor, each sample has the To encode a signal with a fixed predictor, each sample has the
corresponding prediction subtracted and sent to the residual coder. corresponding prediction subtracted and sent to the residual coder.
To decode a signal with a fixed predictor, the residual is decoded, To decode a signal with a fixed predictor, the residual is decoded,
and then the prediction can be added for each sample. This means and then the prediction can be added for each sample. This means
that decoding is necessarily a sequential process within a subframe, that decoding is necessarily a sequential process within a subframe,
as for each sample, enough fully decoded previous samples are needed as for each sample, enough fully decoded previous samples are needed
to calculate the prediction. to calculate the prediction.
For fixed predictor order 0, the prediction is always 0, thus each For fixed predictor order 0, the prediction is always 0; thus, each
residual sample is equal to its corresponding input or decoded residual sample is equal to its corresponding input or decoded
sample. The difference between a fixed predictor with order 0 and a sample. The difference between a fixed predictor with order 0 and a
verbatim subframe, is that a verbatim subframe stores all samples verbatim subframe is that a verbatim subframe stores all samples
unencoded, while a fixed predictor with order 0 has all its samples unencoded while a fixed predictor with order 0 has all its samples
processed by the residual coder. processed by the residual coder.
The first order fixed predictor is comparable to how DPCM encoding The first-order fixed predictor is comparable to how differential
works, as the resulting residual sample is the difference between the pulse-code modulation (DPCM) encoding works, as the resulting
corresponding sample and the sample before it. The higher order residual sample is the difference between the corresponding sample
fixed predictors can be understood as polynomials fitted to the and the sample before it. The higher-order fixed predictors can be
previous samples. understood as polynomials fitted to the previous samples.
9.2.6. Linear predictor subframe 9.2.6. Linear Predictor Subframe
Whereas fixed predictors are well suited for simple signals, using a Whereas fixed predictors are well suited for simple signals, using a
(non-fixed) linear predictor on more complex signals can improve (non-fixed) linear predictor on more complex signals can improve
compression by making the residual samples even smaller. There is a compression by making the residual samples even smaller. There is a
certain trade-off however, as storing the predictor coefficients certain trade-off, however, as storing the predictor coefficients
takes up space as well. takes up space as well.
In the FLAC format, a predictor is defined by up to 32 predictor In the FLAC format, a predictor is defined by up to 32 predictor
coefficients and a shift. To form a prediction, each coefficient is coefficients and a shift. To form a prediction, each coefficient is
multiplied by its corresponding past sample, the results are summed, multiplied by its corresponding past sample, the results are summed,
and this sum is then shifted. To encode a signal with a linear and this sum is then shifted. To encode a signal with a linear
predictor, each sample has the corresponding prediction subtracted predictor, each sample has the corresponding prediction subtracted
and sent to the residual coder. To decode a signal with a linear and sent to the residual coder. To decode a signal with a linear
predictor, the residual is decoded, and then the prediction can be predictor, the residual is decoded, and then the prediction can be
added for each sample. This means that decoding MUST be a sequential added for each sample. This means that decoding MUST be a sequential
process within a subframe, as for each sample, enough decoded samples process within a subframe, as enough decoded samples are needed to
are needed to calculate the prediction. calculate the prediction for each sample.
The table below defines how a linear predictor subframe appears in The table below defines how a linear predictor subframe appears in
the bitstream. the bitstream.
+==========+==========================================+ +==========+==========================================+
| Data | Description | | Data | Description |
+==========+==========================================+ +==========+==========================================+
| s(n) | Unencoded warm-up samples (n = | | s(n) | Unencoded warm-up samples (n = |
| | subframe's bits per sample * lpc order). | | | subframe's bits per sample * LPC order). |
+----------+------------------------------------------+ +----------+------------------------------------------+
| u(4) | (Predictor coefficient precision in | | u(4) | (Predictor coefficient precision in |
| | bits)-1 (NOTE: 0b1111 is forbidden). | | | bits)-1 (Note: 0b1111 is forbidden). |
+----------+------------------------------------------+ +----------+------------------------------------------+
| s(5) | Prediction right shift needed in bits. | | s(5) | Prediction right shift needed in bits. |
+----------+------------------------------------------+ +----------+------------------------------------------+
| s(n) | Predictor coefficients (n = predictor | | s(n) | Predictor coefficients (n = predictor |
| | coefficient precision * lpc order). | | | coefficient precision * LPC order). |
+----------+------------------------------------------+ +----------+------------------------------------------+
| Coded | Coded residual as defined in | | Coded | Coded residual as defined in |
| residual | Section 9.2.7 | | residual | Section 9.2.7. |
+----------+------------------------------------------+ +----------+------------------------------------------+
Table 22 Table 22
See Section 9.2.3 on how the warm-up samples are stored unencoded. See Section 9.2.3 on how the warm-up samples are stored unencoded.
The predictor coefficients are stored as an integer number coded big- The predictor coefficients are stored as an integer number coded big-
endian, signed two's complement, where the number of bits needed for endian, signed two's complement, where the number of bits needed for
each coefficient is defined by the predictor coefficient precision. each coefficient is defined by the predictor coefficient precision.
While the prediction right shift is signed two's complement, this While the prediction right shift is signed two's complement, this
number MUST NOT be negative, see Appendix B.4 for an explanation why number MUST NOT be negative; see Appendix B.4 for an explanation why
this is. this is.
Please note that the order in which the predictor coefficients appear Please note that the order in which the predictor coefficients appear
in the bitstream corresponds to which *past* sample they belong to. in the bitstream corresponds to which *past* sample they belong to.
In other words, the order of the predictor coefficients is opposite In other words, the order of the predictor coefficients is opposite
to the chronological order of the samples. So, the first predictor to the chronological order of the samples. So, the first predictor
coefficient has to be multiplied with the sample directly before the coefficient has to be multiplied with the sample directly before the
sample that is being predicted, the second predictor coefficient has sample that is being predicted, the second predictor coefficient has
to be multiplied with the sample before that, etc. to be multiplied with the sample before that, etc.
9.2.7. Coded residual 9.2.7. Coded Residual
The first two bits in a coded residual indicate which coding method The first two bits in a coded residual indicate which coding method
is used. See the table below. is used. See the table below.
+=============+=============================================+ +=============+=============================================+
| Value | Description | | Value | Description |
+=============+=============================================+ +=============+=============================================+
| 0b00 | partitioned Rice code with 4-bit parameters | | 0b00 | partitioned Rice code with 4-bit parameters |
+-------------+---------------------------------------------+ +-------------+---------------------------------------------+
| 0b01 | partitioned Rice code with 5-bit parameters | | 0b01 | partitioned Rice code with 5-bit parameters |
+-------------+---------------------------------------------+ +-------------+---------------------------------------------+
| 0b10 - 0b11 | reserved | | 0b10 - 0b11 | Reserved |
+-------------+---------------------------------------------+ +-------------+---------------------------------------------+
Table 23 Table 23
Both defined coding methods work the same way, but differ in the Both defined coding methods work the same way but differ in the
number of bits used for Rice parameters. The 4 bits that directly number of bits used for Rice parameters. The 4 bits that directly
follow the coding method bits form the partition order, which is an follow the coding method bits form the partition order, which is an
unsigned number. The rest of the coded residual consists of unsigned number. The rest of the coded residual consists of
2^(partition order) partitions. For example, if the 4 bits are 2^(partition order) partitions. For example, if the 4 bits are
0b1000, the partition order is 8 and the residual is split up into 0b1000, the partition order is 8, and the residual is split up into
2^8 = 256 partitions. 2^8 = 256 partitions.
Each partition contains a certain number of residual samples. The Each partition contains a certain number of residual samples. The
number of residual samples in the first partition is equal to (block number of residual samples in the first partition is equal to (block
size >> partition order) - predictor order, i.e., the block size size >> partition order) - predictor order, i.e., the block size
divided by the number of partitions minus the predictor order. In divided by the number of partitions minus the predictor order. In
all other partitions, the number of residual samples is equal to all other partitions, the number of residual samples is equal to
(block size >> partition order). (block size >> partition order).
The partition order MUST be such that the block size is evenly The partition order MUST be such that the block size is evenly
divisible by the number of partitions. This means, for example, that divisible by the number of partitions. This means, for example, that
for all odd block sizes, only partition order 0 is allowed. The only partition order 0 is allowed for all odd block sizes. The
partition order also MUST be such that the (block size >> partition partition order also MUST be such that the (block size >> partition
order) is larger than the predictor order. This means, for example, order) is larger than the predictor order. This means, for example,
that with a block size of 4096 and a predictor order of 4, the that with a block size of 4096 and a predictor order of 4, the
partition order cannot be larger than 9. partition order cannot be larger than 9.
Each partition starts with a parameter. If the coded residual of a Each partition starts with a parameter. If the coded residual of a
subframe is one with 4-bit Rice parameters (see the table at the subframe is one with 4-bit Rice parameters (see Table 23), the first
start of this section), the first 4 bits of each partition are either 4 bits of each partition are either a Rice parameter or an escape
a Rice parameter or an escape code. These 4 bits indicate an escape code. These 4 bits indicate an escape code if they are 0b1111;
code if they are 0b1111, otherwise they contain the Rice parameter as otherwise, they contain the Rice parameter as an unsigned number. If
an unsigned number. If the coded residual of the current subframe is the coded residual of the current subframe is one with 5-bit Rice
one with 5-bit Rice parameters, the first 5 bits of each partition parameters, the first 5 bits of each partition indicate an escape
indicate an escape code if they are 0b11111, otherwise, they contain code if they are 0b11111; otherwise, they contain the Rice parameter
the Rice parameter as an unsigned number as well. as an unsigned number as well. Escaped partition Escaped Partition
If an escape code was used, the partition does not contain a If an escape code was used, the partition does not contain a
variable-length Rice coded residual, but a fixed-length unencoded variable-length Rice-coded residual; rather, it contains a fixed-
residual. Directly following the escape code are 5 bits containing length unencoded residual. Directly following the escape code are 5
the number of bits with which each residual sample is stored, as an bits containing the number of bits with which each residual sample is
unsigned number. The residual samples themselves are stored signed stored, as an unsigned number. The residual samples themselves are
two's complement. For example, when a partition is escaped and each stored signed two's complement. For example, when a partition is
residual sample is stored with 3 bits, the number -1 is represented escaped and each residual sample is stored with 3 bits, the number -1
as 0b111. is represented as 0b111.
Note that it is possible that the number of bits with which each Note that it is possible that the number of bits with which each
sample is stored is 0, which means all residual samples in that sample is stored is 0, which means that all residual samples in that
partition have a value of 0 and that no bits are used to store the partition have a value of 0 and that no bits are used to store the
samples. In that case, the partition contains nothing except the samples. In that case, the partition contains nothing except the
escape code and 0b00000. escape code and 0b00000. Rice code Rice Code
If a Rice parameter was provided for a certain partition, that If a Rice parameter was provided for a certain partition, that
partition contains a Rice coded residual. The residual samples, partition contains a Rice-coded residual. The residual samples,
which are signed numbers, are represented by unsigned numbers in the which are signed numbers, are represented by unsigned numbers in the
Rice code. For positive numbers, the representation is the number Rice code. For positive numbers, the representation is the number
doubled, for negative numbers, the representation is the number doubled. For negative numbers, the representation is the number
multiplied by -2 and has 1 subtracted. This representation of signed multiplied by -2 and with 1 subtracted. This representation of
numbers is also known as zigzag encoding. The zigzag encoded signed numbers is also known as zigzag encoding. The zigzag-encoded
residual is called the folded residual. residual is called the folded residual.
Each folded residual sample is then split into two parts, a most- Each folded residual sample is then split into two parts, a most-
significant part and a least-significant part. The Rice parameter at significant part and a least-significant part. The Rice parameter at
the start of each partition determines where that split lies: it is the start of each partition determines where that split lies: it is
the number of bits in the least-significant part. Each residual the number of bits in the least-significant part. Each residual
sample is then stored by coding the most-significant part as unary, sample is then stored by coding the most-significant part as unary,
followed by the least-significant part as binary. followed by the least-significant part as binary.
For example, take a partition with Rice parameter 3 containing a For example, take a partition with Rice parameter 3 containing a
folded residual sample with 38 as its value, which is 0b100110 in folded residual sample with 38 as its value, which is 0b100110 in
binary. The most-significant part is 0b100 (4) and is stored unary binary. The most-significant part is 0b100 (4) and is stored in
as 0b00001. The least-significant part is 0b110 (6) and is stored as unary form as 0b00001. The least-significant part is 0b110 (6) and
is. The Rice code word is thus 0b00001110. The Rice code words for is stored as is. The Rice code word is thus 0b00001110. The Rice
all residual samples in a partition are stored consecutively. code words for all residual samples in a partition are stored
To decode a Rice code word, zero bits must be counted until To decode a Rice code word, zero bits must be counted until
encountering a one bit, after which a number of bits given by the encountering a one bit, after which a number of bits given by the
Rice parameter must be read. The count of zero bits is shifted left Rice parameter must be read. The count of zero bits is shifted left
by the Rice parameter (i.e., multiplied by 2 raised to the power Rice by the Rice parameter (i.e., multiplied by 2 raised to the power Rice
parameter) and bitwise ORed with (i.e., added to) the read value. parameter) and bitwise ORed with (i.e., added to) the read value.
This is the folded residual value. An even folded residual value is This is the folded residual value. An even folded residual value is
shifted right 1 bit (i.e., divided by two) to get the (unfolded) shifted right 1 bit (i.e., divided by 2) to get the (unfolded)
residual value. An odd folded residual value is shifted right 1 bit residual value. An odd folded residual value is shifted right 1 bit
and then has all bits flipped (1 added to and divided by -2) to get and then has all bits flipped (1 added to and divided by -2) to get
the (unfolded) residual value, subject to negative numbers being the (unfolded) residual value, subject to negative numbers being
signed two's complement on the decoding machine. signed two's complement on the decoding machine.
Appendix D shows decoding of a complete coded residual. Appendix D shows decoding of a complete coded residual. Residual sample value limit Residual Sample Value Limit
All residual sample values MUST be representable in the range offered All residual sample values MUST be representable in the range offered
by a 32-bit integer, signed one's complement. Equivalently, all by a 32-bit integer, signed one's complement. Equivalently, all
residual sample values MUST fall in the range offered by a 32-bit residual sample values MUST fall in the range offered by a 32-bit
integer signed two's complement excluding the most negative possible integer signed two's complement, excluding the most negative possible
value of that range. This means residual sample values MUST NOT have value of that range. This means residual sample values MUST NOT have
an absolute value equal to, or larger than, 2 to the power 31. A an absolute value equal to, or larger than, 2 to the power 31. A
FLAC encoder MUST make sure of this. If a FLAC encoder is, for a FLAC encoder MUST make sure of this. If a FLAC encoder is, for a
certain subframe, unable to find a suitable predictor for which all certain subframe, unable to find a suitable predictor for which all
residual samples fall within said range, it MUST default to writing a residual samples fall within said range, it MUST default to writing a
verbatim subframe. Appendix A explains in which circumstances verbatim subframe. Appendix A explains in which circumstances
residual samples are already implicitly representable in said range residual samples are already implicitly representable in said range;
and thus an additional check is not needed. thus, an additional check is not needed.
The reason for this limit is to ensure that decoders can use 32-bit The reason for this limit is to ensure that decoders can use 32-bit
integers when processing residuals, simplifying decoding. The reason integers when processing residuals, simplifying decoding. The reason
the most negative value of a 32-bit int signed two's complement is the most negative value of a 32-bit integer signed two's complement
specifically excluded is to prevent decoders from having to implement is specifically excluded is to prevent decoders from having to
specific handling of that value, as it cannot be negated within a implement specific handling of that value, as it cannot be negated
32-bit signed int, and most library routines calculating an absolute within a 32-bit signed integer, and most library routines calculating
value have undefined behavior on processing that value. an absolute value have undefined behavior for processing that value.
9.3. Frame footer 9.3. Frame Footer
Following the last subframe is the frame footer. If the last Following the last subframe is the frame footer. If the last
subframe is not byte aligned (i.e., the number of bits required to subframe is not byte aligned (i.e., the number of bits required to
store all subframes put together is not divisible by 8), zero bits store all subframes put together is not divisible by 8), zero bits
are added until byte alignment is reached. Following this is a are added until byte alignment is reached. Following this is a
16-bit CRC, initialized with 0, with the polynomial x^16 + x^15 + x^2 16-bit CRC, initialized with 0, with the polynomial x^16 + x^15 + x^2
+ x^0. This CRC covers the whole frame excluding the 16-bit CRC, + x^0. This CRC covers the whole frame, excluding the 16-bit CRC but
including the sync code. including the sync code.
10. Container mappings 10. Container Mappings
The FLAC format can be used without any container, as it already The FLAC format can be used without any container, as it already
provides for the most basic features normally associated with a provides for the most basic features normally associated with a
container. However, the functionality this basic container provides container. However, the functionality this basic container provides
is rather limited, and for more advanced features, like combining is rather limited, and for more advanced features (such as combining
FLAC audio with video, it needs to be encapsulated by a more capable FLAC audio with video), it needs to be encapsulated by a more capable
container. This presents a problem: because of these container container. This presents a problem: because of these container
features, the FLAC format mixes data that belongs to the encoded data features, the FLAC format mixes data that belongs to the encoded data
(like block size and sample rate) with data that belongs to the (like block size and sample rate) with data that belongs to the
container (like checksum and timecode). The choice was made to container (like checksum and timecode). The choice was made to
encapsulate FLAC frames as they are, which means some data will be encapsulate FLAC frames as they are, which means some data will be
duplicated and potentially deviating between the FLAC frames and the duplicated and potentially deviating between the FLAC frames and the
encapsulating container. encapsulating container.
As FLAC frames are completely independent of each other, container As FLAC frames are completely independent of each other, container
format features handling dependencies do not need to be used. For format features handling dependencies do not need to be used. For
example, all FLAC frames embedded in Matroska are marked as keyframes example, all FLAC frames embedded in Matroska are marked as keyframes
when they are stored in a SimpleBlock, and tracks in an MP4 file when they are stored in a SimpleBlock, and tracks in an MP4 file
containing only FLAC frames do not need a sync sample box. containing only FLAC frames do not need a sync sample box.
10.1. Ogg mapping 10.1. Ogg Mapping
The Ogg container format is defined in [RFC3533]. The first packet The Ogg container format is defined in [RFC3533]. The first packet
of a logical bitstream carrying FLAC data is structured according to of a logical bitstream carrying FLAC data is structured according to
the following table. the following table.
+=========+=========================================================+ +=========+=========================================================+
| Data | Description | | Data | Description |
+=========+=========================================================+ +=========+=========================================================+
| 5 | Bytes 0x7F 0x46 0x4C 0x41 0x43 (as also defined by | | 5 | Bytes 0x7F 0x46 0x4C 0x41 0x43 (as also defined by |
| bytes | [RFC5334]) | | bytes | [RFC5334]). |
+---------+---------------------------------------------------------+ +---------+---------------------------------------------------------+
| 2 | Version number of the FLAC-in-Ogg mapping. These bytes | | 2 | Version number of the FLAC-in-Ogg mapping. These bytes |
| bytes | are 0x01 0x00, meaning version 1.0 of the mapping. | | bytes | are 0x01 0x00, meaning version 1.0 of the mapping. |
+---------+---------------------------------------------------------+ +---------+---------------------------------------------------------+
| 2 | Number of header packets (excluding the first header | | 2 | Number of header packets (excluding the first header |
| bytes | packet) as an unsigned number coded big-endian. | | bytes | packet) as an unsigned number coded big-endian. |
+---------+---------------------------------------------------------+ +---------+---------------------------------------------------------+
| 4 | The fLaC signature | | 4 | The fLaC signature. |
| bytes | | | bytes | |
+---------+---------------------------------------------------------+ +---------+---------------------------------------------------------+
| 4 | A metadata block header for the streaminfo block | | 4 | A metadata block header for the streaminfo block. |
| bytes | | | bytes | |
+---------+---------------------------------------------------------+ +---------+---------------------------------------------------------+
| 34 | A streaminfo metadata block | | 34 | A streaminfo metadata block. |
| bytes | | | bytes | |
+---------+---------------------------------------------------------+ +---------+---------------------------------------------------------+
Table 24 Table 24
The number of header packets MAY be 0, which means the number of The number of header packets MAY be 0, which means the number of
packets that follow is unknown. This first packet MUST NOT share a packets that follow is unknown. This first packet MUST NOT share a
Ogg page with any other packets. This means the first page of a Ogg page with any other packets. This means the first page of a
logical stream of FLAC-in-Ogg is always 79 bytes. logical stream of FLAC-in-Ogg is always 79 bytes.
Following the first packet are one or more header packets, each of Following the first packet are one or more header packets, each of
which contains a single metadata block. The first of these packets which contains a single metadata block. The first of these packets
SHOULD be a Vorbis comment metadata block, for historic reasons. SHOULD be a Vorbis comment metadata block for historic reasons. This
This is contrary to unencapsulated FLAC streams, where the order of is contrary to unencapsulated FLAC streams, where the order of
metadata blocks is not important except for the streaminfo block and metadata blocks is not important except for the streaminfo block and
where a Vorbis comment metadata block is optional. where a Vorbis comment metadata block is optional.
Following the header packets are audio packets. Each audio packet Following the header packets are audio packets. Each audio packet
contains a single FLAC frame. The first audio packet MUST start on a contains a single FLAC frame. The first audio packet MUST start on a
new Ogg page, i.e., the last metadata block MUST finish its page new Ogg page, i.e., the last metadata block MUST finish its page
before any audio packets are encapsulated. before any audio packets are encapsulated.
The granule position of all pages containing header packets MUST be The granule position of all pages containing header packets MUST be
0. For pages containing audio packets, the granule position is the 0. For pages containing audio packets, the granule position is the
number of the last sample contained in the last completed packet in number of the last sample contained in the last completed packet in
the frame. The sample numbering considers interchannel samples. If the frame. The sample numbering considers interchannel samples. If
a page contains no packet end (e.g., when it only contains the start a page contains no packet end (e.g., when it only contains the start
of a large packet, which continues on the next page), then the of a large packet that continues on the next page), then the granule
granule position is set to the maximum value possible, i.e., 0xFF position is set to the maximum value possible, i.e., 0xFF 0xFF 0xFF
0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF. 0xFF 0xFF 0xFF 0xFF 0xFF.
The granule position of the first audio data page with a completed The granule position of the first audio data page with a completed
packet MAY be larger than the number of samples contained in packets packet MAY be larger than the number of samples contained in packets
that complete on that page. In other words, the apparent sample that complete on that page. In other words, the apparent sample
number of the first sample in the stream following from the granule number of the first sample in the stream following from the granule
position and the audio data MAY be larger than 0. This allows, for position and the audio data MAY be larger than 0. This allows, for
example, a server to cast a live stream to several clients that example, a server to cast a live stream to several clients that
joined at different moments, without rewriting the granule position joined at different moments without rewriting the granule position
for each client. for each client.
If an audio stream is encoded where audio properties (sample rate, If an audio stream is encoded where audio properties (sample rate,
number of channels, or bit depth) change at some point in the stream, number of channels, or bit depth) change at some point in the stream,
this should be dealt with by finishing encoding of the current Ogg this should be dealt with by finishing encoding of the current Ogg
stream and starting a new Ogg stream, concatenated to the previous stream and starting a new Ogg stream, concatenated to the previous
one. This is called chaining in Ogg. See the Ogg specification one. This is called chaining in Ogg. See the Ogg specification
[RFC3533] for details. [RFC3533] for details.
10.2. Matroska mapping 10.2. Matroska Mapping
The Matroska container format is defined in The Matroska container format is defined in [RFC9559]. The codec ID
[I-D.ietf-cellar-matroska]. The codec ID (EBML path (EBML path \Segment\Tracks\TrackEntry\CodecID) assigned to signal
\Segment\Tracks\TrackEntry\CodecID) assigned to signal tracks tracks carrying FLAC data is A_FLAC in ASCII. All FLAC data before
carrying FLAC data is A_FLAC in ASCII. All FLAC data before the the first audio frame (i.e., the fLaC ASCII signature and all
first audio frame (i.e., the fLaC ASCII signature and all metadata metadata blocks) is stored as CodecPrivate data (EBML path
blocks) is stored as CodecPrivate data (EBML path
\Segment\Tracks\TrackEntry\CodecPrivate). \Segment\Tracks\TrackEntry\CodecPrivate).
Each FLAC frame (including all of its subframes) is treated as a Each FLAC frame (including all of its subframes) is treated as a
single frame in the Matroska context. single frame in the context of Matroska.
If an audio stream is encoded where audio properties (sample rate, If an audio stream is encoded where audio properties (sample rate,
number of channels, or bit depth) change at some point in the stream, number of channels, or bit depth) change at some point in the stream,
this should be dealt with by finishing the current Matroska segment this should be dealt with by finishing the current Matroska segment
and starting a new one with the new properties. and starting a new one with the new properties.
10.3. ISO Base Media File Format (MP4) mapping 10.3. ISO Base Media File Format (MP4) Mapping
The full encapsulation definition of FLAC audio in MP4 files was The full encapsulation definition of FLAC audio in MP4 files was
deemed too extensive to include in this document. A definition deemed too extensive to include in this document. A definition
document can be found at [FLAC-in-MP4-specification]. document can be found at [FLAC-in-MP4-specification].
12. Security Considerations
Like any other codec (such as [RFC6716]), FLAC should not be used Like any other codec (such as [RFC6716]), FLAC should not be used
with insecure ciphers or cipher modes that are vulnerable to known with insecure ciphers or cipher modes that are vulnerable to known
plaintext attacks. Some of the header bits as well as the padding plaintext attacks. Some of the header bits, as well as the padding,
are easily predictable. are easily predictable.
Implementations of the FLAC codec need to take appropriate security Implementations of the FLAC codec need to take appropriate security
considerations into account. Section 2.1 of [RFC4732] provides considerations into account. Section 2.1 of [RFC4732] provides
general information on DoS attacks on end-systems and describes some general information on DoS attacks on end systems and describes some
mitigation strategies. Areas of concern specific to FLAC follow. mitigation strategies. Areas of concern specific to FLAC follow.
It is extremely important for the decoder to be robust against It is extremely important for the decoder to be robust against
malformed payloads. Payloads that do not conform to this malformed payloads. Payloads that do not conform to this
specification MUST NOT cause the decoder to overrun its allocated specification MUST NOT cause the decoder to overrun its allocated
memory or take an excessive amount of resources to decode. An memory or take an excessive amount of resources to decode. An
overrun in allocated memory could lead to arbitrary code execution by overrun in allocated memory could lead to arbitrary code execution by
an attacker. The same applies to the encoder, even though problems an attacker. The same applies to the encoder, even though problems
with encoders are typically rarer. Malformed audio streams MUST NOT with encoders are typically rarer. Malformed audio streams MUST NOT
cause the encoder to misbehave because this would allow an attacker cause the encoder to misbehave because this would allow an attacker
to attack transcoding gateways. to attack transcoding gateways.
As with all compression algorithms, both encoding and decoding can As with all compression algorithms, both encoding and decoding can
produce an output much larger than the input. For decoding, the most produce an output much larger than the input. For decoding, the most
extreme possible case of this is a frame with eight constant extreme possible case of this is a frame with eight constant
subframes of block size 65535 and coding for 32-bit PCM. This frame subframes of block size 65535 and coding for 32-bit PCM. This frame
is only 49 bytes in size, but codes for more than 2 megabytes of is only 49 bytes in size but codes for more than 2 megabytes of
uncompressed PCM data. For encoding, it is possible to have an even uncompressed PCM data. For encoding, it is possible to have an even
larger size increase, although such behavior is generally considered larger size increase, although such behavior is generally considered
faulty. This happens if the encoder chooses a rice parameter that faulty. This happens if the encoder chooses a Rice parameter that
does not fit with the residual that has to be encoded. In such a does not fit with the residual that has to be encoded. In such a
case, very long unary coded symbols can appear, in the most extreme case, very long unary-coded symbols can appear (in the most extreme
case, more than 4 gigabytes per sample. Decoder and encoder case, more than 4 gigabytes per sample). Decoder and encoder
implementors are advised to take precautions to prevent excessive implementors are advised to take precautions to prevent excessive
resource utilization in such cases. resource utilization in such cases.
Where metadata is handled, implementors are advised to either Where metadata is handled, implementors are advised to either
thoroughly test the handling of extreme cases or impose reasonable thoroughly test the handling of extreme cases or impose reasonable
limits beyond the limits of this specification document. For limits beyond the limits of this specification. For example, a
example, a single Vorbis comment metadata block can contain millions single Vorbis comment metadata block can contain millions of valid
of valid fields. It is unlikely such a limit is ever reached except fields. It is unlikely such a limit is ever reached except in a
in a potentially malicious file. Likewise, the media type and potentially malicious file. Likewise, the media type and description
description of a picture metadata block can be millions of characters of a picture metadata block can be millions of characters long,
long, despite there being no reasonable use of such contents. One despite there being no reasonable use of such contents. One possible
possible use case for very long character strings is in lyrics, which use case for very long character strings is in lyrics, which can be
can be stored in Vorbis comment metadata block fields. stored in Vorbis comment metadata block fields.
Various kinds of metadata blocks contain length fields or field Various kinds of metadata blocks contain length fields or field
counts. While reading a block following these lengths or counts, a counts. While reading a block following these lengths or counts, a
decoder MUST make sure higher-level lengths or counts (most decoder MUST make sure higher-level lengths or counts (most
importantly, the length field of the metadata block itself) are not importantly, the length field of the metadata block itself) are not
exceeded. As some of these length fields code string lengths, memory exceeded. As some of these length fields code string lengths and
for which must be allocated, parsers MUST first verify that a block memory must be allocated for that, parsers MUST first verify that a
is valid before allocating memory based on its contents, except when block is valid before allocating memory based on its contents, except
explicitly instructed to salvage data from a malformed file. when explicitly instructed to salvage data from a malformed file.
Metadata blocks can also contain references, e.g., the picture Metadata blocks can also contain references, e.g., the picture
metadata block can contain a URI. When following an URI, the metadata block can contain a URI. When following a URI, the security
security considerations of [RFC3986] apply. Applications MUST obtain considerations of [RFC3986] apply. Applications MUST obtain explicit
explicit user approval to retrieve resources via remote protocols. user approval to retrieve resources via remote protocols. Following
external URIs introduces a tracking risk from on-path observers and
Following external URIs introduces a tracking risk from on-path the operator of the service hosting the URI. Likewise, the choice of
observers and the operator of the service hosting the URI. Likewise, scheme, if it isn't protected like https, could also introduce
the choice of scheme, if it isn’t protected like https, could also integrity attacks by an on-path observer. A malicious operator of
introduce integrity attacks by an on-path observer. A malicious the service hosting the URI can return arbitrary content that the
operator of the service hosting the URI can return arbitrary content parser will read. Also, such retrievals can be used in a DDoS attack
that the parser will read. Also, such retrievals can be used in a when the URI points to a potential victim. Therefore, applications
DDoS attack when the URI points to a potential victim. Therefore, need to ask user approval for each retrieval individually, take extra
applications need to ask user approval for each retrieval precautions when parsing retrieved data, and cache retrieved
individually, take extra precautions when parsing retrieved data, and resources. Applications MUST obtain explicit user approval to
cache retrieved resources. Applications MUST obtain explicit user retrieve local resources not located in the same directory as the
approval to retrieve local resources not located in the same FLAC file being processed. Since relative URIs are permitted,
directory as the FLAC file being processed. Since relative URIs are applications MUST guard against directory traversal attacks and guard
permitted, applications MUST guard against directory traversal against a violation of a same-origin policy if such a policy is being
attacks and guard against a violation of a same-origin policy if such enforced.
a policy is being enforced.
Seeking in a FLAC stream that is not in a container relies on the Seeking in a FLAC stream that is not in a container relies on the
coded number in frame headers and optionally a seektable metadata coded number in frame headers and optionally a seektable metadata
block. Parsers MUST employ thorough checks on whether a found coded block. Parsers MUST employ thorough checks on whether a found coded
number or seekpoint is at all possible, e.g., whether it is within number or seekpoint is at all possible, e.g., whether it is within
bounds and not directly contradicting any other coded number or bounds and not directly contradicting any other coded number or
seekpoint that the seeking process relies on. Without these checks, seekpoint that the seeking process relies on. Without these checks,
seeking might get stuck in an infinite loop when numbers in frames seeking might get stuck in an infinite loop when numbers in frames
are non-consecutive or otherwise not valid, which could be used in are non-consecutive or otherwise not valid, which could be used in
denial of service attacks. DoS attacks.
Implementors are advised to employ fuzz testing combined with Implementors are advised to employ fuzz testing combined with
different sanitizers on FLAC decoders to find security problems. different sanitizers on FLAC decoders to find security problems.
Ignoring the results of CRC checks improves the efficiency of decoder Ignoring the results of CRC checks improves the efficiency of decoder
fuzz testing. fuzz testing.
See [FLAC-decoder-testbench] for a non-exhaustive list of FLAC files See [FLAC-decoder-testbench] for a non-exhaustive list of FLAC files
with extreme configurations that lead to crashes or reboots on some with extreme configurations that lead to crashes or reboots on some
known implementations. Besides providing a starting point for known implementations. Besides providing a starting point for
security testing, this set of files can also be used to test security testing, this set of files can also be used to test
conformance with this specification. conformance with this specification.
FLAC files may contain executable code, although the FLAC format is FLAC files may contain executable code, although the FLAC format is
not designed for it and it is uncommon. One use case where FLAC is not designed for it and it is uncommon. One use case where FLAC is
occasionally used to store executable code is when compressing images occasionally used to store executable code is when compressing images
of mixed mode CDs, which contain both audio and non-audio data, of of mixed-mode CDs, which contain both audio and non-audio data, the
which the non-audio portion can contain executable code. In that non-audio portion of which can contain executable code. In that
case, the executable code is stored as if it were audio and is case, the executable code is stored as if it were audio and is
potentially obscured. Of course, it is also possible to store potentially obscured. Of course, it is also possible to store
executable code as metadata, for example as a vorbis comment with executable code as metadata, for example, as a Vorbis comment with
help of a binary-to-text encoding or directly in an application help of a binary-to-text encoding or directly in an application
metadata block. Applications MUST NOT execute code contained in FLAC metadata block. Applications MUST NOT execute code contained in FLAC
files or present parts of FLAC files as executable code to the user, files or present parts of FLAC files as executable code to the user,
except when an application has that explicit purpose, e.g., except when an application has that explicit purpose, e.g.,
applications reading FLAC files as disc images and presenting it as applications reading FLAC files as disc images and presenting it as a
virtual disc drive. virtual disc drive.
13. IANA Considerations 12. IANA Considerations
This document registers one new media type, "audio/flac", as defined Per this document, IANA has registered one new media type ("audio/
in the following section, and creates a new IANA registry. flac") and created a new IANA registry, as described in the
subsections below.
13.1. Media type registration 12.1. Media Type Registration
The following information serves as the registration form for the IANA has registered the "audio/flac" media type as follows. This
"audio/flac" media type. This media type is applicable for FLAC media type is applicable for FLAC audio that is not packaged in a
audio that is not packaged in a container as described in Section 10. container as described in Section 10. FLAC audio packaged in such a
FLAC audio packaged in such a container will take on the media type container will take on the media type of that container, for example,
of that container, for example, audio/ogg when packaged in an Ogg "audio/ogg" when packaged in an Ogg container or "video/mp4" when
container, or video/mp4 when packaged in an MP4 container alongside a packaged in an MP4 container alongside a video track.
video track.
Type name: audio Type name: audio
Subtype name: flac Subtype name: flac
Required parameters: N/A Required parameters: N/A
Optional parameters: N/A Optional parameters: N/A
Encoding considerations: as per THISRFC Encoding considerations: as per RFC 9639
Security considerations: see the security considerations in Section Security considerations: See the security considerations in
12 of THISRFC Section 11 of RFC 9639.
Interoperability considerations: see the descriptions of past format Interoperability considerations: See the descriptions of past format
changes in Appendix B of THISRFC changes in Appendix B of RFC 9639.
Published specification: THISRFC Published specification: RFC 9639
Applications that use this media type: ffmpeg, apache, firefox Applications that use this media type: ffmpeg, apache, firefox
Fragment identifier considerations: none Fragment identifier considerations: N/A
Additional information: Additional information:
Deprecated alias names for this type: audio/x-flac Deprecated alias names for this type: audio/x-flac
Magic number(s): fLaC
Magic number(s): fLaC File extension(s): flac
Macintosh file type code(s): N/A
File extension(s): flac Uniform Type Identifier: org.xiph.flac conforms to public.audio
Macintosh file type code(s): none Windows Clipboard Format Name: audio/flac
Uniform Type Identifier: org.xiph.flac conforms to public.audio
Windows Clipboard Format Name: audio/flac
Person & email address to contact for further information:
IETF CELLAR WG cellar@ietf.org
Intended usage: COMMON Person & email address to contact for further information: IETF
CELLAR Working Group (cellar@ietf.org)
Restrictions on usage: N/A Intended usage: COMMON
Author: IETF CELLAR WG Restrictions on usage: N/A
Change controller: Internet Engineering Task Force Author: IETF CELLAR Working Group
Provisional registration? (standards tree only): NO Change controller: Internet Engineering Task Force (iesg@ietf.org)
13.2. Application ID Registry 12.2. FLAC Application Metadata Block IDs Registry
This document creates a new IANA registry called the "FLAC IANA has created a new registry called the "FLAC Application Metadata
Application Metadata Block ID" registry. The values correspond to Block IDs" registry. The values correspond to the 32-bit identifier
the 32-bit identifier described in Section 8.4. described in Section 8.4.
To register a new Application ID in this registry, one needs an To register a new Application ID in this registry, one needs an
Application ID, a description, optionally a reference to a document Application ID, a description, an optional reference to a document
describing the Application ID and a Change Controller (IETF or email describing the Application ID, and a Change Controller (IETF or email
of registrant). The Application IDs are to be allocated according to of registrant). The Application IDs are allocated according to the
the "First Come First Served" policy [RFC8126], so that there is no "First Come First Served" policy [RFC8126] so that there is no
impediment to registering any Application IDs the FLAC community impediment to registering any Application IDs the FLAC community
encounters, especially if they were used in audio files but were not encounters, especially if they were used in audio files but were not
registered when the audio files were encoded. An Application ID can registered when the audio files were encoded. An Application ID can
be any 32-bit value, but is often composed of 4 ASCII characters, to be any 32-bit value but is often composed of 4 ASCII characters that
be human-readable. are human-readable.
The FLAC Application Metadata Block ID registry is assigned the
following initial values, taken from the registration page at
xiph.org (see [ID-registration-page]), which is no longer being
maintained as it is replaced by this registry.
|Application|ASCII |Description| Specification |Change |
|ID |rendition | | |controller|
| |(if | | | |
| |available)| | | |
|0x41544348 |ATCH |FlacFile | [FlacFile] |IETF |
|0x42534F4C |BSOL |beSolo | |IETF |
|0x42554753 |BUGS |Bugs Player| |IETF |
|0x43756573 |Cues |GoldWave | |IETF |
| | |cue points | | |
|0x46696361 |Fica |CUE | |IETF |
| | |Splitter | | |
|0x46746F6C |Ftol |flac-tools | |IETF |
|0x4D4F5442 |MOTB |MOTB | |IETF |
| | |MetaCzar | | |
|0x4D505345 |MPSE |MP3 Stream | |IETF |
| | |Editor | | |
|0x4D754D4C |MuML |MusicML: | |IETF |
| | |Music | | |
| | |Metadata | | |
| | |Language | | |
|0x52494646 |RIFF |Sound | |IETF |
| | |Devices | | |
| | |RIFF chunk | | |
| | |storage | | |
|0x5346464C |SFFL |Sound Font | |IETF |
| | |FLAC | | |
|0x534F4E59 |SONY |Sony | |IETF |
| | |Creative | | |
| | |Software | | |
|0x5351455A |SQEZ |flacsqueeze| |IETF |
|0x54745776 |TtWv |TwistedWave| |IETF |
|0x55495453 |UITS |UITS | |IETF |
| | |Embedding | | |
| | |tools | | |
|0x61696666 |aiff |FLAC AIFF | [Foreign-metadata] |IETF |
| | |chunk | | |
| | |storage | | |
|0x696D6167 |imag |flac-image | |IETF |
|0x7065656D |peem |Parseable | |IETF |
| | |Embedded | | |
| | |Extensible | | |
| | |Metadata | | |
|0x71667374 |qfst |QFLAC | |IETF |
| | |Studio | | |
|0x72696666 |riff |FLAC RIFF | [Foreign-metadata] |IETF |
| | |chunk | | |
| | |storage | | |
|0x74756E65 |tune |TagTuner | |IETF |
|0x773634C0 |w64 |FLAC Wave64| [Foreign-metadata] |IETF |
| | |chunk | | |
| | |storage | | |
|0x78626174 |xbat |XBAT | |IETF |
|0x786D6364 |xmcd |xmcd | |IETF |
Table 25
14. Acknowledgments
FLAC owes much to the many people who have advanced the audio
compression field so freely. For instance:
* A. J. Robinson for his work on Shorten; his paper (see The initial contents of "FLAC Application Metadata Block IDs"
[robinson-tr156]) is a good starting point on some of the basic registry are shown in the table below. These initial values were
methods used by FLAC. FLAC trivially extends and improves the taken from the registration page at xiph.org (see
fixed predictors, LPC coefficient quantization, and Rice coding [ID-registration-page]), which is no longer being maintained as it
used in Shorten. has been replaced by this registry.
* S. W. Golomb and Robert F. Rice; their universal codes are used
by FLAC's entropy coder, see [Rice].
* N. Levinson and J. Durbin; the FLAC reference encoder (see
Section 11) uses an algorithm developed and refined by them for
determining the LPC coefficients from the autocorrelation
coefficients, see [Durbin].
* And of course, Claude Shannon, see [Shannon].
The FLAC format, the FLAC reference implementation, and this document +===========+==========+===========+===================+==========+
were originally developed by Josh Coalson. While many others have |Application|ASCII |Description|Reference |Change |
contributed since, this original effort is deeply appreciated. |ID |Rendition | | |Controller|
| |(If | | | |
| |Available)| | | |
|0x41544348 |ATCH |FlacFile |[FlacFile], RFC |IETF |
| | | |9639 | |
|0x42534F4C |BSOL |beSolo |RFC 9639 |IETF |
|0x42554753 |BUGS |Bugs Player|RFC 9639 |IETF |
|0x43756573 |Cues |GoldWave |RFC 9639 |IETF |
| | |cue points | | |
|0x46696361 |Fica |CUE |RFC 9639 |IETF |
| | |Splitter | | |
|0x46746F6C |Ftol |flac-tools |RFC 9639 |IETF |
|0x4D4F5442 |MOTB |MOTB |RFC 9639 |IETF |
| | |MetaCzar | | |
|0x4D505345 |MPSE |MP3 Stream |RFC 9639 |IETF |
| | |Editor | | |
|0x4D754D4C |MuML |MusicML: |RFC 9639 |IETF |
| | |Music | | |
| | |Metadata | | |
| | |Language | | |
|0x52494646 |RIFF |Sound |RFC 9639 |IETF |
| | |Devices | | |
| | |RIFF chunk | | |
| | |storage | | |
|0x5346464C |SFFL |Sound Font |RFC 9639 |IETF |
| | |FLAC | | |
|0x534F4E59 |SONY |Sony |RFC 9639 |IETF |
| | |Creative | | |
| | |Software | | |
|0x5351455A |SQEZ |flacsqueeze|RFC 9639 |IETF |
|0x54745776 |TtWv |TwistedWave|RFC 9639 |IETF |
|0x55495453 |UITS |UITS |RFC 9639 |IETF |
| | |Embedding | | |
| | |tools | | |
|0x61696666 |aiff |FLAC AIFF |[Foreign-metadata],|IETF |
| | |chunk |RFC 9639 | |
| | |storage | | |
|0x696D6167 |imag |flac-image |RFC 9639 |IETF |
|0x7065656D |peem |Parseable |RFC 9639 |IETF |
| | |Embedded | | |
| | |Extensible | | |
| | |Metadata | | |
|0x71667374 |qfst |QFLAC |RFC 9639 |IETF |
| | |Studio | | |
|0x72696666 |riff |FLAC RIFF |[Foreign-metadata],|IETF |
| | |chunk |RFC 9639 | |
| | |storage | | |
|0x74756E65 |tune |TagTuner |RFC 9639 |IETF |
|0x773634C0 |w64 |FLAC Wave64|[Foreign-metadata],|IETF |
| | |chunk |RFC 9639 | |
| | |storage | | |
|0x78626174 |xbat |XBAT |RFC 9639 |IETF |
|0x786D6364 |xmcd |xmcd |RFC 9639 |IETF |
15. References Table 25
15.1. Normative References 13. References
[I-D.ietf-cellar-matroska] 13.1. Normative References
Lhomme, S., Bunkus, M., and D. Rice, "Matroska Media
Container Format Specifications", Work in Progress,
Internet-Draft, draft-ietf-cellar-matroska-21, 22 October
2023, <https://datatracker.ietf.org/doc/html/draft-ietf-
[ISRC-handbook] [ISRC-handbook]
International ISRC Registration Authority, "International International ISRC Registration Authority, "International
Standard Recording Code (ISRC) Handbook, 4th edition", Standard Recording Code (ISRC) Handbook", 4th edition,
2021, <https://www.ifpi.org/isrc_handbook/>. 2021, <https://www.ifpi.org/isrc_handbook/>.
[RFC1321] Rivest, R., "The MD5 Message-Digest Algorithm", RFC 1321, [RFC1321] Rivest, R., "The MD5 Message-Digest Algorithm", RFC 1321,
DOI 10.17487/RFC1321, April 1992, DOI 10.17487/RFC1321, April 1992,
<https://www.rfc-editor.org/info/rfc1321>. <https://www.rfc-editor.org/info/rfc1321>.
[RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet Mail [RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
Extensions (MIME) Part Two: Media Types", RFC 2046, Extensions (MIME) Part Two: Media Types", RFC 2046,
DOI 10.17487/RFC2046, November 1996, DOI 10.17487/RFC2046, November 1996,
<https://www.rfc-editor.org/info/rfc2046>. <https://www.rfc-editor.org/info/rfc2046>.
skipping to change at page 60, line 14 skipping to change at line 2563
[RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
Resource Identifier (URI): Generic Syntax", STD 66, Resource Identifier (URI): Generic Syntax", STD 66,
RFC 3986, DOI 10.17487/RFC3986, January 2005, RFC 3986, DOI 10.17487/RFC3986, January 2005,
<https://www.rfc-editor.org/info/rfc3986>. <https://www.rfc-editor.org/info/rfc3986>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/info/rfc8174>. May 2017, <https://www.rfc-editor.org/info/rfc8174>.
15.2. Informative References [RFC9559] Lhomme, S., Bunkus, M., and D. Rice, "Matroska Media
Container Format Specification", RFC 9559,
DOI 10.17487/RFC9559, September 2024,
[Durbin] Durbin, J., "The Fitting of Time-Series Models", 13.2. Informative References
DOI 10.2307/1401322, December 1959,
[Durbin] Durbin, J., "The Fitting of Time-Series Models", Revue de
l'Institut International de Statistique / Review of the
International Statistical Institute, vol. 28, no. 3, pp.
233–44, DOI 10.2307/1401322, 1960,
<https://www.jstor.org/stable/1401322>. <https://www.jstor.org/stable/1401322>.
[FIR] "Finite impulse response - Wikipedia", [FIR] Wikipedia, "Finite impulse response", August 2024,
<https://en.wikipedia.org/wiki/Finite_impulse_response>. <https://en.wikipedia.org/w/
[FLAC-decoder-testbench] [FLAC-decoder-testbench]
"FLAC decoder testbench", commit aa7b0c6, August 2023, "The Free Lossless Audio Codec (FLAC) test files", commit
aa7b0c6, August 2023,
<https://github.com/ietf-wg-cellar/flac-test-files>. <https://github.com/ietf-wg-cellar/flac-test-files>.
[FLAC-in-MP4-specification] [FLAC-in-MP4-specification]
Montgomery, C., "Encapsulation of FLAC in ISO Base Media "Encapsulation of FLAC in ISO Base Media File Format",
File Format", commit 78d85dd, July 2022, commit 78d85dd, July 2022,
<https://github.com/xiph/flac/blob/master/doc/ <https://github.com/xiph/flac/blob/master/doc/
isoflac.txt>. isoflac.txt>.
[FLAC-specification-github] [FLAC-specification-github]
"FLAC specification github repository", "The Free Lossless Audio Codec (FLAC) Specification",
<https://github.com/ietf-wg-cellar/flac-specification>. <https://github.com/ietf-wg-cellar/flac-specification>.
"FLAC specification wiki: Implementations",
[FLAC-wiki-interoperability] [FLAC-wiki-interoperability]
"FLAC specification wiki: Interoperability "Interoperability considerations", commit 58a06d6,
considerations", <https://github.com/ietf-wg-cellar/flac- <https://github.com/ietf-wg-cellar/flac-
specification/wiki/Interoperability-considerations>. specification/wiki/Interoperability-considerations>.
[FlacFile] "FlacFile", October 2007, [FlacFile] "FlacFile", Wayback Machine archive, October 2007,
<https://web.archive.org/web/20071023070305/ <https://web.archive.org/web/20071023070305/
http://firestuff.org:80/flacfile/>. http://firestuff.org:80/flacfile/>.
[Foreign-metadata] [Foreign-metadata]
"Specification of foreign metadata storage in FLAC", "Specification of foreign metadata storage in FLAC",
November 2023, commit 72787c3, November 2023,
<https://github.com/xiph/flac/blob/master/doc/ <https://github.com/xiph/flac/blob/master/doc/
foreign_metadata_storage.md>. foreign_metadata_storage.md>.
Hans, M. and RW. Schafer, "Lossless Compression of Digital
Audio", DOI 10.1109/79.939834, November 1999,
[ID-registration-page] [ID-registration-page]
"FLAC - ID Registry", <https://xiph.org/flac/id.html>. Xiph.Org, "ID registry", <https://xiph.org/flac/id.html>.
[ID3v2] Nilsson, M., "id3v2.4.0-frames.txt", November 2000, [ID3v2] Nilsson, M., "ID3 tag version 2.4.0 - Native Frames",
Wayback Machine archive, November 2000,
<https://web.archive.org/web/20220903174949/ <https://web.archive.org/web/20220903174949/
https://id3.org/id3v2.4.0-frames>. https://id3.org/id3v2.4.0-frames>.
[IEC.60908.1999] [IEC.60908.1999]
International Electrotechnical Commission, "Audio International Electrotechnical Commission, "Audio
recording - Compact disc digital audio system", recording - Compact disc digital audio system",
IEC International standard 60908 second edition, 1999. IEC 60908:1999-02, 1999,
[LinearPrediction] [LinearPrediction]
"Linear prediction - Wikipedia", Wikipedia, "Linear prediction", August 2023,
<https://en.wikipedia.org/wiki/Linear_prediction>. <https://en.wikipedia.org/w/
[MLP] Gerzon, MA., Craven, PG., Stuart, JR., Law, MJ., and RJ. [Lossless-Compression]
Wilson, "The MLP Lossless Compression System", September Hans, M. and R. W. Schafer, "Lossless compression of
1999, digital audio", IEEE Signal Processing Magazine, vol. 18,
no. 4, pp. 21-32, DOI 10.1109/79.939834, July 2001,
[lossyWAV] Hydrogenaudio Knowledgebase, "lossyWAV", July 2021,
[MLP] Gerzon, M. A., Craven, P. G., Stuart, J. R., Law, M. J.,
and R. J. Wilson, "The MLP Lossless Compression System",
Audio Engineering Society Conference: 17th International
Conference: High-Quality Audio Codin, September 1999,
<https://www.aes.org/e-lib/online/browse.cfm?elib=8082>. <https://www.aes.org/e-lib/online/browse.cfm?elib=8082>.
[MusicBrainz] [MusicBrainz]
MusicBrainz, "Tags & Variables - MusicBrainz Picard v2.10 MusicBrainz, "Tags & Variables", MusicBrainz Picard v2.10
documentation", <https://picard- documentation, <https://picard-
docs.musicbrainz.org/en/variables/variables.html>. docs.musicbrainz.org/en/variables/variables.html>.
[RFC4732] Handley, M., Ed., Rescorla, E., Ed., and IAB, "Internet [RFC4732] Handley, M., Ed., Rescorla, E., Ed., and IAB, "Internet
Denial-of-Service Considerations", RFC 4732, Denial-of-Service Considerations", RFC 4732,
DOI 10.17487/RFC4732, December 2006, DOI 10.17487/RFC4732, December 2006,
<https://www.rfc-editor.org/info/rfc4732>. <https://www.rfc-editor.org/info/rfc4732>.
[RFC5334] Goncalves, I., Pfeiffer, S., and C. Montgomery, "Ogg Media [RFC5334] Goncalves, I., Pfeiffer, S., and C. Montgomery, "Ogg Media
Types", RFC 5334, DOI 10.17487/RFC5334, September 2008, Types", RFC 5334, DOI 10.17487/RFC5334, September 2008,
<https://www.rfc-editor.org/info/rfc5334>. <https://www.rfc-editor.org/info/rfc5334>.
[RFC6716] Valin, JM., Vos, K., and T. Terriberry, "Definition of the [RFC6716] Valin, JM., Vos, K., and T. Terriberry, "Definition of the
Opus Audio Codec", RFC 6716, DOI 10.17487/RFC6716, Opus Audio Codec", RFC 6716, DOI 10.17487/RFC6716,
September 2012, <https://www.rfc-editor.org/info/rfc6716>. September 2012, <https://www.rfc-editor.org/info/rfc6716>.
[RFC7942] Sheffer, Y. and A. Farrel, "Improving Awareness of Running [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for
Code: The Implementation Status Section", BCP 205, Writing an IANA Considerations Section in RFCs", BCP 26,
RFC 7942, DOI 10.17487/RFC7942, July 2016, RFC 8126, DOI 10.17487/RFC8126, June 2017,
<https://www.rfc-editor.org/info/rfc7942>. <https://www.rfc-editor.org/info/rfc8126>.
[Rice] Rice, RF. and JR. Plaunt, "Adaptive Variable-Length Coding [Rice] Rice, R. F. and J. R. Plaunt, "Adaptive Variable-Length
for Efficient Compression of Spacecraft Television Data", Coding for Efficient Compression of Spacecraft Television
DOI 10.1109/TCOM.1971.1090789, December 1971, Data", IEEE Transactions on Communication Technology, vol.
19, no. 6, pp. 889-897, DOI 10.1109/TCOM.1971.1090789,
December 1971,
<https://ieeexplore.ieee.org/document/1090789>. <https://ieeexplore.ieee.org/document/1090789>.
[Shannon] Shannon, CE., "Communication in the Presence of Noise", [Robinson-TR156]
Robinson, T., "SHORTEN: Simple lossless and near-lossless
waveform compression", Cambridge University Engineering
Department Technical Report CUED/F-INFENG/TR.156, December
1994, <https://mi.eng.cam.ac.uk/reports/svr-ftp/auto-pdf/
[Shannon] Shannon, C. E., "Communication in the Presence of Noise",
Proceedings of the IRE, vol. 37, no. 1, pp. 10-21,
DOI 10.1109/JRPROC.1949.232969, January 1949, DOI 10.1109/JRPROC.1949.232969, January 1949,
<https://ieeexplore.ieee.org/document/1697831>. <https://ieeexplore.ieee.org/document/1697831>.
[VarLengthCode] [VarLengthCode]
"Variable-length code - Wikipedia", Wikipedia, "Variable-length code", April 2024,
<https://en.wikipedia.org/wiki/Variable-length_code>. <https://en.wikipedia.org/w/index.php?title=Variable-
[Vorbis] Xiph.Org, "Ogg Vorbis I format specification: comment [Vorbis] Xiph.Org, "Ogg Vorbis I format specification: comment
field and header specification", field and header specification",
<https://xiph.org/vorbis/doc/v-comment.html>. <https://xiph.org/vorbis/doc/v-comment.html>.
[lossyWAV] "lossyWAV - Hydrogenaudio Knowledgebase", Appendix A. Numerical Considerations
Robinson, T., "SHORTEN: Simple lossless and near-lossless
waveform compression", December 1994,
Appendix A. Numerical considerations
In order to maintain lossless behavior, all arithmetic used in In order to maintain lossless behavior, all arithmetic used in
encoding and decoding sample values must be done with integer data encoding and decoding sample values must be done with integer data
types to eliminate the possibility of introducing rounding errors types to eliminate the possibility of introducing rounding errors
associated with floating-point arithmetic. Use of floating-point associated with floating-point arithmetic. Use of floating-point
representations in analysis (e.g., finding a good predictor or Rice representations in analysis (e.g., finding a good predictor or Rice
parameter) is not a concern, as long as the process of using the parameter) is not a concern as long as the process of using the found
found predictor and Rice parameter to encode audio samples is predictor and Rice parameter to encode audio samples is implemented
implemented with only integer math. with only integer math.
Furthermore, the possibility of integer overflow can be eliminated by Furthermore, the possibility of integer overflow can be eliminated by
using large enough data types. Choosing a 64-bit signed data type using data types that are large enough. Choosing a 64-bit signed
for all arithmetic involving sample values would make sure the data type for all arithmetic involving sample values would make sure
possibility for overflow is eliminated, but usually smaller data the possibility for overflow is eliminated, but usually, smaller data
types are chosen for increased performance, especially in embedded types are chosen for increased performance, especially in embedded
devices. This appendix provides guidelines for choosing the devices. This appendix provides guidelines for choosing the
appropriate data type for each step of encoding and decoding FLAC appropriate data type for each step of encoding and decoding FLAC
files. files.
In this appendix, signed data types are signed two's complement. In this appendix, signed data types are signed two's complement.
A.1. Determining the necessary data type size A.1. Determining the Necessary Data Type Size
To find the smallest data type size that is guaranteed not to To find the smallest data type size that is guaranteed not to
overflow for a certain sequence of arithmetic operations, the overflow for a certain sequence of arithmetic operations, the
combination of values producing the largest possible result should be combination of values producing the largest possible result should be
considered. considered.
If, for example, two 16-bit signed integers are added, the largest For example, if two 16-bit signed integers are added, the largest
possible result forms if both values are the largest number that can possible result forms if both values are the largest number that can
be represented with a 16-bit signed integer. To store the result, a be represented with a 16-bit signed integer. To store the result, a
signed integer data type with at least 17 bits is needed. Similarly, signed integer data type with at least 17 bits is needed. Similarly,
when adding 4 of these values, 18 bits are needed; when adding 8, 19 when adding 4 of these values, 18 bits are needed; when adding 8, 19
bits are needed, etc. In general, the number of bits necessary when bits are needed, etc. In general, the number of bits necessary when
adding numbers together is increased by the log base 2 of the number adding numbers together is increased by the log base 2 of the number
of values rounded up to the nearest integer. So, when adding 18 of values rounded up to the nearest integer. So, when adding 18
unknown values stored in 8 bit signed integers, we need a signed unknown values stored in 8-bit signed integers, we need a signed
integer data type of at least 13 bits to store the result, as the log integer data type of at least 13 bits to store the result, as the log
base 2 of 18 rounded up is 5. base 2 of 18 rounded up is 5.
When multiplying two numbers, the number of bits needed for the When multiplying two numbers, the number of bits needed for the
result is the size of the first number plus the size of the second result is the size of the first number plus the size of the second
number. If, for example, a 16-bit signed integer is multiplied by number. For example, if a 16-bit signed integer is multiplied by
another 16-bit signed integer, the result needs at least 32 bits to another 16-bit signed integer, the result needs at least 32 bits to
be stored without overflowing. To show this in practice, the largest be stored without overflowing. To show this in practice, the largest
signed value that can be stored in 4 bits is -8. (-8)*(-8) is 64, signed value that can be stored in 4 bits is -8. (-8)*(-8) is 64,
which needs at least 8 bits (signed) to store. which needs at least 8 bits (signed) to store.
A.2. Stereo decorrelation A.2. Stereo Decorrelation
When stereo decorrelation is used, the side channel will have one When stereo decorrelation is used, the side channel will have one
extra bit of bit depth, see Section 4.2. extra bit of bit depth; see Section 4.2.
This means that while 16-bit signed integers have sufficient range to This means that while 16-bit signed integers have sufficient range to
store samples from a fully decoded FLAC frame with a bit depth of 16 store samples from a fully decoded FLAC frame with a bit depth of 16
bits, the decoding of a side subframe in such a file will need a data bits, the decoding of a side subframe in such a file will need a data
type with at least 17 bits to store decoded subframe samples before type with at least 17 bits to store decoded subframe samples before
undoing stereo decorrelation. undoing stereo decorrelation.
Most FLAC decoders store decoded (subframe) samples as 32-bit values, Most FLAC decoders store decoded (subframe) samples as 32-bit values,
which is sufficient for files with bit depths up to (and including) which is sufficient for files with bit depths up to (and including)
31 bits. 31 bits.
skipping to change at page 64, line 20 skipping to change at line 2771
A prediction (which is used to calculate the residual on encoding or A prediction (which is used to calculate the residual on encoding or
added to the residual to calculate the sample value on decoding) is added to the residual to calculate the sample value on decoding) is
formed by multiplying and summing preceding sample values. In order formed by multiplying and summing preceding sample values. In order
to eliminate the possibility of integer overflow, the combination of to eliminate the possibility of integer overflow, the combination of
preceding sample values and predictor coefficients producing the preceding sample values and predictor coefficients producing the
largest possible value should be considered. largest possible value should be considered.
To determine the size of the data type needed to calculate either a To determine the size of the data type needed to calculate either a
residual sample (on encoding) or an audio sample value (on decoding) residual sample (on encoding) or an audio sample value (on decoding)
in a fixed predictor subframe, the maximal possible value for these in a fixed predictor subframe, the maximum possible value for these
is calculated as described in Appendix A.1 in the following table. is calculated as described in Appendix A.1 and in the following
For example: if a frame codes for 16-bit audio and has some form of table. For example, if a frame codes for 16-bit audio and has some
stereo decorrelation, the subframe coding for the side channel would form of stereo decorrelation, the subframe coding for the side
need 16+1+3 bits if a third order fixed predictor is used. channel would need 16+1+3 bits if a third-order fixed predictor is
+=======+==============================+===============+=======+ +=======+==============================+===============+=======+
| Order | Calculation of residual | Sample values | Extra | | Order | Calculation of Residual | Sample Values | Extra |
| | | summed | bits | | | | Summed | Bits |
+=======+==============================+===============+=======+ +=======+==============================+===============+=======+
| 0 | a(n) | 1 | 0 | | 0 | a(n) | 1 | 0 |
+-------+------------------------------+---------------+-------+ +-------+------------------------------+---------------+-------+
| 1 | a(n) - a(n-1) | 2 | 1 | | 1 | a(n) - a(n-1) | 2 | 1 |
+-------+------------------------------+---------------+-------+ +-------+------------------------------+---------------+-------+
| 2 | a(n) - 2 * a(n-1) + a(n-2) | 4 | 2 | | 2 | a(n) - 2 * a(n-1) + a(n-2) | 4 | 2 |
+-------+------------------------------+---------------+-------+ +-------+------------------------------+---------------+-------+
| 3 | a(n) - 3 * a(n-1) + 3 * | 8 | 3 | | 3 | a(n) - 3 * a(n-1) + 3 * | 8 | 3 |
| | a(n-2) - a(n-3) | | | | | a(n-2) - a(n-3) | | |
+-------+------------------------------+---------------+-------+ +-------+------------------------------+---------------+-------+
| 4 | a(n) - 4 * a(n-1) + 6 * | 16 | 4 | | 4 | a(n) - 4 * a(n-1) + 6 * | 16 | 4 |
| | a(n-2) - 4 * a(n-3) + a(n-4) | | | | | a(n-2) - 4 * a(n-3) + a(n-4) | | |
+-------+------------------------------+---------------+-------+ +-------+------------------------------+---------------+-------+
Table 26 Table 26
Where Where:
* n is the number of the sample being predicted. * n is the number of the sample being predicted.
* a(n) is the sample being predicted. * a(n) is the sample being predicted.
* a(n-1) is the sample before the one being predicted, a(n-2) is the * a(n-1) is the sample before the one being predicted, a(n-2) is the
sample before that, etc. sample before that, etc.
For subframes with a linear predictor, the calculation is a little For subframes with a linear predictor, the calculation is a little
more complicated. Each prediction is the sum of several more complicated. Each prediction is the sum of several
multiplications. Each of these multiply a sample value with a multiplications. Each of these multiply a sample value with a
predictor coefficient. The extra bits needed can be calculated by predictor coefficient. The extra bits needed can be calculated by
adding the predictor coefficient precision (in bits) to the bit depth adding the predictor coefficient precision (in bits) to the bit depth
of the audio samples. To account for the summing of these of the audio samples. To account for the summing of these
multiplications, the log base 2 of the predictor order rounded up is multiplications, the log base 2 of the predictor order rounded up is
skipping to change at page 65, line 28 skipping to change at line 2829
least (24 + 1) + 15 + ceil(log2(12)) = 44 bits. As another example, least (24 + 1) + 15 + ceil(log2(12)) = 44 bits. As another example,
with a side-channel subframe bit depth of 16, a predictor order of 8, with a side-channel subframe bit depth of 16, a predictor order of 8,
and a predictor coefficient precision of 12 bits, the minimum and a predictor coefficient precision of 12 bits, the minimum
required size of the used signed integer data type is (16 + 1) + 12 + required size of the used signed integer data type is (16 + 1) + 12 +
ceil(log2(8)) = 32 bits. ceil(log2(8)) = 32 bits.
A.4. Residual A.4. Residual
As stated in Section 9.2.7, an encoder must make sure residual As stated in Section 9.2.7, an encoder must make sure residual
samples are representable by a 32-bit integer, signed two's samples are representable by a 32-bit integer, signed two's
complement, excluding the most negative value. Continuing as in the complement, excluding the most negative value. As in the previous
previous section, it is possible to calculate when residual samples section, it is possible to calculate when residual samples already
already implicitly fit and when an additional check is needed. This implicitly fit and when an additional check is needed. This implicit
implicit fit is achieved when residuals would fit a theoretical fit is achieved when residuals would fit a theoretical 31-bit signed
31-bit signed int, as that satisfies both of the mentioned criteria. integer, as that satisfies both of the mentioned criteria. When this
When this implicit fit is not achieved, all residual values must be implicit fit is not achieved, all residual values must be calculated
calculated and checked individually. and checked individually.
For the residual of a fixed predictor, the maximum residual sample For the residual of a fixed predictor, the maximum residual sample
size was already calculated in the previous section. However, for a size was already calculated in the previous section. However, for a
linear predictor, the prediction is shifted right by a certain linear predictor, the prediction is shifted right by a certain
amount. The number of bits needed for the residual is the number of amount. The number of bits needed for the residual is the number of
bits calculated in the previous section, reduced by the prediction bits calculated in the previous section, reduced by the prediction
right shift, and increased by one bit to account for the subtraction right shift, and increased by one bit to account for the subtraction
of the prediction from the current sample on encoding. of the prediction from the current sample on encoding.
Taking the last example of the previous section, where 32 bits were Taking the last example of the previous section, where 32 bits were
needed for the prediction, the required data type size for the needed for the prediction, the required data type size for the
residual samples in case of a right shift of 10 bits would be 32 - 10 residual samples in case of a right shift of 10 bits would be 32 - 10
+ 1 = 23 bits, which means it is not necessary to perform the + 1 = 23 bits, which means it is not necessary to perform the
aforementioned check. aforementioned check.
As another example, when encoding 32-bit PCM with fixed predictors, As another example, when encoding 32-bit PCM with fixed predictors,
all predictor orders must be checked. While the 0-order fixed all predictor orders must be checked. While the zero-order fixed
predictor is guaranteed to have residual samples that fit a 32-bit predictor is guaranteed to have residual samples that fit a 32-bit
signed int, it might produce a residual sample value that is the most signed integer, it might produce a residual sample value that is the
negative representable value of that 32-bit signed int. most negative representable value of that 32-bit signed integer.
Note that on decoding, while the residual sample values are limited Note that on decoding, while the residual sample values are limited
to the aforementioned range, the predictions are not. This means to the aforementioned range, the predictions are not. This means
that while the decoding of the residual samples can happen fully in that while the decoding of the residual samples can happen fully in
32-bit signed integers, decoders must be sure to execute the addition 32-bit signed integers, decoders must be sure to execute the addition
of each residual sample to its accompanying prediction with a wide of each residual sample to its accompanying prediction with a signed
enough signed integer data type like on encoding. integer data type that is wide enough, as with encoding.
A.5. Rice coding A.5. Rice Coding
When folding (i.e., zig-zag encoding) the residual sample values, no When folding (i.e., zigzag encoding) the residual sample values, no
extra bits are needed when the absolute value of each residual sample extra bits are needed when the absolute value of each residual sample
is first stored in an unsigned data type of the size of the last is first stored in an unsigned data type of the size of the last
step, then doubled, and then has one subtracted depending on whether step, then doubled, and then has one subtracted depending on whether
the residual sample was positive or negative. Many implementations, the residual sample was positive or negative. However, many
however, choose to require one extra bit of data type size so zig-zag implementations choose to require one extra bit of data type size so
encoding can happen in one step and without a cast instead of the zigzag encoding can happen in one step without a cast instead of the
procedure described in the previous sentence. procedure described in the previous sentence.
Appendix B. Past format changes Appendix B. Past Format Changes
This informational appendix documents the changes made to the FLAC This informational appendix documents the changes made to the FLAC
format over the years. This information might be of use when format over the years. This information might be of use when
encountering FLAC files that were made with software following the encountering FLAC files that were made with software following the
format as it was before the changes documented in this appendix. format as it was before the changes documented in this appendix.
The FLAC format was first specified in December 2000 and the The FLAC format was first specified in December 2000, and the
bitstream format was considered frozen with the release of FLAC (the bitstream format was considered frozen with the release of FLAC 1.0
reference encoder/decoder) 1.0 in July 2001. Only changes made since (the reference encoder/decoder) in July 2001. Only changes made
this first stable release are considered in this appendix. Changes since this first stable release are considered in this appendix.
made to the FLAC streamable subset definition (see Section 7) are not Changes made to the FLAC streamable subset definition (see Section 7)
considered. are not considered.
B.1. Addition of blocking strategy bit B.1. Addition of Blocking Strategy Bit
Perhaps the largest backwards incompatible change to the Perhaps the largest backwards-incompatible change to the
specification was published in July 2007. Before this change, specification was published in July 2007. Before this change,
variable block size streams were not explicitly marked as such by a variable block size streams were not explicitly marked as such by a
flag bit in the frame header. A decoder had two ways to detect a flag bit in the frame header. A decoder had two ways to detect a
variable block size stream, either by comparing the minimum and variable block size stream: by comparing the minimum and maximum
maximum block size in the STREAMINFO metadata block (which are equal block sizes in the STREAMINFO metadata block (which are equal for a
for a fixed block size stream), or, if a decoder did not receive a fixed block size stream) or by detecting a change of block size
STREAMINFO metadata block, by detecting a change of block size during during a stream if a decoder did not receive a STREAMINFO metadata
a stream, which could in theory not happen at all. As the meaning of block, which could not happen at all in theory. As the meaning of
the coded number in the frame header depends on whether or not a the coded number in the frame header depends on whether or not a
stream is variable block size, this presented a problem: the meaning stream has a variable block size, this presented a problem: the
of the coded number could not be reliably determined. To fix this meaning of the coded number could not be reliably determined. To fix
problem, one of the reserved bits was changed to be used as a this problem, one of the reserved bits was changed to be used as a
blocking strategy bit. See also Section 9.1. blocking strategy bit. See also Section 9.1.
Along with the addition of a new flag, the meaning of the block size Along with the addition of a new flag, the meaning of the block size
bits (see Section 9.1.1) was subtly changed. Initially, block size bits (see Section 9.1.1) was subtly changed. Initially, block size
bits patterns 0b0001-0b0101 and 0b1000-0b1111 could only be used for bits patterns 0b0001-0b0101 and 0b1000-0b1111 could only be used for
fixed block size streams, while 0b0110 and 0b0111 could be used for fixed block size streams, while 0b0110 and 0b0111 could be used for
both fixed block size and variable block size streams. With the both fixed block size and variable block size streams. With this
change, these restrictions were lifted, and patterns 0b0001-0b1111 change, these restrictions were lifted, and patterns 0b0001-0b1111
are now used for both variable block size and fixed block size are now used for both variable block size and fixed block size
streams. streams.
B.2. Restriction of encoded residual samples B.2. Restriction of Encoded Residual Samples
Another change to the specification was deemed necessary during Another change to the specification was deemed necessary during
standardization by the CELLAR working group of the IETF. As standardization by the CELLAR Working Group of the IETF. As
specified in Section 9.2.7 a limit is imposed on residual samples. specified in Section 9.2.7, a limit is imposed on residual samples.
This limit was not specified prior to the IETF standardization This limit was not specified prior to the IETF standardization
effort. However, as far as was known to the working group, no FLAC effort. However, as far as was known to the working group, no FLAC
encoder at that time produced FLAC files containing residual samples encoder at that time produced FLAC files containing residual samples
exceeding this limit. This is mostly because it is very unlikely to exceeding this limit. This is mostly because it is very unlikely to
encounter residual samples exceeding this limit when encoding 24-bit encounter residual samples exceeding this limit when encoding 24-bit
PCM, and encoding of PCM with higher bit depths was not yet PCM, and encoding of PCM with higher bit depths was not yet
implemented in any known encoder. In fact, these FLAC encoders would implemented in any known encoder. In fact, these FLAC encoders would
produce corrupt files upon being triggered to produce such residual produce corrupt files upon being triggered to produce such residual
samples and it is unlikely any non-experimental encoder would ever do samples, and it is unlikely any non-experimental encoder would ever
so, even when presented with crafted material. Therefore, it was not do so, even when presented with crafted material. Therefore, it was
expected that existing implementations would be rendered non- not expected that existing implementations would be rendered non-
compliant by this change. compliant by this change.
B.3. Addition of 5-bit Rice parameters B.3. Addition of 5-Bit Rice Parameters
One significant addition to the format was the residual coding method One significant addition to the format was the residual coding method
using 5-bit Rice parameters. Prior to publication of this addition using 5-bit Rice parameters. Prior to publication of this addition
in July 2007, there was only one residual coding method specified, a in July 2007, a partitioned Rice code with 4-bit Rice parameters was
partitioned Rice code with 4-bit Rice parameters. The range offered the only residual coding method specified. The range offered by this
by this coding method proved too small when encoding 24-bit PCM, coding method proved too small when encoding 24-bit PCM; therefore, a
therefore, a second residual coding method was specified, identical second residual coding method was specified that was identical to the
to the first but with 5-bit Rice parameters. first, but with 5-bit Rice parameters.
B.4. Restriction of LPC shift to non-negative values B.4. Restriction of LPC Shift to Non-negative Values
As stated in Section 9.2.6, the predictor right shift is a number As stated in Section 9.2.6, the predictor right shift is a number
signed two's complement, which MUST NOT be negative. This is because signed two's complement, which MUST NOT be negative. This is because
right shifting a number by a negative amount is undefined behavior in shifting a number to the right by a negative amount is undefined
the C programming language standard. The intended behavior was that behavior in the C programming language standard. The intended
a positive number would be a right shift and a negative number would behavior was that a positive number would be a right shift and a
be a left shift. The FLAC reference encoder was changed in 2007 to negative number would be a left shift. The FLAC reference encoder
not generate LPC subframes with a negative predictor right shift, as was changed in 2007 to not generate LPC subframes with a negative
it turned out that the use of such subframes would only very rarely predictor right shift, as it turned out that the use of such
provide any benefit, and the decoders that were already widely in use subframes would only very rarely provide any benefit and the decoders
at that point were not able to handle such subframes. that were already widely in use at that point were not able to handle
such subframes.
Appendix C. Interoperability considerations Appendix C. Interoperability Considerations
As documented in Appendix B, there have been some changes and As documented in Appendix B, there have been some changes and
additions to the FLAC format. Additionally, implementation of additions to the FLAC format. Additionally, implementation of
certain features of the FLAC format took many years, meaning early certain features of the FLAC format took many years, meaning early
decoder implementations could not be tested against files with these decoder implementations could not be tested against files with these
features. Finally, many lower-quality FLAC decoders only implement features. Finally, many lower-quality FLAC decoders only implement
just enough features required for playback of the most common FLAC just enough features required for playback of the most common FLAC
files. files.
This appendix provides some considerations for encoder This appendix provides some considerations for encoder
implementations aiming to create highly compatible files. As this implementations aiming to create highly compatible files. As this
topic is one that might change after this document is finished, topic is one that might change after this document is published,
consult [FLAC-wiki-interoperability] for more up-to-date information. consult [FLAC-wiki-interoperability] for more up-to-date information.
C.1. Features outside of the streamable subset C.1. Features outside of the Streamable Subset
As described in Section 7, FLAC specifies a subset of its As described in Section 7, FLAC specifies a subset of its
capabilities as the FLAC streamable subset. Certain decoders may capabilities as the FLAC streamable subset. Certain decoders may
choose to only decode FLAC files conforming to the limitations choose to only decode FLAC files conforming to the limitations
imposed by the streamable subset. Therefore, maximum compatibility imposed by the streamable subset. Therefore, maximum compatibility
with decoders is achieved when the limitations of the FLAC streamable with decoders is achieved when the limitations of the FLAC streamable
subset are followed when creating FLAC files. subset are followed when creating FLAC files.
C.2. Variable block size C.2. Variable Block Size
Because it is often difficult to find the optimal arrangement of Because it is often difficult to find the optimal arrangement of
block sizes for maximum compression, most encoders choose to create block sizes for maximum compression, most encoders choose to create
files with a fixed block size. Because of this, many decoder files with a fixed block size. Because of this, many decoder
implementations receive minimal use when handling variable block size implementations receive minimal use when handling variable block size
streams, and this can reveal bugs or reveal that implementations do streams, and this can reveal bugs or reveal that implementations do
not decode them at all. Furthermore, as explained in Appendix B.1, not decode them at all. Furthermore, as explained in Appendix B.1,
there have been some changes to the way variable block size streams there have been some changes to the way variable block size streams
were encoded. Because of this, maximum compatibility with decoders are encoded. Because of this, maximum compatibility with decoders is
is achieved when FLAC files are created using fixed block size achieved when FLAC files are created using fixed block size streams.
C.3. 5-bit Rice parameter C.3. 5-Bit Rice Parameter
As the addition of the 5-bit Rice parameter, as described in As the addition of the 5-bit Rice parameter (described in
Appendix B.3, occurred quite a few years after the FLAC format was Appendix B.3) occurred quite a few years after the FLAC format was
first introduced, some early decoders might not be able to decode first introduced, some early decoders might not be able to decode
files containing such Rice parameters. The introduction of this was files containing such Rice parameters. The introduction of this was
specifically aimed at improving compression of 24-bit PCM audio, and specifically aimed at improving compression of 24-bit PCM audio, and
compression of 16-bit PCM audio only rarely benefits from using 5-bit compression of 16-bit PCM audio only rarely benefits from using 5-bit
Rice parameters. Therefore, maximum compatibility with decoders is Rice parameters. Therefore, maximum compatibility with decoders is
achieved when FLAC files containing audio with a bit depth of 16 bits achieved when FLAC files containing audio with a bit depth of 16 bits
or lower are created without any use of 5-bit Rice parameters. or less are created without any use of 5-bit Rice parameters.
C.4. Rice escape code C.4. Rice Escape Code
Escaped Rice partitions are seldom used, as it turned out their use Escaped Rice partitions are seldom used, as it turned out their use
provides only a very small compression improvement. As many encoders provides only a very small compression improvement. As many encoders
therefore do not use these by default or are not capable of producing do not use these by default or are not capable of producing them at
them at all, it is likely that many decoder implementations are not all, it is likely that many decoder implementations are not able to
able to decode them correctly. Therefore, maximum compatibility with decode them correctly. Therefore, maximum compatibility with
decoders is achieved when FLAC files are created without any use of decoders is achieved when FLAC files are created without any use of
escaped Rice partitions. escaped Rice partitions.
C.5. Uncommon block size C.5. Uncommon Block Size
For unknown reasons, some decoders have chosen to support only common For unknown reasons, some decoders have chosen to support only common
block sizes for all but the last block of a stream. Therefore, block sizes for all but the last block of a stream. Therefore,
maximum compatibility with decoders is achieved when creating FLAC maximum compatibility with decoders is achieved when creating FLAC
files using common block sizes, as listed in Section 9.1.1, for all files using common block sizes, as listed in Section 9.1.1, for all
but the last block of a stream. but the last block of a stream.
C.6. Uncommon bit depth C.6. Uncommon Bit Depth
Most audio is stored in bit depths that are a whole number of bytes, Most audio is stored in bit depths that are a whole number of bytes,
e.g., 8, 16 or 24 bit. There is however audio with different bit e.g., 8, 16, or 24 bits. However, there is audio with different bit
depths. A few examples: depths. A few examples:
* DVD-Audio has the possibility to store 20 bit PCM audio. * DVD-Audio has the possibility to store 20-bit PCM audio.
* DAT and DV can store 12 bit PCM audio.
* NICAM-728 samples at 14 bit, which is companded to 10 bit. * DAT and DV can store 12-bit PCM audio.
* 8-bit µ-law can be losslessly converted to 14 bit (Linear) PCM.
* 8-bit A-law can be losslessly converted to 13 bit (Linear) PCM. * NICAM-728 samples at 14 bits, which is companded to 10 bits.
* 8-bit µ-law can be losslessly converted to 14-bit (Linear) PCM.
* 8-bit A-law can be losslessly converted to 13-bit (Linear) PCM.
The FLAC format can contain these bit depths directly, but because The FLAC format can contain these bit depths directly, but because
they are uncommon, some decoders are not able to process the they are uncommon, some decoders are not able to process the
resulting files correctly. It is possible to store these formats in resulting files correctly. It is possible to store these formats in
a FLAC file with a more common bit depth without sacrificing a FLAC file with a more common bit depth without sacrificing
compression by padding each sample with zero bits to a bit depth that compression by padding each sample with zero bits to a bit depth that
is a whole byte. The FLAC format can efficiently compress these is a whole byte. The FLAC format can efficiently compress these
wasted bits. See Section 9.2.2 for details. wasted bits. See Section 9.2.2 for details.
Therefore, maximum compatibility with decoders is achieved when FLAC Therefore, maximum compatibility with decoders is achieved when FLAC
files are created by padding samples of such audio with zero bits to files are created by padding samples of such audio with zero bits to
the bit depth that is the next whole number of bytes. the bit depth that is the next whole number of bytes.
In cases where the original signal is already padded, this operation In cases where the original signal is already padded, this operation
cannot be reversed losslessly without knowing the original bit depth. cannot be reversed losslessly without knowing the original bit depth.
To leave no ambiguity, the original bit depth needs to be stored, for To leave no ambiguity, the original bit depth needs to be stored, for
example, in a vorbis comment field, by storing the header of the example, in a Vorbis comment field, by storing the header of the
original file, or in a description of the file. The choice of a original file, or in a description of the file. The choice of a
suitable method is left to the implementer. suitable method is left to the implementor.
Besides audio with a 'non-whole byte' bit depth, some decoder Besides audio with a "non-whole byte" bit depth, some decoder
implementations have chosen to only accept FLAC files coding for PCM implementations have chosen to only accept FLAC files coding for PCM
audio with a bit depth of 16 bit. Many implementations support bit audio with a bit depth of 16 bits. Many implementations support bit
depths up to 24 bit but no higher. Consult depths up to 24 bits, but no higher. Consult
[FLAC-wiki-interoperability] for more up-to-date information. [FLAC-wiki-interoperability] for more up-to-date information.
C.7. Multi-channel audio and uncommon sample rates C.7. Multi-Channel Audio and Uncommon Sample Rates
Many FLAC audio players are unable to render multi-channel audio or Many FLAC audio players are unable to render multi-channel audio or
audio with an uncommon sample rate. While this is not a concern audio with an uncommon sample rate. While this is not a concern
specific to the FLAC format, it is of note when requiring maximum specific to the FLAC format, it is of note when requiring maximum
compatibility with decoders. Unlike the previously mentioned compatibility with decoders. Unlike the previously mentioned
interoperability considerations, this is one where compatibility interoperability considerations, this is one where compatibility
cannot be improved without sacrificing the lossless nature of the cannot be improved without sacrificing the lossless nature of the
FLAC format. FLAC format.
From a non-exhaustive inquiry, it seems that a non-negligible amount From a non-exhaustive inquiry, it seems that a non-negligible number
of players, especially hardware players, do not support audio with 3 of players, especially hardware players, do not support audio with 3
or more channels or sample rates other than those considered common, or more channels or sample rates other than those considered common;
see Section 9.1.2. see Section 9.1.2.
For those players that do support and are able to render multi- For those players that do support and are able to render multi-
channel audio, many do not parse and use the channel audio, many do not parse and use the
WAVEFORMATEXTENSIBLE_CHANNEL_MASK tag (see Section 8.6.2). This too WAVEFORMATEXTENSIBLE_CHANNEL_MASK tag (see Section 8.6.2). This is
is an interoperability consideration where compatibility cannot be also an interoperability consideration because compatibility cannot
improved without sacrificing the lossless nature of the FLAC format. be improved without sacrificing the lossless nature of the FLAC
C.8. Changing audio properties mid-stream C.8. Changing Audio Properties Mid-Stream
Each FLAC frame header stores the audio sample rate, number of bits Each FLAC frame header stores the audio sample rate, number of bits
per sample, and number of channels independently of the streaminfo per sample, and number of channels independently of the streaminfo
metadata block and other frame headers. This was done to permit metadata block and other frame headers. This was done to permit
multicasting of FLAC files, but it also allows these properties to multicasting of FLAC files, but it also allows these properties to
change mid-stream. However, many FLAC decoders do not handle such change mid-stream. However, many FLAC decoders do not handle such
changes, as few other formats are capable of holding such streams and changes, as few other formats are capable of holding such streams and
changing playback properties during playback is often not possible changing playback properties during playback is often not possible
without interrupting playback. Also, as explained in Section 9, without interrupting playback. Also, as explained in Section 9,
using this feature of FLAC results in various practical problems. using this feature of FLAC results in various practical problems.
skipping to change at page 71, line 30 skipping to change at line 3111
such a stream correctly. Therefore, maximum compatibility with such a stream correctly. Therefore, maximum compatibility with
decoders is achieved when FLAC files are created with a single set of decoders is achieved when FLAC files are created with a single set of
audio properties, in which the properties coded in the streaminfo audio properties, in which the properties coded in the streaminfo
metadata block (see Section 8.2) and the properties coded in all metadata block (see Section 8.2) and the properties coded in all
frame headers (see Section 9.1) are the same. This can be achieved frame headers (see Section 9.1) are the same. This can be achieved
by splitting up an input stream with changing audio properties at the by splitting up an input stream with changing audio properties at the
points where these properties change into separate streams or files. points where these properties change into separate streams or files.
Appendix D. Examples Appendix D. Examples
This informational appendix contains short example FLAC files that This informational appendix contains short examples of FLAC files
are decoded step by step. These examples provide a more engaging way that are decoded step by step. These examples provide a more
to understand the FLAC format than the formal specification. The engaging way to understand the FLAC format than the formal
text explaining these examples assumes the reader has at least specification. The text explaining these examples assumes the reader
cursorily read the specification and that the reader refers to the has at least cursorily read the specification and that the reader
specification for explanation of the terminology used. These refers to the specification for explanation of the terminology used.
examples mostly focus on the layout of several metadata blocks and These examples mostly focus on the layout of several metadata blocks,
subframe types and the implications of certain aspects (for example, subframe types, and the implications of certain aspects (e.g., wasted
wasted bits and stereo decorrelation) on this layout. bits and stereo decorrelation) on this layout.
The examples feature files generated by various FLAC encoders. These The examples feature files generated by various FLAC encoders. These
are presented in hexadecimal or binary format, followed by tables and are presented in hexadecimal or binary format, followed by tables and
text referring to various features by their starting bit positions in text referring to various features by their starting bit positions in
these representations. Each starting position (shortened to 'start' these representations. Each starting position (shortened to "start"
in the tables) is a hexadecimal byte position and a start bit within in the tables) is a hexadecimal byte position and a start bit within
that byte, separated by a plus sign. Counts for these start at zero. that byte, separated by a plus sign. Counts for these start at zero.
For example, a feature starting at the 3rd bit of the 17th byte is For example, a feature starting at the 3rd bit of the 17th byte is
referred to as starting at 0x10+2. The files that are explored in referred to as starting at 0x10+2. The files that are explored in
these examples can be found at [FLAC-specification-github]. these examples can be found at [FLAC-specification-github].
All data in this appendix has been thoroughly verified. However, as All data in this appendix has been thoroughly verified. However, as
this appendix is informational, if any information here conflicts this appendix is informational, if any information here conflicts
with statements in the formal specification, the latter takes with statements in the formal specification, the latter takes
precedence. precedence.
D.1. Decoding example 1 D.1. Decoding Example 1
This very short example FLAC file codes for PCM audio that has two This very short example FLAC file codes for PCM audio that has two
channels, each containing one sample. The focus of this example is channels, each containing one sample. The focus of this example is
on the essential parts of a FLAC file. on the essential parts of a FLAC file.
D.1.1. Example file 1 in hexadecimal representation D.1.1. Example File 1 in Hexadecimal Representation
00000000: 664c 6143 8000 0022 1000 1000 fLaC...".... 00000000: 664c 6143 8000 0022 1000 1000 fLaC..."....
0000000c: 0000 0f00 000f 0ac4 42f0 0000 ........B... 0000000c: 0000 0f00 000f 0ac4 42f0 0000 ........B...
00000018: 0001 3e84 b418 07dc 6903 0758 ..>.....i..X 00000018: 0001 3e84 b418 07dc 6903 0758 ..>.....i..X
00000024: 6a3d ad1a 2e0f fff8 6918 0000 j=......i... 00000024: 6a3d ad1a 2e0f fff8 6918 0000 j=......i...
00000030: bf03 58fd 0312 8baa 9a ..X...... 00000030: bf03 58fd 0312 8baa 9a ..X......
D.1.2. Example file 1 in binary representation D.1.2. Example File 1 in Binary Representation
00000000: 01100110 01001100 01100001 01000011 fLaC 00000000: 01100110 01001100 01100001 01000011 fLaC
00000004: 10000000 00000000 00000000 00100010 ..." 00000004: 10000000 00000000 00000000 00100010 ..."
00000008: 00010000 00000000 00010000 00000000 .... 00000008: 00010000 00000000 00010000 00000000 ....
0000000c: 00000000 00000000 00001111 00000000 .... 0000000c: 00000000 00000000 00001111 00000000 ....
00000010: 00000000 00001111 00001010 11000100 .... 00000010: 00000000 00001111 00001010 11000100 ....
00000014: 01000010 11110000 00000000 00000000 B... 00000014: 01000010 11110000 00000000 00000000 B...
00000018: 00000000 00000001 00111110 10000100 ..>. 00000018: 00000000 00000001 00111110 10000100 ..>.
0000001c: 10110100 00011000 00000111 11011100 .... 0000001c: 10110100 00011000 00000111 11011100 ....
00000020: 01101001 00000011 00000111 01011000 i..X 00000020: 01101001 00000011 00000111 01011000 i..X
00000024: 01101010 00111101 10101101 00011010 j=.. 00000024: 01101010 00111101 10101101 00011010 j=..
00000028: 00101110 00001111 11111111 11111000 .... 00000028: 00101110 00001111 11111111 11111000 ....
0000002c: 01101001 00011000 00000000 00000000 i... 0000002c: 01101001 00011000 00000000 00000000 i...
00000030: 10111111 00000011 01011000 11111101 ..X. 00000030: 10111111 00000011 01011000 11111101 ..X.
00000034: 00000011 00010010 10001011 10101010 .... 00000034: 00000011 00010010 10001011 10101010 ....
00000038: 10011010 00000038: 10011010
D.1.3. Signature and streaminfo D.1.3. Signature and Streaminfo
The first 4 bytes of the file contain the fLaC file signature. The first 4 bytes of the file contain the fLaC file signature.
Directly following it is a metadata block. The signature and the Directly following it is a metadata block. The signature and the
first metadata block header are broken down in the following table. first metadata block header are broken down in the following table.
+========+=========+============+===========================+ +========+=========+============+===========================+
| Start | Length | Contents | Description | | Start | Length | Contents | Description |
+========+=========+============+===========================+ +========+=========+============+===========================+
| 0x00+0 | 4 bytes | 0x664C6143 | fLaC | | 0x00+0 | 4 bytes | 0x664C6143 | fLaC |
+--------+---------+------------+---------------------------+ +--------+---------+------------+---------------------------+
| 0x04+0 | 1 bit | 0b1 | Last metadata block | | 0x04+0 | 1 bit | 0b1 | Last metadata block |
+--------+---------+------------+---------------------------+ +--------+---------+------------+---------------------------+
| 0x04+1 | 7 bits | 0b0000000 | Streaminfo metadata block | | 0x04+1 | 7 bits | 0b0000000 | Streaminfo metadata block |
+--------+---------+------------+---------------------------+ +--------+---------+------------+---------------------------+
| 0x05+0 | 3 bytes | 0x000022 | Length 34 byte | | 0x05+0 | 3 bytes | 0x000022 | Length of 34 bytes |
+--------+---------+------------+---------------------------+ +--------+---------+------------+---------------------------+
Table 27 Table 27
As the header indicates that this is the last metadata block, the As the header indicates that this is the last metadata block, the
position of the first audio frame can now be calculated as the position of the first audio frame can now be calculated as the
position of the first byte after the metadata block header + the position of the first byte after the metadata block header + the
length of the block, i.e., 8+34 = 42 or 0x2a. As can be seen, 0x2a length of the block, i.e., 8+34 = 42 or 0x2a. Thus, 0x2a indeed
indeed contains the frame sync code for fixed block size streams, contains the frame sync code for fixed block size streams -- 0xfff8.
The streaminfo metadata block contents are broken down in the The streaminfo metadata block contents are broken down in the
following table. following table.
+========+==========+====================+=========================+ +========+==========+====================+==========================+
| Start | Length | Contents | Description | | Start | Length | Contents | Description |
+========+==========+====================+=========================+ +========+==========+====================+==========================+
| 0x08+0 | 2 bytes | 0x1000 | Min. block size 4096 | | 0x08+0 | 2 bytes | 0x1000 | Min. block size 4096 |
+--------+----------+--------------------+-------------------------+ +--------+----------+--------------------+--------------------------+
| 0x0a+0 | 2 bytes | 0x1000 | Max. block size 4096 | | 0x0a+0 | 2 bytes | 0x1000 | Max. block size 4096 |
+--------+----------+--------------------+-------------------------+ +--------+----------+--------------------+--------------------------+
| 0x0c+0 | 3 bytes | 0x00000f | Min. frame size 15 byte | | 0x0c+0 | 3 bytes | 0x00000f | Min. frame size 15 bytes |
+--------+----------+--------------------+-------------------------+ +--------+----------+--------------------+--------------------------+
| 0x0f+0 | 3 bytes | 0x00000f | Max. frame size 15 byte | | 0x0f+0 | 3 bytes | 0x00000f | Max. frame size 15 bytes |
+--------+----------+--------------------+-------------------------+ +--------+----------+--------------------+--------------------------+
| 0x12+0 | 20 bits | 0x0ac4, 0b0100 | Sample rate 44100 hertz | | 0x12+0 | 20 bits | 0x0ac4, 0b0100 | Sample rate 44100 hertz |
+--------+----------+--------------------+-------------------------+ +--------+----------+--------------------+--------------------------+
| 0x14+4 | 3 bits | 0b001 | 2 channels | | 0x14+4 | 3 bits | 0b001 | 2 channels |
+--------+----------+--------------------+-------------------------+ +--------+----------+--------------------+--------------------------+
| 0x14+7 | 5 bits | 0b01111 | Sample bit depth 16 | | 0x14+7 | 5 bits | 0b01111 | Sample bit depth 16 |
+--------+----------+--------------------+-------------------------+ +--------+----------+--------------------+--------------------------+
| 0x15+4 | 36 bits | 0b0000, 0x00000001 | Total no. of samples 1 | | 0x15+4 | 36 bits | 0b0000, 0x00000001 | Total no. of samples 1 |
+--------+----------+--------------------+-------------------------+ +--------+----------+--------------------+--------------------------+
| 0x1a | 16 bytes | (...) | MD5 checksum | | 0x1a | 16 | (...) | MD5 checksum |
+--------+----------+--------------------+-------------------------+ | | bytes | | |
Table 28 Table 28
The minimum and maximum block size are both 4096. This was The minimum and maximum block sizes are both 4096. This was
apparently the block size the encoder planned to use, but as only 1 apparently the block size the encoder planned to use, but as only 1
interchannel sample was provided, no frames with 4096 samples are interchannel sample was provided, no frames with 4096 samples are
actually present in this file. actually present in this file.
Note that anywhere a number of samples is mentioned (block size, Note that anywhere a number of samples is mentioned (block size,
total number of samples, sample rate), interchannel samples are total number of samples, sample rate), interchannel samples are
meant. meant.
The MD5 checksum (starting at 0x1a) is 0x3e84 b418 07dc 6903 0758 The MD5 checksum (starting at 0x1a) is 0x3e84 b418 07dc 6903 0758
6a3d ad1a 2e0f. This will be validated after decoding the samples. 6a3d ad1a 2e0f. This will be validated after decoding the samples.
D.1.4. Audio frames D.1.4. Audio Frames
The frame header starts at position 0x2a and is broken down in the The frame header starts at position 0x2a and is broken down in the
following table. following table.
+========+=========+=================+===================+ +========+=========+=================+===================+
| Start | Length | Contents | Description | | Start | Length | Contents | Description |
+========+=========+=================+===================+ +========+=========+=================+===================+
| 0x2a+0 | 15 bits | 0xff, 0b1111100 | frame sync | | 0x2a+0 | 15 bits | 0xff, 0b1111100 | frame sync |
+--------+---------+-----------------+-------------------+ +--------+---------+-----------------+-------------------+
| 0x2b+7 | 1 bit | 0b0 | blocking strategy | | 0x2b+7 | 1 bit | 0b0 | blocking strategy |
+--------+---------+-----------------+-------------------+ +--------+---------+-----------------+-------------------+
| 0x2c+0 | 4 bits | 0b0110 | 8-bit block size | | 0x2c+0 | 4 bits | 0b0110 | 8-bit block size |
| | | | further down | | | | | further down |
+--------+---------+-----------------+-------------------+ +--------+---------+-----------------+-------------------+
| 0x2c+4 | 4 bits | 0b1001 | sample rate 44.1 | | 0x2c+4 | 4 bits | 0b1001 | sample rate 44.1 |
| | | | kHz | | | | | kHz |
+--------+---------+-----------------+-------------------+ +--------+---------+-----------------+-------------------+
| 0x2d+0 | 4 bits | 0b0001 | stereo, no | | 0x2d+0 | 4 bits | 0b0001 | stereo, no |
| | | | decorrelation | | | | | decorrelation |
+--------+---------+-----------------+-------------------+ +--------+---------+-----------------+-------------------+
| 0x2d+4 | 3 bits | 0b100 | bit depth 16 bit | | 0x2d+4 | 3 bits | 0b100 | bit depth 16 bits |
+--------+---------+-----------------+-------------------+ +--------+---------+-----------------+-------------------+
| 0x2d+7 | 1 bit | 0b0 | mandatory 0 bit | | 0x2d+7 | 1 bit | 0b0 | mandatory 0 bits |
+--------+---------+-----------------+-------------------+ +--------+---------+-----------------+-------------------+
| 0x2e+0 | 1 byte | 0x00 | frame number 0 | | 0x2e+0 | 1 byte | 0x00 | frame number 0 |
+--------+---------+-----------------+-------------------+ +--------+---------+-----------------+-------------------+
| 0x2f+0 | 1 byte | 0x00 | block size 1 | | 0x2f+0 | 1 byte | 0x00 | block size 1 |
+--------+---------+-----------------+-------------------+ +--------+---------+-----------------+-------------------+
| 0x30+0 | 1 byte | 0xbf | frame header CRC | | 0x30+0 | 1 byte | 0xbf | frame header CRC |
+--------+---------+-----------------+-------------------+ +--------+---------+-----------------+-------------------+
Table 29 Table 29
As the stream is a fixed block size stream, the number at 0x2e As the stream is a fixed block size stream, the number at 0x2e
contains a frame number. As the value is smaller than 128, only 1 contains a frame number. Because the value is smaller than 128, only
byte is used for the encoding. 1 byte is used for the encoding.
At byte 0x31, the first subframe starts, which is broken down in the At byte 0x31, the first subframe starts, which is broken down in the
following table. following table.
+========+=========+================+=========================+ +========+=========+================+=========================+
| Start | Length | Contents | Description | | Start | Length | Contents | Description |
+========+=========+================+=========================+ +========+=========+================+=========================+
| 0x31+0 | 1 bit | 0b0 | mandatory 0 bit | | 0x31+0 | 1 bit | 0b0 | mandatory 0 bit |
+--------+---------+----------------+-------------------------+ +--------+---------+----------------+-------------------------+
| 0x31+1 | 6 bits | 0b000001 | verbatim subframe | | 0x31+1 | 6 bits | 0b000001 | verbatim subframe |
+--------+---------+----------------+-------------------------+ +--------+---------+----------------+-------------------------+
| 0x31+7 | 1 bit | 0b1 | wasted bits used | | 0x31+7 | 1 bit | 0b1 | wasted bits used |
+--------+---------+----------------+-------------------------+ +--------+---------+----------------+-------------------------+
| 0x32+0 | 2 bits | 0b01 | 2 wasted bits used | | 0x32+0 | 2 bits | 0b01 | 2 wasted bits used |
+--------+---------+----------------+-------------------------+ +--------+---------+----------------+-------------------------+
| 0x32+2 | 14 bits | 0b011000, 0xfd | 14-bit unencoded sample | | 0x32+2 | 14 bits | 0b011000, 0xfd | 14-bit unencoded sample |
+--------+---------+----------------+-------------------------+ +--------+---------+----------------+-------------------------+
Table 30 Table 30
As the wasted bits flag is 1 in this subframe, an unary coded number As the wasted bits flag is 1 in this subframe, a unary-coded number
follows. Starting at 0x32, we see 0b01, which unary codes for 1, follows. Starting at 0x32, we see 0b01, which unary codes for 1,
meaning this subframe uses 2 wasted bits. meaning that this subframe uses 2 wasted bits.
As this is a verbatim subframe, the subframe only contains unencoded As this is a verbatim subframe, the subframe only contains unencoded
sample values. With a block size of 1, it contains only a single sample values. With a block size of 1, it contains only a single
sample. The bit depth of the audio is 16 bits, but as the subframe sample. The bit depth of the audio is 16 bits, but as the subframe
header signals the use of 2 wasted bits, only 14 bits are stored. As header signals the use of 2 wasted bits, only 14 bits are stored. As
no stereo decorrelation is used, a bit depth increase for the side no stereo decorrelation is used, a bit depth increase for the side
channel is not applicable. So, the next 14 bits (starting at channel is not applicable. So, the next 14 bits (starting at
position 0x32+2) contain the unencoded sample coded big-endian, position 0x32+2) contain the unencoded sample coded big-endian,
signed two's complement. The value reads 0b011000 11111101, or 6397. signed two's complement. The value reads 0b011000 11111101, or 6397.
This value needs to be shifted left by 2 bits, to account for the This value needs to be shifted left by 2 bits to account for the
wasted bits. The value is then 0b011000 11111101 00, or 25588. wasted bits. The value is then 0b011000 11111101 00, or 25588.
The second subframe starts at 0x34, and is broken down in the The second subframe starts at 0x34 and is broken down in the
following table. following table.
+========+=========+==============+=========================+ +========+=========+==============+=========================+
| Start | Length | Contents | Description | | Start | Length | Contents | Description |
+========+=========+==============+=========================+ +========+=========+==============+=========================+
| 0x34+0 | 1 bit | 0b0 | mandatory 0 bit | | 0x34+0 | 1 bit | 0b0 | mandatory 0 bit |
+--------+---------+--------------+-------------------------+ +--------+---------+--------------+-------------------------+
| 0x34+1 | 6 bits | 0b000001 | verbatim subframe | | 0x34+1 | 6 bits | 0b000001 | verbatim subframe |
+--------+---------+--------------+-------------------------+ +--------+---------+--------------+-------------------------+
| 0x34+7 | 1 bit | 0b1 | wasted bits used | | 0x34+7 | 1 bit | 0b1 | wasted bits used |
+--------+---------+--------------+-------------------------+ +--------+---------+--------------+-------------------------+
| 0x35+0 | 4 bits | 0b0001 | 4 wasted bits used | | 0x35+0 | 4 bits | 0b0001 | 4 wasted bits used |
+--------+---------+--------------+-------------------------+ +--------+---------+--------------+-------------------------+
| 0x35+4 | 12 bits | 0b0010, 0x8b | 12-bit unencoded sample | | 0x35+4 | 12 bits | 0b0010, 0x8b | 12-bit unencoded sample |
+--------+---------+--------------+-------------------------+ +--------+---------+--------------+-------------------------+
Table 31 Table 31
Here the wasted bits flag is also one, but the unary coded number The wasted bits flag is also one, but the unary-coded number that
that follows it is 4 bit long, indicating the use of 4 wasted bits. follows it is 4 bits long, indicating the use of 4 wasted bits. This
This means the sample is stored in 12 bits. The sample value is means the sample is stored in 12 bits. The sample value is 0b0010
0b0010 10001011, or 651. This value now has to be shifted left by 4 10001011, or 651. This value now has to be shifted left by 4 bits,
bits, i.e., 0b0010 10001011 0000 or 10416. i.e., 0b0010 10001011 0000, or 10416.
At this point, we would undo stereo decorrelation if that was At this point, we would undo stereo decorrelation if that was
applicable. applicable.
As the last subframe ends byte-aligned, no padding bits follow it. As the last subframe ends byte-aligned, no padding bits follow it.
The next 2 bytes, starting at 0x38, contain the frame CRC. As this The next 2 bytes, starting at 0x38, contain the frame CRC. As this
is the only frame in the file, the file ends with the CRC. is the only frame in the file, the file ends with the CRC.
To validate the MD5 checksum, we line up the samples interleaved, To validate the MD5 checksum, we line up the samples interleaved,
byte-aligned, little endian, signed two's complement. The first byte-aligned, little-endian, signed two's complement. The first
sample, with value 25588, translates to 0xf463, the second sample, sample, with value 25588, translates to 0xf463, and the second
with value 10416, translates to 0xb028. When computing the MD5 sample, with value 10416, translates to 0xb028. When computing the
checksum with 0xf463b028 as input, we get the MD5 checksum found in MD5 checksum with 0xf463b028 as input, we get the MD5 checksum found
the header, so decoding was lossless. in the header, so decoding was lossless.
D.2. Decoding example 2 D.2. Decoding Example 2
This FLAC file is larger than the first example, but still contains This FLAC file is larger than the first example, but still contains
very little audio. The focus of this example is on decoding a very little audio. The focus of this example is on decoding a
subframe with a fixed predictor and a coded residual, but it also subframe with a fixed predictor and a coded residual, but it also
contains a very short seektable, a Vorbis comment metadata block, and contains a very short seektable, a Vorbis comment metadata block, and
a padding metadata block. a padding metadata block.
D.2.1. Example file 2 in hexadecimal representation D.2.1. Example File 2 in Hexadecimal Representation
00000000: 664c 6143 0000 0022 0010 0010 fLaC...".... 00000000: 664c 6143 0000 0022 0010 0010 fLaC..."....
0000000c: 0000 1700 0044 0ac4 42f0 0000 .....D..B... 0000000c: 0000 1700 0044 0ac4 42f0 0000 .....D..B...
00000018: 0013 d5b0 5649 75e9 8b8d 8b93 ....VIu..... 00000018: 0013 d5b0 5649 75e9 8b8d 8b93 ....VIu.....
00000024: 0422 757b 8103 0300 0012 0000 ."u{........ 00000024: 0422 757b 8103 0300 0012 0000 ."u{........
00000030: 0000 0000 0000 0000 0000 0000 ............ 00000030: 0000 0000 0000 0000 0000 0000 ............
0000003c: 0000 0010 0400 003a 2000 0000 .......: ... 0000003c: 0000 0010 0400 003a 2000 0000 .......: ...
00000048: 7265 6665 7265 6e63 6520 6c69 reference li 00000048: 7265 6665 7265 6e63 6520 6c69 reference li
00000054: 6246 4c41 4320 312e 332e 3320 bFLAC 1.3.3 00000054: 6246 4c41 4320 312e 332e 3320 bFLAC 1.3.3
00000060: 3230 3139 3038 3034 0100 0000 20190804.... 00000060: 3230 3139 3038 3034 0100 0000 20190804....
0000006c: 0e00 0000 5449 544c 453d d7a9 ....TITLE=.. 0000006c: 0e00 0000 5449 544c 453d d7a9 ....TITLE=..
00000078: d79c d795 d79d 8100 0006 0000 ............ 00000078: d79c d795 d79d 8100 0006 0000 ............
00000084: 0000 0000 fff8 6998 000f 9912 ......i..... 00000084: 0000 0000 fff8 6998 000f 9912 ......i.....
00000090: 0867 0162 3d14 4299 8f5d f70d .g.b=.B..].. 00000090: 0867 0162 3d14 4299 8f5d f70d .g.b=.B..]..
0000009c: 6fe0 0c17 caeb 2100 0ee7 a77a o.....!....z 0000009c: 6fe0 0c17 caeb 2100 0ee7 a77a o.....!....z
000000a8: 24a1 590c 1217 b603 097b 784f $.Y......{xO 000000a8: 24a1 590c 1217 b603 097b 784f $.Y......{xO
000000b4: aa9a 33d2 85e0 70ad 5b1b 4851 ..3...p.[.HQ 000000b4: aa9a 33d2 85e0 70ad 5b1b 4851 ..3...p.[.HQ
000000c0: b401 0d99 d2cd 1a68 f1e6 b810 .......h.... 000000c0: b401 0d99 d2cd 1a68 f1e6 b810 .......h....
000000cc: fff8 6918 0102 a402 c382 c40b ..i......... 000000cc: fff8 6918 0102 a402 c382 c40b ..i.........
000000d8: c14a 03ee 48dd 03b6 7c13 30 .J..H...|.0 000000d8: c14a 03ee 48dd 03b6 7c13 30 .J..H...|.0
D.2.2. Example file 2 in binary representation (only audio frames) D.2.2. Example File 2 in Binary Representation (Only Audio Frames)
00000088: 11111111 11111000 01101001 10011000 ..i. 00000088: 11111111 11111000 01101001 10011000 ..i.
0000008c: 00000000 00001111 10011001 00010010 .... 0000008c: 00000000 00001111 10011001 00010010 ....
00000090: 00001000 01100111 00000001 01100010 .g.b 00000090: 00001000 01100111 00000001 01100010 .g.b
00000094: 00111101 00010100 01000010 10011001 =.B. 00000094: 00111101 00010100 01000010 10011001 =.B.
00000098: 10001111 01011101 11110111 00001101 .].. 00000098: 10001111 01011101 11110111 00001101 .]..
0000009c: 01101111 11100000 00001100 00010111 o... 0000009c: 01101111 11100000 00001100 00010111 o...
000000a0: 11001010 11101011 00100001 00000000 ..!. 000000a0: 11001010 11101011 00100001 00000000 ..!.
000000a4: 00001110 11100111 10100111 01111010 ...z 000000a4: 00001110 11100111 10100111 01111010 ...z
000000a8: 00100100 10100001 01011001 00001100 $.Y. 000000a8: 00100100 10100001 01011001 00001100 $.Y.
skipping to change at page 78, line 5 skipping to change at line 3401
000000c0: 10110100 00000001 00001101 10011001 .... 000000c0: 10110100 00000001 00001101 10011001 ....
000000c4: 11010010 11001101 00011010 01101000 ...h 000000c4: 11010010 11001101 00011010 01101000 ...h
000000c8: 11110001 11100110 10111000 00010000 .... 000000c8: 11110001 11100110 10111000 00010000 ....
000000cc: 11111111 11111000 01101001 00011000 ..i. 000000cc: 11111111 11111000 01101001 00011000 ..i.
000000d0: 00000001 00000010 10100100 00000010 .... 000000d0: 00000001 00000010 10100100 00000010 ....
000000d4: 11000011 10000010 11000100 00001011 .... 000000d4: 11000011 10000010 11000100 00001011 ....
000000d8: 11000001 01001010 00000011 11101110 .J.. 000000d8: 11000001 01001010 00000011 11101110 .J..
000000dc: 01001000 11011101 00000011 10110110 H... 000000dc: 01001000 11011101 00000011 10110110 H...
000000e0: 01111100 00010011 00110000 |.0 000000e0: 01111100 00010011 00110000 |.0
D.2.3. Streaminfo metadata block D.2.3. Streaminfo Metadata Block
Most of the streaminfo block, including its header, is the same as in Most of the streaminfo block, including its header, is the same as in
example 1, so only parts that are different are listed in the example 1, so only parts that are different are listed in the
following table. following table.
+========+=========+============+=============================+ +========+=========+============+=============================+
| Start | Length | Contents | Description | | Start | Length | Contents | Description |
+========+=========+============+=============================+ +========+=========+============+=============================+
| 0x04+0 | 1 bit | 0b0 | Not the last metadata block | | 0x04+0 | 1 bit | 0b0 | Not the last metadata block |
+--------+---------+------------+-----------------------------+ +--------+---------+------------+-----------------------------+
| 0x08+0 | 2 bytes | 0x0010 | Min. block size 16 | | 0x08+0 | 2 bytes | 0x0010 | Min. block size 16 |
+--------+---------+------------+-----------------------------+ +--------+---------+------------+-----------------------------+
| 0x0a+0 | 2 bytes | 0x0010 | Max. block size 16 | | 0x0a+0 | 2 bytes | 0x0010 | Max. block size 16 |
+--------+---------+------------+-----------------------------+ +--------+---------+------------+-----------------------------+
| 0x0c+0 | 3 bytes | 0x000017 | Min. frame size 23 byte | | 0x0c+0 | 3 bytes | 0x000017 | Min. frame size 23 bytes |
+--------+---------+------------+-----------------------------+ +--------+---------+------------+-----------------------------+
| 0x0f+0 | 3 bytes | 0x000044 | Max. frame size 68 byte | | 0x0f+0 | 3 bytes | 0x000044 | Max. frame size 68 bytes |
+--------+---------+------------+-----------------------------+ +--------+---------+------------+-----------------------------+
| 0x15+4 | 36 bits | 0b0000, | Total no. of samples 19 | | 0x15+4 | 36 bits | 0b0000, | Total no. of samples 19 |
| | | 0x00000013 | | | | | 0x00000013 | |
+--------+---------+------------+-----------------------------+ +--------+---------+------------+-----------------------------+
| 0x1a | 16 | (...) | MD5 checksum | | 0x1a | 16 | (...) | MD5 checksum |
| | bytes | | | | | bytes | | |
+--------+---------+------------+-----------------------------+ +--------+---------+------------+-----------------------------+
Table 32 Table 32
This time, the minimum and maximum block sizes are reflected in the This time, the minimum and maximum block sizes are reflected in the
file: there is one block of 16 samples, the last block (which has 3 file: there is one block of 16 samples, and the last block (which has
samples) is not considered for the minimum block size. The MD5 3 samples) is not considered for the minimum block size. The MD5
checksum is 0xd5b0 5649 75e9 8b8d 8b93 0422 757b 8103, this will be checksum is 0xd5b0 5649 75e9 8b8d 8b93 0422 757b 8103. This will be
verified at the end of this example. verified at the end of this example.
D.2.4. Seektable D.2.4. Seektable
The seektable metadata block only holds one entry. It is not really The seektable metadata block only holds one entry. It is not really
useful here, as it points to the first frame, but it is enough for useful here, as it points to the first frame, but it is enough for
this example. The seektable metadata block is broken down in the this example. The seektable metadata block is broken down in the
following table. following table.
+========+========+====================+================+ +========+========+====================+================+
| Start | Length | Contents | Description | | Start | Length | Contents | Description |
+========+========+====================+================+ +========+========+====================+================+
| 0x2a+0 | 1 bit | 0b0 | Not the last | | 0x2a+0 | 1 bit | 0b0 | Not the last |
| | | | metadata block | | | | | metadata block |
+--------+--------+--------------------+----------------+ +--------+--------+--------------------+----------------+
| 0x2a+1 | 7 bits | 0b0000011 | Seektable | | 0x2a+1 | 7 bits | 0b0000011 | Seektable |
| | | | metadata block | | | | | metadata block |
+--------+--------+--------------------+----------------+ +--------+--------+--------------------+----------------+
| 0x2b+0 | 3 | 0x000012 | Length 18 byte | | 0x2b+0 | 3 | 0x000012 | Length 18 |
| | bytes | | | | | bytes | | bytes |
+--------+--------+--------------------+----------------+ +--------+--------+--------------------+----------------+
| 0x2e+0 | 8 | 0x0000000000000000 | Seekpoint to | | 0x2e+0 | 8 | 0x0000000000000000 | Seekpoint to |
| | bytes | | sample 0 | | | bytes | | sample 0 |
+--------+--------+--------------------+----------------+ +--------+--------+--------------------+----------------+
| 0x36+0 | 8 | 0x0000000000000000 | Seekpoint to | | 0x36+0 | 8 | 0x0000000000000000 | Seekpoint to |
| | bytes | | offset 0 | | | bytes | | offset 0 |
+--------+--------+--------------------+----------------+ +--------+--------+--------------------+----------------+
| 0x3e+0 | 2 | 0x0010 | Seekpoint to | | 0x3e+0 | 2 | 0x0010 | Seekpoint to |
| | bytes | | block size 16 | | | bytes | | block size 16 |
+--------+--------+--------------------+----------------+ +--------+--------+--------------------+----------------+
Table 33 Table 33
D.2.5. Vorbis comment D.2.5. Vorbis Comment
The Vorbis comment metadata block contains the vendor string and a The Vorbis comment metadata block contains the vendor string and a
single comment. It is broken down in the following table. single comment. It is broken down in the following table.
+========+==========+============+===============================+ +========+==========+============+===============================+
| Start | Length | Contents | Description | | Start | Length | Contents | Description |
+========+==========+============+===============================+ +========+==========+============+===============================+
| 0x40+0 | 1 bit | 0b0 | Not the last metadata block | | 0x40+0 | 1 bit | 0b0 | Not the last metadata block |
+--------+----------+------------+-------------------------------+ +--------+----------+------------+-------------------------------+
| 0x40+1 | 7 bits | 0b0000100 | Vorbis comment metadata block | | 0x40+1 | 7 bits | 0b0000100 | Vorbis comment metadata block |
+--------+----------+------------+-------------------------------+ +--------+----------+------------+-------------------------------+
| 0x41+0 | 3 bytes | 0x00003a | Length 58 byte | | 0x41+0 | 3 bytes | 0x00003a | Length 58 bytes |
+--------+----------+------------+-------------------------------+ +--------+----------+------------+-------------------------------+
| 0x44+0 | 4 bytes | 0x20000000 | Vendor string length 32 byte | | 0x44+0 | 4 bytes | 0x20000000 | Vendor string length 32 bytes |
+--------+----------+------------+-------------------------------+ +--------+----------+------------+-------------------------------+
| 0x48+0 | 32 bytes | (...) | Vendor string | | 0x48+0 | 32 bytes | (...) | Vendor string |
+--------+----------+------------+-------------------------------+ +--------+----------+------------+-------------------------------+
| 0x68+0 | 4 bytes | 0x01000000 | Number of fields 1 | | 0x68+0 | 4 bytes | 0x01000000 | Number of fields 1 |
+--------+----------+------------+-------------------------------+ +--------+----------+------------+-------------------------------+
| 0x6c+0 | 4 bytes | 0x0e000000 | Field length 14 byte | | 0x6c+0 | 4 bytes | 0x0e000000 | Field length 14 bytes |
+--------+----------+------------+-------------------------------+ +--------+----------+------------+-------------------------------+
| 0x70+0 | 14 bytes | (...) | Field contents | | 0x70+0 | 14 bytes | (...) | Field contents |
+--------+----------+------------+-------------------------------+ +--------+----------+------------+-------------------------------+
Table 34 Table 34
The vendor string is reference libFLAC 1.3.3 20190804, and the field The vendor string is reference libFLAC 1.3.3 20190804, and the field
contents of the only field is TITLE=שלום. The Vorbis comment field is contents of the only field is TITLE=שלום. The Vorbis comment field is
14 bytes but only 10 characters in size, because it contains four 14 bytes but only 10 characters in size, because it contains four
2-byte characters. 2-byte characters.
skipping to change at page 81, line 5 skipping to change at line 3516
+--------+---------+----------------+------------------------+ +--------+---------+----------------+------------------------+
| 0x7e+1 | 7 bits | 0b0000001 | Padding metadata block | | 0x7e+1 | 7 bits | 0b0000001 | Padding metadata block |
+--------+---------+----------------+------------------------+ +--------+---------+----------------+------------------------+
| 0x7f+0 | 3 bytes | 0x000006 | Length 6 byte | | 0x7f+0 | 3 bytes | 0x000006 | Length 6 byte |
+--------+---------+----------------+------------------------+ +--------+---------+----------------+------------------------+
| 0x82+0 | 6 bytes | 0x000000000000 | Padding bytes | | 0x82+0 | 6 bytes | 0x000000000000 | Padding bytes |
+--------+---------+----------------+------------------------+ +--------+---------+----------------+------------------------+
Table 35 Table 35
D.2.7. First audio frame D.2.7. First Audio Frame
The frame header starts at position 0x88 and is broken down in the The frame header starts at position 0x88 and is broken down in the
following table. following table.
+========+=========+=================+===================+ +========+=========+=================+===================+
| Start | Length | Contents | Description | | Start | Length | Contents | Description |
+========+=========+=================+===================+ +========+=========+=================+===================+
| 0x88+0 | 15 bits | 0xff, 0b1111100 | frame sync | | 0x88+0 | 15 bits | 0xff, 0b1111100 | frame sync |
+--------+---------+-----------------+-------------------+ +--------+---------+-----------------+-------------------+
| 0x89+7 | 1 bit | 0b0 | blocking strategy | | 0x89+7 | 1 bit | 0b0 | blocking strategy |
skipping to change at page 81, line 38 skipping to change at line 3549
+--------+---------+-----------------+-------------------+ +--------+---------+-----------------+-------------------+
| 0x8c+0 | 1 byte | 0x00 | frame number 0 | | 0x8c+0 | 1 byte | 0x00 | frame number 0 |
+--------+---------+-----------------+-------------------+ +--------+---------+-----------------+-------------------+
| 0x8d+0 | 1 byte | 0x0f | block size 16 | | 0x8d+0 | 1 byte | 0x0f | block size 16 |
+--------+---------+-----------------+-------------------+ +--------+---------+-----------------+-------------------+
| 0x8e+0 | 1 byte | 0x99 | frame header CRC | | 0x8e+0 | 1 byte | 0x99 | frame header CRC |
+--------+---------+-----------------+-------------------+ +--------+---------+-----------------+-------------------+
Table 36 Table 36
The first subframe starts at byte 0x8f, it is broken down in the The first subframe starts at byte 0x8f, and it is broken down in the
following table excluding the coded residual. As this subframe codes following table, excluding the coded residual. As this subframe
for a side channel, the bit depth is increased by 1 bit from 16 bit codes for a side channel, the bit depth is increased by 1 bit from 16
to 17 bit. This is most clearly present in the unencoded warm-up bits to 17 bits. This is most clearly present in the unencoded warm-
sample. up sample.
+========+=========+=============+===========================+ +========+=========+=============+===========================+
| Start | Length | Contents | Description | | Start | Length | Contents | Description |
+========+=========+=============+===========================+ +========+=========+=============+===========================+
| 0x8f+0 | 1 bit | 0b0 | mandatory 0 bit | | 0x8f+0 | 1 bit | 0b0 | mandatory 0 bit |
+--------+---------+-------------+---------------------------+ +--------+---------+-------------+---------------------------+
| 0x8f+1 | 6 bits | 0b001001 | fixed subframe, 1st order | | 0x8f+1 | 6 bits | 0b001001 | fixed subframe, 1st order |
+--------+---------+-------------+---------------------------+ +--------+---------+-------------+---------------------------+
| 0x8f+7 | 1 bit | 0b0 | no wasted bits used | | 0x8f+7 | 1 bit | 0b0 | no wasted bits used |
+--------+---------+-------------+---------------------------+ +--------+---------+-------------+---------------------------+
| 0x90+0 | 17 bits | 0x0867, 0b0 | unencoded warm-up sample | | 0x90+0 | 17 bits | 0x0867, 0b0 | unencoded warm-up sample |
+--------+---------+-------------+---------------------------+ +--------+---------+-------------+---------------------------+
Table 37 Table 37
The coded residual is broken down in the following table. All The coded residual is broken down in the following table. All
quotients are unary coded, all remainders are stored unencoded with a quotients are unary coded, and all remainders are stored unencoded
number of bits specified by the Rice parameter. with a number of bits specified by the Rice parameter.
+========+========+=================+=================+ +========+========+=================+=================+
| Start | Length | Contents | Description | | Start | Length | Contents | Description |
+========+========+=================+=================+ +========+========+=================+=================+
| 0x92+1 | 2 bits | 0b00 | Rice code with | | 0x92+1 | 2 bits | 0b00 | Rice code with |
| | | | 4-bit parameter | | | | | 4-bit parameter |
+--------+--------+-----------------+-----------------+ +--------+--------+-----------------+-----------------+
| 0x92+3 | 4 bits | 0b0000 | Partition order | | 0x92+3 | 4 bits | 0b0000 | Partition order |
| | | | 0 | | | | | 0 |
+--------+--------+-----------------+-----------------+ +--------+--------+-----------------+-----------------+
skipping to change at page 84, line 20 skipping to change at line 3667
+--------+--------+-----------------+-----------------+ +--------+--------+-----------------+-----------------+
| 0xaa+5 | 11 | 0b00100001100 | Remainder 268 | | 0xaa+5 | 11 | 0b00100001100 | Remainder 268 |
| | bits | | | | | bits | | |
+--------+--------+-----------------+-----------------+ +--------+--------+-----------------+-----------------+
Table 38 Table 38
At this point, the decoder should know it is done decoding the coded At this point, the decoder should know it is done decoding the coded
residual, as it received 16 samples: 1 warm-up sample and 15 residual residual, as it received 16 samples: 1 warm-up sample and 15 residual
samples. Each residual sample can be calculated from the quotient samples. Each residual sample can be calculated from the quotient
and remainder, and undoing the zig-zag encoding. For example, the and remainder and from undoing the zigzag encoding. For example, the
value of the first zig-zag encoded residual sample is 3 * 2^11 + 244 value of the first zigzag-encoded residual sample is 3 * 2^11 + 244 =
= 6388. As this is an even number, the zig-zag encoding is undone by 6388. As this is an even number, the zigzag encoding is undone by
dividing by 2, the residual sample value is 3194. This is done for dividing by 2; the residual sample value is 3194. This is done for
all residual samples in the next table. all residual samples in the next table.
+==========+===========+=================+=======================+ +==========+===========+================+=======================+
| Quotient | Remainder | Zig-zag encoded | Residual sample value | | Quotient | Remainder | Zigzag Encoded | Residual Sample Value |
+==========+===========+=================+=======================+ +==========+===========+================+=======================+
| 3 | 244 | 6388 | 3194 | | 3 | 244 | 6388 | 3194 |
+----------+-----------+-----------------+-----------------------+ +----------+-----------+----------------+-----------------------+
| 1 | 545 | 2593 | -1297 | | 1 | 545 | 2593 | -1297 |
+----------+-----------+-----------------+-----------------------+ +----------+-----------+----------------+-----------------------+
| 1 | 408 | 2456 | 1228 | | 1 | 408 | 2456 | 1228 |
+----------+-----------+-----------------+-----------------------+ +----------+-----------+----------------+-----------------------+
| 0 | 1885 | 1885 | -943 | | 0 | 1885 | 1885 | -943 |
+----------+-----------+-----------------+-----------------------+ +----------+-----------+----------------+-----------------------+
| 0 | 1904 | 1904 | 952 | | 0 | 1904 | 1904 | 952 |
+----------+-----------+-----------------+-----------------------+ +----------+-----------+----------------+-----------------------+
| 0 | 1391 | 1391 | -696 | | 0 | 1391 | 1391 | -696 |
+----------+-----------+-----------------+-----------------------+ +----------+-----------+----------------+-----------------------+
| 0 | 1536 | 1536 | 768 | | 0 | 1536 | 1536 | 768 |
+----------+-----------+-----------------+-----------------------+ +----------+-----------+----------------+-----------------------+
| 0 | 1047 | 1047 | -524 | | 0 | 1047 | 1047 | -524 |
+----------+-----------+-----------------+-----------------------+ +----------+-----------+----------------+-----------------------+
| 0 | 1198 | 1198 | 599 | | 0 | 1198 | 1198 | 599 |
+----------+-----------+-----------------+-----------------------+ +----------+-----------+----------------+-----------------------+
| 0 | 801 | 801 | -401 | | 0 | 801 | 801 | -401 |
+----------+-----------+-----------------+-----------------------+ +----------+-----------+----------------+-----------------------+
| 12 | 1767 | 26343 | -13172 | | 12 | 1767 | 26343 | -13172 |
+----------+-----------+-----------------+-----------------------+ +----------+-----------+----------------+-----------------------+
| 0 | 631 | 631 | -316 | | 0 | 631 | 631 | -316 |
+----------+-----------+-----------------+-----------------------+ +----------+-----------+----------------+-----------------------+
| 0 | 548 | 548 | 274 | | 0 | 548 | 548 | 274 |
+----------+-----------+-----------------+-----------------------+ +----------+-----------+----------------+-----------------------+
| 0 | 533 | 533 | -267 | | 0 | 533 | 533 | -267 |
+----------+-----------+-----------------+-----------------------+ +----------+-----------+----------------+-----------------------+
| 0 | 268 | 268 | 134 | | 0 | 268 | 268 | 134 |
+----------+-----------+-----------------+-----------------------+ +----------+-----------+----------------+-----------------------+
Table 39 Table 39
It can be calculated that using a Rice code is, in this case, more It can be calculated that using a Rice code is, in this case, more
efficient than storing values unencoded. The Rice code (excluding efficient than storing values unencoded. The Rice code (excluding
the partition order and parameter) is 199 bits in length. The the partition order and parameter) is 199 bits in length. The
largest residual value (-13172) would need 15 bits to be stored largest residual value (-13172) would need 15 bits to be stored
unencoded, so storing all 15 samples with 15 bits results in a unencoded, so storing all 15 samples with 15 bits results in a
sequence with a length of 225 bits. sequence with a length of 225 bits.
The next step is using the predictor and the residuals to restore the The next step is using the predictor and the residuals to restore the
sample values. As this subframe uses a fixed predictor with order 1, sample values. As this subframe uses a fixed predictor with order 1,
this means adding the residual value to the value of the previous the residual value is added to the value of the previous sample.
+===========+==============+ +===========+==============+
| Residual | Sample value | | Residual | Sample Value |
+===========+==============+ +===========+==============+
| (warm-up) | 4302 | | (warm-up) | 4302 |
+-----------+--------------+ +-----------+--------------+
| 3194 | 7496 | | 3194 | 7496 |
+-----------+--------------+ +-----------+--------------+
| -1297 | 6199 | | -1297 | 6199 |
+-----------+--------------+ +-----------+--------------+
| 1228 | 7427 | | 1228 | 7427 |
+-----------+--------------+ +-----------+--------------+
| -943 | 6484 | | -943 | 6484 |
skipping to change at page 86, line 45 skipping to change at line 3760
+-----------+--------------+ +-----------+--------------+
| -267 | -6299 | | -267 | -6299 |
+-----------+--------------+ +-----------+--------------+
| 134 | -6165 | | 134 | -6165 |
+-----------+--------------+ +-----------+--------------+
Table 40 Table 40
With this, the decoding of the first subframe is complete. The With this, the decoding of the first subframe is complete. The
decoding of the second subframe is very similar, as it also uses a decoding of the second subframe is very similar, as it also uses a
fixed predictor of order 1, so this is left as an exercise for the fixed predictor of order 1. This is left as an exercise for the
reader, the results are in the next table. The next step is undoing reader; the results are in the next table. The next step is undoing
stereo decorrelation, which is done in the following table. As the stereo decorrelation, which is done in the following table. As the
stereo decorrelation is side-right, the samples in the right channel stereo decorrelation is side-right, the samples in the right channel
come directly from the second subframe, while the samples in the left come directly from the second subframe, while the samples in the left
channel are found by adding the values of both subframes for each channel are found by adding the values of both subframes for each
sample. sample.
+============+============+========+=======+ +============+============+========+=======+
| Subframe 1 | Subframe 2 | Left | Right | | Subframe 1 | Subframe 2 | Left | Right |
+============+============+========+=======+ +============+============+========+=======+
| 4302 | 6070 | 10372 | 6070 | | 4302 | 6070 | 10372 | 6070 |
skipping to change at page 87, line 46 skipping to change at line 3809
| -6299 | -8896 | -15195 | -8896 | | -6299 | -8896 | -15195 | -8896 |
+------------+------------+--------+-------+ +------------+------------+--------+-------+
| -6165 | -8653 | -14818 | -8653 | | -6165 | -8653 | -14818 | -8653 |
+------------+------------+--------+-------+ +------------+------------+--------+-------+
Table 41 Table 41
As the second subframe ends byte-aligned, no padding bits follow it. As the second subframe ends byte-aligned, no padding bits follow it.
Finally, the last 2 bytes of the frame contain the frame CRC. Finally, the last 2 bytes of the frame contain the frame CRC.
D.2.8. Second audio frame D.2.8. Second Audio Frame
The second audio frame is very similar to the frame decoded in the The second audio frame is very similar to the frame decoded in the
first example, but this time not 1 but 3 samples are present. first example, but this time, 3 samples (not 1) are present.
The frame header starts at position 0xcc and is broken down in the The frame header starts at position 0xcc and is broken down in the
following table. following table.
+========+=========+=================+===================+ +========+=========+=================+===================+
| Start | Length | Contents | Description | | Start | Length | Contents | Description |
+========+=========+=================+===================+ +========+=========+=================+===================+
| 0xcc+0 | 15 bits | 0xff, 0b1111100 | frame sync | | 0xcc+0 | 15 bits | 0xff, 0b1111100 | frame sync |
+--------+---------+-----------------+-------------------+ +--------+---------+-----------------+-------------------+
| 0xcd+7 | 1 bit | 0b0 | blocking strategy | | 0xcd+7 | 1 bit | 0b0 | blocking strategy |
+--------+---------+-----------------+-------------------+ +--------+---------+-----------------+-------------------+
| 0xce+0 | 4 bits | 0b0110 | 8-bit block size | | 0xce+0 | 4 bits | 0b0110 | 8-bit block size |
| | | | further down | | | | | further down |
+--------+---------+-----------------+-------------------+ +--------+---------+-----------------+-------------------+
| 0xce+4 | 4 bits | 0b1001 | sample rate 44.1 | | 0xce+4 | 4 bits | 0b1001 | sample rate 44.1 |
| | | | kHz | | | | | kHz |
+--------+---------+-----------------+-------------------+ +--------+---------+-----------------+-------------------+
| 0xcf+0 | 4 bits | 0b0001 | stereo, no | | 0xcf+0 | 4 bits | 0b0001 | stereo, no |
| | | | decorrelation | | | | | decorrelation |
+--------+---------+-----------------+-------------------+ +--------+---------+-----------------+-------------------+
| 0xcf+4 | 3 bits | 0b100 | bit depth 16 bit | | 0xcf+4 | 3 bits | 0b100 | bit depth 16 bits |
+--------+---------+-----------------+-------------------+ +--------+---------+-----------------+-------------------+
| 0xcf+7 | 1 bit | 0b0 | mandatory 0 bit | | 0xcf+7 | 1 bit | 0b0 | mandatory 0 bits |
+--------+---------+-----------------+-------------------+ +--------+---------+-----------------+-------------------+
| 0xd0+0 | 1 byte | 0x01 | frame number 1 | | 0xd0+0 | 1 byte | 0x01 | frame number 1 |
+--------+---------+-----------------+-------------------+ +--------+---------+-----------------+-------------------+
| 0xd1+0 | 1 byte | 0x02 | block size 3 | | 0xd1+0 | 1 byte | 0x02 | block size 3 |
+--------+---------+-----------------+-------------------+ +--------+---------+-----------------+-------------------+
| 0xd2+0 | 1 byte | 0xa4 | frame header CRC | | 0xd2+0 | 1 byte | 0xa4 | frame header CRC |
+--------+---------+-----------------+-------------------+ +--------+---------+-----------------+-------------------+
Table 42 Table 42
The first subframe starts at 0xd3+0 and is broken down in the The first subframe starts at 0xd3+0 and is broken down in the
following table. following table.
+========+=========+==========+=========================+ +========+=========+==========+=========================+
| Start | Length | Contents | Description | | Start | Length | Contents | Description |
+========+=========+==========+=========================+ +========+=========+==========+=========================+
| 0xd3+0 | 1 bit | 0b0 | mandatory 0 bit | | 0xd3+0 | 1 bit | 0b0 | mandatory 0 bits |
+--------+---------+----------+-------------------------+ +--------+---------+----------+-------------------------+
| 0xd3+1 | 6 bits | 0b000001 | verbatim subframe | | 0xd3+1 | 6 bits | 0b000001 | verbatim subframe |
+--------+---------+----------+-------------------------+ +--------+---------+----------+-------------------------+
| 0xd3+7 | 1 bit | 0b0 | no wasted bits used | | 0xd3+7 | 1 bit | 0b0 | no wasted bits used |
+--------+---------+----------+-------------------------+ +--------+---------+----------+-------------------------+
| 0xd4+0 | 16 bits | 0xc382 | 16-bit unencoded sample | | 0xd4+0 | 16 bits | 0xc382 | 16-bit unencoded sample |
+--------+---------+----------+-------------------------+ +--------+---------+----------+-------------------------+
| 0xd6+0 | 16 bits | 0xc40b | 16-bit unencoded sample | | 0xd6+0 | 16 bits | 0xc40b | 16-bit unencoded sample |
+--------+---------+----------+-------------------------+ +--------+---------+----------+-------------------------+
| 0xd8+0 | 16 bits | 0xc14a | 16-bit unencoded sample | | 0xd8+0 | 16 bits | 0xc14a | 16-bit unencoded sample |
+--------+---------+----------+-------------------------+ +--------+---------+----------+-------------------------+
Table 43 Table 43
The second subframe starts at 0xda+0 and is broken down in the The second subframe starts at 0xda+0 and is broken down in the
following table. following table.
+========+=========+===================+=========================+ +========+=========+===================+=========================+
| Start | Length | Contents | Description | | Start | Length | Contents | Description |
+========+=========+===================+=========================+ +========+=========+===================+=========================+
| 0xda+0 | 1 bit | 0b0 | mandatory 0 bit | | 0xda+0 | 1 bit | 0b0 | mandatory 0 bits |
+--------+---------+-------------------+-------------------------+ +--------+---------+-------------------+-------------------------+
| 0xda+1 | 6 bits | 0b000001 | verbatim subframe | | 0xda+1 | 6 bits | 0b000001 | verbatim subframe |
+--------+---------+-------------------+-------------------------+ +--------+---------+-------------------+-------------------------+
| 0xda+7 | 1 bit | 0b1 | wasted bits used | | 0xda+7 | 1 bit | 0b1 | wasted bits used |
+--------+---------+-------------------+-------------------------+ +--------+---------+-------------------+-------------------------+
| 0xdb+0 | 1 bit | 0b1 | 1 wasted bit used | | 0xdb+0 | 1 bit | 0b1 | 1 wasted bit used |
+--------+---------+-------------------+-------------------------+ +--------+---------+-------------------+-------------------------+
| 0xdb+1 | 15 bits | 0b110111001001000 | 15-bit unencoded sample | | 0xdb+1 | 15 bits | 0b110111001001000 | 15-bit unencoded sample |
+--------+---------+-------------------+-------------------------+ +--------+---------+-------------------+-------------------------+
| 0xdd+0 | 15 bits | 0b110111010000001 | 15-bit unencoded sample | | 0xdd+0 | 15 bits | 0b110111010000001 | 15-bit unencoded sample |
skipping to change at page 89, line 51 skipping to change at line 3895
| 0xde+7 | 15 bits | 0b110110110011111 | 15-bit unencoded sample | | 0xde+7 | 15 bits | 0b110110110011111 | 15-bit unencoded sample |
+--------+---------+-------------------+-------------------------+ +--------+---------+-------------------+-------------------------+
Table 44 Table 44
As this subframe uses wasted bits, the 15-bit unencoded samples need As this subframe uses wasted bits, the 15-bit unencoded samples need
to be shifted left by 1 bit. For example, sample 1 is stored as to be shifted left by 1 bit. For example, sample 1 is stored as
-4536 and becomes -9072 after shifting left 1 bit. -4536 and becomes -9072 after shifting left 1 bit.
As the last subframe does not end on byte alignment, 2 padding bits As the last subframe does not end on byte alignment, 2 padding bits
are added before the 2 byte frame CRC follows at 0xe1+0. are added before the 2-byte frame CRC follows at 0xe1+0.
D.2.9. MD5 checksum verification D.2.9. MD5 Checksum Verification
All samples in the file have been decoded, we can now verify the MD5 All samples in the file have been decoded, and we can now verify the
checksum. All sample values must be interleaved and stored signed, MD5 checksum. All sample values must be interleaved and stored
coded little-endian. The result of this follows in groups of 12 signed, coded little-endian. The result of this follows in groups of
samples (i.e., 6 interchannel samples) per line. 12 samples (i.e., 6 interchannel samples) per line.
0x8428 B617 7946 3129 5E3A 2722 D445 D128 0B3D B723 EB45 DF28 0x8428 B617 7946 3129 5E3A 2722 D445 D128 0B3D B723 EB45 DF28
0x723f 1E25 9D46 4929 B841 7026 5747 B829 8F43 8127 AEC7 14DF 0x723f 1E25 9D46 4929 B841 7026 5747 B829 8F43 8127 AEC7 14DF
0x9FC4 41DD 54C7 E4DE A5C4 40DD 1EC6 33DE 82C3 90DC 0BC4 02DD 0x9FC4 41DD 54C7 E4DE A5C4 40DD 1EC6 33DE 82C3 90DC 0BC4 02DD
0x4AC1 3EDB 0x4AC1 3EDB
The MD5 checksum of this is indeed the same as the one found in the The MD5 checksum of this is indeed the same as the one found in the
streaminfo metadata block. streaminfo metadata block.
D.3. Decoding example 3 D.3. Decoding Example 3
This example is once again a very short FLAC file. The focus of this This example is once again a very short FLAC file. The focus of this
example is on decoding a subframe with a linear predictor and a coded example is on decoding a subframe with a linear predictor and a coded
residual with more than one partition. residual with more than one partition.
D.3.1. Example file 3 in hexadecimal representation D.3.1. Example File 3 in Hexadecimal Representation
00000000: 664c 6143 8000 0022 1000 1000 fLaC...".... 00000000: 664c 6143 8000 0022 1000 1000 fLaC..."....
0000000c: 0000 1f00 001f 07d0 0070 0000 .........p.. 0000000c: 0000 1f00 001f 07d0 0070 0000 .........p..
00000018: 0018 f8f9 e396 f5cb cfc6 dc80 ............ 00000018: 0018 f8f9 e396 f5cb cfc6 dc80 ............
00000024: 7f99 7790 6b32 fff8 6802 0017 ..w.k2..h... 00000024: 7f99 7790 6b32 fff8 6802 0017 ..w.k2..h...
00000030: e944 004f 6f31 3d10 47d2 27cb .D.Oo1=.G.'. 00000030: e944 004f 6f31 3d10 47d2 27cb .D.Oo1=.G.'.
0000003c: 6d09 0831 452b dc28 2222 8057 m..1E+.("".W 0000003c: 6d09 0831 452b dc28 2222 8057 m..1E+.("".W
00000048: a3 . 00000048: a3 .
D.3.2. Example file 3 in binary representation (only audio frame) D.3.2. Example File 3 in Binary Representation (Only Audio Frame)
0000002a: 11111111 11111000 01101000 00000010 ..h. 0000002a: 11111111 11111000 01101000 00000010 ..h.
0000002e: 00000000 00010111 11101001 01000100 ...D 0000002e: 00000000 00010111 11101001 01000100 ...D
00000032: 00000000 01001111 01101111 00110001 .Oo1 00000032: 00000000 01001111 01101111 00110001 .Oo1
00000036: 00111101 00010000 01000111 11010010 =.G. 00000036: 00111101 00010000 01000111 11010010 =.G.
0000003a: 00100111 11001011 01101101 00001001 '.m. 0000003a: 00100111 11001011 01101101 00001001 '.m.
0000003e: 00001000 00110001 01000101 00101011 .1E+ 0000003e: 00001000 00110001 01000101 00101011 .1E+
00000042: 11011100 00101000 00100010 00100010 .("" 00000042: 11011100 00101000 00100010 00100010 .(""
00000046: 10000000 01010111 10100011 .W. 00000046: 10000000 01010111 10100011 .W.
D.3.3. Streaminfo metadata block D.3.3. Streaminfo Metadata Block
Most of the streaminfo metadata block, including its header, is the Most of the streaminfo metadata block, including its header, is the
same as in example 1, so only parts that are different are listed in same as in example 1, so only parts that are different are listed in
the following table. the following table.
+========+==========+====================+=========================+ +========+==========+====================+==========================+
| Start | Length | Contents | Description | | Start | Length | Contents | Description |
+========+==========+====================+=========================+ +========+==========+====================+==========================+
| 0x0c+0 | 3 bytes | 0x00001f | Min. frame size 31 byte | | 0x0c+0 | 3 bytes | 0x00001f | Min. frame size 31 bytes |
+--------+----------+--------------------+-------------------------+ +--------+----------+--------------------+--------------------------+
| 0x0f+0 | 3 bytes | 0x00001f | Max. frame size 31 byte | | 0x0f+0 | 3 bytes | 0x00001f | Max. frame size 31 bytes |
+--------+----------+--------------------+-------------------------+ +--------+----------+--------------------+--------------------------+
| 0x12+0 | 20 bits | 0x07d0, 0x0000 | Sample rate 32000 hertz | | 0x12+0 | 20 bits | 0x07d0, 0x0000 | Sample rate 32000 hertz |
+--------+----------+--------------------+-------------------------+ +--------+----------+--------------------+--------------------------+
| 0x14+4 | 3 bits | 0b000 | 1 channel | | 0x14+4 | 3 bits | 0b000 | 1 channel |
+--------+----------+--------------------+-------------------------+ +--------+----------+--------------------+--------------------------+
| 0x14+7 | 5 bits | 0b00111 | Sample bit depth 8 bit | | 0x14+7 | 5 bits | 0b00111 | Sample bit depth 8 bits |
+--------+----------+--------------------+-------------------------+ +--------+----------+--------------------+--------------------------+
| 0x15+4 | 36 bits | 0b0000, 0x00000018 | Total no. of samples 24 | | 0x15+4 | 36 bits | 0b0000, 0x00000018 | Total no. of samples 24 |
+--------+----------+--------------------+-------------------------+ +--------+----------+--------------------+--------------------------+
| 0x1a | 16 bytes | (...) | MD5 checksum | | 0x1a | 16 | (...) | MD5 checksum |
+--------+----------+--------------------+-------------------------+ | | bytes | | |
Table 45 Table 45
D.3.4. Audio frame D.3.4. Audio Frame
The frame header starts at position 0x2a and is broken down in the The frame header starts at position 0x2a and is broken down in the
following table. following table.
+========+=========+=================+===================+ +========+=========+=================+===================+
| Start | Length | Contents | Description | | Start | Length | Contents | Description |
+========+=========+=================+===================+ +========+=========+=================+===================+
| 0x2a+0 | 15 bits | 0xff, 0b1111100 | Frame sync | | 0x2a+0 | 15 bits | 0xff, 0b1111100 | Frame sync |
+--------+---------+-----------------+-------------------+ +--------+---------+-----------------+-------------------+
| 0x2b+7 | 1 bit | 0b0 | blocking strategy | | 0x2b+7 | 1 bit | 0b0 | blocking strategy |
+--------+---------+-----------------+-------------------+ +--------+---------+-----------------+-------------------+
| 0x2c+0 | 4 bits | 0b0110 | 8-bit block size | | 0x2c+0 | 4 bits | 0b0110 | 8-bit block size |
| | | | further down | | | | | further down |
+--------+---------+-----------------+-------------------+ +--------+---------+-----------------+-------------------+
| 0x2c+4 | 4 bits | 0b1000 | Sample rate 32 | | 0x2c+4 | 4 bits | 0b1000 | Sample rate 32 |
| | | | kHz | | | | | kHz |
+--------+---------+-----------------+-------------------+ +--------+---------+-----------------+-------------------+
| 0x2d+0 | 4 bits | 0b0000 | Mono audio (1 | | 0x2d+0 | 4 bits | 0b0000 | Mono audio (1 |
| | | | channel) | | | | | channel) |
+--------+---------+-----------------+-------------------+ +--------+---------+-----------------+-------------------+
| 0x2d+4 | 3 bits | 0b001 | Bit depth 8 bit | | 0x2d+4 | 3 bits | 0b001 | Bit depth 8 bits |
+--------+---------+-----------------+-------------------+ +--------+---------+-----------------+-------------------+
| 0x2d+7 | 1 bit | 0b0 | Mandatory 0 bit | | 0x2d+7 | 1 bit | 0b0 | Mandatory 0 bits |
+--------+---------+-----------------+-------------------+ +--------+---------+-----------------+-------------------+
| 0x2e+0 | 1 byte | 0x00 | Frame number 0 | | 0x2e+0 | 1 byte | 0x00 | Frame number 0 |
+--------+---------+-----------------+-------------------+ +--------+---------+-----------------+-------------------+
| 0x2f+0 | 1 byte | 0x17 | Block size 24 | | 0x2f+0 | 1 byte | 0x17 | Block size 24 |
+--------+---------+-----------------+-------------------+ +--------+---------+-----------------+-------------------+
| 0x30+0 | 1 byte | 0xe9 | Frame header CRC | | 0x30+0 | 1 byte | 0xe9 | Frame header CRC |
+--------+---------+-----------------+-------------------+ +--------+---------+-----------------+-------------------+
Table 46 Table 46
The first and only subframe starts at byte 0x31, it is broken down in The first and only subframe starts at byte 0x31. It is broken down
the following table, without the coded residual. in the following table, without the coded residual.
+========+========+==========+=====================+ +========+========+==========+=====================+
| Start | Length | Contents | Description | | Start | Length | Contents | Description |
+========+========+==========+=====================+ +========+========+==========+=====================+
| 0x31+0 | 1 bit | 0b0 | Mandatory 0 bit | | 0x31+0 | 1 bit | 0b0 | Mandatory 0 bits |
+--------+--------+----------+---------------------+ +--------+--------+----------+---------------------+
| 0x31+1 | 6 bits | 0b100010 | Linear prediction | | 0x31+1 | 6 bits | 0b100010 | Linear prediction |
| | | | subframe, 3rd order | | | | | subframe, 3rd order |
+--------+--------+----------+---------------------+ +--------+--------+----------+---------------------+
| 0x31+7 | 1 bit | 0b0 | No wasted bits used | | 0x31+7 | 1 bit | 0b0 | No wasted bits used |
+--------+--------+----------+---------------------+ +--------+--------+----------+---------------------+
| 0x32+0 | 8 bits | 0x00 | Unencoded warm-up | | 0x32+0 | 8 bits | 0x00 | Unencoded warm-up |
| | | | sample 0 | | | | | sample 0 |
+--------+--------+----------+---------------------+ +--------+--------+----------+---------------------+
| 0x33+0 | 8 bits | 0x4f | Unencoded warm-up | | 0x33+0 | 8 bits | 0x4f | Unencoded warm-up |
skipping to change at page 94, line 50 skipping to change at line 4097
| | bits | | | | | bits | | |
+--------+--------+----------+--------------------------------------+ +--------+--------+----------+--------------------------------------+
| 0x42+7 | 4 bits | 0b0001 | Rice parameter 1 | | 0x42+7 | 4 bits | 0b0001 | Rice parameter 1 |
+--------+--------+----------+--------------------------------------+ +--------+--------+----------+--------------------------------------+
| 0x43+3 | 23 | (...) | Residual partition 4 | | 0x43+3 | 23 | (...) | Residual partition 4 |
| | bits | | | | | bits | | |
+--------+--------+----------+--------------------------------------+ +--------+--------+----------+--------------------------------------+
Table 48 Table 48
The frame ends with 6 padding bits and a 2 byte frame CRC The frame ends with 6 padding bits and a 2-byte frame CRC.
To decode this subframe, 21 predictions have to be calculated and To decode this subframe, 21 predictions have to be calculated and
added to their corresponding residuals. This is a sequential added to their corresponding residuals. This is a sequential
process: as each prediction uses previous samples, it is not possible process: as each prediction uses previous samples, it is not possible
to start this decoding halfway a subframe or decode a subframe with to start this decoding halfway through a subframe or decode a
parallel threads. subframe with parallel threads.
The following table breaks down the calculation for each sample. For The following table breaks down the calculation for each sample. For
example, the predictor without shift value of row 4 is found by example, the predictor without shift value of row 4 is found by
applying the predictor with the three warm-up samples: 7*111 - 6*79 + applying the predictor with the three warm-up samples: 7*111 - 6*79 +
2*0 = 303. This value is then shifted right by 2 bits: 303 >> 2 = 2*0 = 303. This value is then shifted right by 2 bits: 303 >> 2 =
75. Then, the decoded residual sample is added: 75 + 3 = 78. 75. Then, the decoded residual sample is added: 75 + 3 = 78.
+===========+=====================+===========+==============+ +===========+=====================+===========+==============+
| Residual | Predictor w/o shift | Predictor | Sample value | | Residual | Predictor w/o Shift | Predictor | Sample Value |
+===========+=====================+===========+==============+ +===========+=====================+===========+==============+
| (warm-up) | N/A | N/A | 0 | | (warm-up) | N/A | N/A | 0 |
+-----------+---------------------+-----------+--------------+ +-----------+---------------------+-----------+--------------+
| (warm-up) | N/A | N/A | 79 | | (warm-up) | N/A | N/A | 79 |
+-----------+---------------------+-----------+--------------+ +-----------+---------------------+-----------+--------------+
| (warm-up) | N/A | N/A | 111 | | (warm-up) | N/A | N/A | 111 |
+-----------+---------------------+-----------+--------------+ +-----------+---------------------+-----------+--------------+
| 3 | 303 | 75 | 78 | | 3 | 303 | 75 | 78 |
+-----------+---------------------+-----------+--------------+ +-----------+---------------------+-----------+--------------+
| -1 | 38 | 9 | 8 | | -1 | 38 | 9 | 8 |
skipping to change at page 96, line 22 skipping to change at line 4165
+-----------+---------------------+-----------+--------------+ +-----------+---------------------+-----------+--------------+
| 2 | -24 | -6 | -4 | | 2 | -24 | -6 | -4 |
+-----------+---------------------+-----------+--------------+ +-----------+---------------------+-----------+--------------+
| 2 | -26 | -7 | -5 | | 2 | -26 | -7 | -5 |
+-----------+---------------------+-----------+--------------+ +-----------+---------------------+-----------+--------------+
| 0 | 1 | 0 | 0 | | 0 | 1 | 0 | 0 |
+-----------+---------------------+-----------+--------------+ +-----------+---------------------+-----------+--------------+
Table 49 Table 49
By lining all these samples up, we get the following input for the By lining up all these samples, we get the following input for the
MD5 checksum calculation process. MD5 checksum calculation process:
0x004F 6F4E 08C3 A6BC F32A 4335 0DE5 D2DA F40E 1813 06FC FB00 0x004F 6F4E 08C3 A6BC F32A 4335 0DE5 D2DA F40E 1813 06FC FB00
Which indeed results in the MD5 checksum found in the streaminfo This indeed results in the MD5 checksum found in the streaminfo
metadata block. metadata block.
FLAC owes much to the many people who have advanced the audio
compression field so freely. For instance:
* A. J. Robinson: He worked on Shorten, and his paper (see
[Robinson-TR156]) is a good starting point on some of the basic
methods used by FLAC. FLAC trivially extends and improves the
fixed predictors, LPC coefficient quantization, and Rice coding
used in Shorten.
* S. W. Golomb and Robert F. Rice: Their universal codes are used by
FLAC's entropy coder. See [Rice].
* N. Levinson and J. Durbin: The FLAC reference encoder uses an
algorithm developed and refined by them for determining the LPC
coefficients from the autocorrelation coefficients. See
* Claude Shannon: See [Shannon].
The FLAC format, the FLAC reference implementation, and the initial
draft version of this document were originally developed by Josh
Coalson. While many others have contributed since, this original
effort is deeply appreciated.
Authors' Addresses Authors' Addresses
Martijn van Beurden Martijn van Beurden
Netherlands Netherlands
Email: mvanb1@gmail.com Email: mvanb1@gmail.com
Andrew Weaver Andrew Weaver
Email: theandrewjw@gmail.com Email: theandrewjw@gmail.com
 End of changes. 471 change blocks. 
1257 lines changed or deleted 1257 lines changed or added

