SOUND FILE FORMATS
==================

Covered in this document are:
* RIFF (.WAV file format)
* The .VOC file format (Creative Inc.)
* The .MOD file format
* Logarithmic Delta Compression
* Notes on sampling frequencies



RIFF (.WAV file format)
-----------------------
* Covered in Microsoft Windows Multimedia SDK, Microsoft Windows 3.1 SDK, IBM
  OS/2 redbooks and Microsoft Developer Network CD (search for ADPCM). The RIFF
  subset that describes sampled audio is compatible between Microsoft 3.1 and
  OS/2 2.x.
* Many sampling formats fit in the standard (i.e. the standard allows these
  formats), but only a few are "encouraged". The only format that is supported
  by all drivers and programs is PCM. 8-bit PCM is unsigned (0..255, 128 is
  silence), 16-bit PCM is signed (-32767..32767, 0 is silence). The reason for
  this discrepancy is probably the way hardware devices were implemented.
* In practice, many applications only support a fixed subset of the flexible
  RIFF format. Because only few applications parse the flexible "chunked"
  header (similar to that of TIFF), and assume a fixed header instead, it is
  best to adhere to that standard. This also means that extensions (such as
  compression will not be handled well by most existing software.
* Although any sample rate can be specified in the header of the RIFF file,
  only 11025 Hz, 22050 Hz and 44100 Hz are supported by many drivers. See also
  the notes on sampling frequencies below.
* When seen as a fixed header, the header of the .WAV file format is:

  struct WAVEFMT {
    char signature[4];  // must contain 'RIFF'
    long RIFFsize;      // size of file (in bytes) minus 8
    char type[4];       // must contain 'WAVE'
    char fmtchunk[4];   // must contain 'fmt ' (including blank)
    long fmtsize;       // size of format chunk, must be 16
    int  format;        // normally 1 (PCM)
    int  channels;      // number of channels, 1=mono, 2=stereo
    long samplerate;    // sampling frequency: 11025, 22050 or 44100
    long average_bps;   // average bytes per second; samplerate * channels
    int  align;         // 1=byte aligned, 2=word aligned
    int  bitspersample; // should be 8 or 16
    char datchunk[4];   // must contain 'data'
    long samples;       // number of samples
  };



The .VOC file format (Creative Inc.)
------------------------------------
Creative .VOC files start with a header with the following format:

offset  size    description
---------------------------------------------------------------------
00h     14h     Contains the string "Creative Voice File" plus an EOF byte.
14h     2       The file offset to the sample data. This value usually is
                001Ah.
16h     2       Version number. The major version is in the high byte, the
                minor version in the low byte.
18h     2       Validity check. This word contains the complement (NOT
                operation) value of offset 16h added to 1234h.
1Ah     ...     Start of the sample data.

All 16-bit and 24-bit values are stored in Little Endian (Intel format).

Audio data is split in blocks (often there is only one data block in the file).
Blocks start with a four byte header. The first byte of this block header is
the "block type". The other three bytes give the length of the block excluding
the header. The terminator block is an exception, only the "block type" byte
is stored (the "data length" bytes are absent).

type  description
---------------------------------------------------------------------
0     Terminator
      No extra data. According to the specification, this bloc should be
      present in the "in memory" representation of the voice file, but it
      is absent in most .VOC files.

1     Voice data
      The first two bytes of extra data give the playback speed and the
      compression mode. The rest of the extra data are the encoded samples.
        byte: playback speed = 256 - 1000000/sample_frequency
        byte: compression
                0 = none (8 bits unsigned PCM samples)
                1 = 4-bits packed (two 4-bit samples per byte)
                2 = 2.6 bits packed (two 3-bit samples and one 2-bit sample)
                3 = 2 bits packed (four 2-bit samples per byte)

2     Voice cont.
      This block contains only samples, no data on playback speed or
      compression. Therefore, a "Voice data" block must have preceded this
      block.

3     Silence
      Three bytes extra data:
        word: silence period in samples minus one (i.e. for the exact period
              you must add 1 to the value found here.
        byte: playback speed, see block #1 ("Voice Data").

4     Mark
      Two bytes extra data:
        word: mark value.
      This block updates an internal status variable in the driver. This
      variable can be queried by the application software, for example to
      synchronize the sound with animation.

5     Text
      The extra data of this block stores a zero-terminated string. This
      block is only for additional information of the application software,
      it is ignored by the driver.

6     Repeat start
      Two bytes of extra data:
        word: the repeat count. The voice data is played once more than
              indicated in this count.
      All voice data block between this block and a block #7 ("Repeat End")
      is played count+1 times. Repeats cannot be nested.

7     Repeat end
      No extra data.

8     Extra info
      Must be followed by block type 1 and supersedes the playback information
      in that block.
      Four bytes with extra data:
        2 bytes: playback speed = 65536 - 256000000/sample_frequency
        1 byte:  compression, see block type 1
        1 byte:  mode, 0 = mono, 1 = stereo
      In case of stereo samples, the compression must be 0 (none) and the
      frequency calculated from the first 2 bytes must be halved.

Through its block structure, .VOC files support "silence compression". Most
.VOC players expect only a single "voice data" block however. This is also the
way most .VOC files are: a header, optionally an "extra info" block for stereo
playback and a "voice data" block.

The more powerfull compression scheme used by .VOC files is an ADPCM variant,
called "packing". In fact, it is not ADPCM, because it stores the samples
instead of the differences between the samples. The data can be packed into
2-bit, 2.67-bit or 4-bit samples (2.67 bit, or 2.5 bit as it is sometimes
referred to, means that three samples are packed into one byte; two 3-bit codes
and one 2-bit code). Only the format of 4-bit packing is covered here.

For 4-bit packing, two samples are stored in one byte (so one sample per
nibble). Each nibble has one sign bit (bit 3) and three magnitude bits (0..2).
The value in the magnitude bits (in the range 0..7) is multiplied with a "step"
value in that ranges from 1 to 10. The result is a signed PCM value between -70
and 70.

Depending on the value of the magnitude bits, the step value is incremented or
decremented. It is incremented if the magnitude is 5 or above and it is
decremented if the magnitude is 0. Valid values for the step value are 1, 2.5,
5 and 10.



The .MOD file format
--------------------
Modules contain 4-channel music. The Commodore Amiga, for which the .MOD file
format was developed, has four sound channels width independent sample rates
(no mixing occurs). Two of these channels are played on the left speaker, two
on the right one.

A "song" contains information about what note to play at each moment. It does
not contain the actual sound data. A "module" is a song with the sampled sound
data concatenated. This sampled data is stored in "instruments". Each
instrument is previously recorded and stored as a ".SAM" file.

A song contains multiple "patterns". Patterns are played back in order. Each
pattern contains 64 "notes" for all four channels. The notes are played back
sequentially as well. Each note contains information about the sample to play
and extra information. A note takes 4 bytes, a pattern thus takes 1024 bytes.
At the normal tempo, about 8 notes are played per second.

There is no standard sample rate for the samples in the modules. Often, the
sound data is sampled at a rate called C-3. This rate is 16,574 Hz on an Amiga
PAL machine, or 16,727 Hz on an Amiga NTSC machine. The instrument must play
a C-3 note of 911 Hz to be in tune. The sample rate C-2 (8,287 Hz for PAL) is
apparently even more popular (the instrument must play a C-2, 456 Hz). For
percussion, the sample rate sometimes is A-3 (about 28 kHz). (The Amiga timers
are connected to the video circuit. The origins of the .MOD file format lies in
Europe, so much of the timing is based on PAL.) The different sampling rates
for PAL and NTSC versions of the Amiga can give tuning errors between samples.
Software based "finetuning" tries to solve this as much as possible.

The original Amiga SoundTracker song format from 1988 by Karsten Obars has
been expanded by several competitive products, especially Noisetracker by
Mahoney and Kaktus. Some of these have imcompatible file formats. This document
describes the file format of the original SoundTracker and of the ProTracker
extensions. Most .MOD files adhere to these standards.

offset  size    description
---------------------------------------------------------------------
00h     14h     Song name. Padded with zero bytes if less than 20 characters.
                If the song name is 20 characters, there is no terminating
                zero byte.

14h     16h     Name (instrument name) for sample 1. Padded with zero bytes.
2ah     2       Length for sample 1 stored as the number of 16-bit words.
                Multiply by 2 to get the length in bytes. All 16-bit values in
                the .MOD file format are encoded in big-endian (Motorola)
                format.
2ch     1       Finetune value. Trackers that support finetuning use a fixed
                relationship between notes (displayed on the screen) and pitch
                (note frequency). These trackers have a problem when combining
                instrument samples that are slightly out of tune with other
                instruments. Finetuning allows to use a different table for
                playback than the table that is used for displaying notes.
                value:    0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
                finetune: 0 +1 +2 +3 +4 +5 +6 +7 -8 -7 -6 -5 -4 -3 -2 -1
                The pitches of the instrument should be multiplied with
                2^(-finetune/12/8); 1/8 of a semitone, 12 semitones per octave.
2dh     1       Volume in the range 0..40h (or 0..64). Although volume should
                be logarithmic, dB=20*log10(volume/64), Most trackers and
                modules assume a linear scale between 64 (full volume) and 0
                (silence).
2eh     2       Repeat point for sample 1, stored as a word offset from the
                start of the sample. Multiply by 2 to get the offset in bytes.
30h     2       Repeat Length for sample 1, stored as the number of words in
                the loop. Multiply by 2 to get the length in bytes.

32h     1a4h    Information for the 14 or 30 other samples starts here. It is
                the same as the information for sample 1. The original
                SoundTracker had a maximum of 15 samples (instruments). The
                newer Noisetracker has a maximum of 31 samples.
                32h    1eh   Information for sample 2
                50h    1eh   Information for sample 3
                .
                .
                19ah    1eh  Information for sample 14
                1b8h    1eh  Information for sample 15
                .
                .
                37ah    1eh  Information for sample 30
                398h    1eh  Information for sample 31

The offsets below are valid for .MOD files with 15 samples. For "M.K."
files with 31 samples, see the next table.
---------------------------------------------------------------------
1d6h    1       Number of song positions in the song (the number of entries in
                the table at offset 3d8h), in the range 1..128.
1d7h    1       Free (for the original version of the SoundTracker), but
                usually set to 127 (for unknown reasons).
1d8h    80h     Song positions (0 to 127), each song position holds a value
                in the range 0..63 that indicates the pattern to play at that
                position.
258h    400h    First pattern (all other patterns follow). You must determine
                how many different patterns there are by looking up the maximum
                value in the "song positions" table at offset 1d8h. Since
                a pattern can be repeated, you cannot use the value at offset
                1d6h.

The offsets below are valid for .MOD files with 31 samples.
---------------------------------------------------------------------
3b6h    1       Number of song positions in the song (the number of entries in
                the table at offset 3b8h), in the range 1..128.
3b7h    1       Mostly set to 127 (but ignored by most trackers).
3b8h    80h     Song positions (0 to 127), each song position holds a value
                in the range 0..63 that indicates the pattern to play at that
                position.
438h    4       The four characters "M.K.". This signature is used to check
                wether the file is a 15-samples .MOD file or a 31-samples
                file. If the file is smaller than 43ch bytes or if it doesn't
                have the "M.K." signature on this position, the file must be
                assumed a 15-samples file.
                Even from newer files, this signature may have been altered,
                sometimes by newer trackers (with slightly different file
                formats) and sometimes to make "ripping" (extracting sound
                tracks from games) more difficult. Alternative signatures with
                the same file format are "M&K!" and "FLT4".
43ch    400h    First pattern (all other patterns follow). You must determine
                how many different patterns there are by looking up the maximum
                value in the "song positions" table at offset 3b8h. Since
                a pattern can be repeated, you cannot use the value at offset
                3b6h.

For module file, the samples are stored right after the pattern data. Use the
sample info structures (from offset 14h) to  calculate the sample start and
end addresses.

Each note is stored as 4 bytes, and all four notes at each slot position in
the pattern are stored after each other.

00 -  chan1  chan2  chan3  chan4
01 -  chan1  chan2  chan3  chan4
02 -  chan1  chan2  chan3  chan4
etc.

Contents for each note:

 _____byte 1_____   byte2_    _____byte 3_____       byte4_
/                \ /      \  /                \     /      \
0000          0000-00000000  0000          0000  -  00000000

Upper four    12 bits for    Lower four    Effect   Effect
bits of sam-  note pitch.    bits of sam-  code.    parameter.
ple number.                  ple number.

Pitch table for Tuning 0 (i.e. no finetuning)
  C-0 to B-0 :1712,1616,1525,1440,1359,1283,1211,1143,1078,1018, 961, 907
  C-1 to B-1 : 856, 808, 762, 720, 678, 640, 604, 570, 538, 508, 480, 453
  C-2 to B-2 : 428, 404, 381, 360, 339, 320, 302, 285, 269, 254, 240, 226
  C-3 to B-3 : 214, 202, 190, 180, 170, 160, 151, 143, 135, 127, 120, 113
  C-4 to B-4 : 107, 101,  95,  90,  85,  80,  76,  72,  68,  64,  60,  57

If the sample value for a note is 0, use the previous sample for that channel.
Similarly, if the note pitch is 0, the previous note continues to play (the
note is *not* restarted, to restart a note, the pitch must be set or the
"restart" effect must be given). The same also applies to effects.

To determine what note to show, scan through the table until you find the same
pitch as the one stored in bytes 1-2 of each note. Use the index to look up in
a notenames table. In modules that support finetuning, the note pitch will
always match one of the entries in the table. Otherwise, the pitch in bytes 1-2
may differ from the note table for one of several reasons:
- sampling frequencies differ between PAL and NTSC versions of the Amiga
- samples for different instruments may have been sampled at different
  frequencies
- samples for different instruments may be out of tune in relation to each
  other

The ProTracker and many other trackers only support notes between C-1 and B-3.
Some modules use lower and higher notes, however, so the table has been
expanded with by octaves. Also, several entries in the table are not rounded
correctly. Since ProTracker use exactly these values, we chose to reproduce the
table as is.

The pitch for the note actually determines the time between outputting samples.
For the Amiga, the value of the pitch is the number of clock ticks that a 3.5
MHz timer must count down between outputting two samples.
The exact formulas are:

              3546894.6
SampleRate = -----------        (For a PAL machine)
                Pitch

              3579545.3
SampleRate = -----------        (For a NTSC machine)
                Pitch

Thus, the pitch 214 (for note C-3) corresponds to 16,574 Hz, using the PAL
formula. Each second 16,574 sampled data bytes for the instrument are pushed
through the D/A convertor. (Actually, with most samplers the D/A conversion
frequency is fixed and real-time resampling takes place.)

If finetuning is supported, the pitch stored in the note is always one of the
values in the "pitch table for tuning 0" above. The correct pitch for the note
comes from one of 15 other tables, according to the finetune value of the
instrument.

Protracker V2.3A/3.01 Effect Commands
----------------------------------------------------------------------------
0 + Normal play or arpeggio             0xy : x-first halfnote add, y-second
1 - Slide up                            1xx : up speed
2 - Slide down                          2xx : down speed
3 + Tone portamento (slide to note)     3xx : up/down speed
4 - Vibrato (frequency trembles)        4xy : x-speed,   y-depth
5 - Continue effect 3 + volume slide    5xy : x-upspeed, y-downspeed
6 - Continue effect 4 + volume slide    6xy : x-upspeed, y-downspeed
7 - Tremolo (volume trembles)           7xy : x-speed,   y-depth
8 - not used
9 - Set sample offset                   9xx : offset (23 -> 2300)
A - Volume slide                        Axy : x-upspeed, y-downspeed
B - Position jump                       Bxx : song position
C - Set volume                          Cxx : volume, 00-40
D - Pattern break                       Dxx : break position in next pattern
E - Extended commands
  E0- Set filter                        E0x : 0=filter on, 1=filter off
  E1- Fine slide up                     E1x : value
  E2- Fine slide down                   E2x : value
  E3+ Glissando control                 E3x : 0=off, 1=on (use with effect 3)
  E4- Set vibrato waveform              E4x : 0=sine, 1=ramp down, 2=square
  E5- Set finetune value                E5x : value (0..15)
  E6- Jump to loop                      E6x : 0=set loop start, >0 play x times
  E7- Set Tremolo Waveform              E7x : 0=sine, 1=ramp down. 2=square
  E8- not used
  E9- Retrig note                       E9x : retrig from note + x vblanks
  EA- Fine volume slide up              EAx : add x to volume
  EB- Fine volume slide down            EBx : subtract x from volume
  EC- Note cut                          ECx : cut from note + x vblanks
  ED- Note delay                        EDx : delay note x vblanks
  EE- Pattern delay                     EEx : delay pattern x notes
  EF- Invert loop                       EFx : speed
F - Set speed or tempo                  Fxx : speed (00-1F) / tempo (20-FF)

In a module, all the samples are stored right after the pattern data. To
determine where a sample starts and stops, you use the sample information
structures in the beginning of the file (from offset 14h). The data for a
sample normally starts with two zero bytes, because some trackers create a
repetition loop on the first two samples when it isn't currently playing (i.e.
it plays "silence").



Logarithmic Delta Compression
-----------------------------
Logarithmic Delta Compression is a *lossy* compression scheme. It uses a first
order predictor (more on this later) to get a probable value for the next
sample value. Then it reads the next sample value and calculates the difference
(delta) between the real and predicted values for the sample. Finally, it
stores the 2-based logarithmic value for the calculated difference. The
logarithmic difference can be stored in half as many bits as the original
sample.

The LDC algorithm was developed as a compromise between a fast and simple
algorithm (so you can expand and compress on the fly while reading or writing
compressed samples from or to disk), and complex algorithms that achieve high
compression ratios with low quality loss.

A popular "compression" method is to resample data with fewer bits per sample
(6 or 4 bits for a signal that was originally 8-bit PCM). When restoring the
sound, the data is rescaled to the original number of bits. This method gives a
severe quality loss when you get below 7 bits.

A better approach is to store the diffences between the samples. The
differences will often be close to 0. Storing the differences in 4 bits (where
the original signal is 8 bits) without quality degradation is possible on most
voice recordings. For music, you will often have to scale the differences,
because the difference values may exceed what you can store in 4 bits. For
example, the difference between two samples is +24. With four bits you can
encode the differences -8..+7, so 24 is out of reach. But by dividing all
difference by 4, the difference becomes +6, which you can encode. When playing
the sound back, you multiply the +6 value by 4 again to get the original
difference of +24.

This approach is called DPCM (Delta Pulse Code Modulation). The choice for the
scaling factor (4 in the above case) is essential for good compression. To
choose the best scaling factor, you should analyze the complete sound data.
This means that you can only start compressing after all sound has been
recorded. You cannot record with compression in one pass. As you may have
noticed, DPCM is a lossy compression scheme.

There is one other thing that may improve quality with DPCM: the predicor. In
the DPCM described above, we took the difference between the current sample and
the next one. This is technically a *0th order* predictor; we predict that the
next sample will be equal to the current one and store the difference. If you
have a better predictor, the differences with the actual samples will be nearer
zero and you scaling factor will be lower. State of the art predictors often
use a digital low-pass filter.

An improved technique, ADPCM (Adaptive Pulse Code Modulation), chooses the best
predictor and scaling factor for each short section of samples. The exact
technique differs for each manufacturer. Since ADPCM works on few samples at
a time, it can also compress samples on the fly. But because ADPCM often
requires fairly complex computations (optimal predictor parameters and scaling
factor must calculated for every few samples), it usually requires extra
hardware.

LDC is able to decode 11 kBytes of 4-bit sound samples (so data sampled at 22
kHz) per second and still have time left to play the sound (and do other things
as well). All this on relatively slow machines (12 MHz PC-ATs) without extra
hardware. Like ADPCM, it is a single pass algorithm.

The logarithmic encoding is based on the observation that with a good
predictor, most differences will be near zero. These differences need to be
encoded with the highest precision. High differences do not occur often, so
a coding error on these high differences is less severe.

The level 0 LDC algorithm (LDC0 for short) presented here is meant for 8-bit
PCM data in which each sample is and unsigned value between 0 and 255 and a
value of 128 is the "zero level". The algorithm is easily adapted for signed
sample formats or for 16-bit data. Two's complement arithmetic is assumed and
overflow is to be ignored. Thus, adding 255 and subtracting 1 from a sample
gives the same result. The difference between the sample value and the
predicted value is encoded with the following table:

Table 0
=======

  t_{0,i} = trunc(2**(i-1))          for 0<=i<=8
            2**7 + trunc(2**(15-i))  for 8<=i<=15

  difference  encoding          difference  encoding
  --------------------------------------------------
   +0         0000              +128 (-128) 1000
   +1         0001              +192 (-64)  1001
   +2         0010              +224 (-32)  1010
   +4         0011              +240 (-16)  1011
   +8         0100              +248 (-8)   1100
   +16        0101              +252 (-4)   1101
   +32        0110              +254 (-2)   1110
   +64        0111              +255 (-1)   1111

The algorithm uses an unweighted first order predictor. Calculation of the
predicted next sample takes one shift and one subtraction. Previous versions
of LDC also defined a 0th order predictor, but the first order predictor
outperforms it in all my measurements on real sampled audio and it is almost as
easy to calculate.

p_i = 2*p_{i-1} - p_{i-2}
      Unweighted first order predictor (p_{-1} = p_{-2} = 128). If the
      predicted value is less than 0 or greater than 255, it is set to 0 or
      or 255. This reduces approximation errors.

The level one algorithm of LDC (LDC1) starts with the table described above,
but may switch to one of seven other tables. What table is selected is based
on the encoded 4 bit value. There is no pre-analysis of the optimal table, as
in several ADPCM schemes, so LDC1 can compress data in real time. With most
adaptive schemes, dynamic respons is not very good, since the compressor must
search for the optimal parameters. LDC improves on this, because of its use of
logarithmic tables.

It is difficult to quantify the loss of quality in the compression. Two general
values that give an impression of the quality of the compression scheme are
the Root Mean Square error (RMS) between the compressed audio file and the
original audio file. If you view the encoding error as the addition of a
"noise" signal (like quantization errors lead to quantization noise), you can
also calculate the Signal to Noise Ratio.

Values in "RMS/SNR", for good quality, the RMS value should be low and the SNR
value should be high.

                animal          music
                (hahn.wav)      (halleluj.wav)
LDC0            5.53/24.3       2.37/54.5
MS ADPCM        6.22/21.6       2.74/47.1

                voice           voice           voice           voice
                (patience.wav)  (pinch1.wav)    (irrita.wav)    (name.wav)
LDC0            0.63/199.0      1.29/98.7       4.04/32.5       0.93/138.4
MS ADPCM        0.95/132.3      1.27/100.3      3.64/35.9       0.93/139.0

LDC0 compares well to the far more complex ADPCM algorithm from Microsoft. In
some cases it performs better (patience.wav) and in other cases Microsoft's
ADPCM wins.


Notes on sampling frequencies
-----------------------------
Some sampling rates are more popular than others, for various reasons. Some
recording hardware is restricted to (approximations of) some of these rates,
some playback hardware has direct support for some.

8 kHz           Exactly 8,000 Hz is a telephony standard that goes together
                with U-LAW (and also A-LAW) encoding. Some systems use an
                approximation of 8 kHz; in particular, the NeXT workstation
                uses 8,012.8210513 Hz, apparently the rate used by Telco
                CODECS. Several CompuPhase Sound drivers are optimized for
                8,192 Hz.

11 kHz          Either 11,025 Hz (a quarter of the CD sampling rate, and also
                a Microsoft/IBM multi media standard), 11,127 Hz (half the Mac
                sampling rate), or 11,111 Hz (Sound Blaster's approximation of
                11 kHz). This is probably the most popular sampling frequency.

18.9 kHz        CD-ROM/XA standard.

22 kHz          Either 22,050 Hz (half the CD sampling rate, and a Microsoft/
                IBM multi media standard), the Macintosh rate of precisely
                22,254.545454545454 Hz (the horizontal scan rate of the
                original Macintosh), or 22,222 Hz for the Sound Blaster.

32,000 Hz       Used in digital radio, NICAM (Nearly-Instantaneous Companded
                Audio Multiplex [IBA/BREMA/BBC]) and other TV work, at least in
                the UK; also long play DAT and Japanese HDTV.

37.8 kHz        CD-ROM/XA standard for higher quality.

44,056 Hz       This weird rate is used by professional audio equipment to fit
                an integral number of samples in a video frame.

44,100 Hz       The CD sampling rate. This is also the highest defined
                frequency in the multi media standards for Microsoft Windows
                and IBM OS/2. DAT players recording digitally from CD also use
                this rate.

48,000 Hz       The DAT (Digital Audio Tape) sampling rate for domestic use.

While professinal musicians disagree, most people don't have a problem if
recorded sound is played at a slightly different rate, say, 1-2%. On the other
hand, if sound must be synchronized with other activity (animation), even the
smallest difference in sampling rate can frustrate the buffering scheme used.

