ofSoundBufferfile
ofSoundBufferclass
Buffer for audio samples and associated metadata.
ofSoundBuffer stores audio as an array of interleaved floating point samples, with a given sample rate for playback.
How sound recording works
Physically speaking, what we call sound is simply changes in air pressure perceived by a listener. These changes in sound pressure are converted by a microphone into changes in voltage that can be recorded, making a sound recording. A sound recording is therefore a recording of changes in air pressure at a particular point in space (ie where the microphone was positioned). When it is played back through a speaker, the speaker reproduces the same pattern of changes in air pressure as were recorded by the microphone, but this time at a different point in space (ie where the speaker is positioned).
In digital audio these changes in air pressure are recorded as a set of discrete numbers (samples), each number representing a snapshot of the air pressure at a particular point in time. For high quality audio there are typically 44,100 snapshots recorded every second. This is called the sample rate and is expressed in Hz (44100Hz) or kHz (44.1kHz).
Because humans have two ears, rather than one, sound is usually recorded in stereo. The simplest stereo sound recording is two channels of sound recorded by two microphones at two different points in space. More channels can also be recorded (eg with 5.1 surround sound systems or Ambisonics).
Frames, channels and samples
Data in an ofSoundBuffer is stored interleaved as an array of floats. Interleaved audio is analogous to how different color channels are stored side by side in an ofImage or ofPixels object.
The functions and function arguments in ofSoundBuffer that deal with this interleaved data are based on 3 key terms:
channels refers to the number of channels or individual streams of interleaved audio samples in the buffer. A mono recording has 1 channel, a stereo recording has 2 channels.
samples refers to the actual raw data. One sample is a single floating point number between -1 and 1, which represents a snapshot of sound pressure at a single moment in time. A 0.1 second long buffer at 44100Hz contains 4410 samples if it has 1 channel, 8820 samples if it has 2 channels, 13230 samples if it has 3 channels, and so on.
frames refers to the number of multi-channel sets of interleaved sample data there are in the buffer. A 0.1 second long buffer at 44100Hz always has 4410 frames, regardless of how many channels it has. To get the number of samples in a buffer you multiply the number of channels by the number of frames.
If I have an ofSoundBuffer with 8 frames of mono (1 channel) audio, the underlying array contains 8 samples (ie it is 8 floats long), and the samples are arranged like this:
L L L L L L L L
where L
represents a single sample.
If I have an ofSoundBuffer with 8 frames of stereo (2 channel) audio, then the underlying array contains 16 samples ((getNumFrames()*getNumChannels(), ie 8*2) arranged in an interleaved pattern:
L R L R L R L R L R L R L R L R
where L
represents a single sample for the left channel, and R
represents a single sample for the right channel. Grouping the frames together for clarity:
LR LR LR LR LR LR LR LR
If I have an ofSoundBuffer with 8 frames of 5.1 surround (6 channel) audio, then the underlying array of floats contains 48 samples (getNumFrames()*getNumChannels(), ie 8*6) and is usually arranged in an interleaved pattern like this:
L C R Ls Rs Lfe L C R Ls Rs Lfe L C R Ls Rs Lfe L C R Ls Rs Lfe L C R Ls Rs Lfe L C R Ls Rs Lfe L C R Ls Rs Lfe L C R Ls Rs Lfe
where L
represents a single sample for the left channel, C
for centre, R
for right, Ls
for left surround, Rs
for right surround and Lfe
for the subwoofer. Grouping the frames together for clarity:
LCRLsRsLfe LCRLsRsLfe LCRLsRsLfe LCRLsRsLfe LCRLsRsLfe LCRLsRsLfe LCRLsRsLfe LCRLsRsLfe