Digital audio

Sounds are pressure waves in some material (usually air). A sound is created by a disturbance that generates a pressure wave. That pressure wave travels to your ear, and your ear sends signals to your brain that you interpret as sound.

Different pressure waves lead to different sounds. When you strike a tuning fork, it produces a pressure wave that is a sine wave of some frequency, and we hear that as a pure tone. High-frequency sine waves lead to high-pitched sounds; low-frequency sine waves lead to low-pitched sounds. Remarkably, all periodic signals can be represented as the sum of multiple sine waves of various frequencies, amplitudes, and phases (take EECS 216 to learn more).

Digital audio refers to samples of these pressure waves. We can store the "shape" of a wave by sampling it at a high frequency. Later, we can reproduce the pressure wave by sending these samples to a speaker, which vibrates according to the sampled value and recreates an approximation of the original pressure wave. Here's a good tutorial on digital audio.

An E100 program generates sound by sending samples to the speaker controller, which forwards those samples to the DE2 line out. For example, if you send samples that approximate a 440 Hz sine wave, then the speaker controller will produce that sine wave on the DE2 line out connector.

Each sample sent to the speaker controller is a 32-bit, signed value in the range [-231, 231-1]. A value represents the magnitude of the wave at an instant in time. The speaker controller produces sound at an 8 KHz sampling rate, so the time interval between samples is 0.125 ms. For example, the following samples represent a sine wave with amplitude 1024 and frequency 440 Hz. There are 19 samples before the samples start to repeat, so one period of these samples covers a time span of 2.375 ms (which is approximately the period of a 440 Hz sine wave).

The samples you send determine what sound gets produced by the speakers. Increasing the amplitude of the samples increases the volume. Increasing the frequency of the samples (i.e. shortening the period) raises the pitch.

A sine wave of frequency 440 Hz corresponds to the note known as concert A; this is the note that an oboist plays at the start of an orchestra concert and sounds like this (note: on Linux and Windows, Chrome's built-in media player can't play this file. Try saving the file and playing it with another program.)

In western music, there are 12 notes in an octave, e.g. A, A#, B, C, C#, D, D#, E, F, F#, G, G#. After that, the names of the notes start repeating. The frequency of each note is approximately 1.059463094 times the frequency of the note below (1.059463094 = 21/12). E.g., the frequency of the A# directly above concert A is about 466.2 Hz). Because there are 12 notes in an octave, the note that is one octave above concert A (which is also called A) is double the frequency of concert A, or 880 Hz. Of course, you may use a different scale than that of western music.

Extracting sound samples from WAV files

You may find it useful to extract sound samples from normal sound files. You can use the sox program on Linux to create a file with sound samples but no WAV format headers. For example, run the following command to extract sound samples from a WAV file file1.wav into a file file2.raw:

sox file1.wav -4 -s -c 1 -r 8000 file2.raw
(-4 means each sample is four bytes; -s means samples are encoded as signed, two's complement integers; -c 1 means one channel (mono); -r 8000 means the sampling rate is 8000 Hz.) The .raw file will be a binary file that is filled with 32-bit numbers.