Background Compact Disc
Part 2 - Music by Numbers (April 1987) 


This series of articles is aimed at non-technical readers who nevertheless would like to learn a little of the background to Compact Disc, how CD players work, how to choose and use a CD player, and what exciting new developments the CD medium still has in store. The articles have been prompted by many letters from GRAMOPHONE readers. These make it quite clear that, whilst most of the first converts to the CD format were already knowledgeable collectors of LP vinyl records or prerecorded musicassettes, who had decided to add this third 'music carrier' to their existing audio systems, there were many complete beginners attracted by the new CD medium for whom the whole hi-ti mystique was like learning a new language.



Part 2 - Music by numbers

In Part II set the scene for this sixpart series of articles on Compact Disc, by presenting a kind of chronology of how CD has evolved to date and comparing some of the relative advantages and disadvantages of CDs, LPs and cassettes. The CD medium scores high marks under various headings such as tonal fidelity, low noise and distortion, pitch accuracy and robustness because of the combination of its two newest elements: (a) the use of digital encoding, and (b) the use of optical (laser-beam) scanning. 1 propose to discuss the digital aspects this month and return to the optical topics in future instalments.

The analogue chain
The common starting point for much of the music we listen to at home from records, tape, radio or television is a recording session. Figure 1 shows the typical situation in which an orchestra or other group of musicians play for our future enjoyment. The sounds they make set the air in the hall or studio into sympathetic vibration—fast vibrations for high-pitched sound; slow vibrations for bass notes; highamplitude vibrations for loud sounds; tiny vibrations for quiet notes, etc. All this sound energy travels out from each individual source at the speed of sound (about 760 miles per hour, or 340 metres per second) in the form of a so-called pressure wave. Thus at any point in the studio the air pressure is set into a state of continous oscillation between values above and below the normal atmospheric pressure. The three main characteristics of the sound wave are the frequency (number of up and down cycles per second, or Hertz), the amplitude (magnitude of the up and down swings) and the phase (point reached in the cycle at the instant of observation).

Figure 2 shows the type of pressure variation which would be plotted against time if we could position a very fast-reading barometer in the studio. The waveform shown here is typical of a flute playing quite loudly, and is seen to be repetitive every 2 milliseconds. This means that the fundamental pitch of the note sounding corresponds to a frequency of 500Hz (500 cycles per second) or approximately the note B above middle C. The irregularities in the wave shape indicate that harmonic frequencies (multiples of 500Hz) are also present, contributing to the recognisable flute timbre. In fact, the flute waveform is about the simplest we come across in music, being surpassed only by a tuning fork which radiates only its fundamental frequency—as a pure sine wave. Clearly the waveform at any point in the studio when a full orchestra is playing is extremely complex, and is further compounded by the numerous reflected waves bouncing from the walls, floor and ceiling.

Complex or not, the human ear has an incredible ability to analyse and make sense of this stream of pressure variations—and a modern microphone can respond to the pressure wave and convert it into a virtually identical (i.e. 'analogue') electrical voltage waveform (looking exactly like my Fig. 2 but with the vertical scale changed to read "Voltage" instead of "Air Pressure"). As Fig. 1 indicates, this analogue signal is then passed through a mixing console, where it may be combined with the signals from other microphones and amplified (i.e. boosted in scale but with the analogue waveform preserved intact). It is then passed (in the conventional analogue chain for LP production) to an analogue tape recorder which again preserves the waveform but this time in terms of a magnetic pattern on a tape. At some later date, perhaps after editing or further processing, the tape will be replayed in a tape-to-disc transfer stage, and the waveform will now be stored as a physical pattern in the grooves of a disc, as shown in the micro-photograph last month.

Air Pressure +
Normal 02 4 6 8 10 Time (milliseconds) Fig. 2. Variation of air pressure with time (showing typical flute waveform at 500Hz-500 cycles per second). Fig. 1. Analogue and digital programme chains for the production and reproduction of both LPs and COs. Our precious analogue waveform is clearly vulnerable to attack from dust, physical damage and electrical or magnetic interference at all stages in this chain of events—the miracle is that LP records can sound as good as they do, at least for their first few playings.

The digital chain

Since around 1972, as I mentioned in Part I, the recording industry has been changing over from analogue recorders to digital ones which do not store the signals in analogue form but as a series of numbers encoded as on/off pulses. Thus, as shown in the lower half of Fig. 1, the signals from the mixing console are passed through an A/D (analogue to digital) converter, of which more later, to a digital recorder. For LP manufacture, the digital tape must be replayed through a D/A converter to return the signals to the required analogue form for vinyl disc processing.

Ideally, however, the signals should be kept in the more robust noise-free digital form over as much of the chain as possible. For this reason, at the studio end of the chain there is a move towards all-digital mixing consoles and even research into the possibilities of digital microphones. At our receiving end, of course, the Compact Disc meets this need to preserve signals in digital form (see bottom line in fig. 1) right up to the moment when we play them back, and the D/A converter built into our CD player produces the necessary analogue version to feed to our loudspeakers or headphones.

A/D conversion
Put simply, the process of converting an analogue waveform into digital form involves electronically sampling the waveform at regular intervals (see Fig. 3), noting the signal level at each sampled instant against a fixed scale of values (the process known as 'quantization') and storing these individual values as a stream of encoded pulses. The sampling rate is dictated by a quartz oscillator or 'clock', and for best results should be as high as possible—certainly it must be at least twice the highest audio frequency that we wish to reproduce. Thus, for high-quality recording up to 20kHz, the sampling frequency must be at least 40kHz. Professional recorders mainly use 48kHz, or 44- 1 kHz which is the sampling frequency chosen for the Compact Disc. (The BBC were pioneers in digital sound processing and since the late 1960s their VHF! FM radio distribution network has been digitized using a 32kHz sampling frequency, sufficient for the restricted radio bandwidth of 15k Hz.)

The quantization process is clearly subject to errors, since there can be only a limited number of quantization levels trying to represent an infinite number of possible analogue levels, so that a degree of approximation has to take place. This quantization error or noise can be reduced by increasing the number of reference levels available, but this can greatly increase the cost of the electronic components.

Binary numbers
To explain this, I must break my own rule of keeping this series of articles as non-technical as possible by introducing a little arithmetic. An essential ingredient in the modern electronics miracle of microprocessor chips which have brought us pocket calculators, digital watches, computers, space travel and now digital audio, has been the use of binary numbers. Table 1 shows the numbers in a binary (two unit) scale and their equivalent numbers in the decimal scale with which we are all more familiar. The binary scale allows any number to be represented by a block or 'word' using only two symbols or states: I and 0. The word-length in Table I is four binary digits ('bits') which just copes with the first 16 (i.e. 2) numbers on our decimal scale (from 0 to 15). For simplicity, the quantization illustrated in Fig. 3 is only 3-bit which is enough to represent each Volt on the scale of voltages (from 0 to 7V) by a different binary number (from 000 to 111). Of course more bits are needed for larger or more complex numbers and in general a word-length of n bits provides 2n values, as shown in Table 2, More bits needed.

Hence, going back to our problem with quantization noise, we find that each additional bit in the chosen word-length halves the quantization noise, giving a 6dB improvement in signal-to-noise ratio per bit. The 16bit word-length used in the Compact Disc format therefore provides a very acceptable 96dB S/N ratio but needs 65,536 quantization levels to do so. The chips needed are at about the limit of what might be called 'affordability' in equipment aimed at the consumer market at present.

Since the recording bandwidth needed is at least equal to the sampling frequency multiplied by the word-length in bits, the result of all this heavy arithmetic is that CD processing for two-channel stereo requires a bandwidth of 14112MHz (441 x 16 x 2).

Fig. 3. Sequence of events in the digitization of an analogue waveform (top) and its subsequent conversion from digital form back to analogue (courtesy Philips).

The story does not quite end here. This audio bit stream is not recorded directly on to tape. A powerful error correction code is applied to ensure unimpaired playback even in circumstances where dirt or damage wipes out up to 4,000 bits of data (a CD track length of 2.5mm). The code can even conceal errors affecting up to 12,300 bits by interpolation. To be honest, the CD medium could not exist without such error correction and concealment encoding to allow for a wide tolerance in disc and player manufacture as well as the inevitable scratches and marks caused by disc handling. Other digital codes have also to be added to take care of synchronization, tracking, search and find, random track selection and visual displays. In addition a process known as EFM (Eight-to-Fourteen Modulation) is employed partly to surpress undesirable subsonic frequencies which might disturb the player servo mechanism.

The final result of all this data storage requirement is that the CD medium must have the ability to handle some 432 million bits per second. Or looked at another way, 74 minutes of stereo music (the maximum CD capacity, reputedly chosen to get Beethoven's Choral Symphony on one disc) amounts to about 19,000 Wits, of which some 5,000M are audio data. At least this confirms the relatively huge CD storage capacity compared, for example, with the ordinary floppy disc (4 Mbits)—and shows why CD has a considerable future potential for data storage and such exciting developments as CDROM, CD-1 and CD-V, as we shall see in due course. Just to throw in one last piece of useless information, the constant scanning speed of 125 metres per second used in CD means that the track on a 74 minutes disc is 5.5 kilometres (34 miles) long.

Filters
Certain restrictions must be applied to the audio signals. At the recording stage, for example, it is vital that the amplitude (peak level) of the music signals sent to the A/D converter do not exceed the limits of the quantization scale (e.g. 7 Volts in Fig. 3). And there is a similar restriction in terms of the frequency spread—which must not exceed the nominal audio bandwidth on which the choice of sampling frequency was based, e.g. 20kHz for the CD sampling frequency of 441kHz. Exceeding the maximum signal level will result in heavy clipping distortion or break-up, so the recording engineer must control his microphone signals with care. Exceeding the frequency bandwidth results in the generation of spurious frequencies—again a serious form of distortion.

This latter effect is known as 'aliasing' and can be understood by referring to Fig. 4a. This shows that the sampling process, which consists in essence of modulating the audio signal on to a stream of short pulses at the sampling frequency, produces an extended (theoretically infinite) frequency spectrum. It comprises a series of integral multiples of the sampling frequency, together with sidebands corresponding to the audio signal and extending to + 20kHz. The desired signal is confined to the baseband 0-20kHz. If the music signals were allowed to include Table I. Decimal and Binary scales compared. Decimal Powers of 2 Binary 0 0000 2 2' 0010 3 0011 4 22 0100 5 0101 6 0110 7 0111 8 2 000 9 001 10 010 11 011 12 100 13 101 14 110 15 (2—l) 111 Table 2. Range of values for different word lengths 0001 Number of bits Range of values SIN ratio (n) (2") (dB) 1 2 6 2 4 12 3 8 18 4 16 24 5 32 30 6 64 36 7 128 42 8 256 48 9 512 54 10 1,024 60 II 2,048 66 12 4,096 72 13 8,192 78 14 16,384 84 15 32,768 90 16 65,536 96 (b) (C) Amplitude kr 25 441 88-2 132-3 176-4 220-5 Frequency (kHz) Effect of Digital Filter - - 2544-1 1764 Effect of Hold and Analogue Fitter 25 Output Fig. 4. Basic principle of four- times oversampling: (a) infinite series of harmonies generated in the normal sampling process; (b) effect of oversampling and digital filter; (c) final HF removal with hold and analogue filtering.

anything above 20kHz, the sidebands would overlap to produce unwanted interference signals called 'alias' frequencies. An 'anti-aliasing' filter must therefore be introduced just before the AID converter which passes all frequencies below 20k1-lz but rapidly attenuates higher frequencies, down to say - 50dB by 24kHz.
A similar problem arises at the playback stage, when the spreadeagled spectrum of Fig. 4a with supersonic 'images' emerges from the CD player's D/A converter. Although these very high image frequencies are in themselves inaudible, they could overload the amplifier, damage the loudspeakers and cause severe intermodulation distortion. So once again a low-pass 'anti-imaging' filter is needed. Sufficiently steep filtering can be produced by traditional analogue filters (graphically referred to as 'brickwall' filters) and indeed this is the method adopted in the majority of Japanese CD players. However, such analogue filters are complex and bulky and very careful design is needed or else the sideeffects of ringing and time delay (phase non-linearity) can become audible Oversampling
From their earliest designs, Philips (followed by some other companies such as Marantz, B&O and Revox) have adopted a different approach using so-called four-times oversampling filters. These behave as if sampling at 1764kHz (four times 44.1kHz) and produce a modified image spectrum as shown in Fig. 4b. It is now much easier to remove the unwanted high frequencies using a special digital filter, and following the D/A converter with a mild analogue filter which introduces very little phase non-linearity (see Fig. 4c). Although the chip used by Philips in these early four-times oversampling players was only 14-bits, the technique easily matched the 96dB dynamic range performance of 16-bit CD players. To the 84dB contributed by the 14-bit converter can be added 6dB due to the fact that the inherent quantization noise is spread over the whole 1764kHz sampled bandwidth (882kHz) whereas only that portion of the noise falling within the 22kHz band is relevant. A further 7dB improvement is achieved by a noiseshaping circuit. As a rule, oversampling brings benefits in terms of phase linearity, tower distortion and fewer temperature rise problems.

Note, however, that there is no single unique way of optimizing the CD specification. Quite a number of Japanese manufacturers have adopted two-times oversampling (at 88.2kHz) with a 16-bit converter, giving excellent results. And the latest Philips CD players have moved a stage further with new dual 16-bit chips plus four-times oversampling, giving a claimed 18-bit performance. Philips figures for typical results from their own earlier players (with the new 16-bit results in brackets) are as follows: amplitude linearity ±03dB (±001dB); phase linearity ±05' (±02'); S/N ratio 102dB (103dB); total distortion plus noise —89dB (-92dB); channel separation 96dB (100dB).

Of course these evolutionary upgradings of player specifications cannot go beyond the limits of the fixed CD standard itself, but they can bring the standard ever closer.

feedback