http://theghostinthemp3.com/theghostinthemp3.html
The Ghost in the MP3
---------------------------------------------------------------------
for updates follow on: soundcloud or facebook
---------------------------------------------------------------------
download uncompressed audio: rpm7.bandcamp.com/album/
the-ghost-in-the-mp3
---------------------------------------------------------------------
I. Overview
The MPEG-1 or MPEG-2 Layer III standard, more commonly referred to as
MP3, has become a nearly ubiquitous digital audio file format. First
published in 1993, this codec implements a lossy compression
algorithm based on a perceptual model of human hearing. Listening
tests, primarily designed by and for western-european men, and using
the music they liked, were used to refine the encoder. These tests
determined which sounds were perceptually important and which could
be erased or altered, ostensibly without being noticed. What are
these lost sounds? Are they sounds which human ears can not hear in
their original context due to universal perceptual limitations or are
they simply encoding detritus? It is commonly accepted that MP3's
create audible artifacts such as pre-echo, but what does the music
which this codec deletes sound like? In the work presented here,
techniques are considered and developed to recover these lost sounds,
the ghosts in the MP3, and reformulate these sounds as art.
II. MP3 Compression
The MP3 standard, designed in the early 1990's by the Moving Pictures
Experts Group, has become an interesting object of critique in
contemporary technology studies (Sterne, 2006). How a standard which
subtly reduces the audible quality of soundfiles has remained in
place, despite massively increased bandwidths and storage capacity is
impressive, and highlights the foresight (and fortune) of the
format's creators.
Regardless, the MP3 is not always the most appropriate format for a
given task, and a critical evaluation of the technology and its
limitations is warranted. Many listeners today listen exclusively to
MP3 files, even in settings where the gains from a higher fidelity
format would be clearly perceptible. This lossy compression codec has
thus come to dominate unanticipated listening spaces.
Despite its highly touted performance in listening tests, the MP3
compression codec does generate audible artifacts and remove
perceptible sonic information, especially when implemented at low bit
rates.
For example, white, pink, and brown noise, when compressed to the
lowest possible MP3 bit rate, sounds very different from the original
random signal.
White, Pink, & Brown Noise - Uncompressed Spectrograph
Example 1. White, Pink, and Brown Noise - Uncompressed
White, Pink, & Brown Noise - 8kbps MP3 Spectrograph
Example 2. White, Pink, and Brown Noise - Lowest Possible Bit Rate
MP3 (8kbps)
In comparison, low-frequency sine tones sound quite good as an MP3
encoded at 320kbps MP3.
Still, some material has been left behind which, upon examination, is
quite interesting.
Low Sine Tone Chords - Uncompressed
Example 3. Sine Tone Chords - Uncompressed
Low Sine Tone Chords - 320kbps MP3
Example 4. Sine Tone Chords - 320kbps MP3
Low Sine Tone Chords - 320kbps MP3 Ghost
Example 5. Sine Tone Chords - 320kbps MP3 "Ghost"
III. Finding the Ghost
Using the Bregman, pyo, and pydub libraries, along with the LAME MP3
encoder, I begin with an uncompressed WAV file and save it as an MP3
file, 128kbps in this example, which does quite well. I chose 128kbps
for these examples because that was the "high-quality" bit rate used
in the original MP3 development listening tests. In the music I've
made (moDernist, etc.) using this process, I've used 320kbps MP3's.
Tom's Diner - First Verse - Uncompressed
Example 6. Tom's Diner - Verse 1 - Uncompressed
Tom's Diner - First Verse - 320kbps MP3
Example 7. Tom's Diner - Verse 1 - 128kbps MP3 Example
I then analyze, compare, and take the difference between both files.
Tom's Diner - First Verse - 128kbps MP3 Ghost
Example 8. Tom's Diner - Verse 1 - 128kbps MP3 "Ghost" Example
Where the two files are the same or similar, the information in the
original audio has been largely preserved in the MP3. However,
corresponding time-frequency bins which differ significantly between
the two files betray spots where information has been altered or
deleted. Different extraction techniques are possible, each leading
to slightly different output.
Example 9. Tom's Diner - Verse 1 - "Ghost" extracted via masking
Example
Different ways of handling phase estimation also lead to slightly
different results.
Examples 10 & 11. Tom's Diner - Verse 1 - 128kbps MP3 "Ghost" Example
with & without phase estimation
IV. Artistic Overview and Background
As previously stated, the MP3 codec was refined using listening tests
designed by european audio engineers and featuring the music they
chose. In a sense, each of these songs acts as a resonant filter for
every file encoded in the MP3 format. Tom's Diner by Suzanne Vega,
Fast Car by Tracy Chapman, a Haydn Trumpet concerto... these songs
carved out the space of sounds that could be successfully encoded as
MP3's. To that end, these songs represent a kind of best-case
scenario for an MP3 encoding. If anything can be encoded well by this
format, it should be these files. And yet these files do leave a
residue behind when encoded to MP3. Exploring these sounds helps to
define a boundary case for MP3 salvaging.
V. moDernisT
As a preliminary foray into codec ghost composition, I am creating a
series of pieces based on the songs used in the original MP3
listening tests. Today, I'd like to briefly discuss my treatment of
Tom's Diner. After compressing the original audio to 320kbps MP3's, I
begin by analyzing the song structure, interpreting the music and
text, and I then attempt to arrange the most interesting recovered
material via this framework.
As a case study of the techniques I've used, I'd like to discuss two
verses in detail. Verse one finds the narrator in a bustling diner,
making observations about her environment. The focus of this text is
external to it's author, as opposed to later verses which exist in a
more subjective, internal space. Using different settings to harvest
the lost material, I was able to isolate both clear, pitched content
and more ephemeral transient signals.
Using the python library headspace, and a reverb model of a small
diner, I began to construct a virtual 3-d space. Beginning by
fragmenting and scrambling the more transient material, I applied
head related transfer functions to simulate the background
conversation one might hear in a diner. Tracking the amplitude of the
original melody in the verse, I applied a loose amplitude envelope to
these signals. Thus, a remnant of the original vocal line comes
through in its amplitude contour.
Having constructed this background, prominent pitches from the
original melody appear and disappear, located variously in this
virtual space. These ephemeral sounds hint at a familiar melody,
playing with aural memory and imagination, a flickering apparition
hovering at the border of consciousness.
Example 12. moDernisT - Verse 1
Verse 5 finds the narrator in a noticeably different psychological
state. Instead of buoyantly attending to the activity of the room,
she is lost in thought, remembering.
Example 13. Tom's Diner - Verse 5
Accordingly, I have given this material more space. It is less
fragmented, the constant background conversation has receded, the
virtual space has drawn closer, it feels more internal than external.
Key phrases and snippets of the melody emerge more clearly, and then
the outro arrives, once again obscuring the familiar melody, but
hinting at it's former presence. We hear mostly transients, but
internally we might fill in the rest.
Example 14. moDernisT - Verse 5
VI. Future
Moving forward, I am planning a series of related compositions,
constructed first from the other songs involved in the listening
tests, but then probing the space of MP3 compression in different
ways, attempting to highlight even more explicitly the filtering
effect of this codec. The songs used in developing the MP3 codec are
notable for what they are not: they are not music from other
cultures, not hip-hop or dance music, nothing with prominent low
frequencies, nothing particularly noisy, no outright aggressive
sounds, nothing lo-fi. Rather, these sounds have been broadly
institutionally accepted and conform to accepted standards of
production and recording technique. As MP3's have invaded more and
more contemporary listening spaces, the class of privileged sounds
which the format inadvertently creates has become more apparent.
Originally developed for suboptimal listening environments, MP3's are
now heard everywhere, at home, streaming in stores and public spaces,
over high-fidelity car stereo systems. This format has become a
curator for these spaces: allowing in a great deal of wonderful
sound, yes, but at the exclusion of a vast territory in the available
sonic terrain. Composing with these sounds and injecting them back
into contemporary listening spaces is one possible act of resistance,
one available mode of cultural critique.
VII. Conclusion
In conclusion, composing with MP3 files is an attempt to derive
interesting material from the sounds that have been rejected from
many of our contemporary listening spaces. MP3 ghosts and artifacts
are difficult to predict and provide externally generated material to
react to and work with as a composer, while not limiting the freedom
of the artist to arrange, alter, and interpret these sounds.
Investigating a particular format for its aesthetic possibilities is
inspired by musics built around previous technologies- "tape music",
for example. I see "format music" as a contemporary analogue of these
practices. Each of you reading this are involved with technology in
your own way. I have found it extremely fruitful to question and
explore the limitations of those technologies with which I find
myself intertwined. I hope you have also found interest in
questioning these limitations with me.
Special thanks to:
Tara Rodgers, Aden Evens, Larry Polansky, Michael Casey, Matthew
Burtner, and many others
for their help in conceiving and realizing this project
Read the full "Ghost in the MP3" Conference Paper from the
2014 Sound and Music Computing / International Computer Music
Conference here
Works Referenced:
Brandenburg, Karlheinz. "MP3 and AAC Explained." In Audio Engineering
Society Conference: 17th International Conference: High-Quality Audio
Coding, 1999.
Brandenburg, Karlheinz, and Gerhard Stoll. "ISO/MPEG-1 Audio: A
Generic Standard for Coding of High-quality Digital Audio." Journal
of the Audio Engineering Society 42, no. 10 (1994): 780-792.
Cascone, Kim. "The Aesthetics of Failure:'Post-digital' Tendencies in
Contemporary Computer Music." Computer Music Journal 24, no. 4
(2000): 12-18.
Miller, Vincent. Understanding digital culture. Sage Publications,
2011.
Demers, Joanna. Listening Through the Noise: The Aesthetics of
Experimental Electronic Music. Oxford University Press, USA, 2010.
Evens, Aden. Sound Ideas: Music, Machines and Experiences.
Minneapolis: University of Minnesota Press, 2005.
Manovich, Lev. The Language of New Media. The MIT press, 2001.
Oswald, John. "Plunderphonics, or Audio Piracy as a Compositional
Prerogative." In Wired Society Electro-Acoustic Conference, 1985.
Pras, Amandine, Rachel Zimmerman, Daniel Levitin, and Catherine
Guastavino. "Subjective Evaluation of Mp3 Compression for Different
Musical Genres." In Audio Engineering Society Convention 127, 2009.
Sterne, Jonathan. MP3: The Meaning of a Format. Duke University Press
Books, 2012.
Sterne, Jonathan. "The Mp3 as Cultural Artifact." New Media & Society
8, no. 5 (2006): 825-842.
heartbeat