When computers listen to music, what do they hear?

A new generation of scholars is turning music into data—and uncovering truths beyond human ears.

By Leon Neyfakh, 7/8/2012

Soon after the release of the first iPhone five years ago, an astonishing new ritual began to be performed in cafes and restaurants across the country. It centered on an app called Shazam. When the phone was held up to a radio, Shazam would almost instantly identify whatever song happened to be on, causing any iPhone skeptics in the vicinity to gulp in bewilderment and awe.

There was something unspeakably impressive about a machine that could listen to a snippet of a random hit from 1981, pick out its melody and beat, and somehow cross-reference them against a database that seemed to contain the totality of all recorded music. Seeing it happen for the first time was revelatory. By translating a song into a string of numbers, and identifying what made it different from every other song ever written, Shazam forced us to confront the fact that a computer could hear and process music in a way that we humans simply can’t.

That insight is at the heart of a new kind of thinking about music—one built on the idea that by taking massive numbers of songs, symphonies, and sonatas, turning them into cold, hard data, and analyzing them with computers, we can learn things about music that would have previously been impossible to uncover. Using advanced statistical tools and massive collections of data, a growing number of scholars—as well as some music fans—are taking the melodies, rhythms, and harmonies that make up the music we all love, crunching them en masse, and generating previously inaccessible new findings about how music works, why we like it, and how individual musicians have fit into mankind’s long march from Bach to the Beatles to Bieber.

Computational musicology, as the relatively young field is known within academic circles, has already produced a range of findings that were out of reach before the power of data-crunching was brought to bear on music. Douglas Mason, a doctoral student at Harvard, has analyzed scores of Beatles songs and come up with a new way to understand what Bob Dylan called their “outrageous” use of guitar chords. Michael Cuthbert, an associate professor at MIT, has studied music from the time of the bubonic plague, and discovered that during one of civilization’s darkest hours, surprisingly, music became much happier, as people sought to escape the misery of life.

While most of us do all right when it comes to describing how a song makes us feel, we tend to fail miserably when asked to explain what it is about how it sounds that makes us feel that way.

Meanwhile, Glenn Schellenberg, a psychologist at the University of Toronto at Mississauga who specializes in music cognition, and Christian von Scheve of the Free University of Berlin looked at the composition of 1,000 Top 40 songs from the last 50 years and found that over time, pop has become more “sad-sounding” and “emotionally ambiguous.”

“You get a bird’s eye view of something where the details are so fascinating—where the individual pieces are so engrossing—that it’s very hard for us to see, or in this case hear, the big picture...of context, of history, of what else is going on,” said Cuthbert. “Computers are dispassionate. They can let us hear things across pieces in a way that we can’t by even the closest study of an individual piece.”

As more of the world’s concertos, folk songs, hymns, and number one hits are converted into data and analyzed, it’s turning out that listening is only one of the things we can we do in order to try and understand the music we love. And it confronts us with a kind of irony: Only by transforming it into something that doesn’t look like music can we hope to hear all of its hidden notes.


Perhaps the most straightforward way to analyze large swaths of music at once is to focus on lyrics. In the last few years, multiple studies have sought to use lyrics as clues to how American music—and the country itself—have changed over the course of the past several decades. In 2009, a pair of psychologists examined a dataset of Billboard hits released between 1955 and 2003 and looked for patterns involving subject matter and sentence length. They found that during tough economic times, musicians tended to use longer sentences in their songs, and referred more often to the future. In 2011, a paper published in the Journal of Aesthetics, Creativity and the Arts examined popular songs from 1980 to 2007, and found an increase in the use of the pronoun “I” and a decrease in the use of “we.”

Such studies fall neatly within the tradition of the digital humanities, a burgeoning field centered around breaking down works of literature to their basic components and analyzing them for things like word usage, syntax, and plot structure. But when it comes to music, lyrics are only the beginning. In many cases, much more is communicated through the texture and sound of the music itself: the tune, the beat, the chord progressions, the tempo, and so on. It’s often these attributes, more than lyrics, that imbue a piece of music with the power to communicate a mood, hijack our emotions, invade our consciousness, or make us dance. And while most of us do all right when it comes to describing how a song makes us feel, we tend to fail miserably when asked to explain what it is about how it sounds—what it does, musically—that makes us feel that way.

Computer-assisted analysis can help bridge that gap—not by directly explaining our reactions, but by generating precise, technical descriptions of what is happening in a particular piece, putting it in a blender along with thousands of others, and allowing us to compare them to, say, other works by the same artist, or pieces of music from a different era. It’s no surprise that computers are well suited to this task, since math has been entwined with music theory from the beginning. The ancient Greeks understood that musical pitches have clear mathematical relationships to one another, and most music is built on time signatures, chords, and melodic structures that can be represented with numbers. In that sense, turning music into data comes relatively naturally.

Even so, it’s not easy, and researchers are constantly working on getting better at it. The most straightforward method—and the most arduous—involves painstakingly converting sheet music, note by note, into a series of numbers that can then be interpreted and statistically analyzed with a computer. This kind of work has its roots in the 1960s, when two separate research teams used an elaborate mechanism involving thousands of punch cards to encode large collections of folk songs from around the world and then looked for patterns in performance style and composition structure—in effect, turning even the subjective parts of music into data. The ethnomusicologist Alan Lomax led one of these endeavors, which he called “Cantometrics”; his team sought to compare vocal styles favored by different cultures, picking out 37 “style factors” such as breathiness and rasp. Berkeley scholar Bertrand Bronson, meanwhile, wanted to know whether folk songs from a given region or era had anything in common in terms of melodic phrasing or rhythm. In an essay entitled “All This for a Song?” Bronson described his project as an attempt to overcome the vastness of the available data by turning it over to “electronic robots, clothed in power and magic, which speak with sibylline utterance in our day and which can answer the hardest questions in the twinkling of an eye.”

Today, the “electronic robots” of most computational musicologists live right on their computers, and consist of specialized software programs such as Humdrum or Music21, which provide scholars with the tools they need to analyze sheet music they’ve translated into data. And these systems are generating new information within music history that would have been difficult to gain by ear alone.

David Huron, a professor at Ohio State University who created Humdrum in the 1980s, has used his program to study the rise of syncopated rhythms, which in American pop music are associated with African-American influence. With a colleague, he showed that between 1890 and 1940, the number of syncopated beats in the average measure nearly doubled, and that during the 1910s, songwriters experimented with different kinds of syncopation that had never been tried before. “The 1910s had always seemed like a sedate time in popular music, but in fact there was this huge blossoming of kinds of syncopation,” Huron said. “It was this exploratory period, when musicians were trying out this newfangled thing called syncopation. No scholar had ever identified it. But that’s what came out of the data.”

Using Music21, which was designed by Michael Cuthbert and his MIT colleague Christopher Ariza, Harvard physics doctoral student Douglas Mason analyzed Beatles songs, running more than 100 of them under the microscope and discovering that the majority of them were built around one—and only one—highly unexpected chord. “You expect C to appear in a song in the key of C, but you wouldn’t expect a chord that almost never appears in a song in C, like E flat, because it’s really out of key. But the Beatles did stuff like that all the time,” said Mason. “They’ll have a song in major but they’ll bring in a chord from the minor key, and that chord will act as an anchor for the whole song.”


Up to now, progress in computational musicology has been slow in part because of how time-consuming it has been to turn musical notation into computer data by hand. It can take scholars years of laborious research to create a complete database, even if they’re studying something relatively narrow, like the life’s work of a particular composer. To ask truly ambitious questions—to extract surprising insights not just about music itself, but about the culture that produced it—scholars need to be able to analyze thousands, even millions of songs at once. This is becoming possible thanks to advances in audio recognition software, which is starting to allow researchers to essentially feed a recording into a computer and have it take notes on what it hears.

One of the groups working on making computers better at listening is an academic organization called SALAMI, which helps digital musicologists and programmers test new analytical tools, and holds a contest every year to see who can come up with the most sensitive audio software. Another is Somerville-based Echo Nest, which formed in 2005 as an outgrowth of a project at the MIT Media Lab. It has since built a business out of helping companies like MTV and Clear Channel build music recommendation tools and online radio stations. At Echo Nest, computer engineers develop algorithms that can take any mp3 file and read the raw signal for so-called “psycho-acoustic attributes” that emerge from a song’s dominant melody, tempo, rhythm, and harmonic structure.

As excited as digital musicologists are to have this high-altitude approach within reach, they tend to feel that more traditional colleagues disapprove of their replacing careful, sensitive listening with statistics. “I think arts and humanities scholars, especially in the postmodern age, don’t like to talk in big sweeping generalities,” said Huron. “We like to emphasize the individual artist, and focus on what’s unique about a particular work of art. And we take a kind of pride in being the most sensitive, the most observant. I think for many scholars, numerical methods are antithetical to the spirit of the humanities.”

“We’re not really here to replace musicologists—I want to stress that, because our old school musicologists get upset by this,” said J. Stephen Downie, a professor at the University of Illinois who serves as a principal investigator on SALAMI. “But we can change the kinds of questions they can answer. No musicologist could ever listen to 20,000 hours of music. We can get a machine to do that, and find connections that they would miss.”

For those of us outside the academy, perhaps the best way to appreciate the potential of turning large quantities of music into data currently comes in the form of recommendation engines like Pandora, as well as quirky online apps that we can use to learn new things about our favorite songs. Peachnote, a project of Munich-based computer scientist Vladimir Viro, makes use of publicly available archives of sheet music to allow users to input a basic melodic phrase or chord progression and see how it has been used throughout musical history. (Put in the iconic notes that open Scott Joplin’s 1902 song “The Entertainer,” and Peachnote will tell you that the phrase experienced a gradual rise in popularity throughout the 1800s.) A Web app developed by Echo Nest’s Paul Lamere, “Looking for the Slow Build,” allows you to plot songs that gradually get louder and build toward a climax (“Stairway to Heaven”) against ones that stay steady (“California Gurls”). Another Lamere creation, “In Search of the Click Track,” reveals which of your favorite songs were recorded with the aid of a metronome or a drum machine.

But while such toys capture the magic many of us experienced when we first saw Shazam in action, the real payoff from crunching huge amounts of music data will go beyond the music itself—and instead will tell us something about ourselves. Consider the most notable feature of Spotify, the popular music streaming service, which informs you what everyone in your social network is listening to, in real time. What makes such information so thrilling is that we think of music as a mirror: To love a song is to identify with it; to love an artist is to declare that his or her view of the world resonates with our own. And just as we can learn a lot about our friends by scrolling through their mp3 libraries, so too could someone analyze, say, the last 30 years of recorded music and tell a new kind of story about our culture as a whole. And while it’s not clear we really want to know what the massive success of LMFAO says about us, there’s always the possibility the data will reveal that it was a symptom of powerful historical forces far beyond our control. Try proving that just by listening to “Party Rock Anthem” over and over.

Leon Neyfakh is the staff writer for Ideas. E-mail

© Copyright 2012 The New York Times Company