Artificial Intelligence and Bioacoustics


What do you get when you cross ecology, acoustics, and artificial intelligence?


A wholesale revolution in wildlife science

By Warren Cornwall

At a moment when artificial intelligence is at the forefront of cultural consciousness, a similar but much less noticed revolution is underway among a small group of scientists. As daily headlines document the uncanny ability of programs such as ChatGPT to solve complex puzzles and carry on conversations with humans, AI is also being used to decipher what other animals are doing based on the squawks, groans and clicks that fill our natural world.

Fifteen years ago, the field of bioacoustics was the domain of a small coterie of ecologists and biologists toting bulky equipment into the wild. Now, some of the same tools powering scary-smart chatbots are being paired with cheap, durable audio recorders to put our fingers on the pulse of the wild. Scientists who once relied on a smattering of handheld microphones are wiring entire forest systems, a bugging operation that would turn the CIA green with envy.

This is a transformation underpinned by numbers—enormous numbers. The Sierra Nevada mountains, California’s craggy granite spine, are now home to 2,000 tree-mounted audio recorders vacuuming up 1 million hours of recordings every year. Computers capable of billions of calculations a second are teasing insights into how wildfires reshape bird populations, how oil spills influence whale numbers, and how coral reefs recover from destructive fishing.

As Google and other tech heavy-hitters join the effort, there also is talk of using listening stations paired with computers to slow tide-powered turbines or ocean-going freighters when whales swim nearby. And some researchers are exploring whether AI could even open the door to communication between species.

“Four years ago we would not have been able to do what we are doing now,” says Holger Klinck, Director of Cornell’s Center for Conservation Bioacoustics.

Rockhopper unit

A rockhopper marine passive acoustic recording unit being assembled by Chris Tessaglia-Hymes. © K. Lisa Yang Center for Conservation Bioacoustics

Swift recorder

Swift Terrestrial Passive Acoustic Recording Unit. © K. Lisa Yang Center for Conservation Bioacoustics

• • •

Scientists who once relied on a smattering of handheld microphones are wiring entire forest systems, a bugging operation that would turn the CIA green with envy.

To get a handle on the unfolding of this breakneck revolution, consider the intertwined career paths of two young Cornell scientists, Connor Wood and Stefan Kahl.

Wood’s embrace of bioacoustics started as an act of desperation. In 2016, he was a first-year ecology PhD student at the University of Wisconsin. His advisor had won a grant from the state of California to track invasive barred owls in the northern Sierra Nevada mountains.

The 28-year-old had plenty of experience with the gritty side of wildlife research. As a field technician he had hunted junco bird nests in South Dakota’s Black Hills, monitored martens in northern California, and attached GPS trackers to coyotes in the Nevada desert. For a master’s degree he spent two summers on the Appalachian Trail in Maine lugging backpacks loaded with 50 pounds of metal rodent traps.

But the logistics of counting barred owls overwhelmed him. If he did it the traditional way, he and a team of technicians would spend summer nights driving rugged logging roads. They would stop periodically in the dark to trigger a response from any nearby barred owls by mimicking their haunting 9-note call, often characterized as someone moaning “Who cooks for you? Who cooks for you-all?” He would need to crisscross an area the size of Delaware in a single summer.

“I could not make the numbers work,” recalled Wood. “It just wasn’t going to happen.”

Wood’s advisor at the University of Wisconsin, Zach Peery, mentioned hearing of people experimenting with audio recorders to monitor northern spotted owls.  Could something like that take the place of a sleep-deprived human crew roaming the California mountains and hooting?

His PhD on the line, Wood found his way to Cornell’s bioacoustics center, a pioneer in using audio recorders to track birds. Despite reservations (Did owls hoot very often when people weren’t around?), Wood saw few alternatives. “I basically dove in.”

When he first started strapping audio recorders half the size of a shoebox to trees in 2017, computer programs could already help decipher thousands of hours of audio recordings. But it was a grueling process with frustrating limitations.

So instead of roaming the backwoods, Wood became a desk jockey, staring at a computer screen for hours at a time. The computer program he was using could scan through audio recordings hunting for barred owl calls. But Wood needed to give the computer a “picture” of what that call looked like, to ensure it didn’t ignore owls, or mistakenly count other birds.

It worked something like this: Imagine a person has never seen a bird before. You want them to go into the woods and count how many barred owls they see. You need to give them a description that’s specific enough that they won’t mistakenly think a barred owl is anything with wings. They might wind up counting butterflies, too. On the other hand, you don’t want them to be so specific that they think a barred owl needs to look exactly like a particular owl in a photo. They might ignore owls that are bigger or smaller or a different shade of brown.

To get a computer to do this with audio files, Wood pored over graphic representations of barred owl calls known as spectrograms— lines that scroll across the screen, wavering up and down depending on how high or low the pitch is. By looking at multiple samples, he created a generic visual profile of the “Who cooks for you?” call.

The system proved 95% accurate at knowing when an owl was in audio samples from the woods. Using the program, he saw that between 2017 and 2018, barred owl numbers had more than doubled as they migrated southward from the Pacific Northwest. The territorial birds showed a preference for old-growth forests, also home to rare California spotted owls.

The discovery set off alarm bells with wildlife managers, prompting an experiment to see if systematically shooting barred owls could help spotted owls. Between 2019 and 2020 they killed 76 owls. Wood’s audio recorders showed a steep decline in barred owls, and a return of spotted owls to woods previously colonized by their bigger relatives.

But the data crunching was exhausting. And that was just for two species. Once again, Wood was running up against the limitations of his tools. “This was not a scalable solution. We could not get any more species out of this. I’m not even sure we could scale this up to get more of the Sierra Nevada,” he said.

Kait Frasier, Bioacoustics

Connor Wood (left) and Stephan Kahl (right).
K. Lisa Yang Center for Conservation Bioacoustics

While Wood was attaching recorders to trees, Stefan Kahl was hunched over his own computer in Germany.

In 2016, the Cornell University Ornithology Lab and Google’s research arm hosted an annual contest to solve a puzzle related to audio recordings of birds. That year, a team of computer scientists from ETH Zurich, Switzerland’s premier technology university, crushed the other participants by using a kind of computer program called a deep neural network. They were the first in the competition to ever use one. The next year, every entrant used a neural network.

Kahl was one of the 2017 contestants. At Chemnitz University of Technology, he had worked for years on computer vision, but at the time he couldn’t tell a single bird by its call. Nevertheless, the computing challenge appealed to him. “I was also interested in having a social impact with the stuff I was doing,” he said.

While the internal workings of the neural networks that Kahl used can be complex and abstract, many resemble a more sophisticated version of how Wood got a computer program to recognize barred owls. To teach a common type of neural network to identify photos with cats, for example, it’s fed reams of photos, some with cats and others with dogs, toasters, cars, people, whatever. It’s already known which photos contain cats. The program runs a series of calculations on the raw data until it spits out a final answer—whether a certain picture is probably a cat. Then it checks the results against the correct answers. The program goes through the process again and again and again, each time tweaking the calculations to make the predictions more accurate. It’s not uncommon for a neural network to take millions of samples of training data and zip through it hundreds of times in search of greater accuracy.

Like the other teams, Kahl’s group used a convolutional neural network or CNN— a kind of program particularly adept at working with visual images. Google’s photo search engine, facial recognition software, and self-driving cars all rely on this kind of network. In the case of sound, they work with spectrograms.

Kahl’s program placed second that year. It could correctly identify approximately 60% of 1,500 different bird species from more than 36,000 recordings. More importantly, his tool helped give birth to a plan to monitor the sounds of entire ecosystems on a scale never done before.

Both relative newcomers to bioacoustics, Kahl and Wood converged at a pivotal moment. Kahl was looking for a way to take his program beyond a competition. Wood was chafing at the limitations of existing computer programs. In 2020, just as the COVID-19 pandemic swept the globe, Klinck brought the two together as postdoctoral researchers at Cornell. “Holger is like the puppet master,” quipped Wood.

The two started work on a project expanding Wood’s California research to a Sierra-wide network of microphones. Kahl’s computer program, dubbed BirdNet, enabled them to track the sounds of more than a hundred bird species in California, offering the possibility of using the panoply of birds to detect changes in the ecosystem.

Blue Whale
Kait Frasier, Bioacoustics

Dr. Kait Frasier , Co-Chief Scientist and Bruce Thayre, Marine Technician.
Scripps Whale Acoustics Laboratory.

Birds are an ideal subject for artificial intelligence tuned to sound. They are abundant and noisy, and birders often know a bird by its call before they see it. Ornithologists also have troves of audio recordings, which computer scientists can use to train neural networks.

But whale biologists are close on birder’s acoustic heels. Around the time neural networks were emerging on the bird scene in the mid 2010s, an oceanographer named Kait Frasier was discovering their potential to make sense of whales’ echolocation clicks.

Frasier was working at University of California, San Diego’s Scripps Institute of Oceanography when the Deepwater Horizon oil spill hit the Gulf of Mexico. While oil was still billowing from the broken wellhead, Scripps scientists dropped powerful underwater recorders into nearby areas in a mad-dash to understand how the oil—134 million gallons in all—was affecting the 20 species of toothed whales that swim in the gulf.

But researchers soon confronted a problem. Many of the whales’ echolocation clicks sounded so similar it was hard to tell one species from another. So Frasier, whose parents are both software engineers, turned to a computer. It was similar to Kahl’s approach with a crucial difference – unlike the bird program, Frasier didn’t have reams of whale calls tied to a particular species with which to train the computer. Instead, she relied on the computer to pluck its own patterns from the cacophony of more than 50 million underwater clicks. Human experts could then evaluate the sounds patterns to see if they could be tied to a particular whale species. The computer can’t do it all. “At the end of the day you have to look at the data and figure out whether you can believe it,” said Frasier.

Today, her computer program can consistently pick out the calls of seven of the whale species. She’s compiling the data into a look at how populations around the spill changed over the first ten years.

In the long run, Frasier imagines the potential for something far more ambitious—an acoustic system circling the world’s oceans. Think of an aquatic, super-sized version of Wood’s Sierra Nevada project.

Until recently, another player in the field was Matt Harvey, a software engineer at Google’s AI branch who came to whale acoustics from a very different angle. Prior to leading a small team at Google working on bioacoustics, Harvey had been working at YouTube. He was honing programs that could screen advertisements on the network to make sure they were paired with well-suited videos. In 2018 he saw a posting for another job at Google. Scientists at the U.S. National Oceanic and Atmospheric Administration (NOAA) needed help scanning underwater audio for humpback whale sounds. “Whales,” said Harvey, “are more fun than ads.”

The recordings came from 13 microphones tethered to the seafloor in the Pacific Ocean, including several around Hawaii. Previous attempts to use computers had struggled, because unlike many other whales, their calls vary enormously. Using a neural network, Harvey and the NOAA scientists built a system that could successfully detect between 80% and 95% of the humpback calls.

For another project, Harvey worked with Canadian fisheries managers to detect the calls of endangered orcas near Vancouver Island, triggering an alert to fisheries officials, who can then caution nearby ships to slow. The system got an emergency test in August 2022 when a fishing boat laden with more than 2,500 of gallons of oil and diesel sank in nearby U.S. waters. Canadian officials used AI-powered microphones to track the whereabouts of orcas, in case they needed to steer the whales away from the oil spill. He, however, is no longer working on the project, having moved on to other work at Google last year.

There is even the tantalizing prospect of whale and computer scientists teaming up to see if AI and neural networks might enable people to understand what sperm whales are saying, and perhaps “speak” to them. The Project CETI team includes several winners of the MacArthur Genius Award and researchers from Harvard, Massachusetts Institute of Technology, Oxford University, and the University of California, Berkeley, among others. They are building audio and video-recorders that can amass enough sperm whale calls to train computers. “We’re trying to train a computer to be a baby sperm whale and use communication systems like its family,” said David Gruber, a renowned marine biologist at the City University of New York’s Baruch College and a leader of the project.

• • •

 If we entrust computers to do the listening for us, what happens to the wonder inspired by a humpback whale melody, the joy of birdsong at sun rise?

The results of AI-powered bioacoustics can seem like sorcery, even to the computer scientists building them.

But that magic also comes with a dose of reality. Connor Wood’s experience is reminiscent of many encounters with computers: supposed labor-saving devices that create their own avalanche of work. Twice his computer program crashed at a key moment, and another time a power outage interrupted his research.

He also doesn’t entirely trust BirdNet. So he has a sampling of the recordings double-checked by a professional birding guide with an encyclopedic knowledge of bird calls. “I like having a human in the loop,” said Wood.

But the guiding business meant the birder was gone for weeks at a time. The call checking stagnated. Such are the downsides of having a human in the loop.

And then there is the more subtle question of what we miss if we entrust computers to do the listening for us. What happens to the wonder inspired by a humpback whale melody, the joy of birdsong at sunrise, the awe at the wall of insect sound that fills a jungle?

For Wood, the ascendance of AI means he spends most of his time at a computer managing data. That’s a bargain he’s willing to make, both because of the power of the new tools and because it gives him more time with his young family. Still, in the spring of 2022, he and four others skied into the high Sierras to install recorders to listen for the mating croaks of endangered Yosemite toads. The U.S. Forest Service hopes the same tools Wood uses with birds can be turned to amphibians.

On a return trip, he saw a small pond brimming with tadpoles, and watched a raven stalking nearby. He would never have thought the birds might be feeding on tadpoles unless he had seen it. “I think that it’s important to maintain a connection to the place,” said Wood, as he stood in the backyard of his home in Ithaca, New York. “It’s important in a not very tangible way.”

For Kahl, technology has been a doorway to the natural world. When he was a child, his grandfather, a nature enthusiast, would sit in the kitchen listening to recordings of bird songs. Yet, Kahl could never conjure the name of a bird when he and his grandfather heard one call outside.

Today, thanks to his work at Cornell, Kahl is following in his grandfather’s footsteps, smartphone in hand. When he hears a bird while out on a walk, he uses his phone to record the song with a BirdNET app. Within seconds, a neural network has scanned its catalog of more than 3,000 species to find a match. Kahl is just one of more than 2 million people using the app. “I was just not paying attention to my surroundings. I would hear birds but I just didn’t care,” Kahl recalled of his childhood. “Now I do.”



Warren Cornwall is an environmental, science and outdoor recreation journalist whose stories have appeared in The New York Times Magazine, The New York Times, Science, Slate, The Boston Globe’s Globe Magazine, Outside Magazine online, National Geographic News, and The Seattle Times.
Top illustration by Sjoerd van Leeuwen
What to Read Next


Anthropocene Magazine Logo

Get the latest sustainability science delivered to your inbox every week


You have successfully signed up

Share This Article