The system can transition in real time between songs by layering in music, sound effects, voice-over snippets and ads, delivering the style of smooth, seamless playback that has long been the human DJ’s trade.
January 31, 2020, by Drew Harwell - The Washington Post
In 2018, some stations on the online iHeartRadio service began testing a music-mixing AI system built by the startup Super Hi-Fi, which says it can “understand music nuances with the same depth as a human DJ.” The system can transition in real time between songs by layering in music, sound effects, voice-over snippets, and ads, delivering the style of smooth, seamless playback that has long been the human DJ’s trade.
The Los Angeles-based Super Hi-Fi, whose clients also include the streaming fitness service Peloton, says its “computational music presentation” AI can help erase the seconds-long gaps between songs that can lead to “a loss of energy, lack of continuity, and disquieting sterility.” Super Hi-Fi patents filed last year reduce the art of mixing music to a diagram of algorithmic tasks: The “Magic Stitch” system, as Super Hi-Fi calls it, assesses songs’ “rhythmic profile, chordal, melodic content, harmonic and amplitude over time”; calculates ways to blend them “in concordance with the salient temporal moments”; then interjects other elements to create a “more engaging overall consumption experience.”
The system is trained to process different stylistic touches – “a ‘wild and crazy’ classical piece will differ greatly from a ‘wild and crazy’ punk rock anthem,” one patent states – as well as their energy level and mood, from “bouncing off the walls” to “dirge.” Sentiment is also catalogued, the company said, “so as not to put a super-positive announcer over a super-negative piece of content,” like a grim piece of breaking news.
In a demonstration for a Post reporter, Super Hi-Fi co-founder Zack Zalon showed the system transitioning from Nicki Minaj’s “Anaconda” to Kanye West’s “Stronger”: One song wove cleanly into the other through an automated mix of booming sound effects, background music, interview sound bites, and station-branding shout-outs (“Super Hi-Fi: Recommended by God”). The smooth transition might have taken a DJ a few minutes to prepare; the computer completed it in a matter of seconds. (“3,526 Calculations Performed,” the system declared afterward.)
Much of the initial training for these delicate transitions comes from humans, who prerecord voice-overs, select songs, edit audio clips, and classify music by genre, style, and mood. Zalon said the machine-learning system has been further refined by iHeartMedia’s human DJs, who have helped identify clumsy transitions and room for future improvements.
“To have radio DJs across the country that really care about song transitions and are listening to find everything wrong, that was awesome,” Zalon said. “It gave us hundreds of the world’s best ears. … They almost unwittingly became kind of like our QA [quality assurance] team.” (It was unclear whether any of those DJs were among the recent layoffs.)
The system won’t trigger massive job cuts and could lead to new opportunities, Zalon argues, because humans will still need to create and ready the audio snippets from which the AI can select. But he expects that, in a few years, computer-generated voices could automatically read off the news, tee up interviews, and introduce songs, potentially supplanting humans even more. The software performed 315 million musical transitions for listeners in January alone.
The system now is used only for iHeartMedia’s digital stations, but some industry observers expect in the future that software like it could reshape local over-the-air broadcasts, too. The company’s chief product officer, Chris Williams, said last year in an interview with the industry news site RadioWorld that “virtual DJs” that could seamlessly interweave chatter, music, and ads were “absolutely” coming, and “something we are always thinking about.”