Emergent Rhythm — Real-time AI Generative Live Set (AI DJ Project #3)


Emergent Rhythm — Real-time AI Generative Live Set (digest)

“Emergent Rhythm” is an audio-visual DJ performance using real-time AI audio generation models. Artist/DJ Tokui manipulates multiple models on stage to spontaneously generate rhythms and melodies. He then combines and mixes the generated audio loops to create musical developments.

This work is the third installment of his AI DJ Project, a series of attempts to explore the future of AI-based music production and live DJ performance. The project began as a back-to-back DJ session with the artist and an AI DJ system, then evolved into a live performance using various AI symbolic (MIDI) music generation models on stage. In Emergent Rhythm, the artist tries to employ AI audio synthesis models in real-time and faces unprecedented challenges. Everything heard during this performance is purely AI-generated sound (no synthesizers or drum machines in the general sense).

We adapted GAN (Generative Adversarial Networks) models, a default AI-image synthesis model, for audio synthesis. StyleGAN models trained on spectrograms of various sound materials generate spectrograms, and vocoder GAN models convert them into audio files. The process is faster than real-time. As the title Emergent Rhythm suggests, we focus on the musical and visual “rhythms” and recurring patterns that emerge in the interaction between multiple AI models and the artist. The accompanying visuals feature not only the periodicity over time from past to future but also the common patterns across multiple scales ranging from the extreme large-scale of the universe to the extreme small-scale of cell and atomic structures. More than one million images were generated using the Stable Diffusion model for the visuals.

Aligning with the visual theme, we extracted audio loops from various natural and man-made environmental sounds as well as music and used them as training data for audio generation. In addition, the artist employs real-time timbre transfer effects using Neutone, an AI audio plug-in he helped develop and release for other artists. The AI plugin converts incoming audio into various singing voices, such as Japanese Buddhist chants. This highlights the diversity and commonality within the human cultural heritage.

From a DJ session, in which existing songs are selected and mixed, to a live performance that generates songs spontaneously and develops them in response to the audience’s reactions: In this performance, “Emergent Rhythm,” the human DJ is expected to become an AJ, or “AI Jockey,” rather than a “Disk Jockey”, taming and riding the AI-generated audio stream in real-time. With the unique morphing sounds created by AI and the new degrees of freedom that AI music generation allows, the AI Jockey will offer audiences a unique, even otherworldly sonic experience.

Technical Detail

Poster for NeurIPS 2023