Philly Holmes

Communing with the Black Box: Using publicly accessible AI audio tools to generate rich sonic outcomes in the composition process.

Sound Design - MSc

Reid School of Music

P.Holmes-5@sms.ed.ac.uk

Follow:

PH_MScSD_Dissertation_Final_Comp.pdf (822.14 KB)

AI generated image of 3 computers in an oil painting style engulfed in flames and surrounded by worshippers

Bio

Philly Holmes is a multidisciplinary sonic and visual artist who works across many creative practices. As a composer they have written numerous full-length scores for theatre productions across Ireland. They also bring a forward-thinking graphic style to their instrumental and vocal scores, pushing the boundaries between play and performance and combining new and familiar elements to produce scores that trust their performers to do their best work when given space to improvise.

They also have extensive experience as a music producer and engineer, bringing technical excellence to all aspects of their practice - from the club dancefloor to the ambient listening room.

They hold a Degree in Music Technology with a minor in Composition from Trinity College Dublin, graduating with a first and are just finished this MSc in Sound Design at ECA.

Their work, ultimately, explores queerness, the sublime and ideas of spirituality with technology. They also engage closely with the vanguard of music technology, working with new techniques as a means to expand a creative practice, not concerned with novelty or 'newness' for its own sake, they find ways to integrate these techniques into sustainable long-term creative methodologies, engaging with tools over extended periods of time to create compelling sonic outcomes.

Awards

This research was made possible by the Andrew Grant Scholarship and Bequest

Project description

In the past decade, deep learning/machine learning/artificial intelligence techniques have emerged in the audio industry, producing a broad spectrum of sonic outcomes, and unlocking potential for complex abstract sound processing techniques. These tools span from complex audio source separation to straightforward MIDI generation techniques. In actuality, modern deep learning ‘Artificial Intelligence’ (AI) tools are just a new technological entry into a lineage of algorithmic techniques for musical composition; algorithms and formal rulesets have been a crucial part of the compositional process for centuries.

The development of modern computing technology takes algorithmic composition to the extreme and abstract. The application of complex, highly abstracted processes to musical data results in unique, often unpredictable, results that exist outside of the scope of straightforward human capability or imagination – the neural network black box.

This project takes a practice-led approach to these tools and engages with deep learning as an extension of creative workflows. seeking to explore the nature of existing AI audio tools, with a focus on plug-and-play, easily available audio processing techniques. Through a set

of experiments and case studies, various tools are assessed for their usefulness in creative workflows. These are presented as audio pieces, performance recordings, compositions and work created in Unity. Key projects in the field are also highlighted and presented, showcasing important projects at the vanguard of AI audio.

Below are key excerpts and outcomes from the project. The full text delves deeper into the tools, ethics and ramifications of AI audio tools and the future of creative sonic outcomes.

First Voices

First Voices is piece for tape and two performers composed at short notice using AI tools for a PlusMinus Ensemble performance in early July 2022. It was the first finished piece using machine-learning tools for this project.

The piece explores themes of language learning and new life as an AI learns to find its voice for the first time, comparing the process of ‘training’ an AI with a dataset to learning to speak and articulate thought for the first time. The tape component was created by processing clips of the earliest known recordings from the 1860s with Holly Plus. Holly Plus produces an uncanny- sounding vocal tone that lends a compelling tonality to the degraded, noisy early recordings, creating musicality out of relatively atonal samples. This tool sounds computerised; it’s a warped and uncanny soundalike of Herndon’s singing voice. It’s both recognisable and extremely alien. It’s an incredibly unique sonic quality, and one that demonstrates the compelling quirks and idiosyncrasies of even the most cutting-edge digital tools. Snippets of the processed track were looped, layered and pitched using various digital techniques and arranged into two sections to support the narrative of the vocal component. Finally, the piece was recorded to tape and slowed down slightly to restore an analogue quality to the digital sounds of Holly Plus.

The performed vocal component of the piece features two vocalists performing three movements. For the first movement, each performer reads lists of randomised phonemes. They recite the list, calling and responding and sometimes speaking over each other. The second movement has the performers reciting lists of words in increasing length. These lists were populated from random combinations of common words, but as the words increase in length, meaning emerges from the randomness. This emergent quality evokes the idea of the AI characters learning to speak and form meaning. In the final movement, the AI characters discuss themes of autonomy, identity and sentience, a sudden but evocative contrast from the nascence of the previous movements.

The tape track strives to make a connection between those earliest recordings and the sounds of new audio processing AIs that have a similar low-fidelity quality. The uncanny similarity of sounds produced in these two eras serves as a metaphor for the youth of AI tools, while the final movement draws inspiration loosely from recent headlines discussing an allegedly sentient Google chatbot.The machine learning-generated audio we’re hearing today is just as young as those earliest recordings.

The score is presented as text-only, white text on a black background. The preface discusses digital nativity and the ways in which we perceive information differently depending

on the medium: paper or screen. Presenting it as a ‘graphic’ score with a high degree of improvisation helped to encourage players to perform and embody the voice of the still-growing AI found in the piece. This was designed to reflect the fact that a machine-learning algorithm is only as good as its dataset – the better the performance, the better the AI.

This piece was also presented as a work in Unity, taking a performance about the differences between the digital and physical to the digital world to explore the same questions in a different context. In this digital presentation, two AI entities are represented by floating, glowing polygons. These shapes are sound-responsive. The piece takes place in a continuous set of identical rooms in a three-by- three grid, allowing users to navigate the repeated, disorienting space.

An Algorithmic Intervention Into Field Recordings

An Algorithmic Intervention into Field Recordings is a set of works that seeks to transform field recordings using a variety of AI tools. The concept emerged while improvising with Google Magenta DDSP and trying to push the tool to its limits. A series of field recordings was made, one in Haymarket, one by the Leith River in Deans Village and one at a rest stop on the French A7 motorway. The field recordings were first cleaned up in Ableton Live 11, deconstructed using Spleeter and then morphed using layers of DDSP plug-in processing. Deep listening reveals a call-and-response quality to these pieces, where the DDSP plugins latch onto a particularly tonal element in a recording and transform it into a tonal gesture. It starts to feel like an AI algorithm listening and responding to these recordings, highlighting certain elements and downplaying others based on the capabilities of the technology. The Spleeter separations similarly reveal digital artefacts in the recordings and accentuate recording imperfections. Both machine-learning processes unearth abstract sound qualities that manual processing could not. The results are chaotic and imperfect, but the digital transformation of serene soundscapes into digital sonic clusters is compelling. The algorithm is intervening in these recordings of nature and producing something new and previously unheard.

Is Everything Gonna Be Ok? Lecture/Performance/Provocation

‘Is Everything Gonna Be Ok?’ was a work-in-progress performance of Nani Porenta’s research into game-engine interactivity in live musical performance. A collection of sound from this project was presented as the supporting act for this showcase. This performance took the form of a performance and lecture, with narration discussing the nature of AI creative tools, some of which was processed via Holly Plus. The lecture component explored issues of AI and labour and the ethical use of creative tools, and it briefly described the processes involved in creating some of the work discussed throughout chapter 2. It’s important to note that the narration was casual in tone and makes some relatively sweeping generalisations about the nature of AI creative tools. While not necessarily academically rigorous, the narration takes a broad view on the nature of AI tools and concludes with a call to action for creative practitioners to experiment with machine-learning tools in their creative practice as a new frontier of creative experimentation.

This piece contains excerpts of other pieces developed for the project and as such is a summation of the creative work for this project.

Philly Holmes

Communing with the Black Box: Using publicly accessible AI audio tools to generate rich sonic outcomes in the composition process.

Sound Design - MSc

Reid School of Music

P.Holmes-5@sms.ed.ac.uk

Follow:

PH_MScSD_Dissertation_Final_Comp.pdf (822.14 KB)

Sound Design - MSc

Student list