RJ Skerry-Ryan's projects tagged sound.

Tacotron: End-to-end Speech Synthesis
2016 - present tags: google machine-learning sound

The most exciting work I’ve been involved with on the Sound Understanding team has been the development of Tacotron, an end-to-end speech synthesis system that produces speech “end-to-end” from characters to waveform. The initial system took verbalized characters as input and produced a log-magnitude mel spectrogram, which we then synthesize to waveform via standard signal processing methods (Griffin-Lim and an inverse Short-time Fourier Transform). In Tacotron 2, we replaced this hand-designed synthesis method with a neural vocoder, initially based on WaveNet.

One line of research on my team is direct-to-waveform acoustic models, skipping the intermediate spectrogram representation. In the Wave-Tacotron paper, we published a model that does just that.

Check out our publications page for the full list of research that my team has co-authored with the Google Brain, Deepmind, and Google Speech teams.
Google Sound Understanding
2015 - present tags: google sound machine-learning research

In 2015, I joined the Sound Understanding team within Google Perception. We focus on building systems that can both analyze and synthesize sound. Being able to work on my hobby (sound and digital signal processing) as my full time job has been a dream come true. We operate as a hybrid research team, which means we both publish our work and deploy it to improve Alphabet’s products and services.

I’ve had the opportunity to work on some neat tasks and projects during my time on the team, but speech synthesis has been what I’ve spent the most time working on.
Mixxx DJ Software
2008 - present tags: sound open-source music
Mixxx is Free and open source DJ software for Windows, Mac and Linux that gives you everything you need to perform live mixes.

I got involved with this lovely open source project through Google Summer of Code in 2008. Since then I’ve contributed thousands of hours and quite a bit of code.

In 2011, I took over as the project lead developer. Together with a handful of core developers and hundreds of contributors we work together to produce the best free DJ software available. You can check out our code on GitHub.

As the lead developer I am involved with nearly every major change to Mixxx. Some of the major projects I’ve been heavily involved with over the years include:
- OpenGL waveforms (my GSoC project)
- SQLite-based music library
- arbitrary numbers of decks (previously Mixxx was hard-coded to 2)
- looping
- hotcues
- waveform scratching
- sample decks
- worker-thread based architecture for decoding audio
- key detection and pitch shifting
- dynamic / resizable UI (rewrite of the original skin system)
- modular effects system (combined plugin-based and native effects)
- non-constant beatgrids (allows mixing tracks that change tempo)
- master sync (persistent syncing of decks)
- concurrent library scanner
- internationalization / translation support
- tools for making evidence-based changes (performance metrics)
- unit testing
When I joined the project you couldn’t run Mixxx for more than a few minutes without encountering crashes. That’s what happens when you have thousands of lines of C++ written by 3 distinct teams of people over the course of a decade with close to no documentation!

My biggest contribution to Mixxx has been restructuring the codebase to prevent common problems that lead to segfaults (i.e. reduction of mutable shared state across threads, separation of realtime callback code from the rest of Mixxx, modularity and good conservative code practices). Stability is a feature!

I enjoy working on Mixxx because it combines my love of electronic music, software engineering and product design.
La Fontaine du Campus Est
2007 - 2008 tags: sound music

During IAP 2007, Rob Gens, Zack Anderson, and I worked building our very own Bellagio Fountain at MIT as part of the East Campus annual Bad Ideas competition. We weren’t able to finish it in time, and the project sat around for a year until the 2008 Bad Ideas competition, where we finally finished it up.

It was a lot of fun and hard work to build, and we ended up getting help from other putzen. Check out the videos and get the source code here: La Fontaine.

Projects tagged with sound: