The most exciting work I’ve been involved with on the Sound Understanding
team has been the development of
Tacotron, an end-to-end speech synthesis system that produces
speech “end-to-end” from characters to waveform. The initial system took
verbalized characters as input and produced a log-magnitude mel spectrogram,
which we then synthesize to waveform via standard signal processing methods
(Griffin-Lim and an inverse Short-time Fourier
Transform). In Tacotron 2, we replaced this hand-designed
synthesis method with a neural vocoder, initially based on WaveNet.
One line of research on my team is direct-to-waveform acoustic models, skipping
the intermediate spectrogram representation. In the
Wave-Tacotron paper, we published a model that does just that.
Check out our publications page for the full list of
research that my team has co-authored with the Google Brain, Deepmind, and
Google Speech teams.
In 2015, I joined the Sound Understanding team within Google
Perception. We focus on building systems that can both
analyze and synthesize sound. Being able to work on my hobby (sound and digital
signal processing) as my full time job has been a dream come true. We operate as
a hybrid research team, which means we both publish our work
and deploy it to improve Alphabet’s products and services.
I’ve had the opportunity to work on some neat tasks and projects during my time
on the team, but speech
synthesis has been what I’ve
spent the most time working on.
In 2011, I took over as the project lead developer. Together with a handful of
core developers and hundreds of contributors we work together to produce the
best free DJ software available. You can check out our code on
As the lead developer I am involved with nearly every major change to
Mixxx. Some of the major projects I’ve been heavily involved with over the years include:
OpenGL waveforms (my GSoC project)
SQLite-based music library
arbitrary numbers of decks (previously Mixxx was hard-coded to 2)
worker-thread based architecture for decoding audio
key detection and pitch shifting
dynamic / resizable UI (rewrite of the original skin system)
modular effects system (combined plugin-based and native effects)
non-constant beatgrids (allows mixing tracks that change tempo)
master sync (persistent syncing of decks)
concurrent library scanner
internationalization / translation support
tools for making evidence-based changes (performance metrics)
When I joined the project you couldn’t run Mixxx for more than a few minutes
without encountering crashes. That’s what happens when you have thousands of
lines of C++ written by 3 distinct teams of people over the course of a decade
with close to no documentation!
My biggest contribution to Mixxx has been restructuring the codebase to prevent
common problems that lead to segfaults (i.e. reduction of mutable shared state
across threads, separation of realtime callback code from the rest of Mixxx,
modularity and good conservative code practices). Stability is a feature!
I enjoy working on Mixxx because it combines my love of electronic music,
software engineering and product design.
During IAP 2007, Rob Gens, Zack Anderson, and I worked building
our very own Bellagio Fountain at MIT as part of the East Campus annual Bad
Ideas competition. We weren’t able to finish it in time, and the project sat
around for a year until the 2008 Bad Ideas competition, where we finally
finished it up.
It was a lot of fun and hard work to build, and we ended up getting help from
other putzen. Check out the videos and get the source code here: