Last year, I wrote a blog post reflecting on the year 2020. Re-reading what I had written then was surprisingly insightful, particularly because I could see how life had changed in some ways and remained unchanged in others. I decided to continue the tradition this year in the hopes of presenting my year-later self with the same joy and delight of reading a memoir of similar kind.
2021 was, in some ways, very similar to 2020. Despite the development and proliferation of vaccines, COVID-19 raged on, morphing into a new variant every few months. Masks and social distancing are now deeply embedded into our daily lives. Although booster shots and pill-type medications might change the dynamics of the pandemic, I personally think COVID is here to stay, at least for the foreseeable future.
After being discharged from the army in March of 2021, I spent roughly 6 months working as an intern at Neosapience, a Korean startup specializing in voice-over services and metaverse characters. This was also when I left ReRent, a hospitality startup that I was fortunate enough to have worked for since the summer of 2020. ReRent immensely helped me learn and grow as a software developer, versed in
git and GitHub, general web development, and Django, which has since become my favorite Python backend framework. It is also where I met valuable teammates, some of whom I met in person at Yale.
The transition from ReRent to Neosapience was a lot more than just a change of jobs. At Neosapience, I worked on machine learning research–an art of its own entirely different from backend web development. Specifically, I was tasked with the job of developing a singing voice synthesis model that, given lyrics and melodies, could “sing.” I still remember the frustration I felt when I was first trying to reproduce a reference paper I was provided as a baseline. There were parts of the paper that were ambiguous. The fact that it was a GAN-based model certainly did not help. I reached out to the authors in the hopes of gaining clarity, but received no response. Although I extrapolated parts of the model and trained it for a few days, the model only produced barely audible mumbles that could not be farther from the act of singing. I learned that ML was hard.
Thankfully, I was fortunate enough to have had more experienced co-workers as mentors who provided valuable pieces of advice. One of them suggested that I design a model of my own instead of blindly trying to reproduce the paper. As a demo of sorts, he showed me that a simple CNN model could sing better than the GAN I was trying to reproduce, with just a few minutes of training. Inspired by his progress, I began designing my own modules to experiment with a host of different architectures: CNNs, RNNs, transformers, and combinations thereof. I also explored various famous CNN architectures, such as InceptionNet and ResNeXT in search of inspiration and ideas.
Unexpectedly, the biggest success came from a very experimental model that was a direct adaptation of MLP-Mixer, an architecture composed entirely of multi-layer perceptrons, or
nn.Linear layers in PyTorch. This was a paper I presented during one of our weekly paper-reading meetings. Although the quality of results produced by the final model still contained audible artifacts, nonetheless we saw novelty in the fact that it was the first voice synthesis model exclusively composed of linear layers. This project culminated in my first ever publication MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis in IEEE Machine Learning for Signal Processing workshop, now available on IEEE Xplore. By the end of my internship, I felt a lot more comfortable with various ML concepts and their implementations. This was also when I was involved with Hugging Face’s Flax/JAX community week event where my teammates and I developed KoCLIP, as well as BigScience, a huge project by Hugging Face to reproduce a GPT-3-sized language model.
I came back to Yale with the explicit intent of majoring in Computer Science and Mathematics. While this was not a trivial decision, it was very clear and obvious to me that this was the academic path I wanted to pursue. I took CPSC 223, which is Yale’s signature data structures course taught in… barebones C.
free are probably the functions I used the most this year, perhaps with the exception of
printfs I used for lazy debugging. On top of CS classes, I also continued my involvement with ML in a few ways. For one thing, I co-authored my second paper, EdiTTS: Score-based Editing for Controllable Text-to-Speech, with a co-worker at Neosapience. This was the first project in which I used Amazon Mechanical Turk for MOS measurements. I’m still waiting on the final decision from a conference to which I submitted this paper, but I’m happy about how it came out regardless.
More importantly, I was extremely fortunate to be given the opportunity to work as a software engineering intern at Hugging Face. This was an unbelievable achievement for me that I knew I did not deserve. As a self-taught newcomer and student to the field of ML, I only dreamed about working at Hugging Face when I was first learning about transformers. I still have not produced much output at HF largely due to the fact that my internship was part-time and very low time commitment-wise, but I’m still excited for the month of January, which is when I will be dedicating myself full time to Hugging Face and BigScience. I would also like to express gratitude to the engineer at Hugging Face who referred me to this position, and whom I now consider a mentor, Stas Bekman.
This semester was perhaps the hardest one yet at Yale. All the classes I took either required a lot of effort or time commitment. Admittedly to fulfill my distribution requirement, I went out my ways and took HIST 271: European Intellectual History since Nietzsche, where I learned a ton about philosophy, from the Enlightenment all the way up to post-Modernism. I also enrolled in ASTR 110: Planets and Stars, which I frankly took for an easy science credit, only to realize that weekly problem sets took up more time than I had anticipated. MATH 241: Probability Theory was easy at first, but ramped up quite quickly at the end of the semester, to the point that I was floundering about during finals week. Nonetheless, I’m glad that the semester is over, and that I came out of it feeling more learned and knowledgable than I was five months ago.
2021 was surely a roller coaster ride. It was surely a fruitful one, but it is also a miracle how it turned out the way it did. With experience, memories, and gratefulness at heart, I cannot wait to see what 2022 has in store.