LoRA
I recently completed another summer internship at Meta (formerly Facebook). I was surprised to learn that one of the intern friends I met was an avid read...
I recently completed another summer internship at Meta (formerly Facebook). I was surprised to learn that one of the intern friends I met was an avid read...
Update: The code was modified with further optimizations. In particular, instead of checking the trie per every DFS call, we update the trie pointer along...
Note: This blog post was completed as part of Yale’s CPSC 482: Current Topics in Applied Machine Learning.
Last year, I wrote a blog post reflecting on the year 2020. Re-reading what I had written then was surprisingly insightful, particularly because I could see ...
Recently, I’ve heard a lot about score-based networks. In this post, I will attempt to provide a high-level overview of what scores are and how the concept o...
In this post, we will take a look at Flow models, which I’ve been obsessed with while reading papers like Glow-TTS and VITS. This post is heavily based on th...
In this short post, we will take a look at variational lower bound, also referred to as the evidence lower bound or ELBO for short. While I have referenced E...
It has been a while since I last posted on this blog. Admittedly, a lot has happened in my life: I have been discharged from the Republic of Korea Army, rece...
In this post, we will take a look at Nyström approximation, a technique that I came across in Nyströmformer: A Nyström-based Algorithm for Approximating Self...
In this post, we will take a look at relative positional encoding, as introduced in Shaw et al (2018) and refined by Huang et al (2018). This is a topic I me...
These days, I’ve found myself absorbed in the world of memory-efficient transformer architectures. Transformer models require $O(n^2)$ runtime and memory due...
A few days ago, I came across a simple yet nonetheless interesting paper, titled “NumerSense: Probing Numerical Commonsense Knowledge of Pre-Trained Language...
These days, I’m exploring the field of natural language generation, using auto-regressive models such as GPT-2. HuggingFace transformers offers a host of pre...
In the previous post, we took a look at how to extract keywords from a block of text using transformer models like BERT. In that blog post, you might recall ...
I’ve been interested in blog post auto-tagging and classification for some time. Recently, I was able to fine-tune RoBERTa to develop a decent multi-label, m...
When GPT-3 was released, people were amazed by its ability to generate coherent, natural-sounding text. In fact, it wasn’t just text; it could generate JavaS...
In today’s post, we will take a break from deep learning and turn our attention to the topic of rejection sampling. We’ve discussed the topic of sampling som...
Today, we are finally going to take a look at transformers, the mother of most, if not all current state-of-the-art NLP models. Back in the day, RNNs used to...
Attention took the NLP community by storm a few years ago when it was first announced. I’ve personally heard about attention many times, but never had the ch...
Today’s article was inspired by a question that came up on a Korean mathematics Facebook group I’m part of. The gist of the question could probably be transl...
For the past couple of months or so, I’ve been spending time looking into transformers and BERT. Transformers are state of the art NLP models that are now re...
In today’s post, we will take a look at adversarial attacks. Adversarial attacks have become an active field of research in the deep learning community, for ...
2020 was unlike any other. The COVID pandemic fundamentally transformed our ways of life. Masks became a norm; classes were taught on Zoom; social distancing...
In the previous post, we took a look at how to implement a basic sequence-to-sequence model in PyTorch. Today, we will be implementing a small improvement to...
For a very long time, I’ve been fascinated by sequence-to-sequence models. Give the model a photo as input, it spits out a caption to go along with it; give ...
In today’s post, we will take a look at neural style transfer, or NMT for short. NMT is something that I first came across about a year ago when reading Fran...
In this blog post, we will be revisiting GANs, or general adversarial networks. This isn’t the first time we’ve seen GANs on this blog: we’ve implemented GAN...
While mindlessly browsing through Math Stack Exchange, I stumbled across an interesting classic:
In today’s post, we’ll take a look at the Inception model, otherwise known as GoogLeNet. I’ve actually written the code for this notebook in October 😱 but wa...
In today’s post, we will be taking a quick look at the VGG model and how to implement one using PyTorch. This is going to be a short post since the VGG archi...
In this post, we’ll take a look at RNNs, or recurrent neural networks, and attempt to implement parts of it in scratch through PyTorch. Yes, it’s not entirel...
These past few weeks, I’ve been powering through PyTorch notebooks and tutorials, mostly because I enjoyed the PyTorch API so much and found so many of it us...
This is a very quick post in which I familiarize myself with basic tensor operations in PyTorch while also documenting and clarifying details that initially ...
This post is based on this article on Medium, titled “Matplotlib + Seaborn + Pandas: An Ideal Amalgamation for Statistical Data Visualization.” This article ...
Recently, I joined the Language, Information, and Learning at Yale lab, led by Professor Dragomir Radev. Although I’m still in what I would consider to be th...
I’ve always been a fan of TensorFlow, specifically tf.keras, for its simplicity and ease of use in implementing algorithms and building models. Today, I deci...
Recently, I fortuitously came across an interesting blog post on the multi-armed bandit problem, or MAB for short. I say fortuitous because the contents of t...
For the past month and a half, I’ve been working as a backend developer for ReRent, a Yale SOM-based hospitality startup. Working alongside motivated, inspir...
We’ve discussed Gaussians a few times on this blog. In particular, recently we explored Gaussian process regression, which is personally a post I really enjo...
Maintaining momentum in writing and self-learning has admittedly been difficult these past few weeks since I’ve started my internship. Normally, I would writ...
In today’s post, we will finally start modeling the auto-tagger model that I wanted to build for more blog. As you may have noticed, every blog post is class...
Docker was one of these things that I always wanted to learn, but never got into. Part of the reason was that it seemed distant and even somewhat unnecessary...
In a previous post, we discussed how we can use tf-idf vectorization to encode documents into vectors. While probing more into this topic and geting a taste ...
A few days ago, a video popped up in my YouTube suggestions. We all know how disturbingly powerful the YouTube recommendation algorithm is: more than 90 perc...
Although I’ve been able to automate some portion of the blog workflow, there’s always been a challenging part that I wanted to further automate myself using ...
In this post, we will explore the Gaussian Process in the context of regression. This is a topic I meant to study for a long time, yet was never able to due ...
The traveling salesman problem (TSP) is a famous problem in computer science. The problem might be summarized as follows: imagine you are a salesperson who n...
In the last post, we revisited the Riemann Zeta function, which we had briefly introduced in another previous post on Euler’s take on the famous Basel proble...
The other day, I came across an interesting article by Chris Henson on the relationship between the Riemann Zeta function and prime numbers. After encounteri...
In this post, we will be taking a look at a very simple yet popular search algorithm, namely breadth-first search and depth-first search methods. To give you...
Recently, I ran into an interesting video on YouTube on numerical methods (at this pont, I can’t help but wonder if YouTube can read my mind, but now I digre...
In this post, we will explore Gibbs sampling, a Markov chain Monte Carlo algorithm used for sampling from probability distributions, somewhat similar to the ...
A reflection on my first open source contribution sprint
I’ve stumbled across the word “Apache Spark” on the internet so many times, yet I never had the chance to really get to know what it was. For one thing, it s...
In this post, we will revisit the topic of recurrent neural networks, or RNNs. Although we have used RNNs before in a previous post on character-based text p...
In today’s post, we will explore ways to build machine learning pipelines with Scikit-learn. A pipeline might sound like a big word, but it’s just a way of c...
In a previous post, we took a look at Fisher’s information matrix. Today, we will be taking a break from the R frenzy and continue our exploration of this to...
These past few days, I’ve been writing posts on R while reading Hadley Wickham’s R for Data Science. R is no Python, but I’m definitely starting to see what ...
In this post, we will continue our journey down the R road to take a deeper dive into data frames. R is great for data analysis and wranging when it comes to...
Recently, I was compelled by my own curiosity to study SQL, a language I have heard about quite a lot but never had a chance to study. At first, SQL sounded ...
A few days ago, I saw a friend who posted an Instagram story looking for partners to study R with. I jumped at the opportunity without hesitation—based on my...
So I’ve been spending some time this past week or so picking up a new language: C. C is considered by many to be one of the most basic and fundamental of all...
Before I begin, I must say that this video by Brian Storey at Olin College is the most intuitive explanation of the Leibniz rule I have seen so far. Granted,...
In this post, we will continue our journey with the R programming language. In the last post, we explored some basic plotting functions and how to use them t...
It’s been a while since we last took a look at the R programming language. While I don’t see R becoming my main programming language (I’ll always be a Python...
Fisher’s information is an interesting concept that connects many of the dots that we have explored so far: maximum likelihood estimation, gradient, Jacobian...
Expectation is a core concept in statistics, and it is no surprise that any student interested in probability and statistics may have seen some expression li...
It’s about time that we go back to the old themes again. When I first started this blog, I briefly dabbled in real analysis via Euler, with a particular focu...
Principal component analysis is one of those techniques that I’ve always heard about somewhere, but didn’t have a chance to really dive into. PCA would come ...
Taylor series is used in countless areas of mathematics and sciences. It is a handy little tool in the mathematicians arsenal that allows us to decompose any...
Generative Adversarial Networks refer to a family of generative models that seek to discover the underlying distribution behind a certain data generating pro...
These days, I’ve been spending some time trying to read published research papers on neural networks to gain a more solid understanding of the math behind de...
Programming is difficult but fun. Or maybe it’s the other way around. Either way, any developer would know that external libraries are something that makes p...
These past few days, I’ve been taking a hiatus from the spree of neural networks and machine learning to explore an entirely separate realm of technology: we...
Generative models are fascinating. It is no wonder that GANs, or General Adversarial Networks, are considered by many to be where future lies for deep learni...
In a previous post, we took a look at autoencoders, a type of neural network that receives some data as input, encodes them into a latent representation, and...
In today’s post, we will take yet another look at an interesting application of a neural network: autoencoders. There are many types of autoencoders, but the...
You might remember back in the old days when autocomplete was just terrible. The suggestions provided by autocomplete would be useless if not downright stupi...
Neural networks are powerful models that can be used to identify complex hidden patterns in data. There are many types of neural networks, two of which we ha...
Welcome back to another episode of “From Scratch” series on this blog, where we explore various machine learning algorithms by hand-coding them from scratch....
Recently, a friend recommended me a book, Deep Learning with Python by Francois Chollet. As an eager learner just starting to fiddle with the Keras API, I de...
Disclaimer: I was not sponsored by the developers of Typora to write this post, although that would have been great.
In a previous post, we briefly explored the notion of maximum a posteriori and how it relates to maximum likelihood estimation. Specifically, we derived a ge...
Normal, binomial, exponential, gamma, beta, poisson… These are just some of the many probability distributions that show up on just about any statistics text...
In today’s post, we will take a look at Bayesian linear regression. Both Bayes and linear regression should be familiar names, as we have dealt with these tw...
Welcome to part three of the “from scratch” series where we implement machine learning models from the ground up. The model we will implement today, called t...
Lately, I have been on a DataCamp spree after unlocking a two-month free unlimited trial through Microsoft’s Visual Studio Dev Essentials program. If you hav...
This is an experimental jupyter notebook written using IRkernel. The purpose of this notebook is threefolds: first, to document my progress with self-learnin...
As a novice who just started learning Python just three months ago, I was clueless about what virtual environments were. All I knew was that Anaconda was pur...
Finally, here is the post that was promised ages ago: an introduction to Monte Carolo Markov Chains, or MCMC for short. It took a while for me to understand ...
This tutorial is a continuation of the “from scratch” series we started last time with the blog post demonstrating the implementation of a simple k-nearest n...
In a previous post on likelihood, we explored the concept of maximum likelihood estimation, a technique used to optimize parameters of a distribution. In tod...
These days, machine learning and deep neural networks are exploding in importance. These fields are so popular that, unless you’re a cave man, you have proba...
The other day, my friend and I were talking about our mutual friend Jeremy. “He’s an oddball,” my friend Sean remarked, to which I agreed. Out of nowhere, Je...
The word “moment” has many meanings. Most commonly, it connotes a slice of time. In the realm of physics, moment refers to the rotational tendency of some ob...
If there is one thing that the field of statistics wouldn’t be complete without, it’s probably normal distributions, otherwise referred to as “the bell curve...
I’ve been using a music streaming service for the past few weeks, and it’s been a great experience so far. I usually listen to some smoothing new age piano o...
If there is one thing I recall most succinctly from my high school chemistry class, it is how to use Excel to draw basic plots. In the eyes of a naive freshm...
I have been putting off with blog postsings lately, largely because I was preoccupied with learning new languages I decided to pick up out of whim. Although ...
“I think that’s very unlikely.” “No, you’re probably right.”
In the last post, I tested out the functionality of Jupyter Notebook, a platform that I am just starting to get acquainted with. I’m pleased with how that ex...
So far on this blog, we have looked the mathematics behind distributions, most notably binomial, Poisson, and Gamma, with a little bit of exponential. These ...
So far on this blog, all posts were written using Markdown. Markdown is very easy and learnable even for novices like me, but an issue I had was the inconven...
The more I continue my journey down the rabbit hole of mathematics, the more often I stumble across one name: Leonhard Euler. Nearly every concept that I lea...
In a previous post, we looked at the Poisson distribution as a way of modeling the probability of some event’s occurrence within a specified time frame. Spec...
At the Yongsan Provost Marshall Office, I receive a wide variety of calls during my shift. Some of them are part of routine communications, such as gate chec...
At a glance, Euler’s identity is a confusing, mind-boggling mishmash of numbers that somehow miraculously package themselves into a neat, simple form:
In a previous post, we briefly explored the notion of Markov chains and their application to Google’s PageRank algorithm. Today, we will attempt to understan...
Apple officially announced the new 16-inch MacBook Pro. This product has been a long awaited release for many tech enthusiasts particularly given the negativ...
Google is the most popular search engine in the world. It is so popular that the word “Google” has been added to the Oxford English Dictionary as a proper ve...
So here goes my first post!