PS if you want to read more of my writing, subscribe to my Substack.

Five years of GPT progress --- Amii talk

May 29, 2023 read post

I recently gave a talk at Amii about the history of GPT models.

Deriving the DALL-E lower bound

April 05, 2023 read post

Five years of GPT progress

March 27, 2023 read post

If you want to read more of my writing, I have a Substack.

How is LLaMa.cpp possible?

March 16, 2023 read post

If you want to read more of my writing, I have a Substack. Articles will be posted simultaneously to both places.

A step towards self-improving LLMs

March 07, 2023 read post

There's a Substack version of this post, if you prefer that over my amateurish artisan HTML.

Papers I've read this week (March 4th, 2023)

March 04, 2023 read post

I’m going to try to write a weekly summary of the most interesting papers I’ve read that week. I’d love to hear what papers you’ve been reading, if you agree/disagree about my conclusions for each paper, and/or suggestions for what papers I should read next!

The Sigmoid: a metaphor for technological progress

March 02, 2023 read post

I regularly reference the “s-curve”, or sigmoid, as a metaphor for progress. Here, I explain what I mean, so that I can just link to this post.

Large language models aren't trained enough.

February 27, 2023 read post

I have a Substack if you want to be notified when I write.

A pure Python (well, Numpy) implementation of back-propagation

January 29, 2023 read post

I realized over the weekend that, unfortunately, I didn't know how back-propagation actually works (I just relied on JAX to do it for me).

Pointer Networks

September 20, 2017 read post

Link to paper [arXiv], [code].

Do deep networks generalise or just memorise?

July 04, 2017 read post

There's a brilliant paper out of Google Brain 1 which claimed that DNNs just memorise the training data, and a response 2, which claims that they don't.

Outrageously Large Neural Networks: The sparsely-gated Mixture-of-Experts layer

July 01, 2017 read post

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

June 20, 2017 read post

Random Search for Hyper-Parameter Optimization

March 01, 2017 read post

Useful Bash One-liners

January 20, 2017 read post

I have a file in my home folder that contains Bash oneliners that I use regularly (I'm a huge nerd, naturally). I found most of them elsewhere online; I wrote very few of these from scratch.

A Deep Hierarchical Approach to Lifelong Learning in Minecraft

January 03, 2017 read post

Larry Ellison on consulting costs

December 06, 2016 read post

I'm currently reading Softwar, a book about Oracle's rise. The book is brilliant, and it descibes at length Larry Ellison's sales process. There was a passage describing a meeting that Larry had that explains far more about enterprise sales than it should:

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5mb model size

November 10, 2016 read post

Conditional image synthesis with auxiliary classifier GANs

November 08, 2016 read post

Generative Adversarial Imitation Learning

November 08, 2016 read post

Minimal example of how to do model selection in Python

October 26, 2016 read post

I've had a few people ask me how to do model selection correctly. Here's a minimal example with sklearn in Python.

Representation Learning: A Review and New Perspectives

October 20, 2016 read post

Full Resolution Image Compression with Recurrent Neural Networks

October 19, 2016 read post

Generative Adversarial Networks and Actor-Critic methods

October 19, 2016 read post

Using simulated data to train robots

October 18, 2016 read post

Safe and Efficient Off-Policy Reinforcement Learning

October 18, 2016 read post

XGBoost: A scalable tree boosting system

September 20, 2016 read post

Excellent description of how hashtables work

August 15, 2015 read post

I'm working through the Algorithm Design Manual to improve the efficiency of my coding.

Full example for using JSONcpp on Unix

September 06, 2014 read post

I've been trying to parse JSON files with C++, and I've found a distinct lack of full examples on how to do so. Specifically, I've struggled to find the proper commands to actually compile the code. For future reference (and to help any beginners out), here's a full example of how to use JSONcpp in your code (N.B. You're supposed to enter all of the following code in your terminal).

ARIMA, ARMA, what's the difference?

April 21, 2014 read post

I'm working through TSA, and I noticed that some of my classmates are struggling to understand the difference between an ARIMA process, an AR process, and a MA process, not to mention seasonal version of the above.

Solving Partial Autocorrelation Functions

March 03, 2014 read post

I've been studying time series through TSA. The book presents a structured approach to time series analysis, and covers the material fairly well; I was impressed with the description of what a partial autocorrelation function (PACF) is, as the book explained it more intuitively than the lecture notes did. I did find the description of how to actually solve for the PACF a bit confusing, so I wrote my own explanation.