Emergent abilities and grokking: Fundamental, Mirage, or both?

One of the lessons we have seen in language modeling is the power of scale. The original GPT paper of Radford et al. noted that at some point during training, the model “acquired” the ability to do sentiment analysis of a sentence X by predicting whether it is more likely to be followed by “very … Continue reading Emergent abilities and grokking: Fundamental, Mirage, or both?

Replica Method for the Machine Learning Theorist: Part 2 of 2

Blake Bordelon, Haozhe Shan, Abdul Canatar, Boaz Barak, Cengiz Pehlevan See part 1 of this series, and pdf version of both parts. See also all seminar posts. In the previous post we described the outline of the replica method, and outlined the analysis per this figure: Specifically, we reduced the task of evaluating the expectation … Continue reading Replica Method for the Machine Learning Theorist: Part 2 of 2

Replica Method for the Machine Learning Theorist: Part 1 of 2

Blake Bordelon, Haozhe Shan, Abdul Canatar, Boaz Barak, Cengiz Pehlevan [Boaz's note: Blake and Haozhe were students in the ML theory seminar this spring; in that seminar we touched on the replica method in the lecture on inference and statistical physics but here Blake and Haozhe (with a little help from the rest of us) … Continue reading Replica Method for the Machine Learning Theorist: Part 1 of 2

Towards a Theory of Generalization in Reinforcement Learning: guest lecture by Sham Kakade

Scribe notes by Hamza Chaudhry and Zhaolin Ren Previous post: Natural Language Processing - guest lecture by Sasha Rush Next post: TBD. See also all seminar posts and course webpage. See also video of lecture. Lecture slides: Original form: main / bandit analysis. Annotated: main / bandit analysis. Sham Kakade is a professor in the … Continue reading Towards a Theory of Generalization in Reinforcement Learning: guest lecture by Sham Kakade

Natural Language Processing (guest lecture by Sasha Rush)

Scribe notes by Benjamin Basseri and Richard Xu Previous post: Inference and statistical physics Next post: TBD. See also all seminar posts and course webpage. Alexander (Sasha) Rush is a professor at Cornell working in in Deep Learning / NLP. He applies machine learning to problems of text generation, summarizing long documents, and interactions between … Continue reading Natural Language Processing (guest lecture by Sasha Rush)

Inference and statistical physics

Scribe notes by Franklyn Wang Previous post: Robustness in train and test time Next post: Natural Language Processing (guest lecture by Sasha Rush). See also all seminar posts and course webpage. lecture slides (pdf) - lecture slides (Powerpoint with animation and annotation) - video Digression: Frequentism vs Bayesianism Before getting started, we'll discuss the difference … Continue reading Inference and statistical physics