Ouail Kitouni

Ph.D. Student

Massachusetts Institute of Technology

About

I am interested in the science of deep learning. Recently, I’ve been very excited about topics like reasoning, multi-modal foundation models, and safe and scalable deep learning. During my time at Microsoft Research, I worked on developing a knowledge base generative model towards a knowledge-augmented LLM approach to improve interpretability and limit hallucination. At FAIR, I worked on new pre-training objectives to make LLMs more data-efficient (learn more with less) and improve their knowledge storage and planning capabilities.

Interests

Science of Deep Learning
(Mechanistic) Interpretability
Reasoning in Foundation Models
Safety and Robustness

Education

Interdisciplinary Ph.D. in Physics and Statistics, 2019 - Present
Massachusetts Institute of Technology
BSc in Physics and Mathematics, 2019
University of Rochester

Experience

Research Scientist Intern

Meta AI

Jan 2024 – May 2024 NYC, NY

Research Intern

Microsoft Research

May 2023 – Aug 2023 Cambridge, UK

Machine Learning Researcher Intern

NASA/SETI Frontier Development Lab

May 2022 – Aug 2022 Mountain View, CA

Featured Publications

Ouail Kitouni, Niklas Nolte, Diane Bouchacourt, Adina Williams, Mike Rabbat, Mark Ibrahim

June 2024

The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More

Training transformers to predict “any-to-any” as opposed to just next token solves the reversal curse and can improve planning capabilites.

ArXiv

Ouail Kitouni, Niklas Nolte, James Hensman, Bhaskar Mitra

September 2023 ICML2024

DiSK: Diffusion Model for Structured Knowledge

DiSK is a generative framework for structured (dictionary-like) data that can handle various data types, from numbers to complex hierarchical types. This model excels in tasks like populating missing data and is especially proficient at predicting numerical values. Its potential extends to augmenting language models for better information retrieval and knowledge manipulation.

ArXiv

Ouail Kitouni, Niklas Nolte, Mike Williams

August 2022 ICLR2023

Robust and Provably Monotonic Networks

We develop a novel neural architecture with an exact bound on its Lipschitz constant. The model can be made monotonic in any subset of its features. This inductive bias is especially important for fairness and interpretability considerations.

PDF Code Project Poster NeurIPS Abstract

Ziming Liu, Ouail Kitouni, Niklas Nolte, Eric Michaud, Mike Williams, Max Tegmark

June 2022 NeurIPS2022

Towards Understanding Grokking: An Effective Theory of Representation Learning

This study investigates grokking, a generalization phenomenon first observed in transformer models trained on arithmetic data, using microscopic and macroscopic analyses, revealing four learning phases and a “Goldilocks zone” for optimal representation learning, while emphasizing the value of physics-inspired tools in understanding deep learning.

PDF