I am a 4th year computer science PhD student at UT Austin, advised by Prof. Swarat Chaudhuri. My research focuses on building machine learning frameworks to generate code with human-like efficiency. I'm also interested in enhancing the efficient adaptation framework for Large Language Models (LLMs), specifically for code generation tasks.
Before joining UT Austin, I was a master student in computer science advised by Prof. Jimmy Ba at University of Toronto. I worked on the development of efficient learning algorithms for deep neural networks. My undergraduate studies were also at the University of Toronto, working with Prof. Roger Grosse on stochastic deep neural networks.
Yeming Wen & Swarat Chaudhuri
International Conference on Learning Representations (ICLR), 2024 (Oral, 1.2%)
FLoRA: batched version of Low-Rank Adaptation (LoRA) for foundation models, enabling efficient, personalized adaptations for diverse tasks, demonstrating its effectiveness across multilingual code generation.
Yeming Wen & Swarat Chaudhuri
Advances in Neural Information Processing Systems (NeurIPS), 2024
Synthesize-Partition-Adapt (SPA): using synthetic data and influence functions to create multiple model adaptations, enabling language models to generate diverse responses while maintaining their quality.
Yeming Wen, Pengcheng Yin, Kensen Shi, Henryk Michalewski, Swarat Chaudhuri & Alex Polozov
Instruction Tuning and Instruction Following Workshop at NeurIPS, 2023
We enhance large language models for code generation from natural language prompts with explicit input-output specifications by leveraging synthetic data and utilizing execution-derived feedback.
Amitayush Thakur, George Tsoukalas, Yeming Wen, Jimmy Xin & Swarat Chaudhuri
Conference on Language Modeling (COLM), 2024
COPRA: an in-context learning agent for theorem-proving in Lean and Coq, leveraging GPT-4 for tactic proposals within a stateful search, outperforming both few-shot GPT-4 and finetuned models on benchmarks.
Pengcheng Yin, Wen-Ding Li, Kefan Xiao, Abhishek Rao, Yeming Wen, Kensen Shi, Joshua Howland, Paige Bailey, Michele Catasta, Henryk Michalewski, Alex Polozov & Charles Sutton
Association for Computational Linguistics (ACL), 2023
Arcade: a benchmark featuring challenging data analysis problems in Jupyter notebooks [Code].
Jeremiah Zhe Liu, Shreyas Padhy, Jie Ren, Zi Lin, Yeming Wen, Ghassen Jerfel, Zachary Nado, Jasper Snoek, Dustin Tran & Balaji Lakshminarayanan
Journal of Machine Learning Research (JMLR), 2023
Spectral-normalized Neural Gaussian Process (SNGP) enhances uncertainty estimation by improving a single network's distance-awareness ability, without the high costs of ensembles and Bayesian neural networks.
Rohan Mukherjee, Yeming Wen, Dipak Chaudhari, Thomas Reps, Swarat Chaudhuri & Chris Jermaine
Advances in Neural Information Processing Systems (NeurIPS), 2021 (Spotlight)
By conditioning on the semantic attributes that are computed on AST nodes by the compiler, our model generates better code snippets such as Java method bodies given their surrounding context.
Yeming Wen, Ghassen Jerfel, Rafael Muller, Michael W. Dusenberry, Jasper Snoek, Balaji Lakshminarayanan & Dustin Tran
International Conference on Learning Representations (ICLR), 2021
CAMixup: by adjusting data augmentation according to calibration, we can exploit both marginalization and invariances.
Michael W. Dusenberry, Ghassen Jerfel, Yeming Wen, Yi-An Ma, Jasper Snoek, Katherine Heller, Balaji Lakshminarayanan & Dustin Tran
International Conference on Machine Learning (ICML), 2020
Improved BatchEnsemble with mixture posteriors, Cauchy priors and rank-1 parameterization.
Yeming Wen, Dustin Tran & Jimmy Ba
International Conference on Learning Representations (ICLR), 2020
Bayesian Deep Learning Workshop at NeurIPS, 2019
How to efficiently ensemble deep neural networks efficiently in both computation and memory.
Yeming Wen, Kevin Luk, Maxime Gazeau, Guodong Zhang, Harris Chan & Jimmy Ba
International Conference on Artificial Intelligence and Statistics (AISTATS), 2020
How to add noise to gradients with correct covariance structure such that large-batch training generalizes better without longer training.
Yeming Wen, Paul Vicol, Jimmy Ba, Dustin Tran & Roger Grosse
International Conference on Learning Representations (ICLR), 2018
How to efficiently make pseudo-independent weight perturbations on mini-batches in evolution strategies and variational BNNs as activation perturbations in dropout.
Zachary Nado, Neil Band, Mark Collier, Josip Djolonga, Michael W. Dusenberry, Sebastian Farquhar, Angelos Filos, Marton Havasi, Rodolphe Jenatton, Ghassen Jerfel, Jeremiah Liu, Zelda Mariet, Jeremy Nixon, Shreyas Padhy, Jie Ren, Tim G. J. Rudner, Yeming Wen, Florian Wenzel, Kevin Murphy, D. Sculley, Balaji Lakshminarayanan, Jasper Snoek, Yarin Gal & Dustin Tran
Arxiv, 2020
High-quality implementations of standard and SOTA methods on a variety of tasks [Code].
Tingwu Wang, Xuchan Bao, Ignasi Clavera, Jerrick Hoang, Yeming Wen, Eric Langlois, Shunshi Zhang, Guodong Zhang, Pieter Abbeel & Jimmy Ba
Arxiv, 2019
Benchmarking several commonly used model-based algorithms [Code].
Student Researcher, Google
2022.09 - 2023.05, Remote
Host: Alex Polozov
Ph.D. Resident, Google X, the moonshot factory
2022.05 - 2022.08, Mountain View
Host: Alex Polozov
Research Intern, Google Brain (Now Deepmind)
2020.02 - 2020.09, Mountain View
Host: Dustin Tran
Student Researcher, Google Brain (Now Deepmind)
2019.08 - 2019.12, Toronto
Host: Dustin Tran