Yeming Wen

Batched Low-Rank Adaptation of Foundation Models

Yeming Wen & Swarat Chaudhuri
International Conference on Learning Representations (ICLR), 2024 (Oral, 1.2%)

FLoRA: batched version of Low-Rank Adaptation (LoRA) for foundation models, enabling efficient, personalized adaptations for diverse tasks, demonstrating its effectiveness across multilingual code generation.

Synthesize, Partition, then Adapt: Eliciting Diverse Samples from Foundation Models

Yeming Wen & Swarat Chaudhuri
Advances in Neural Information Processing Systems (NeurIPS), 2024

Synthesize-Partition-Adapt (SPA): using synthetic data and influence functions to create multiple model adaptations, enabling language models to generate diverse responses while maintaining their quality.

Grounding Data Science Code Generation with Input-Output Specifications

Yeming Wen, Pengcheng Yin, Kensen Shi, Henryk Michalewski, Swarat Chaudhuri & Alex Polozov
Instruction Tuning and Instruction Following Workshop at NeurIPS, 2023

We enhance large language models for code generation from natural language prompts with explicit input-output specifications by leveraging synthetic data and utilizing execution-derived feedback.

An In-Context Learning Agent for Formal Theorem-Proving

Amitayush Thakur, George Tsoukalas, Yeming Wen, Jimmy Xin & Swarat Chaudhuri
Conference on Language Modeling (COLM), 2024

COPRA: an in-context learning agent for theorem-proving in Lean and Coq, leveraging GPT-4 for tactic proposals within a stateful search, outperforming both few-shot GPT-4 and finetuned models on benchmarks.

Natural Language to Code Generation in Interactive Data Science Notebooks

Pengcheng Yin, Wen-Ding Li, Kefan Xiao, Abhishek Rao, Yeming Wen, Kensen Shi, Joshua Howland, Paige Bailey, Michele Catasta, Henryk Michalewski, Alex Polozov & Charles Sutton
Association for Computational Linguistics (ACL), 2023

Arcade: a benchmark featuring challenging data analysis problems in Jupyter notebooks [Code].

A Simple Approach to Improve Single-model Deep Uncertainty via Distance-awareness

Jeremiah Zhe Liu, Shreyas Padhy, Jie Ren, Zi Lin, Yeming Wen, Ghassen Jerfel, Zachary Nado, Jasper Snoek, Dustin Tran & Balaji Lakshminarayanan
Journal of Machine Learning Research (JMLR), 2023

Spectral-normalized Neural Gaussian Process (SNGP) enhances uncertainty estimation by improving a single network's distance-awareness ability, without the high costs of ensembles and Bayesian neural networks.

Neural Program Generation Modulo Static Analysis

Rohan Mukherjee, Yeming Wen, Dipak Chaudhari, Thomas Reps, Swarat Chaudhuri & Chris Jermaine
Advances in Neural Information Processing Systems (NeurIPS), 2021 (Spotlight)

By conditioning on the semantic attributes that are computed on AST nodes by the compiler, our model generates better code snippets such as Java method bodies given their surrounding context.

Combining Ensembles and Data Augmentation can Harm your Calibration

Yeming Wen, Ghassen Jerfel, Rafael Muller, Michael W. Dusenberry, Jasper Snoek, Balaji Lakshminarayanan & Dustin Tran
International Conference on Learning Representations (ICLR), 2021

CAMixup: by adjusting data augmentation according to calibration, we can exploit both marginalization and invariances.

Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

Michael W. Dusenberry, Ghassen Jerfel, Yeming Wen, Yi-An Ma, Jasper Snoek, Katherine Heller, Balaji Lakshminarayanan & Dustin Tran
International Conference on Machine Learning (ICML), 2020

Improved BatchEnsemble with mixture posteriors, Cauchy priors and rank-1 parameterization.

BatchEnsemble: an Alternative Approach to Efficient Ensemble and Lifelong Learning

Yeming Wen, Dustin Tran & Jimmy Ba
International Conference on Learning Representations (ICLR), 2020
Bayesian Deep Learning Workshop at NeurIPS, 2019

How to efficiently ensemble deep neural networks efficiently in both computation and memory.

An Empirical Study of Large-Batch Stochastic Gradient Descent with Structured Covariance Noise

Yeming Wen, Kevin Luk, Maxime Gazeau, Guodong Zhang, Harris Chan & Jimmy Ba
International Conference on Artificial Intelligence and Statistics (AISTATS), 2020

How to add noise to gradients with correct covariance structure such that large-batch training generalizes better without longer training.

Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches

Yeming Wen, Paul Vicol, Jimmy Ba, Dustin Tran & Roger Grosse
International Conference on Learning Representations (ICLR), 2018

How to efficiently make pseudo-independent weight perturbations on mini-batches in evolution strategies and variational BNNs as activation perturbations in dropout.

Uncertainty Baselines: Benchmarks for Uncertainty & Robustness in Deep Learning

Zachary Nado, Neil Band, Mark Collier, Josip Djolonga, Michael W. Dusenberry, Sebastian Farquhar, Angelos Filos, Marton Havasi, Rodolphe Jenatton, Ghassen Jerfel, Jeremiah Liu, Zelda Mariet, Jeremy Nixon, Shreyas Padhy, Jie Ren, Tim G. J. Rudner, Yeming Wen, Florian Wenzel, Kevin Murphy, D. Sculley, Balaji Lakshminarayanan, Jasper Snoek, Yarin Gal & Dustin Tran
Arxiv, 2020

High-quality implementations of standard and SOTA methods on a variety of tasks [Code].

Benchmarking Model-Based Reinforcement Learning

Tingwu Wang, Xuchan Bao, Ignasi Clavera, Jerrick Hoang, Yeming Wen, Eric Langlois, Shunshi Zhang, Guodong Zhang, Pieter Abbeel & Jimmy Ba
Arxiv, 2019

Benchmarking several commonly used model-based algorithms [Code].