Research

Post-training Large Language Models for Diverse High-Quality Responses

Post-training Large Language Models for Diverse High-Quality Responses

Yilei Chen, Souradip Chakraborty, Lorenz Wolf, Ioannis Paschalidis, Aldo Pacchiano

arXiv

Principled Fine-tuning of LLMs from User-Edits: A Medley of Preference, Supervision, and Reward

Principled Fine-tuning of LLMs from User-Edits: A Medley of Preference, Supervision, and Reward

Dipendra Misra*, Aldo Pacchiano*, Ta-Chung Chi, Ge Gao

NeuRIPS 2025

Language Model Personalization via Reward Factorization

Language Model Personalization via Reward Factorization

Idan Shenfeld*, Felix Faltings*, Pulkit Agrawal, Aldo Pacchiano

COLM 2025

Contextual Bandits with Stage-wise Constraints

Contextual Bandits with Stage-wise Constraints

Aldo Pacchiano, Mohammad Ghavamzadeh, Peter Bartlett

JMLR 2025

Learning to Explore: An In-Context Learning Approach for Pure Exploration

Learning to Explore: An In-Context Learning Approach for Pure Exploration

Alessio Russo*, Ryan Welch*, Aldo Pacchiano

arXiv

Multiple-policy Evaluation via Density Estimation

Multiple-policy Evaluation via Density Estimation

Yilei Chen, Aldo Pacchiano, Ioannis Ch. Paschalidis

ICML 2025

On the Hardness of Bandit Learning

On the Hardness of Bandit Learning

Nataly Brukhim*, Aldo Pacchiano*, Miroslav Dudik, Robert Schapire

COLT 2025

Active Preference Optimization for Sample Efficient RLHF

Active Preference Optimization for Sample Efficient RLHF

Nirjhar Das, Souradip Chakraborty, Aldo Pacchiano, Sayak Ray Chowdhury

ECML-PKDD 2025

Pure Exploration with Feedback Graphs

Pure Exploration with Feedback Graphs

Alessio Russo, Yichen Song, Aldo Pacchiano

AISTATS 2025

A Theoretical Framework for Partially-Observed Reward States in RLHF

A Theoretical Framework for Partially-Observed Reward States in RLHF

Chinmaya Kausik, Mirco Mutti, Aldo Pacchiano, Ambuj Tewari

ICLR 2025

Provable Interactive Learning with Hindsight Instruction Feedback

Provable Interactive Learning with Hindsight Instruction Feedback

Dipendra Misra*, Aldo Pacchiano*, Robert E Schapire

ICML 2024

Data-Driven Regret Balancing for Online Model Selection in Bandits

Data-Driven Regret Balancing for Online Model Selection in Bandits

Aldo Pacchiano, Christoph Dann, Claudio Gentile

AISTATS 2024

A Unified Model and Dimension for Interactive Estimation

A Unified Model and Dimension for Interactive Estimation

Nataly Brukhim, Aldo Pacchiano, Miroslav Dudik, Robert Schapire

NeuRIPS 2023

Anytime Model Selection in Linear Bandits

Anytime Model Selection in Linear Bandits

Parnian Kassraie, Aldo Pacchiano, Nicolas Emmenegger, Andreas Krause

NeuRIPS 2023

Experiment Planning with Function Approximation

Experiment Planning with Function Approximation

Aldo Pacchiano, Jonathan Lee, Emma Brunskill

NeuRIPS 2023

Supervised Pretraining Can Learn In-Context Reinforcement Learning

Supervised Pretraining Can Learn In-Context Reinforcement Learning

Jonathan Lee*, Annie Xie*, Aldo Pacchiano, Yash Chandak, Chelsea Finn, Ofir Nachum, Emma Brunskill

NeuRIPS 2023