Research
Broadly, I am interested in improving the scalability of RL algorithms, via
paradigms like offline RL, unsupervised RL, and RL pre-training. My long-horizon goal is to develop
large RL models that can learn in a completely unsupervised manner from the vast
collections of unlabeled data on the Internet.
My current focus is on using offline goal-conditioned RL to accomplish the above.
|
Dual Goal Representations
Seohong Park*,
Deepinder Mann*,
Sergey Levine
Preprint
paper
/
code
/
co-author blog post
/
thread
To combat exogenous noise in the goal-conditioned RL setting, we introduce dual goal
representations: representing a goal state purely in relation to other states.
|
Horizon Reduction Makes RL Scalable
Seohong Park,
Kevin Frans,
Deepinder Mann,
Benjamin Eysenbach,
Aviral Kumar,
Sergey Levine
NeurIPS 2025 (Spotlight)
paper
/
code
/
blog post
/
thread
We empirically show that the poor scalability of TD-learning is rooted in the so-called "curse of
horizon" and suggest practical horizon reduction techniques to alleviate this problem.
|
NLP-MM
code
Simple Markov model for text processing and natural language generation. Interfaced as a Discord
bot, trains online on all received messages.
|
[WIP] PartMatcher
repo 1
/
repo 2
Full end-to-end system for computer build generation & PC component analytics. Tracks live prices,
scrapes for benchmark data, and automates build generation with a complex statistical model.
|
|