TAI AAI #08 - Offline Reinforcement Learning

Events
TAI AAI #08 - Offline Reinforcement Learning

AWS Startup Loft | 東京

スタートアップ

日:

時刻:

タイプ:

直接参加

言語:

English

住所:

Meguro Central Square 17F, 3-1-1 Kami-osaki, Shinagawa, Tokyo 141-0021, JP

レベル:

200 - 中級, 300 - 上級, 400 - エキスパート

イベントの詳細

日

時刻

タイプ

直接参加

場所

Meguro Central Square 17F, 3-1-1 Kami-osaki, Shinagawa, Tokyo 141-0021, JP

Event Date and Time：

2025/03/13 (Thu) 19:00 - 21:00 (18:30 Doors Open)

Event Host：

Amazon Web Services, Inc.

Event Details：

This Tokyo AI (TAI) Advanced AI (AAI) group session will feature speakers on the topic of Offline Reinforcement Learning.

Our Community
Tokyo AI (TAI) is a community composed of people based in Tokyo and working with, studying, or investing in AI. We are engineers, product managers, entrepreneurs, academics, and investors intending to build a strong “AI coreˮ in Tokyo. Find more in our overview: https://bit.ly/tai_overview

Agenda：

18:30 Doors open

19:00-20:30 Speakers

Introduction to Offline Reinforcement Learning - Takuma Seno
Offline reinforcement learning (RL) is a paradigm where an RL agent is optimized exclusively with static datasets, which doesn't require any online interaction with an environment. This paradigm unlocks possibilities of new RL applications that were previously difficult to implement with online RL. In this talk, I will cover popular offline RL algorithms and software tools for practitioners.

Offline Reinforcement Learning from Datasets with Nonstationarity - Johannes Ackermann
In Offline RL we aim to learn a policy from a dataset that we have previously connected. The environment is usually assumed to be unchanging throughout the data collection, however, this assumption won't hold when collecting a dataset in practice over a longer timeframe. We thus address a problem setting in which, while collecting the dataset, the transition and reward functions gradually change between episodes but stay constant within each episode. We show that existing methods fail in this setting and propose a method based on contrastive predicting coding that addresses the shortcomings of previous methods. RLC 2024, https://arxiv.org/abs/2405.14114

Studying Sample Efficiency in Deep RL Through Better Evaluation Methods and Data Pruning - Shivakanth Sujit
Reinforcement learning (RL) has shown great promise with algorithms learning in environments with large state and action spaces purely from scalar reward signals. A crucial challenge for current deep RL algorithms is that they require a tremendous amount of environment interactions for learning. This can be infeasible in situations where such interactions are expensive, such as in robotics. In this talk I want to present previous work on sample efficiency, through the lens of better evaluation methods as well as from the algorithmic perspective. For the former, I present an approach for evaluating offline RL methods as a function of data rather than the traditional method of basing it on compute or gradient steps. This approach reveals interesting insights about current offline methods into the data efficiency of the learning process and the robustness of algorithms to distribution changes in the dataset while also answering how much do we actually learn from existing benchmarks. Next I will talk about modifications to existing algorithms that can improve their sample efficiency. Off policy RL methods use an experience replay buffer to store and sample data the agent has observed. However simply assigning equal importance to each of the samples is a naive strategy. This work proposes a method to prioritize samples based on the loss reduction potential of a point, i.e. how much we can learn from a sample.

20:30-21:00 Networking and Food

21:30 Event ends / Building close

Speakers:

Takuma Seno (https://takuseno.github.io/)
Takuma Seno is a Senior Research Scientist at Sony AI, Tokyo Laboratory in Japan. He is working on deep reinforcement learning research for AI agents in Gran Turismo, called Gran Turismo Sophy. He received a Ph.D from Keio University, Japan. He is an author of an offline reinforcement learning library, d3rlpy. He has received several awards for his work, including the Mitou Super Creator 2020 from Information Promotion Agency (IPA), Japan and the outstanding paper award on applications of RL at Reinforcement Learning Conference (RLC) 2024.

Johannes Ackermann (https://johannesack.github.io)
Johannes Ackermann is a PhD student at the University of Tokyo, supervised by Professor Masashi Sugiyama. His research focuses on Reinforcement Learning with changing or complicated transition dynamics and reward functions.

Shivakanth Sujit (https://shivakanthsujit.github.io/)
Shivakanth is a Senior Researcher at Araya. He received his M.Sc. in 2023 from Mila Quebec. He is interested in deep reinforcement learning for robotics and LLMs. Before joining Mila he completed his undergraduate at NIT Trichy, India in Control Engineering, and this background drives his research in combining the insights from control theory and RL for building agents that can safely interact in the real world.