UCLA CS269 Reinforcement Learning

Liping Yin (Team 24) on Dec 11, 2022
Optimization sum-up time-consuming for multi-agents

“How long will agents stay in the traffic jam?” is one of the curious questions in autonomous driving that agents care about. In real-life, divers can know the driving map ahead or during their drives using Google Maps. However, some cars might be broken or have a car crash in the middle of the round which will occupy one or more lanes which is unexpected by the later drivers passing by. Metadrive can simulate and visualize multi-agent driving behaviors under different complex road scenarios. By using Reinforcement Learning knowledge, we can learn the traffic problem and get better solutions.
Ziheng Zhou, Yingqi Gao, Haoyi Qiu (team 23) on Dec 11, 2022
Planning with Learned Actionable Object-centric World Model

To achieve better planning in zero-shot generalization RL tasks, we propose to learn imagined actions over perceived objects in the learned world model. We first reproduced the work Veerapaneni et al’s OP3 method ([5]), which learns a general object-centric world model. Then we implemented our proposed model design to enable the imagined action. We trained it with the block stacking task that the OP3 work has been evaluating on to make sure our method still performs correctly. Our method can still work correctly to learn meaningful representations that can be used to reconstruct the original image, and the forward dynamics are correct. But the imagined action doesn’t work well yet, which we suspect due to both the poorly fitted prior and the limitation of the dataset. We have been trying to create a billiard ball environment, but due to unexpected bugs and time constraints, we have not finished. When it’s finished we can check our imagined action method a lot more intuitively.
Ben Klingher, Erik Ren (Team 22) on Dec 11, 2022
Self-Play for Competetive Multi-Agent Gameplay

Multi-agent reinforcement learning, in which multiple agents are trained in a single, interactive environment, has been demonstrated to develop interesting and novel behaviors. Competitive self-play is a form of multi-agent RL in which multiple versions of a single agent play competitively against each other in order to improve performance iteratively. In our research, we focused on a self-play model of reinforcement learning for the Atari game Pong. We employed deep Q-learning with a dueling convolutional neural network architecture, and experimented with a variety of training methods for our agents. We studied iterative improvement by training an agent against a static rule-based agent, and then freezing its training while we trained a new agent against our first trained agent, and repeated this process for several training cycles. We also experimented with full self-play in which two agents train against each other simultaneously while competing at Pong. Our research demonstrated that this full self-play model widely outperformed the iterative training model, even after a sequence of five iterative training cycles. This demonstrates the unique learning properties of live self-play in a competitive multi-agent environment like Pong.
Jiachen Yan, Yizuo Chen (Team 15) on Dec 10, 2022
Explore Human-agent Interactions in Real-life Traffic

In this project, we study the effects of cooperation between multi-agent in avoiding real-world traffic collisions. We consider the following two questions: (1) Is cooperation among agents helpful in terms of reducing the collision rate? (2) Is it safe to keep human drivers involved in a multi-agent system?
Joshua Duquette, Valen Xie (Team 25) on Dec 8, 2022
Heterogeneous Multi-agent Reinforcement Learning

In this blog, we are focusing on an algorithm called Embedded Multi-agent Actor–Critic (EMAC) that allows a team of heterogeneous agents to learn decentralized control policies for covering unknown environemnts that includes real-world environmental factors such as turbulence, delayed communication and agent loss, this approch is flexible to dynamic environment elements.
Christian Natajaya (Team 21) on Dec 8, 2022
Off-policy Meta Reinforcement Learning in a Multi-Agent Competitive Environment

In this project, we will investigate different applications of meta-RL. Previous investigations have been centered on using meta-RL to expedite single agents learning new unseen tasks drawn from the same task distribution. In this project, we will explore how meta-RL trained agents perform in a competitive multi-agent RL setting when having to play against new unseen agents. We will specifically implement the PEARL (off-policy meta-RL) algorithm developed by Kate Rakelly, et al. Instead of relying on LSTM recurrence in the policy network to memorize skills, PEARL decouples this from the policy by using a network to encode previous experiences into a latent context variable. This latent context variable is used to condition both actor (policy) and critic (value) architectures, We will use Derkgym as the multi-agent gym environment to test the PEARL algorithm. We will first train the agent against another agent that will only perform random actions. After collecting this experience, we will then expose the agent to new opponents with different skillsets. Using meta-RL, we anticipate that the agent will be able to adapt to new opponents quicker. The code and video presentation can be accessed using this link: https://drive.google.com/drive/folders/1JTQ2ycmXNRA2OSZ7f-fipg94HBnQCR0V?usp=sharing
Xiyang Cai and Jie Shao (Team 13) on Dec 6, 2022
Multi-Agent Traffic Learning based on MARL

The project mainly focuses on the problem of traffic simulation. Concretely, this problem refers to having the agents controlled by computers learn to simulate the behavior that people will do when driving in different environments. We plan to reproduce a multiagent reinforcement learning (MARL) algorithm called Coordinated Policy Optimization (CoPo) for this problem. For the method evaluation, we plan to deploy this model to a driving simulator called MetaDrive, including five typical traffic environments, and analyze the behaviors of the individuals.
Justin Cui (Team 06) on Nov 13, 2022
Reinforcement Learning for Football Player Training

As the most popular sport on the planet, millions of fans enjoy watching Sergio Agüero, Raheem Sterling, and Kevin de Bruyne on the field. Football video games are less lively, but still immensely popular, and we wonder if AI agents would be able to play those properly. Researchers want to explore AI agents’ ability to play in complex settings like football. The sport requires a balance of short-term control, learned concepts such as passing, and high-level strategy, which can be difficult to teach agents. We apply RL to train the agents to see if we can achieve good performances.
Linqiao Jiang, Qi Li, Huizhuo Yuan (Team 05) on Oct 24, 2022
2 vs 2 soccer game

In this project we are going to investigate Multi-Agent Reinforcement Learning (MARL). Unlike vanilla RL tasks, MARL involves training multiple agents to learn and make decisions based on their interactions with both the environment and other agents. To illustrate the potential of MARL, we build a 2 vs 2 soccer game in Unity and apply different MARL algorithms to it. We then use the ELO score to evaluate the performance of the multi-agent algorithms and compare them to single-agent algorithms running on MA systems. In the end, we design a fighting experiment where two agents trained by different algorithms play against each other and clearly demonstrate how their behavior varies.
Shivam Patel & Abirami Anbumani (Team 29) on Oct 19, 2022
Multi-Agent Reinforcement Learning for Generating Optimal Market Liquidation Strategy

We will investigate the effectiveness of Multi-Agent Reinforcement Learning(MARL) for obtaining optimal liquidation strategies in Financial Markets. Liquidation is a sequential decision-making problem where an agent strategises to sell a large number of assets in a given time period taking into account a financial market environment. Since state changes in such an environment are not subject to one but multiple such agents, we seek to frame this as a Multi-agent Reinforcement Learning Problem.
Shruthi Srinarasi and Raunak Sinha (Team 28) on Oct 19, 2022
Exploring Deep Transformer Q-Networks (DTQN)

In recent years transformers have been vastly used and studied for a variety of supervised learning tasks, especially in the domain of natural language processing (NLP). To utilize their power in capturing long dependencies in sequences, scalability, simplicity and efficiency, transformers have been fused with reinforcement learning tasks. This however till the advert of DTQN was done primarily in an offline RL fashion. Also, previous methods reformulate RL tasks as sequence prediction tasks. DTQN follows the traditional view of Deep Q-learning and can use transformers in an online fashion. We wish to embark upon understanding and dissecting the intricate workings of DTQN in this project.
Dylon Tjanaka, Kevin Tang, and Daniel Smith (Team 20) on Oct 19, 2022
Room rearrangement and indoor environment exploration

We investigate the problem of room rearrangement, where an agent explores a room, then attempts to restore the objects in a room to their original state.
Wenhao Shen (Team 16) on Oct 19, 2022
Exploring Multi-Agent Reinforcement Learning in Metadrive Simulation

Autonomous vehicle driving has always been one of the most significant field that AI could be applied on. This problem could be solved with making the agent learn gradually improve its policy with its exprience in the environment – Reinforcement Learning. One senario of this problem is that all vehicles are self-driven particles making this problem a multi-agent system. This project seeks to research MARL under this situation, explore the optimization of coordinated policy in this multi-agent environment.
Ruikang Wu and Mengyuan Zhang (Team 10) on Oct 19, 2022
Human-AI Shared Control via Policy Dissection

In many complex tasks, RL-trained policy may not solve the tasks efficiently and correctly. The training process may cost too much time. Policy dissection is a frequency-based method, which can convert RL-trained policy into target-conditioned policy. For this method, human can interacte with AI inorder to get a more good and efficeient result. In this project, we want to explore the human-AI control and implementation of the policy dissection. We will add new action in MetaDrive enviornment.
Tanmay Sanjay Hukkeri, Vivek Arora (Team 09) on Oct 19, 2022
Exploring Generalizability in Autonomous Driving Simulation using MetaDrive

One of the most significant challenges in artificial intelligence is that of autonomous vehicle control. The problem is usually modeled as a continuous stream of actions accompanied by feedback from the environment, with the goal of the “agent” being to gradually improve control policies through experience and system feedback via reinforcement learning. A key facet of testing and improving autonomous driving is the use of simulators. Simulators provide an environment to experiment, train and test RL algorithms on benchmarks of success, generalization and safety, without requiring physical resources and environments. This project seeks to understand the functionalities of different autonomous driving simulators, experimenting with multiple algorithms and then moving onto generalization by comparing training on a single generalized environment, and a cascade of environments and determining the optimal configurations that can maximize generalized return.
John Arthur Dang and Felix Zhang (Team 01) on Oct 19, 2022
Meta Learning for Reinforcement Learning

For our project, we plan on applying MAML to more difficult RL task distributions than was used in the original MAML paper. Specifically we aim to investigate one and few-shot learning for various locomotion tasks of different creature types.
Thomas Scott and Connor Couture (Team 17) on Oct 18, 2022
Reinforcement Learning for Autonomous Driving in a Realistic Simulated Urban Environment

Investigating the performance of various reinforcement learning strategies for self driving vehicles.
Ziru Yan, Yongling Li, Hyosang Ahn (Team 07) on Oct 18, 2022
Reinforcement Learning Implementation on Financial Portfolio Optimization

We want to implement Reinforcement learning on the topic of financial Portfolio optimization, which is one of the most important topic in assets mangement industry. We want to show the potential ability of Reinforcement learning on this area and, ideally, we will able to show that our model could create a portfolio which has much better performance than main index in financial market.
Yanxun Li & Zixian Li (Team 19) on Oct 15, 2022
Speed up Autonomous Driving with Safe Reinforcement Learning in Metadrive

In MetaDrive, safety is the first priority. We want to explore how to speed up auto-drive agents with Safe Reinforcement Learning and find the optimal speed in diverse driving scenarios.
Yanxun Li & Zixian Li (Team 19) on Oct 15, 2022
Speed up Autonomous Driving with Safe Reinforcement Learning in Metadrive

In MetaDrive, safety is the first priority. We want to explore how to speed up auto-drive agents with Safe Reinforcement Learning and find the optimal speed in diverse driving scenarios.
Rohan Wadhawan, Simran Masand, Subham Agrawal (Team 04) on Oct 14, 2022
Reinforcement Learning for Recommendation Systems

While traditional recommendation methods have significantly improved user experience, they are often myopic and need a constant feedback loop to the user. In this implementation, we use Offline Reinforcement Learning to build an agent that can recommend music tracks to a user to minimize the skip rate. We leverage open-sourced Spotify data to build an environment using item embeddings, simulate the rewards based on skips in the user session, and train an agent on this offline data of sequential user interaction. The underlying agent utilizes LIRD, an implementation of the Deep Deterministic Policy gradient algorithm, to optimize recommendations. The reward is received from user behavior when deployed online and simulated during training. The model outperforms the baseline model in minimizing the skip rate and generates diverse recommendations.
Siqi Liu, Yiming Shi (Team 11) on Oct 12, 2022
Ensemble RL - Apply Ensemble Methods on Reinforcement Models using MetaDrive

Ensemble method helps improve models’ performance by combining multiple models instead of using a single model. These methods are wildly used in many Machine Learning tasks. However, there is not much implementation in the Reinforcement Learning area. In this project, we will apply ensemble methods in Reinforcement Learning to our autonomous driving task based on the MetaDrive platform. We trained Proximal Policy Optimization (PPO), Twin Delayed DDPG (TD3), Generative Adversarial Imitation Learning (GAIL), and Soft Actor-Critic (SAC) models as baseline models. We investigated different ensemble methods based on these models. The overall result for the model after the ensemble is slightly better than the ones without the ensemble, but in some cases, we gain much better results.
Peter Yang, Tony Lin (Team 27) on Sep 19, 2022
Reinforcement Learning in Finance/Trading

In this project, we are aiming to apply reinforcement learning techniques on real finance market data. To start, we would adapt the OpenBB Terminal environment and begin with stock market data. By using OpenBB, we are interested in analyze the real-world finance data and discover how reinforcement learning can be used to predict the market. During the initial stage, we would reproduce the results from the contributors. Afterwards, we would love to explore the effectiveness the system has on various data we gather.
Yifu Yuan, Dadian Zhu (team 26) on Sep 19, 2022
Policy Gradient in Active Learning of Graph Neural Networks

Neural Networks have been proven to be powerful and effective in many tasks. However, training such networks usually requires huge amount of data. In supervised learning paradigm, labelling data involves professional domain knowledge and thus becomes expensive and sometimes impratical. To address the data shortage, active learning mothods have been proposed which focuses on how to efficiently label most valuable data samples to reduce annotation cost. In this project, we investigate the application of Policy Gradient in active learning of Graph Neural Networks (GNN). We mainly focus on the affect of better states representation and effectiveness of different policy gradient backbones.
Zhi Zuo, Yinglu Deng (Team 18) on Sep 19, 2022
Autonomous driving based on CARLA

These years, the tech giant companies have set off a boom in the development of autonomous driving, Waymo began as the Google self-driving car project in 2009, and the Tesla Motors announced its first version of Autopilot in 2014. So our team would also like to explore the autonomous driving skills by reinforcement learning in the CALRA simulator.
Xueer Li, Qilin Wang (Team14) on Sep 19, 2022
Transformer in Reinforcement Learning

Decision Transformers are a new type of machine learning model that enable the combination of transformers with reinforcement learning, opening up new avenues of research and application.
Jiaxi Wang and Xuanyi Lin (Team 12) on Sep 19, 2022
Design Our Model Based on What is Provided by CoPO

The project will consist of two parts. In the first part, we will reproduce the results from Independent Policy Optimization[5], IPO, (implemented with PPO), MFPO, CL, CoPO. Fine-tuning of the models will be attempted to explore possibly better results. During the second part, we will design, implement, train, and test our own models. We plan to design our model based on what is provided by CoPO.
Minglu Zhao, Jingdong Gao (Team 08) on Sep 19, 2022
The Emergence of Roles in Multi-agent Traffic Learning

We are interested in applying role-based multi-agent reinforcement learning (MARL) to self-driven particle systems. Specifically, we are inspired by role-based hierarchical MARL frameworks (Wang et al., 2020a; Wang et al., 2020b), which explicitly models agents as having different roles, with each role having its own policy. In this way, the search space for the policy is greatly reduced as the joint action space for roles is much smaller than the joint action space for all agents. We hope to further explore this idea in the MetaDrive environment to study the emergence of roles in the traffic setting. In conclusion, our group hopes to apply various hierarchy-based MARL algorithms to autonomous driving scenarios, and compare our results with existing methods such as (Peng et al., 2021).
Hanbyeol Yoon (Team 03) on Sep 19, 2022
RL-based Trajectory Planning for Dental Surgery

With advancement in robotic technologies, in the field of medical surgery, artificial intelligence (AI) and machine learning (ML) plays a pivotal role in medical imaging, prognosis/diagnosis, treatment assistant, and automation of repetitive subtasks of surgery. However, specifically in dental surgery, reinforcement learning (RL) has not been actively applied while it has a lot in common with robot machining where RL is widely used for fully-automated trajectory generation. Therefore, in this article, by developing an RL-based tool trajectory planner, we would like to enable robot arm to automatically perform a few subtasks in dental surgery such as tooth preparation for crown through removal of cavity. We will use CopelliaSim for simulation environments, Python/Pytorch for RL, and MATLAB for robotics. The expected result of this project would be a RL-based, collision-free, fully-automated dental tool trajectory generation algorithm.
Jacob Yoke Hong Si (Team 02) on Sep 19, 2022
The Decision Transformer: A Conditional Sequential Model in Reinforcement Learning

The development of Transformers in machine learning has allowed the possibilities of high-dimensional distribution models of semantic concepts at scale. However, the applications of transformers have mostly been limited to language generalization and image generation. Therefore, this article introduces the Decision Transformer (DT) [42], an offline reinforcement learning (RL) method. The DT uses conditional sequence modelling which allows it to leverage the simplicity and scalability of the Transformer. In the DT, an autoregressive generative model is conditioned on the return, previous states and actions. This enables the DT to obtain future actions with the desired return. Through comprehensive experiments using the OpenAI Gym of the DT against state-of-the-art model-free offline RL baselines, the DT remains extremely competitive and outperforms the other models.
Zhenghao Peng (Team 00) on Sep 19, 2022
Post Template

This block is a brief introduction of your project and must be finished. The introduction will be shown in the main page of this website.
Lilian Weng on Jun 21, 2017
An Overview of Deep Learning for Curious People (Sample post)

Starting earlier this year, I grew a strong curiosity of deep learning and spent some time reading about this field. To document what I’ve learned and to provide some interesting pointers to people with similar interests, I wrote this overview of deep learning models and their applications.

Room rearrangement and indoor environment exploration We investigate the problem of room rearrangement, where an agent explores a room, then attempts to restore the objects in a room to their original state.

Meta Learning for Reinforcement Learning For our project, we plan on applying MAML to more difficult RL task distributions than was used in the original MAML paper. Specifically we aim to investigate one and few-shot learning for various locomotion tasks of different creature types.

Reinforcement Learning for Autonomous Driving in a Realistic Simulated Urban Environment Investigating the performance of various reinforcement learning strategies for self driving vehicles.

Speed up Autonomous Driving with Safe Reinforcement Learning in Metadrive In MetaDrive, safety is the first priority. We want to explore how to speed up auto-drive agents with Safe Reinforcement Learning and find the optimal speed in diverse driving scenarios.

Speed up Autonomous Driving with Safe Reinforcement Learning in Metadrive In MetaDrive, safety is the first priority. We want to explore how to speed up auto-drive agents with Safe Reinforcement Learning and find the optimal speed in diverse driving scenarios.

Transformer in Reinforcement Learning Decision Transformers are a new type of machine learning model that enable the combination of transformers with reinforcement learning, opening up new avenues of research and application.

Post Template This block is a brief introduction of your project and must be finished. The introduction will be shown in the main page of this website.

Room rearrangement and indoor environment exploration

We investigate the problem of room rearrangement, where an agent explores a room, then attempts to restore the objects in a room to their original state.

Meta Learning for Reinforcement Learning

For our project, we plan on applying MAML to more difficult RL task distributions than was used in the original MAML paper. Specifically we aim to investigate one and few-shot learning for various locomotion tasks of different creature types.

Reinforcement Learning for Autonomous Driving in a Realistic Simulated Urban Environment

Investigating the performance of various reinforcement learning strategies for self driving vehicles.

Speed up Autonomous Driving with Safe Reinforcement Learning in Metadrive

In MetaDrive, safety is the first priority. We want to explore how to speed up auto-drive agents with Safe Reinforcement Learning and find the optimal speed in diverse driving scenarios.

Speed up Autonomous Driving with Safe Reinforcement Learning in Metadrive

In MetaDrive, safety is the first priority. We want to explore how to speed up auto-drive agents with Safe Reinforcement Learning and find the optimal speed in diverse driving scenarios.

Transformer in Reinforcement Learning

Decision Transformers are a new type of machine learning model that enable the combination of transformers with reinforcement learning, opening up new avenues of research and application.

Post Template

This block is a brief introduction of your project and must be finished. The introduction will be shown in the main page of this website.