• Optimization sum-up time-consuming for multi-agents

    “How long will agents stay in the traffic jam?” is one of the curious questions in autonomous driving that agents care about. In real-life, divers can know the driving map ahead or during their drives using Google Maps. However, some cars might be broken or have a car crash in the middle of the round which will occupy one or more lanes which is unexpected by the later drivers passing by. Metadrive can simulate and visualize multi-agent driving behaviors under different complex road scenarios. By using Reinforcement Learning knowledge, we can learn the traffic problem and get better solutions.

  • Planning with Learned Actionable Object-centric World Model

    To achieve better planning in zero-shot generalization RL tasks, we propose to learn imagined actions over perceived objects in the learned world model. We first reproduced the work Veerapaneni et al’s OP3 method ([5]), which learns a general object-centric world model. Then we implemented our proposed model design to enable the imagined action. We trained it with the block stacking task that the OP3 work has been evaluating on to make sure our method still performs correctly. Our method can still work correctly to learn meaningful representations that can be used to reconstruct the original image, and the forward dynamics are correct. But the imagined action doesn’t work well yet, which we suspect due to both the poorly fitted prior and the limitation of the dataset. We have been trying to create a billiard ball environment, but due to unexpected bugs and time constraints, we have not finished. When it’s finished we can check our imagined action method a lot more intuitively.

  • Self-Play for Competetive Multi-Agent Gameplay

    Multi-agent reinforcement learning, in which multiple agents are trained in a single, interactive environment, has been demonstrated to develop interesting and novel behaviors. Competitive self-play is a form of multi-agent RL in which multiple versions of a single agent play competitively against each other in order to improve performance iteratively. In our research, we focused on a self-play model of reinforcement learning for the Atari game Pong. We employed deep Q-learning with a dueling convolutional neural network architecture, and experimented with a variety of training methods for our agents. We studied iterative improvement by training an agent against a static rule-based agent, and then freezing its training while we trained a new agent against our first trained agent, and repeated this process for several training cycles. We also experimented with full self-play in which two agents train against each other simultaneously while competing at Pong. Our research demonstrated that this full self-play model widely outperformed the iterative training model, even after a sequence of five iterative training cycles. This demonstrates the unique learning properties of live self-play in a competitive multi-agent environment like Pong.

  • Explore Human-agent Interactions in Real-life Traffic

    In this project, we study the effects of cooperation between multi-agent in avoiding real-world traffic collisions. We consider the following two questions: (1) Is cooperation among agents helpful in terms of reducing the collision rate? (2) Is it safe to keep human drivers involved in a multi-agent system?

  • Heterogeneous Multi-agent Reinforcement Learning

    In this blog, we are focusing on an algorithm called Embedded Multi-agent Actor–Critic (EMAC) that allows a team of heterogeneous agents to learn decentralized control policies for covering unknown environemnts that includes real-world environmental factors such as turbulence, delayed communication and agent loss, this approch is flexible to dynamic environment elements.

  • Off-policy Meta Reinforcement Learning in a Multi-Agent Competitive Environment

    In this project, we will investigate different applications of meta-RL. Previous investigations have been centered on using meta-RL to expedite single agents learning new unseen tasks drawn from the same task distribution. In this project, we will explore how meta-RL trained agents perform in a competitive multi-agent RL setting when having to play against new unseen agents. We will specifically implement the PEARL (off-policy meta-RL) algorithm developed by Kate Rakelly, et al. Instead of relying on LSTM recurrence in the policy network to memorize skills, PEARL decouples this from the policy by using a network to encode previous experiences into a latent context variable. This latent context variable is used to condition both actor (policy) and critic (value) architectures, We will use Derkgym as the multi-agent gym environment to test the PEARL algorithm. We will first train the agent against another agent that will only perform random actions. After collecting this experience, we will then expose the agent to new opponents with different skillsets. Using meta-RL, we anticipate that the agent will be able to adapt to new opponents quicker. The code and video presentation can be accessed using this link: https://drive.google.com/drive/folders/1JTQ2ycmXNRA2OSZ7f-fipg94HBnQCR0V?usp=sharing

  • Multi-Agent Traffic Learning based on MARL

    The project mainly focuses on the problem of traffic simulation. Concretely, this problem refers to having the agents controlled by computers learn to simulate the behavior that people will do when driving in different environments. We plan to reproduce a multiagent reinforcement learning (MARL) algorithm called Coordinated Policy Optimization (CoPo) for this problem. For the method evaluation, we plan to deploy this model to a driving simulator called MetaDrive, including five typical traffic environments, and analyze the behaviors of the individuals.

  • Reinforcement Learning for Football Player Training

    As the most popular sport on the planet, millions of fans enjoy watching Sergio Agüero, Raheem Sterling, and Kevin de Bruyne on the field. Football video games are less lively, but still immensely popular, and we wonder if AI agents would be able to play those properly. Researchers want to explore AI agents’ ability to play in complex settings like football. The sport requires a balance of short-term control, learned concepts such as passing, and high-level strategy, which can be difficult to teach agents. We apply RL to train the agents to see if we can achieve good performances.

  • 2 vs 2 soccer game

    In this project we are going to investigate Multi-Agent Reinforcement Learning (MARL). Unlike vanilla RL tasks, MARL involves training multiple agents to learn and make decisions based on their interactions with both the environment and other agents. To illustrate the potential of MARL, we build a 2 vs 2 soccer game in Unity and apply different MARL algorithms to it. We then use the ELO score to evaluate the performance of the multi-agent algorithms and compare them to single-agent algorithms running on MA systems. In the end, we design a fighting experiment where two agents trained by different algorithms play against each other and clearly demonstrate how their behavior varies.

  • Multi-Agent Reinforcement Learning for Generating Optimal Market Liquidation Strategy

    We will investigate the effectiveness of Multi-Agent Reinforcement Learning(MARL) for obtaining optimal liquidation strategies in Financial Markets. Liquidation is a sequential decision-making problem where an agent strategises to sell a large number of assets in a given time period taking into account a financial market environment. Since state changes in such an environment are not subject to one but multiple such agents, we seek to frame this as a Multi-agent Reinforcement Learning Problem.

  • Exploring Deep Transformer Q-Networks (DTQN)

    In recent years transformers have been vastly used and studied for a variety of supervised learning tasks, especially in the domain of natural language processing (NLP). To utilize their power in capturing long dependencies in sequences, scalability, simplicity and efficiency, transformers have been fused with reinforcement learning tasks. This however till the advert of DTQN was done primarily in an offline RL fashion. Also, previous methods reformulate RL tasks as sequence prediction tasks. DTQN follows the traditional view of Deep Q-learning and can use transformers in an online fashion. We wish to embark upon understanding and dissecting the intricate workings of DTQN in this project.

  • Room rearrangement and indoor environment exploration

    We investigate the problem of room rearrangement, where an agent explores a room, then attempts to restore the objects in a room to their original state.

  • Exploring Multi-Agent Reinforcement Learning in Metadrive Simulation

    Autonomous vehicle driving has always been one of the most significant field that AI could be applied on. This problem could be solved with making the agent learn gradually improve its policy with its exprience in the environment – Reinforcement Learning. One senario of this problem is that all vehicles are self-driven particles making this problem a multi-agent system. This project seeks to research MARL under this situation, explore the optimization of coordinated policy in this multi-agent environment.

  • Human-AI Shared Control via Policy Dissection

    In many complex tasks, RL-trained policy may not solve the tasks efficiently and correctly. The training process may cost too much time. Policy dissection is a frequency-based method, which can convert RL-trained policy into target-conditioned policy. For this method, human can interacte with AI inorder to get a more good and efficeient result. In this project, we want to explore the human-AI control and implementation of the policy dissection. We will add new action in MetaDrive enviornment.

  • Exploring Generalizability in Autonomous Driving Simulation using MetaDrive

    One of the most significant challenges in artificial intelligence is that of autonomous vehicle control. The problem is usually modeled as a continuous stream of actions accompanied by feedback from the environment, with the goal of the “agent” being to gradually improve control policies through experience and system feedback via reinforcement learning. A key facet of testing and improving autonomous driving is the use of simulators. Simulators provide an environment to experiment, train and test RL algorithms on benchmarks of success, generalization and safety, without requiring physical resources and environments. This project seeks to understand the functionalities of different autonomous driving simulators, experimenting with multiple algorithms and then moving onto generalization by comparing training on a single generalized environment, and a cascade of environments and determining the optimal configurations that can maximize generalized return.

  • Meta Learning for Reinforcement Learning

    For our project, we plan on applying MAML to more difficult RL task distributions than was used in the original MAML paper. Specifically we aim to investigate one and few-shot learning for various locomotion tasks of different creature types.

  • Reinforcement Learning for Autonomous Driving in a Realistic Simulated Urban Environment

    Investigating the performance of various reinforcement learning strategies for self driving vehicles.

  • Reinforcement Learning Implementation on Financial Portfolio Optimization

    We want to implement Reinforcement learning on the topic of financial Portfolio optimization, which is one of the most important topic in assets mangement industry. We want to show the potential ability of Reinforcement learning on this area and, ideally, we will able to show that our model could create a portfolio which has much better performance than main index in financial market.

  • Speed up Autonomous Driving with Safe Reinforcement Learning in Metadrive

    In MetaDrive, safety is the first priority. We want to explore how to speed up auto-drive agents with Safe Reinforcement Learning and find the optimal speed in diverse driving scenarios.

  • Speed up Autonomous Driving with Safe Reinforcement Learning in Metadrive

    In MetaDrive, safety is the first priority. We want to explore how to speed up auto-drive agents with Safe Reinforcement Learning and find the optimal speed in diverse driving scenarios.

  • Reinforcement Learning for Recommendation Systems

    While traditional recommendation methods have significantly improved user experience, they are often myopic and need a constant feedback loop to the user. In this implementation, we use Offline Reinforcement Learning to build an agent that can recommend music tracks to a user to minimize the skip rate. We leverage open-sourced Spotify data to build an environment using item embeddings, simulate the rewards based on skips in the user session, and train an agent on this offline data of sequential user interaction. The underlying agent utilizes LIRD, an implementation of the Deep Deterministic Policy gradient algorithm, to optimize recommendations. The reward is received from user behavior when deployed online and simulated during training. The model outperforms the baseline model in minimizing the skip rate and generates diverse recommendations.

  • Ensemble RL - Apply Ensemble Methods on Reinforcement Models using MetaDrive

    Ensemble method helps improve models’ performance by combining multiple models instead of using a single model. These methods are wildly used in many Machine Learning tasks. However, there is not much implementation in the Reinforcement Learning area. In this project, we will apply ensemble methods in Reinforcement Learning to our autonomous driving task based on the MetaDrive platform. We trained Proximal Policy Optimization (PPO), Twin Delayed DDPG (TD3), Generative Adversarial Imitation Learning (GAIL), and Soft Actor-Critic (SAC) models as baseline models. We investigated different ensemble methods based on these models. The overall result for the model after the ensemble is slightly better than the ones without the ensemble, but in some cases, we gain much better results.

  • Reinforcement Learning in Finance/Trading

    In this project, we are aiming to apply reinforcement learning techniques on real finance market data. To start, we would adapt the OpenBB Terminal environment and begin with stock market data. By using OpenBB, we are interested in analyze the real-world finance data and discover how reinforcement learning can be used to predict the market. During the initial stage, we would reproduce the results from the contributors. Afterwards, we would love to explore the effectiveness the system has on various data we gather.

  • Policy Gradient in Active Learning of Graph Neural Networks

    Neural Networks have been proven to be powerful and effective in many tasks. However, training such networks usually requires huge amount of data. In supervised learning paradigm, labelling data involves professional domain knowledge and thus becomes expensive and sometimes impratical. To address the data shortage, active learning mothods have been proposed which focuses on how to efficiently label most valuable data samples to reduce annotation cost. In this project, we investigate the application of Policy Gradient in active learning of Graph Neural Networks (GNN). We mainly focus on the affect of better states representation and effectiveness of different policy gradient backbones.

  • Autonomous driving based on CARLA

    These years, the tech giant companies have set off a boom in the development of autonomous driving, Waymo began as the Google self-driving car project in 2009, and the Tesla Motors announced its first version of Autopilot in 2014. So our team would also like to explore the autonomous driving skills by reinforcement learning in the CALRA simulator.

  • Transformer in Reinforcement Learning

    Decision Transformers are a new type of machine learning model that enable the combination of transformers with reinforcement learning, opening up new avenues of research and application.

  • Design Our Model Based on What is Provided by CoPO

    The project will consist of two parts. In the first part, we will reproduce the results from Independent Policy Optimization[5], IPO, (implemented with PPO), MFPO, CL, CoPO. Fine-tuning of the models will be attempted to explore possibly better results. During the second part, we will design, implement, train, and test our own models. We plan to design our model based on what is provided by CoPO.

  • The Emergence of Roles in Multi-agent Traffic Learning

    We are interested in applying role-based multi-agent reinforcement learning (MARL) to self-driven particle systems. Specifically, we are inspired by role-based hierarchical MARL frameworks (Wang et al., 2020a; Wang et al., 2020b), which explicitly models agents as having different roles, with each role having its own policy. In this way, the search space for the policy is greatly reduced as the joint action space for roles is much smaller than the joint action space for all agents. We hope to further explore this idea in the MetaDrive environment to study the emergence of roles in the traffic setting. In conclusion, our group hopes to apply various hierarchy-based MARL algorithms to autonomous driving scenarios, and compare our results with existing methods such as (Peng et al., 2021).

  • RL-based Trajectory Planning for Dental Surgery

    With advancement in robotic technologies, in the field of medical surgery, artificial intelligence (AI) and machine learning (ML) plays a pivotal role in medical imaging, prognosis/diagnosis, treatment assistant, and automation of repetitive subtasks of surgery. However, specifically in dental surgery, reinforcement learning (RL) has not been actively applied while it has a lot in common with robot machining where RL is widely used for fully-automated trajectory generation. Therefore, in this article, by developing an RL-based tool trajectory planner, we would like to enable robot arm to automatically perform a few subtasks in dental surgery such as tooth preparation for crown through removal of cavity. We will use CopelliaSim for simulation environments, Python/Pytorch for RL, and MATLAB for robotics. The expected result of this project would be a RL-based, collision-free, fully-automated dental tool trajectory generation algorithm.

  • The Decision Transformer: A Conditional Sequential Model in Reinforcement Learning

    The development of Transformers in machine learning has allowed the possibilities of high-dimensional distribution models of semantic concepts at scale. However, the applications of transformers have mostly been limited to language generalization and image generation. Therefore, this article introduces the Decision Transformer (DT) [42], an offline reinforcement learning (RL) method. The DT uses conditional sequence modelling which allows it to leverage the simplicity and scalability of the Transformer. In the DT, an autoregressive generative model is conditioned on the return, previous states and actions. This enables the DT to obtain future actions with the desired return. Through comprehensive experiments using the OpenAI Gym of the DT against state-of-the-art model-free offline RL baselines, the DT remains extremely competitive and outperforms the other models.

  • Post Template

    This block is a brief introduction of your project and must be finished. The introduction will be shown in the main page of this website.

  • An Overview of Deep Learning for Curious People (Sample post)

    Starting earlier this year, I grew a strong curiosity of deep learning and spent some time reading about this field. To document what I’ve learned and to provide some interesting pointers to people with similar interests, I wrote this overview of deep learning models and their applications.