[HN Gopher] The Surprising Effectiveness of PPO in Cooperative M...
___________________________________________________________________
The Surprising Effectiveness of PPO in Cooperative Multi-Agent
Games
Author : jonbaer
Score : 48 points
Date : 2021-07-14 15:13 UTC (7 hours ago)
(HTM) web link (bair.berkeley.edu)
(TXT) w3m dump (bair.berkeley.edu)
| isaacimagine wrote:
| PPO is awesome, but so is GPT-style reward-trajectory prediction!
| http://arxiv.org/pdf/2106.01345v1.
|
| As a RL hobbyist, I'd love to see some sort of hybrid approach.
| Thoughts?
| bsder wrote:
| Are these things amazingly effective or are they simply
| demonstrating that Starcraft/DOTA aren't as difficult as we
| thought?
| seabird wrote:
| Neither. Once knowledgeable people get a read on these type of
| things, they can usually handle it. The OpenAI Dota 2 "team"
| was open for the public to play -- it was certainly very good,
| but multiple teams beat it, sometimes even multiple times in a
| row. It was great at cheesy stuff like superhuman Force Staff
| plays that humans could never reliably pull off, but could be
| beat through macro pressure.
| ddoran wrote:
| PPO = Proximal Policy Optimization
|
| [https://openai.com/blog/openai-baselines-ppo/]
| jdlyga wrote:
| Thank you!
| Robotbeat wrote:
| Indeed. I looked for the definition in the whole webpage but
| couldn't find it. Even Googling initially failed.
| https://arxiv.org/abs/1707.06347
| throwaway81523 wrote:
| Yeah, did the same, then looked at the linked article. Its
| abstract: Proximal Policy Optimization
| (PPO) is a popular on-policy reinforcement learning
| algorithm but is significantly less utilized than off-
| policy learning algorithms in multi-agent settings. This is
| often due the belief that on-policy methods are
| significantly less sample efficient than their off-policy
| counterparts in multi-agent problems. In this work, we
| investigate Multi-Agent PPO (MAPPO), a variant of PPO which
| is specialized for multi-agent settings. Using a 1-GPU
| desktop, we show that MAPPO achieves surprisingly strong
| performance in three popular multi-agent testbeds: the
| particle-world environments, the Starcraft multi-agent
| challenge, and the Hanabi challenge, with minimal
| hyperparameter tuning and without any domain-specific
| algorithmic modifications or architectures. In the majority
| of environments, we find that compared to off-policy
| baselines, MAPPO achieves strong results while exhibiting
| comparable sample efficiency. Finally, through ablation
| studies, we present the implementation and algorithmic
| factors which are most influential to MAPPO's practical
| performance.
| cratermoon wrote:
| https://jonathan-hui.medium.com/rl-proximal-policy-optimizat...
___________________________________________________________________
(page generated 2021-07-14 23:01 UTC)