https://spectrum.ieee.org/200-years-ago-faraday-invented-the-electric-motor [ ] IEEE.orgIEEE Xplore Digital LibraryIEEE StandardsMore Sites Sign InJoin IEEE 200 Years Ago, Faraday Invented the Electric Motor Share FOR THE TECHNOLOGY INSIDER [ ] Explore by topic AerospaceArtificial IntelligenceBiomedicalComputingConsumer ElectronicsEnergyHistory of TechnologyRoboticsSemiconductorsSensors TelecommunicationsTransportation FOR THE TECHNOLOGY INSIDER Topics AerospaceArtificial IntelligenceBiomedicalComputingConsumer ElectronicsEnergyHistory of TechnologyRoboticsSemiconductorsSensors TelecommunicationsTransportation Sections FeaturesNewsOpinionCareersDIYEngineering Resources More Special ReportsExplainersPodcastsVideosNewslettersTop Programming LanguagesRobots Guide For IEEE Members The MagazineThe Institute For IEEE Members The MagazineThe Institute IEEE Spectrum About UsContact UsReprints & PermissionsAdvertising Follow IEEE Spectrum Support IEEE Spectrum IEEE Spectrum is the flagship publication of the IEEE -- the world's largest professional organization devoted to engineering and applied sciences. Our articles, podcasts, and infographics inform our readers about developments in technology, engineering, and science. Join IEEE Subscribe About IEEEContact & SupportAccessibilityNondiscrimination PolicyTerms IEEE Privacy Policy (c) Copyright 2021 IEEE -- All rights reserved. A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. IEEE websites place cookies on your device to give you the best user experience. By using our websites, you agree to the placement of these cookies. To learn more, read our Privacy Policy. view privacy policy accept & close Enjoy more free content and benefits by creating an account Saving articles to read later requires an IEEE Spectrum account The Institute content is only available for members Downloading full PDF issues is exclusive for IEEE Members Access to Spectrum's Digital Edition is exclusive for IEEE Members Following topics is a feature exclusive for IEEE Members Adding your response to an article requires an IEEE Spectrum account Create an account to access more content and features on IEEE Spectrum, including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE. Join the world's largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum's articles, archives, PDF downloads, and other benefits. Learn more - CREATE AN ACCOUNTSIGN IN JOIN IEEESIGN IN Enjoy more free content and benefits by creating an account Create an account to access more content and features on IEEE Spectrum, including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE. CREATE AN ACCOUNTSIGN IN Type Article Topic History of Technology Careers Magazine 200 Years Ago, Faraday Invented the Electric Motor After Faraday published his results, his mentor accused him of plagiarism Allison Marsh 27 Aug 2021 7 min read Michael Faraday created this model of his electric motor in 1822, a year after his discovery. Michael Faraday created this model of his electric motor in 1822, a year after his discovery. Royal Institution of Great Britain/Science Source electric motors induction ring electric transformer dynamo faraday In 1820, the Danish physicist Hans Christian Orsted threw electromagnetic theory into a state of confusion. Natural philosophers of the day believed that electricity and magnetism were two distinct phenomena, but Orsted suggested that the flow of electricity through a wire created a magnetic field around it. The French physicist Andre-Marie Ampere saw a demonstration of Orsted's experiment in which an electric current deflected a magnetic needle, and he then developed a mathematical theory to explain the relationship. English scientist Michael Faraday soon entered the fray, when Richard Phillips, editor of the Annals of Philosophy, asked him to write a historical account of electromagnetism, a field that was only about two years old and clearly in a state of flux. Faraday was an interesting choice for this task, as Nancy Forbes and Basil Mahon recount in their 2014 book Faraday, Maxwell, and the Electromagnetic Field. Born in 1791, he received only a barebones education at church school in his village of Newington, Surrey (now part of South London). At the age of 14 he was apprenticed to a bookbinder. He read many of the books he bound and continued to look for opportunities to learn more. In a fateful turn of events, just as Faraday's apprenticeship was coming to an end in 1812, one of the bookbinder's clients offered Faraday a ticket to Humphry Davy's farewell lecture series at the Royal Institution of Great Britain. Davy, just 13 years older than Faraday, had already made a name for himself as a chemist. He had discovered sodium, potassium, and several compounds and invented the miner's safety lamp. Plus he was a charismatic speaker. Faraday took detailed notes of the lectures and sent a copy to Davy with a request for employment. When a position opened as a chemistry assistant at the Royal Institution, Davy hired Faraday. Images of Faraday and Davy After Faraday [left] failed to acknowledge his mentor, Humphry Davy [right], in an 1821 paper on the electric motor, Davy accused him of plagiarism.LEFT: ULLSTEIN BILD/GETTY IMAGES; RIGHT: BETTMANN/GETTY IMAGES Davy mentored Faraday and taught him the principles of chemistry. Faraday had an insatiable curiosity, and his reputation at the Royal Institution grew. But when Phillips asked Faraday to write the review article for the Annals, he had only dabbled in electromagnetism and was a bit daunted by Ampere's mathematics. At heart, Faraday was an experimentalist, so in order to write a thorough account, he re-created Orsted's experiments and tried to follow Ampere's reasoning. His "Historical Sketch of Electro-Magnetism," published anonymously in the Annals, described the state of the field, the current research questions and experimental apparatus, the theoretical developments, and the major players. (For a good summary of Faraday's article, see Aaron D. Cobb's "Michael Faraday's 'Historical Sketch of Electro-Magnetism' and the Theory-Dependence of Experimentation," in the December 2009 issue of Philosophy of Science.) While reconstructing Orsted's experiments, Faraday was not entirely convinced that electricity acted like a fluid, running through wires just as water runs through pipes. Instead, he thought of electricity as vibrations resulting from tension between conducting materials. These thoughts kept him experimenting. Faraday observed the circular rotation of a wire as it was attracted and repelled by magnetic poles. "Very satisfactory," he wrote in his notebook. On 3 September 1821, Faraday observed the circular rotation of a wire as it was attracted and repelled by magnetic poles. He sketched in his notebook a clockwise rotation around the south pole of the magnet, and the reverse around the north pole. "Very satisfactory," he wrote in his entry on the day's experiment, "but make more sensible apparatus." The next day, he got it right. He took a deep glass vessel, secured a magnet upright in it with some wax, and then filled the vessel with mercury until the magnetic pole was just above the surface. He floated a stiff wire in the mercury and connected the apparatus to a battery. When a current ran through the circuit, it generated a circular magnetic field around the wire. As the current in the wire interacted with the permanent magnet fixed to the bottom of the dish, the wire rotated clockwise. On the other side of the apparatus, the wire was fixed and the magnet was allowed to move freely, which it did in a circle around the wire. For a helpful animation of Faraday's apparatus, see this tutorial created by the National High Magnetic Field Laboratory. And if you'd like to build your own Faraday motor, this video will walk you through it: Although a great proof of concept, Faraday's device was not exactly useful, except as a parlor trick. Soon, people were snatching up pocket-size motors as novelty gifts. Although Faraday's original motor no longer exists, one that he built the following year does; it's in the collections of the Royal Institution and pictured at top. This simple-looking contraption is the earliest example of an electric motor, the first device to turn electrical energy into mechanical motion. The fallout from Faraday's invention Faraday knew the power of quick publication, and in less than a month he wrote an article, "On Some New Electromagnetic Motions and the Theory of Electromagnetism," which was published in the next issue of the Quarterly Journal of Science, Literature, and the Arts. Unfortunately, Faraday did not appreciate the necessity of fully acknowledging others' contributions to the discovery. Within a week of publication, Humphry Davy dealt his mentee a devastating blow by accusing Faraday of plagiarism. Davy had a notoriously sensitive ego. He was also upset that Faraday failed to adequately credit his friend William Hyde Wollaston, who had been studying the problem of rotary motion with currents and magnets for more than a year. Faraday mentions both men in his article, as well as Ampere, Orsted, and some others. But he doesn't credit anyone as a collaborator, influencer, or codiscoverer. Faraday didn't work directly with Davy and Wollaston on their experiments, but he did overhear a conversation between them and understood the direction of their work. Plus it was (and still is) a common practice to credit your adviser in early publications. When Faraday's reputation began to eclipse that of his mentor's, Faraday made several missteps while navigating the cutthroat, time-sensitive world of academic publishing. Faraday fought to clear his name against the charge of plagiarism and mostly succeeded, although his relationship with Davy remained strained. When Faraday was elected a fellow of the Royal Society in 1824, the sole dissenting vote was cast by the society's president, Humphry Davy. Faraday avoided working in the field of electromagnetism for the next few years. Whether that was his own choice or a choice thrust upon him by Davy's assigning him time-consuming duties within the Royal Institution is an open question. One of Faraday's assignments was to salvage the finances of the Royal Institution, which he did by reinvigorating the lecture series and introducing a popular Christmas lecture. Then in 1825 the Royal Society asked him to lead the Committee for the Improvement of Glass for Optical Purposes, an attempt to revive the British glass industry, which had lost ground to French and German lens makers. This was tedious, bureaucratic work that Faraday undertook as a patriotic duty, but the drudgery and relentless failures took a mental toll. Faraday's experiments of 1831 yield the transformer and the dynamo In 1831, two years after Davy's death and after the completion of Faraday's work on the glass committee, he returned to experimenting with electricity, by way of acoustics. He teamed up with Charles Wheatstone to study sound vibrations. Faraday was particularly interested in how sound vibrations could be seen when a violin bow is pulled across a metal plate lightly covered with sand, creating distinct patterns known as Chladni figures. This video shows the phenomenon in action: Resonance Experiment! (Full Version - With Tones) www.youtube.com Faraday looked at nonlinear standing waves that form on liquid surfaces, which are now known as Faraday waves or Faraday ripples. He published his research, "On a peculiar class of acoustical figures; and on certain forms assumed by groups of particles upon vibrating elastic surfaces," in the Royal Society's Philosophical Transactions. Still convinced that electricity was somehow vibratory, Faraday wondered if electric current passing through a conductor could induce a current in an adjacent conductor. This led him to one of his most famous inventions and experiments: the induction ring. On 29 August 1831, Faraday detailed in his notebook his experiment with a specially prepared iron ring. He wrapped one side of the ring with three lengths of insulated copper wire, each about 24 feet (7 meters) long. The other side, he wrapped with about 60 feet (18 meters) of insulated copper wire. (Although he only describes the assembled ring, it likely took him many days to wrap the wire. Modern experimenters who built a replica spent 10 days on it.) He then began charging one side of the ring and looking at the effects on a magnetic needle a short distance away. To his delight, he was able to induce an electric current from one set of wires to the other, thus creating the first electric transformer. Faraday\u2019s 29 August 1831 notebook entry describes his experiment with a wire-bound iron induction ring\u2014the first electric transformer. Faraday's 29 August 1831 notebook entry describes his experiment with a wire-bound iron induction ring--the first electric transformer.HULTON ARCHIVE/GETTY IMAGES Faraday continued experimenting into the fall of 1831, this time with a permanent magnet. He discovered that he could produce a constant current by rotating a copper disk between the two poles of a permanent magnet. This was the first dynamo, and the direct ancestor of truly useful electric motors. Two hundred years after the discovery of the electric motor, Michael Faraday is rightfully remembered for all of his work in electromagnetism, as well as his skills as a chemist, lecturer, and experimentalist. But Faraday's complex relationship with Davy also speaks to the challenges of mentoring (and being mentored), publishing, and holding (or not) personal grudges. It is sometimes said that Faraday was Davy's greatest discovery, which is a little unfair to Davy, a worthy scientist in his own right. When Faraday's reputation began to eclipse that of his mentor's, Faraday made several missteps while navigating the cutthroat, time-sensitive world of academic publishing. But he continued to do his job--and do it well--creating lasting contributions to the Royal Institution. A decade after his first breakthrough in electromagnetism, he surpassed himself with another. Not bad for a self-taught man with a shaky grasp of mathematics. Part of a continuing series looking at photographs of historical artifacts that embrace the boundless potential of technology. An abridged version of this article appears in the September 2021 print issue as "The Electric Motor at 200." From Your Site Articles * May 1888: Tesla Files His Patents for the Electric Motor - IEEE ... > * The Triumph of the Electric Motor - IEEE Spectrum > Related Articles Around the Web * Michael Faraday's electric magnetic rotation apparatus (motor) | The ... > * The Electric Motor - Edison Tech Center > electric motors induction ring electric transformer dynamo faraday Allison Marsh Allison Marsh is a professor at the University of South Carolina and codirector of the university's Ann Johnson Institute for Science, Technology & Society. She combines her interests in engineering, history, and museum objects to write the Past Forward column, which tells the story of technology through historical artifacts. The Conversation (1) [defa] Partev Sarkissian 02 Sep, 2021 LM If you ever get to London, England, check out Faraday's lab on display at the the Royal Institute in Mayfair. Or check it out online,... https://www.rigb.org/our-history/michael-faraday/ magnetic-laboratory What a piece of history. 0 Replies Hide replies Show More Replies Photo of DAVID NAHAMOO The Institute Type Profile Topic Careers Tomorrow's AI Will Reason Like Humans, IBM Watson Developer Predicts 03 Sep 2021 4 min read Video Friday: Robot Gecko Smashes Face Into Tree Robotics Type News Topic Video Friday: Robot Gecko Smashes Face Into Tree 03 Sep 2021 3 min read Artificial neurons illustration Type News Topic Semiconductors These Super-Efficient, Artificial Neurons Do Not Use Electrons 03 Sep 2021 3 min read Related Stories Transportation Type News Topic New Electric Motor Could Boost Efficiency of EVs, Scooters, and Wind Turbines Semiconductors Type News Topic Carbon Nanotube Yarns Could Replace Copper Windings in Electric Motors Careers Type News Topic Start-up Profile: Axiflux Type Feature Topic Artificial Intelligence Greedy AI Agents Learn to Cooperate How to overcome reinforcement learning's inherent selfishness Somdeb Majumdar 02 Sep 2021 11 min read Small white figures collectively form a brain shape on a pink and orange background. Getty Images DarkGray Imagine you're sitting at a casino's poker table. Someone has explained the basic rules to you, but you've never played before and don't know even the simplest strategies. While this might sound like the setup for an anxiety dream, it's also a fair analogy for the beginning of a training session for a certain kind of artificial intelligence (AI) program. If an AI system was confronted with such a situation, it would commence taking random actions within the parameters of the rules--if playing five-card draw, for example, it wouldn't ask for seven cards. When, by dumb luck, it won a hand, it would take careful note of the actions that led to that reward. If it played the game for long enough, perhaps playing millions of hands, it could devise a good strategy for winning. This type of training is known as reinforcement learning (RL), and it's one of the most exciting areas of machine learning today. RL can be used to teach agents, be they pieces of software or physical robots, how to act to achieve certain goals. And it has been responsible for some of the most impressive triumphs by AI in recent years, such as AlphaGo's win at the board game of Go in a match against a top-ranked human professional. RL differs from another approach called supervised learning, in which systems are trained using an existing labeled dataset. To continue the poker example: In a supervised-learning regimen, the AI player would ingest data about millions of hands. Each data point would be labeled to describe how good or bad an action is for a given state of the game. This would allow the player to take good actions when it sees game states similar to those in the training data. This isn't a very practical way to train on such sequential decision-making problems, since building a dataset with a massive number of game states and actions is intractable. Reinforcement learning is responsible for some of the most impressive triumphs by AI in recent years. In contrast, RL offers a more effective way of training by allowing the player to interact with the world during the training. You don't need a labeled dataset for RL, which proves a big advantage when dealing with real-world applications that don't come with heaps of carefully curated observations. What's more, RL agents can learn strategies that enable them to act even in an uncertain and changing environment, taking their best guesses at the appropriate action when confronted with a new situation. One typical critique of RL is that it's inefficient, that it's just a glorified trial-and-error process that succeeds because of brute-force computing power put to it. But my research group at Intel's AI Lab has devised efficient techniques that can leverage RL for practical breakthroughs. We've been working on RL agents that can quickly figure out extremely complex tasks and that can work together in teams, putting the group's overall objective ahead of their own individual goals. We're planning soon to test our methods in robots and other autonomous systems to bring these achievements into the real world. In RL, we assume that the agent operates with some sort of dynamic environment and that it can at least partially observe the state of that environment. For example, an autonomous vehicle could sense the raw pixel values from an on-board camera, or it could take in more processed data like the location of pedestrians, cars, and lane markings. The environment must also reinforce the agent's actions with certain kinds of feedback--whether an autonomous vehicle reaches its destination without incident or crashes into a wall, say. This feedback signal is typically called the reward. In modern RL, the agents are typically deep neural networks, ones that map input observations to output actions. A common procedure is for an RL agent to begin by taking a bunch of random actions and logging the feedback signal for each, storing all of these in what's known as a replay buffer--essentially, the agent's memory. Over time, the agent creates a large dataset of experiences that are in the form of a state, an action, the next state, and any resulting reward. In reinforcement learning, there's a fundamental tension between exploiting an existing strategy and exploring alternatives. Using this data, the agent trains itself and comes up with a policy, or a way of acting in the environment, that will maximize its total reward. Its policy should get better over time as it learns, but the agent doesn't know whether its policy is optimal at any given point. So it has to make a decision: Should it keep choosing actions based on its current policy or deviate from it and explore new possibilities? If it chooses the former, it will never improve. Most RL agents therefore have an important mandate to sometimes ignore their current best policy in favor of trying new things. How often agents go "off-policy" is an extra parameter of the training system. Often, the exploration rate is kept high in the beginning of training and lowered as the agent gains experience. Whether we're dealing with an AI poker player, an autonomous vehicle, or a virtual stock trader, this tension between exploitation of an existing policy and exploration of alternatives is fundamental to RL. The challenges get even greater when an agent is acting in an environment with sparse rewards. In this situation, the environment provides a feedback signal very rarely--perhaps only at the end of a long multi-step task. So most of the agent's actions produce no helpful feedback. For example, our hypothetical AI poker player would only get a positive reward if it wins a hand, not if it had a good hand but was narrowly beaten by another player. The sparser the rewards, the more difficult the problem. To test RL agents' abilities in such tricky situations, many researchers have made use of a benchmark created by OpenAI called Mujoco Humanoid. Here, researchers must train a computer model of a 3D humanoid figure to walk for a fixed amount of time without falling. While walking sounds simple enough, it's an incredibly difficult task for an RL system to master. The RL agent's observations include the angles of all the humanoid's joints, each of which has three degrees of freedom. With such a complex array of possible states, a policy of random actions is almost guaranteed to fail. It's incredibly rare that the humanoid stays on its feet and takes enough successful steps to achieve a reward. For the Mujoco Humanoid challenge, we had many learners working on smaller problems such as not falling over and raising a foot. We came up with a novel solution, which we call CERL: Collaborative Evolutionary Reinforcement Learning. Our paper about it demonstrated that the challenge at hand can be broken down into two kinds of components: smaller problems for which the system can get some immediate feedback and the larger optimization problem that needs to be solved over a longer time span. We argued that for each of those smaller problems, we could make faster progress with a population of agents that jointly explore and share experiences. For our hypothetical AI poker player, this would be the equivalent of suddenly spawning many avatars and having them all play hands simultaneously to collectively come up with a strategy. For the Mujoco Humanoid challenge, we had many learners working on smaller problems such as not falling over, raising a foot, and so forth. The learners received immediate feedback as they tried to achieve these small goals. Each learner thus became an expert in its own skill area, skills that could contribute to the overall goal of sustained walking--although each learner has no chance of attaining that larger objective itself. The words Learner A on a blue background. On the right, it says "maintain balance" and above is a skeletal stick figure falling backwards. In our approach to the Mujoco Humanoid challenge, a number of "learners" worked on discrete skills, which an "actor" later put together into a complete walking strategy.Intel AI In a standard RL process, each agent has its own replay buffer, the memory bank it uses to learn what actions are good or bad. But in our design, we allowed all learners simultaneously to contribute to and draw from a single buffer. This meant that each learner could access the experiences of all the others, helping its own exploration and making it significantly more efficient at its own task. For while they were tackling discrete problems, they were all learning the same rules of basic physics. A second set of agents, which we called actors, aimed to synthesize all the small movements to achieve the larger objective of sustained walking. Because these agents rarely came close enough to this objective to register a reward, we didn't use RL here. Instead, we employed what's known as a genetic algorithm, a procedure that mimics biological evolution by natural selection. Genetic algorithms, which are a subtype of evolutionary algorithms, start with a population of possible solutions to a problem, and use a fitness function to gradually evolve toward the optimal solution. In each "generation" we initialized a set of actors, each with a different strategy for carrying out the walking task. We then ranked them by performance, retained the top performing ones, and threw away the rest. The next generation of actors were the "offspring" of the survivors and inherited their policies, though we varied these policies via both mutation (random changes in a single parent's policy) and crossover (mixing two parents' policies). Chart with the green CERL performance rising high while other programs, labeled Neuroevolution, TD3 and ERL have low flat lines Our system outperformed other baselines on the complex Mujoco Humanoid task.Intel AI On its own, evolutionary search is known to be extremely slow and inefficient, since it requires a lot of inputs to come up with a good solution. But it's also renowned for its completeness--if a solution exists, it will be found eventually. Our goal was to make use of this completeness while boosting search speed by exploiting fast RL methods. Our RL learners quickly provided reasonably good but sub-optimal solutions, which we inserted into the evolutionary search population to guide our actors towards better solutions. Our hybrid system speedily arrived at an optimal policy that enabled the Mujoco Humanoid to go for a stroll, and greatly outperformed other algorithms at that time. While a sparsity of rewards make RL hard enough, it becomes even more complicated when a task requires several agents to work cooperatively to achieve a common goal. For example, in a benchmark involving simulated Mars rovers, two rovers have to work together to find multiple targets in the shortest amount of time. For this task, we needed to train the individual rovers not only on skills like navigation, but also on cooperative strategies that would allow a pair of rovers to achieve a joint objective, even without communicating directly. Rover 1 scrapped its local objective and instead took the longer route to a different target--for the greater good of the team. Here, the global objective is for the team as a whole to visit the largest number of targets. To achieve that objective, each rover needs to learn how to navigate quickly to a target and also needs to learn how to strategize with its partner. At first, the rovers explore the landscape randomly, using LIDAR sensors to scan for targets. Over a given time interval, one rover might well stumble across a target, so we say that the local objective of navigating to a target has dense rewards. The global objective is achieved only if both rovers find targets, which is a much sparser reward signal. Imagine that both rovers have a certain target in view. Rover 1 has just enough fuel to reach the target, but can go no farther. In this scenario, the best team strategy is for Rover 1 to go to that visible destination and for Rover 2 to sacrifice its local objective--minimizing its time to a target--and to head out in search of other targets. 2 Red boxes with wheels and green lights in a rocky red Martian landscape, labeled Rover 1 and Rover 2 In a benchmark involving simulated rovers, the agents have to work together to achieve the overall goal. Intel AI This problem can be made tougher still by adding another requirement. Imagine that the teams are larger and that several rovers must reach a target simultaneously for it to count. This condition represents situations like search and rescue where multiple agents might be needed to complete a task, such as lifting a heavy beam. If fewer than the required number of rovers reach a target, they receive no reward at all. The rovers therefore have to learn the skills needed to find a target and must also learn to link up with others and visit targets together to achieve the team's global objective. What's more, at the outset, the rovers on a team don't know how many rovers must visit a target together--they get that information only when they are successful. To tackle this difficult multi-agent task, we extended our CERL framework. We presented our new technique, which we call Multiagent Evolutionary Reinforcement Learning (MERL), at the 2020 International Conference on Machine Learning. We again broke down the problem into two parts. Each rover used RL to master a local objective, such as reducing its distance to a target. But that success didn't help with the larger problems of forming alliances and maximizing the total number of targets visited. Again, we solved the global problem with evolutionary search. This time, we were working with teams, so we essentially made many copies of an entire rover team. Across those teams, all the Rover 1s shared a single replay buffer, as did all the Rover 2s, and so forth. We deliberately separated the replay buffers by rovers because it allowed each to focus on its own local learning. (We've run similar experiments with a simulated soccer team, where this approach enables goalies, strikers, and other players to learn different skills.) Because each target was counted only when enough rovers reached it, the rovers were required to work together. Just as in CERL, locally optimized policies were injected into the evolutionary search, which could try out the best policies from the Rover 1s, Rover 2s, and so on. Evolution only needed to deal with the larger team strategy. We compared the performance of MERL with that of another state-of-the-art system for multi-agent RL, the MADDPG algorithm from the University of California, Berkeley. First we tested our virtual robots on the simpler rover problem where only one rover must reach a target. We found that MERL reached more targets than MADDPG, and also saw interesting team behavior emerge in MERL. Animated gifs show yellow, pink and grey squares moving on a green background towards 4 grey target circles. Animated gifs show yellow, pink and grey squares moving on a green background towards 4 grey target circles. In our system [right], the red rover sacrifices its local objective to help the team. In this example, only one rover must reach a target for it to be counted. Intel AI In one example, Rovers 1 and 2 both set out towards the same target, but Rover 1 changed course mid-way and headed for a different target. That made sense: If both rovers reached the target, they wouldn't score additional points. So Rover 1 scrapped its local objective and instead took the longer route to a different target--for the greater good of the team. Animated gifs show yellow, dark pink, light pink, blue, orange, grey squares moving on a green background towards 4 grey target circles. Animated gifs show yellow, dark pink, light pink, blue, orange, grey squares moving on a green background towards 4 grey target circles. In this example, three rovers must reach a target for it to be counted. Our system [right] can handle this challenging task, but others failed at it. Intel AI When three rovers had to reach a target simultaneously, MADDPG completely failed and MERL's emergent team formation was even more obvious, a trend we increasingly observed as the required number of rovers mounted. We checked our work using several different multiagent benchmarks. In each case, the two-part optimization of MERL substantially outperformed existing state-of-the-art algorithms. At Intel's AI lab, we're also exploring how communication could help multi-agent systems optimize performance. In particular, we're investigating whether agents on a team that are in communication with one another can form a language of sorts. To give an example from the rover simulation: If we allowed each rover a limited bandwidth to communicate with the others, what kind of messages would it transmit? And would the rovers jointly come up with codewords for certain actions? This experiment could give us insight into how language develops to achieve a common goal. Autonomous systems of many forms are now becoming part of everyday life. While your Roomba is unlikely to do much damage, even if it went haywire, a robotic truck driving down the highway erratically could kill people. So we need to ensure that any agent that's been trained via RL will operate safely in the real world. How to do that, though, isn't particularly clear. We're exploring ways to define a common safety benchmark for various RL algorithms and a common framework which can be used to train RL agents to operate safely, regardless of the application. This is easier said than done, because an abstracted concept of safety is hard to define, and a task-specific definition of safety is hard to scale across tasks. It's important to figure out now how to get such systems to act safely, because we believe that RL systems have a big role to play in society. Today's AI excels at perception tasks such as object and speech recognition, but it's ill-suited to taking actions. For robots, self-driving cars, and other such autonomous systems, RL training will enable them to learn how to act in an environment with changing and unexpected conditions. In one ongoing test of our theories, we're using RL combined with search algorithms to teach robots how to develop successful trajectories with minimal interaction with the real world. This technique could allow robots to try out new actions without the risk of damaging themselves in the process. We're now applying knowledge acquired in this way to an actual bipedal robot at Oregon State University. Finally, in a leap from robotic systems to systems design, we have applied the same approach to improving various aspects of software and hardware systems. In a recent paper, we demonstrated that an RL agent can learn how to efficiently perform memory management on a hardware accelerator. Our approach, Evolutionary Graph RL, was able to almost double the speed of execution on hardware compared to the native compiler simply by efficiently allocating chunks of data to various memory components. This accomplishment and other recent works by the research community show that RL is moving from solving games to solving problems in real life. From Your Site Articles * Powerful AI Can Now Be Trained on a Single Computer - IEEE ... > * AI Agents Startle Researchers With Unexpected Hide-and-Seek ... > * AI Agents Play "Hide the Toilet Plunger" to Learn Deep Concepts ... > Related Articles Around the Web * Reinforcement learning - GeeksforGeeks > * Reinforcement Learning | Coursera > * Reinforcement learning - Wikipedia > Keep Reading | Show less