[HN Gopher] Scaling Up Reinforcement Learning for Traffic Smoothing
       ___________________________________________________________________
        
       Scaling Up Reinforcement Learning for Traffic Smoothing
        
       Author : saeedesmaili
       Score  : 69 points
       Date   : 2025-04-02 12:41 UTC (3 days ago)
        
 (HTM) web link (bair.berkeley.edu)
 (TXT) w3m dump (bair.berkeley.edu)
        
       | efavdb wrote:
       | Pretty interesting. I'm surprised that the throughput drops with
       | traffic, especially all the way to zero in the first plot.
       | 
       | Definitely frustrating to drive through these and great point
       | that it's bad for efficiency.
        
       | nn3 wrote:
       | Does the really need reinforcement learning? It seems like
       | something that classical controller should be able to do.
        
         | evinitsky wrote:
         | One of the authors here. It's a somewhat nuanced answer. In
         | principle, I think a classical controller would have been fine
         | here and if you read the paper (might be in one of the other
         | papers) we do benchmark a bunch of them. But what's really nice
         | about RL is what it does to the workflow. We can add a sensor,
         | drop a sensor, change the dynamics of the system, and have a
         | functional controller the next day. It trades compute for
         | control engineer time. On a secondary small point, the dynamics
         | of the cruise control cars are an unpleasant switched system
         | and there's a lot of partial observability, we never fully
         | sense the traffic state, we didn't even have direct
         | measurements of the distance to the car in front, and the
         | individual car control decisions are coupled to macroscopic
         | effects on the system i.e. since all the cars have the same
         | policy their decisions actually affect the traffic flow. So,
         | it's not a trivial control design problem at all.
        
       | tonetegeatinst wrote:
       | Are the cars used ICE? I would think a electric vehicle would be
       | better for the environment, and less susceptible to fluctuations
       | in gas prices.
       | 
       | If you used EV's you also have a fleet or high density energy
       | storage when not in use
        
         | evinitsky wrote:
         | These are mostly Nissan Rogue's since that's the thing we could
         | get 100 of
        
       | pornel wrote:
       | It's nice that this worked without need for communication between
       | cars.
       | 
       | This should be a built-in feature of adaptive cruise control in
       | regular cars.
        
         | evinitsky wrote:
         | We're trying to convince folks that this should be the case!
        
       | taberiand wrote:
       | There's a certain satisfaction in anticipating these stop and go
       | waves while driving, and timing it so that you catch the tail
       | just as it starts moving again - the goal being to use the brake
       | as little as possible and ideally only need to adjust
       | acceleration. I don't really get why people feel the need to
       | repeatedly accelerate up and then slam on the brakes, when
       | leaving reasonable gaps makes everything go smoother.
        
         | gorpy7 wrote:
         | depends on the distance between stoplights. but, in general,
         | this will reduce throughput- if you're the lead car and you
         | accelerate slowly when it turns green then the 10th car may not
         | get through the light. and if you slow down early and gently,
         | that'll ripple backward. slowing down gently also doesn't let
         | the cars pack densely quickly and here again you can't get
         | enough cars in the space between lights, especially over
         | freeways/bridges. sometimes you'll see zipper merges just to
         | combat this low density as cars accelerate.
        
           | mitthrowaway2 wrote:
           | Yeah it's not a technique intended for gridlock city traffic
           | where you need cars to squeeze through a light and then pack
           | together. It's very good for some other scenarios though. I
           | think the sorts of people who put enough thought into driving
           | to delete traffic waves are also aware enough to know when
           | it's not appropriate to use that technique.
        
           | taberiand wrote:
           | I'm talking about highway driving. At traffic lights the
           | behaviour I see is people stacking up in one lane when
           | there's a zipper merge across the intersection - I don't get
           | this either, but it's good for me because there's plenty of
           | room to skip past that line and merge in while the stack
           | dawdles across the road, the inchworm-style traffic movement
           | leaving plenty of space between each car
        
         | TeMPOraL wrote:
         | > _VI don 't really get why people feel the need to repeatedly
         | accelerate up and then slam on the brakes, when leaving
         | reasonable gaps makes everything go smoother._
         | 
         | My feeling observing some drivers: because they feel like if
         | they leave a gap of more than 110% of a width of an average car
         | (which is way below reasonable, not to mention _safe_ ), some
         | idiot will immediately slot themselves into that gap. Which
         | shouldn't even matter to them, but somehow they prefer to not
         | leave a gap than to risk another take it.
        
       | gsf_emergency_2 wrote:
       | Describing human intelligences as self-drivers on the highway of
       | progress, one may build the following dictionary with the help of
       | OP's graph                 Traffic density -> urban pop. density
       | Traffic flow -> rate of new ideas productively implemented
       | Partial observability -> democracy*            Reinforcement
       | learning -> augmenting individual expression
       | Oxygen/lithium -> money
       | 
       | *Incorporating further insights from OP:
       | https://news.ycombinator.com/item?id=43589398
        
         | gsf_emergency_2 wrote:
         | Today my mind (system 2) almost feigned surprise*:
         | 
         | The intersection of RL & policy (/politics) is a blind spot of
         | HN*
         | 
         | *(Having factored out emotion+humour)
        
       | peepeepoopoo117 wrote:
       | It's so refreshing to see real solutions to transportation
       | problems instead of pie in the sky "burn it all down and start
       | from scratch" thinking.
        
         | evinitsky wrote:
         | (co-author here). US transit systems have state dependence. I
         | suspect most of us are transit advocates but it seems clear
         | that minimizing present harms is good
        
       | Coneylake wrote:
       | How do lane changes affect this work?
        
         | mitthrowaway2 wrote:
         | I'm not the author, but I do a similar driving pattern when I
         | encounter traffic waves. Lane changes actually end up making
         | little difference. Even when I leave a lot of space ahead of me
         | as a buffer to close a gap, only the very most aggressive
         | drivers tend to enter it for the purpose of gaining ground,
         | because it doesn't help them much at all -- the cars at the
         | leading end of that open gap are usually stopped. However,
         | leaving these gaps does help increase fluidity for those
         | drivers who need to change lanes in order to eg. exit the
         | highway, which in turn reduces traffic.
        
       | jgord wrote:
       | Great exposition on that page .. with the animations clearly
       | explaining the phenomenon [ yet not preventing reading ]
       | 
       | I expect to see many small startups using RL to solve realworld
       | B2B problems, of this flavor, that were previously too-hard to
       | tackle.
        
       | schobi wrote:
       | This is an interesting idea, but I'm sceptical of the advantages.
       | 
       | "there is more Co2" is valid for cars burning fuel. But as soon
       | as you recuperate (even in a hybrid) you might only have a
       | fraction of the losses any more. Adjusting the speed more
       | aggressively is possible, without breaking, with little loss. I
       | totally agree that stop-and-go is annoying, but looking into the
       | future, Co2 should not be a reason for the vehicles in 5-10 years
       | when the research can be rolled out.
       | 
       | Is "slamming the brakes" still happening? Around here you have
       | dynamic speed limit signs on the highway. In high traffic
       | everybody then goes a little slower, but smoothly.
       | 
       | I suspect that if a road is loaded beyond max throughput, this
       | method will also fail, even harder. Let me explain: I remember a
       | graph from communications theory. With improving error correcting
       | codes in transmission, you can get a clean signal for even worse
       | channel conditions. But once it fails you will not have a signal
       | any more. The better the code, the steeper the cutoff. Whereas
       | without in FM radio, the degrade in user experience is also
       | gradual.
       | 
       | So the analogy goes like this: I would expect that you could
       | possibly load the road with another 10% more vehicles. But if one
       | day you have 15% more, the blockage will be even worse than
       | before. Could be worth simulating throughput for various loading
       | situations.
        
       | jwlit wrote:
       | Apologies for people who've already seen this (it's pretty old
       | and comes round fairly frequently on HN), but for those
       | previously unaware, http://trafficwaves.org/ is one of the best
       | sites digging into the phenomenon from an "educated layman's"
       | perspective.
        
       ___________________________________________________________________
       (page generated 2025-04-05 23:02 UTC)