[HN Gopher] Why some DVLA digital services don't work at night
       ___________________________________________________________________
        
       Why some DVLA digital services don't work at night
        
       Author : edent
       Score  : 74 points
       Date   : 2025-01-12 20:20 UTC (4 days ago)
        
 (HTM) web link (dafyddvaughan.uk)
 (TXT) w3m dump (dafyddvaughan.uk)
        
       | pestatije wrote:
       | DVLA - Driver & Vehicle Licensing Agency
       | 
       | plus, since im already posting a comment: its because there is no
       | batch window to process transactions
        
       | ForHackernews wrote:
       | Unpopular opinion, but I think many systems would benefit from a
       | regular "downtime window". Not everything needs to be 24/7 high
       | availability.
       | 
       | Maybe not every night, but if you get users accustomed to the
       | idea that you're offline for 12 hours every Sunday morning, they
       | will not be angry when you need to be offline for 12 hours on a
       | Sunday morning to do maintenance.
       | 
       | The stock market closes, more things should close. We are paying
       | too high of a price for 99.999% uptime when 99.9% is plenty for
       | most applications.
        
         | kragen wrote:
         | Basically this happens because the DVLA and the stock market
         | don't have any competition. Customers in a competitive market
         | won't be angry when you need to be offline for 12 hours every
         | Sunday morning; they'll just switch to your competitor some
         | Sunday, because the competitor is providing them something they
         | value that you don't provide.
        
           | ForHackernews wrote:
           | Maybe they should regulate Sunday trading hours, or unionized
           | sysadmins should negotiate the end of on-call hours.
           | 
           | The red queen's race that you describe for ever-greater
           | scale, ever-greater availability is an example of the tragedy
           | of the commons. Think how much money and many human minds
           | have been wasted trying to squeeze out that last .0001% of
           | "zero downtime" when they could have been creating something
           | new.
           | 
           | "Keep doing the same thing, but more of it, harder" is a
           | recipe for a barren world of monoculture.
        
             | lifeoflejf wrote:
             | Bergen county NJ has blue laws that make it so non-grocery
             | stores must be closed on Sunday's. Maybe there's some value
             | in structuring a time where everybody is off?
             | 
             | Just like at work the only time I really get off is when
             | all of my customers are off. It's nice when the industry
             | sorta shuts off for a week or so around christmas
        
             | kragen wrote:
             | Something like that might plausibly be correct, though
             | you've exaggerated it to a level where it's clearly false.
             | 
             | If we steelman it to its most defensible essence, I think
             | what you're saying is that the cost of the human effort
             | needed to provide these higher uptimes exceeds the consumer
             | benefit (the value of being able to buy a camera on
             | Saturday), say. You could imagine, for example, that each
             | incremental improvement in uptime wins over a proportion of
             | the customer base providing a value that vastly exceeds its
             | cost -- but only until your competitors improve their own
             | offering to match, so all the surplus from all this uptime
             | improvement ultimately goes to the consumers, not the
             | producers.
             | 
             | There are two related holes in this idea.
             | 
             | The first is that producing consumer surplus is _what the
             | economy is for_ , in a moral sense. The reason producing
             | goods and services is a good thing to do is so that someone
             | will benefit from using them! So if all the effort that
             | sysadmins make goes into making services better for users,
             | that's a _good_ thing, not a bad thing.
             | 
             | The second is that nothing is stopping a new entrant from
             | offering a new, low-cost service that isn't as reliable. If
             | the cost of providing all that extra reliability (bundled
             | into the incumbents' pricing scheme) is higher than the
             | actual benefit to users, the users will switch to the
             | lower-cost, less-reliable service. This has happened many
             | times, in fact: less-reliable minicomputers stole business
             | from mainframes, less-reliable VoIP stole business from ATM
             | and SONET and SDH, all kinds of less-reliable plastic goods
             | have stolen business from all-metal versions, and now solar
             | panels are stealing business from coal power plants even
             | though solar panel "uptime" is like 30%.
             | 
             | So the particular market dynamics we're talking about
             | actually sensitively optimize the amount of effort given to
             | uptime to the economic optimum. There do exist lots of
             | market failures, but the particular dynamic we're
             | discussing is the opposite extreme from something like a
             | dollar auction.
        
             | abigail95 wrote:
             | Who is trying to achieve zero downtime? Facebook has
             | degraded service regularly it's just close enough to 99.9
             | that nobody cares.
             | 
             | If loading my messages times out I just move onto something
             | else and go back a few minutes later.
             | 
             | Surely they have metrics measuring that and don't think
             | it's worth the engineering effort to improve it.
        
               | kragen wrote:
               | One of the interesting things that came out of Google's
               | "SRE" system is that they deliberately add outages if
               | they don't have enough. They learned years ago that if
               | you build a service that promises 99% uptime and deliver
               | 99.99% uptime, other people in the company will come to
               | depend on that 99.99% uptime unintentionally. So they
               | chaos-monkey it to ensure that the inevitable failures
               | aren't catastrophic.
        
           | ajnin wrote:
           | The stock markets definitely have competition. For instance
           | Frankfurt, London, Paris or Amsterdam very much compete with
           | each other to offer desirable conditions for investors, and
           | companies will move their trading from one to another if it
           | is their interest. I think the fact they close at night is a
           | self-preservation mechanism, traders would become insane if
           | they had to worry about their positions 24/7.
        
             | kragen wrote:
             | There's a very strong network effect, and most stocks are
             | only listed on a single stock exchange, so in most contexts
             | the competition is very minimal.
        
         | OJFord wrote:
         | It only really works where the audience is already limited in
         | country/timezone though. Sure a global service could just
         | stagger the downtime around the world.. but (unless you've
         | already equivalent partitioned the infrastructure) then you're
         | just running 24/7 with arbitrary geofencing downtime on top.
        
         | jmwilson wrote:
         | Who works Sunday morning then?
         | 
         | The maintenance window will morph into a do-big-risky-changes
         | window, which means everybody in engineering will have to be
         | on-call. Many years ago, when I newly joined a FAANG, I asked,
         | "shouldn't I run this migration after hours when load is low?"
         | and the response was firm, "No, you'll run it when people are
         | around to fix things". It may not always be the answer, but in
         | general, I want to do maintenance when people are present and
         | willing to respond, not nights and weekends when they're
         | somewhere else and can't be found.
        
         | crazygringo wrote:
         | > _Not everything needs to be 24 /7 high availability._
         | 
         | If it makes you more money to be available 24/7 then why
         | _wouldn 't_ you?
         | 
         | > _Maybe not every night, but if you get users accustomed to
         | the idea that you 're offline for 12 hours every Sunday
         | morning_
         | 
         | Then I would use a competitor that was online, period.
         | 
         | Imagine Sunday morning if the only time you have to complete a
         | certain school assignment, but Wikipedia is offline? Or you
         | need to send messages to a few folks that they need to see by
         | the evening, but the platform won't come online until 3pm,
         | which means you'll need to interrupt your afternoon family time
         | instead?
         | 
         | Maybe things closing works fine for your needs and your
         | schedule. But it sure won't for everyone else. Having services
         | that are reliable is one of the things that distinguishes
         | developed countries from developing ones.
        
           | corint wrote:
           | > If it makes you more money to be available 24/7 then why
           | wouldn't you?
           | 
           | Agreed, but for a government service where you update your
           | license, or tell them about selling a car or something,
           | there's no real 'more' money. Being closed at 3am doesn't
           | lose the opportunity in the way that it would if you were
           | selling widgets. It instead forces the would-be users at 3am
           | to wait until the morning.
        
       | rozab wrote:
       | I've often ran into this when using DVLA services and spluttered
       | with indignation. But at the end of the day, these services are
       | fantastically usable (during the daytime) and I appreciate Dafydd
       | pushing to just get them out there!
       | 
       | I got my license in 2015 so never in my life have I had the
       | apparently ubiquitous American experience of queuing at the DMV
       | and filling in paper forms. (is this still real? or limited to
       | stand-up comedy?)
        
         | nsxwolf wrote:
         | The queues have been mostly replaced with "take a number"
         | systems where you can sit down and wait... with your...
         | papers... that you had to fill out first...
        
           | fn-mote wrote:
           | > The queues have been mostly replaced with "take a number"
           | systems where you can sit down and wait...
           | 
           | My recent experience was: sign up online and get a 30 min
           | window (9:00-9:30 say). Queue everyone for that 30 minute
           | window outside the building. At exactly 9:30, enter and go
           | through the usual queues inside. The advantage is that
           | getting through those queues now takes 30 minutes or less
           | because their length is limited. Presumably we/they traded
           | volume of processing for certainty of time spent in the
           | queue. A very familiar tradeoff for a computer scientist.
        
         | AlotOfReading wrote:
         | Queuing at the DMV and filling out paperwork is very much a
         | real thing that still happens. It's a pretty different
         | experience in every state though.
        
           | ChocolateGod wrote:
           | Can it not be done online like in the UK?
        
             | neckro23 wrote:
             | Usually, but it depends on the state. Remember, America
             | isn't a country, it's 50 countries in a trenchcoat.
             | 
             | It's often a mishmash of services too. I was told in-person
             | at the DMV that I couldn't renew my registration since I'm
             | not the registered owner of my car. So I just went to a DMV
             | kiosk at the local grocery store and did it there without a
             | hassle.
        
         | snakeyjake wrote:
         | My US state, one of the ones NOT living in the past, does
         | almost everything online.
         | 
         | The only times you have to come in are:
         | 
         | 1. for your first license, either as a newly-licensed driver or
         | an out-of-state driver who recently moved
         | 
         | 2. if you were bad and broke the law or otherwise had your
         | license cancelled/revoked/suspended
         | 
         | Even those people have to call or go online to make an
         | appointment.
         | 
         | All other tasks from getting/returning plates to requesting a
         | duplicate title can be done online, though drop boxes, or by
         | mail.
         | 
         | I have been to the DMV three times since 1995: once to turn my
         | out-of-state license into an in-state one, once to turn that
         | drivers license into a realID-compliant one, and once to have
         | my fingerprints taken for a concealed carry permit.
        
       | mike_hearn wrote:
       | tl;dr same reason other services go offline at night: concurrency
       | is hard and many computations aren't thread safe, so need to run
       | serially against stable snapshots of the data. If you don't have
       | a database that can provide that efficiently you have no choice
       | but to stop the flow of inbound transactions entirely.
       | 
       | Sounds like Dafydd did the right thing in pushing them to deliver
       | some value now and not try to rebuild everything right away. A
       | common mistake I've seen some people make is assuming that
       | overnight batch jobs that have to shut down the service are some
       | side effect of using mainframes, and any new system that uses
       | newer tech won't have that problem.
       | 
       | In reality getting rid of those kinds of batch jobs is often a
       | hard engineering project that requires a redesign of the
       | algorithms or changes to business processes. A classic example is
       | in banking where the ordering of these jobs can change real world
       | outcomes (e.g. are interest payments made first and then cheques
       | processed, or vice-versa?).
       | 
       | In other cases it's often easier for users to understand a system
       | that shuts down overnight. If the rule is "things submitted by
       | 9pm will be processed by the next day" then it's easy to explain.
       | If the rule is "you can submit at any time and it _might_ be
       | processed by the next day ", depending on whether or not it
       | happens to intersect the snapshot taken at the start of that
       | particular batch job, then that can be more frustrating than
       | helpful.
       | 
       | Sometimes the jobs are batch just because of mainframe
       | limitations and not for any other reason, those can be made
       | incremental more easily if you can get off the mainframe platform
       | to begin with. But that requires rewriting huge amounts of code,
       | hence the popularity of emulators and code transpilers.
        
         | abigail95 wrote:
         | Do you know why the downtime window hasn't been decreasing over
         | time as it gets deployed onto faster hardware over the years?
         | 
         | Nobody would care or notice if this thing had 99.5%
         | availability and went read only for a few minutes per day.
        
           | mike_hearn wrote:
           | It doesn't get deployed onto faster hardware. Mainframes
           | haven't really got faster.
        
             | ndriscoll wrote:
             | Mainframes have absolutely gotten faster. They're basically
             | small supercomputers.
        
             | abigail95 wrote:
             | It must be. Maintaining the original hardware would be more
             | expensive that upgrading to compatible but faster systems.
        
               | mike_hearn wrote:
               | What compatible systems? Mainframes are maintained in
               | more or less their original state by teams from IBM. They
               | are designed to be single machines that scale vertically
               | and never shut down, every component can be hot-swapped
               | including CPUs but IBM charge a lot for CPU capacity if I
               | recall correctly. Given that nighttime doesn't get
               | shorter, the DVLA probably don't see much reason to pay a
               | lot more for a slightly smaller window.
               | 
               | And mainframes from the 80s are slow. It sounds like
               | they're running on the original.
        
               | ndriscoll wrote:
               | Newer mainframes are still faster than older mainframes,
               | and can have hundreds of cores and 10s of TB of RAM. A
               | big part of IBM's draw is that they make modern systems
               | that will continue to run your software forever with no
               | modifications. I had an older guy there tell me a story
               | about them changing a default in some ISPF panel, and
               | customers complained enough that they had to change it
               | back. Their storage systems have a virtualization layer
               | for old programs that send commands to move the heads of
               | a drive that hasn't been manufactured for 55 years or
               | whatever and translate that to use storage backed by a
               | modern RAID with normal disks. The engineers in the
               | mainframe groups know who their customer base is and what
               | they want.
               | 
               | It's unlikely that they're literally using 40 year old
               | hardware since the replacement parts for that would be a
               | nightmare to find and almost certainly more expensive
               | than a compatible new machine.
        
             | throw16180339 wrote:
             | You're mistaken about this. IBM's z-series had 5GHz CPUs
             | well over a decade ago and they haven't gotten any slower.
        
           | pjc50 wrote:
           | Maybe it isn't running on faster hardware? These systems are
           | often horrifyingly outdated.
        
             | pwg wrote:
             | Or maybe it is running on faster hardware, but the UK
             | budget office decided not to pay IBM's fees required to
             | make use of the extra speed, so it has been "throttled" to
             | run at the same speed that it ran on the old hardware.
        
         | ndriscoll wrote:
         | Getting rid of batch jobs shouldn't be a goal; batch processing
         | is generally more efficient as things get amortized, caches get
         | better hit ratios, etc.
         | 
         | What software engineers should understand is there's no reason
         | a batch can't take 3 ms to process and run every 20 ms. "Batch"
         | and "real-time" aren't antonyms. In a language/framework with
         | promises and thread-safe queues it's easy to turn a real time
         | API into a batch one, possibly giving an order of magnitude
         | increase in throughput.
        
           | mike_hearn wrote:
           | Batch size is usually fixed by the business problem in these
           | scenarios, I doubt you can process them in 3msec if the job
           | requires reading in every driving license in the country and
           | doing some work on them for instance.
        
             | ndriscoll wrote:
             | This particular thing might be difficult to change because
             | it's 50 year old COBOL or whatever, but my point was more
             | that I've encountered pushes from architects to "eliminate
             | batches" and it makes no sense. It just means that now I
             | have to re-batch things in my code. The correct way to
             | think about it is that you want smaller, more frequent
             | batches.
             | 
             | Do they really need to do work on all records every night?
             | Probably not. Most people aren't changing their license or
             | vehicle info most days. So the problem is that somewhere
             | they're (conceptually) doing a table scan instead of using
             | an index. That might still be hard to fix, but at least
             | identify the correct problem. Otherwise as you say moving
             | to different tech won't fix it.
        
       | abigail95 wrote:
       | Something is missing here, why do batch jobs take 13 hours? If
       | this thing was started on an old mainframe why isn't the downtime
       | just 5 minutes at 3:39 AM?
       | 
       | Exactly how much data is getting processed?
       | 
       | Edit: Why does rebuilding take a decade or more? This is not a
       | complex system. It doesn't need to solve any novel engineering
       | challenges to operate efficiently. Article does not give much
       | insight into why this particular task couldn't be fixed in 3
       | months.
        
         | shermantanktop wrote:
         | It's funny to me that I would never ask those questions. I've
         | specialized in legacy rehab projects (among other things) and
         | there seems to be no upper bound on how bad things can be or
         | how many annoying reasons there are for why we can't "just fix
         | it." Those "just" questions--which I ask too--end up being
         | hopelessly naive. The answers will crush your soul if you let
         | them, so you can't let them, and you should always assume
         | things are worse than you think.
         | 
         | TFA is spot on - the way to make progress is to cut problems up
         | and deliver value. The unfortunate consequence is that badness
         | gets more and more concentrated into the systems that nobody
         | can touch, sort of like the evolution of a star into an
         | eventual black hole.
        
           | abigail95 wrote:
           | I made a lot of money moving mid size enterprises from legacy
           | ERP systems to custom in house ones.
           | 
           | The DVLA dataset and the computations that are run on it can
           | be studied and replicated in 3 months by a competent team.
           | From there it can be improved.
           | 
           | There is no way that this system requires 13 hours of
           | downtime. If it required two hours - even if the code was
           | generated through automation it can be reverse engineered and
           | optimized.
           | 
           | It is absolute rubbish that this thing is still unavailable
           | outside of 8am-7pm.
           | 
           | I maintain my position that it could be replaced in 3 months.
           | 
           | I got my start in this business when I was in university and
           | they told us our online learning software was going offline
           | for 3 days for an upgrade. Those are the gatekeepers and low
           | achievers we fight against. Think bigger.
        
             | arccy wrote:
             | it's a gov agency, they don't quite pay enough for a
             | motivated competent team....
        
             | monkey_monkey wrote:
             | > The DVLA dataset and the computations that are run on it
             | can be studied and replicated in 3 months by a competent
             | team. From there it can be improved.
             | 
             | Such an HN comment. Made me lol. Think funnier!
        
         | that_guy_iain wrote:
         | > Edit: Why does rebuilding take a decade or more? This is not
         | a complex system. It doesn't need to solve any novel
         | engineering challenges to operate efficiently. Article does not
         | give much insight into why this particular task couldn't be
         | fixed in 3 months.
         | 
         | You do know the UK government has been cutting all their
         | budgets to the bone for about 10 years? That means everywhere
         | is pretty much understaffed.
         | 
         | And how do you know it's not a complex system? I would think
         | that a system like that would be somewhat complex. It's not
         | just driving licenses but a whole bunch of other things that
         | are handled by the DVLA.
        
           | abigail95 wrote:
           | The system may or may not be complex but the data is has to
           | store and transform is not. Because it handles drivers
           | licenses. A function that has been done on pen and paper and
           | filing cabinets.
           | 
           | Study the data, study the operations, reduce complexity.
           | 
           | Since you imply you know more about UK budgets than I do -
           | how much is the DVLA budgeted for IT operations like this and
           | how much more would you give them to expect this problem
           | solved?
           | 
           | I can argue real numbers but vibes about bone dry budgets I
           | cannot.
        
             | that_guy_iain wrote:
             | > The system may or may not be complex but the data is has
             | to store and transform is not. Because it handles drivers
             | licenses. A function that has been done on pen and paper
             | and filing cabinets.
             | 
             | It handles more than just driving licenses... The DVLA do
             | more than just driving licenses.
             | 
             | > Since you imply you know more about UK budgets than I do
             | - how much is the DVLA budgeted for IT operations like this
             | and how much more would you give them to expect this
             | problem solved?
             | 
             | It's not budgeted anything for this as far as I know. I
             | believe it's handled by Government Digital Services which
             | handles lots of the digital services for various
             | departments. The budget for all of GDS is about 90 million
             | most of which isn't for .gov.uk. A rewrite of that size I
             | would expect to cost about 50-60 million in total but take
             | several years.
        
             | ellen364 wrote:
             | Are you suggesting that a process once done using pen and
             | paper can't possibly be complicated?
             | 
             | I have no insight into the DVLA, but the idea that no paper
             | process could ever be complicated is really funny. The UK
             | enjoyed/loathed centuries of bureaucracy before computers
             | were invented. At one point getting a divorce required an
             | Act of Parliament specifically naming the unhappy couple!
             | Being restricted to pen and paper hardly inhibited the
             | human ability to create complex systems.
        
         | ajnin wrote:
         | The batch jobs don't take 13 hours. They're just scheduled to
         | run some time at night where the old offices used to be closed
         | and the jobs could be ran with some expectations regarding data
         | stability over the period. There are probably many jobs
         | scheduled to run at 1AM then 2AM, etc, all depending on the
         | previous to be finished so there is some large delay to ensure
         | that a job does not start before the previous one is finished.
         | 
         | As to your "not a complex system" remark, when a system is
         | built for 60 years, piling up new rules to implement new
         | legislation and needs over time, you tend to end up with a
         | tangled mess of services all interdependent that are very
         | difficult to replace piece-wise with a new shiny
         | architecturally pure one. This is closer to a distributed
         | monolith than a microservices architecture. In my experience
         | you can't rebuild such a thing "in 3 months". People who
         | believe that are those that don't realize the complexity and
         | the extraordinary amount of specifics, special cases, that are
         | baked into the system, and any attempt to just rebuild from
         | scratch in a few months hits that wall and ends up taking
         | years.
        
           | abigail95 wrote:
           | The code will be spaghettified and hideous. The queries will
           | be nonsense.
           | 
           | That doesn't change the fact that the ultimate goal of the
           | system is to manage drivers licenses.
           | 
           | > In my experience you can't rebuild such a thing "in 3
           | months".
           | 
           | Me and my team rebuilt the core stack for the central bank of
           | a developing country. In 3 months. The tech started in the
           | 70s just like this. Think bigger.
        
             | mattmanser wrote:
             | Yeah, I always raise an eyebrow at attitudes like that too.
             | 
             | I've also reimplemented or gradually replaced several out-
             | of-date systems. Albeit on a smaller scale.
             | 
             | In my experience, when you start picking the programs apart
             | you find 90% of the code is redundant or boilerplate. Much
             | of it isn't even called from anywhere, abandoned code, and
             | can be deleted en masse. A lot of programmers don't clean
             | code up "just in case" and then no-one else deletes it.
             | 
             | They can also often be vastly simplified because
             | programmers back then didn't have the patterns and
             | knowledge to write consisely.
             | 
             | I often find myself simplifying the original code first,
             | which gets rid of 50% of it. Then I can see what the code
             | actually does and rewrite it which gets rid of the other
             | 40%.
             | 
             | On the other hand, many programmers don't have the
             | patience, stubbornness or skill to do this kind of work.
             | 
             | And the ability to get through the major panic you have
             | when you're half way through and wondering if you were mad
             | to even start.
        
               | patrickmay wrote:
               | > And the ability to get through the major panic you have
               | when you're half way through and wondering if you were
               | mad to even start.
               | 
               | I feel seen, thank you.
        
           | PaulAJ wrote:
           | Anyone who doesn't understand what's so difficult should read
           | this:
           | 
           | https://wiki.c2.com/?WhyIsPayrollHard
           | 
           | Its from a different domain, but it gives you a flavour of
           | the headaches you encounter. These systems always look simple
           | from the outside, but once you get inside you find endless
           | reams of interrelated and arbitrary business rules that have
           | accumulated. There is probably no complete specification
           | (unless you count the accumulated legal, regulatory and
           | procedural history of the DVLA), and the old code will have
           | little or no accurate documentation (if you are lucky there
           | will be comments).
        
             | stego-tech wrote:
             | Basically this. The people running the show would
             | desperately like to make it simpler, but ultimately it's
             | left overly complicated due to priorities from past
             | leadership well above our paygrade.
             | 
             | The right solution is always to just rip off the bandaid
             | and do it again by hand in a new language or platform, and
             | to eliminate useless complexity while doing so.
             | Unfortunately no leader would ever do this because the
             | Board and/or Shareholders would crucify them for not
             | outsourcing it to McKinsey first and using the fancy-pants
             | automation tool their report recommended.
        
               | pwagland wrote:
               | Well, that, and any organization that has gotten
               | themselves into this situation tend to have a very strong
               | risk aversion principal. Which means they _can't_ approve
               | something like this organisationally since there is
               | simply too much risk embedded, and someone has to accept
               | that.
        
           | Reubend wrote:
           | > In my experience you can't rebuild such a thing "in 3
           | months". People who believe that are those that don't realize
           | the complexity and the extraordinary amount of specifics,
           | special cases, that are baked into the system, and any
           | attempt to just rebuild from scratch in a few months hits
           | that wall and ends up taking years.
           | 
           | Rebuilding a legacy system doesn't require you to support
           | every single edge case that the older system did. It's okay
           | to start off with some minor limitations and gradually add
           | functionality to account for those edge cases.
           | 
           | Furthermore, you've got a huge advantage when remaking
           | something: you can see all the edge cases from the start, and
           | make an ideal design for that, rather than bolting on things
           | as you go (which is done in the case of many of these legacy
           | systems, where functionality was added over time with dirty
           | code in lieu of refactoring).
        
             | jarofgreen wrote:
             | > Rebuilding a legacy system doesn't require you to support
             | every single edge case that the older system did.
             | 
             | Depends on context.
             | 
             | This isn't some social media fun site where you can live
             | with some rough edges; in this context "edge case" may be
             | someone with an health condition who is still entitled to a
             | drivers license; or it could be someone who normally could
             | get one but due to a health condition really shouldn't be
             | allowed one!
        
         | firefoxd wrote:
         | Our systems took 8 hours to back up. Then it grew to 12 hours
         | [0]. The system was a side project by an intern fresh out of
         | college. Over the years, it grew into a crucial software the
         | company relied on. I joined over 10 years later and was able to
         | bring it down to few minutes.
         | 
         | [0]: https://news.ycombinator.com/item?id=38456429
        
         | jdietrich wrote:
         | Per their own data, the DVLA are responsible for the records of
         | 52 million drivers and 46 million vehicles. Those records are
         | immensely complex, because they reflect decades of accumulated
         | legislation, regulation and practice. Every edge case has an
         | edge case.
         | 
         | There's someone, somewhere in the bowels of the DVLA who
         | understands the rules for drivers with visual field defects who
         | use a bioptic device. There's someone who knows which date code
         | applies to a vehicle that has been built with a brand new kit
         | chassis but an old engine and drive train. There's someone who
         | understands the special rates of tax that apply to goods
         | vehicles that are solely used by showmen, or are based on
         | certain offshore islands. God help any outsider who has to
         | condense all of that institutional knowledge into a working
         | piece of software.
         | 
         | Government does not have a good track record of ground-up
         | refactors of complex IT systems. The British government in
         | particular does not have a good track record. Considering all
         | that, the fact that most interactions with DVLA can be done
         | entirely online is borderline miraculous.
         | 
         | https://assets.publishing.service.gov.uk/media/675ad406fd753...
        
       | delta_p_delta_x wrote:
       | Some DVLA services don't work in the day, too. Case in point, the
       | 'get a share code' service:
       | https://www.viewdrivingrecord.service.gov.uk/driving-record/...
        
       | glonq wrote:
       | This sounds a bit familiar. I used to work at a medium-sized
       | company whose systems were based on COBOL code and Unisys
       | mini/mainframe hardware from the 80's. We even had a person
       | employed as a "tape ape"; thankfully not me. Throughout the next
       | decade or two they tried various 4GL-generated facades and bolt-
       | ons but could never escape from that COBOL core. Eventually I
       | think they migrated the software to some kind of big box that
       | emulated the Unisys environment but was slightly more civilized.
       | I have no idea whether they ever eradicated all the COBOL though.
        
       | arjie wrote:
       | While these explanations are plausible, certain other things I've
       | encountered make me believe that deeper reasons underlie even
       | these reasons. When I lived in the UK in 2017 as a foreigner, all
       | applications for a driving licence as a foreigner on a T2-ICT
       | visa had to be sent over for a couple of weeks and you had to
       | include your passport and Biometric Residence Permit and
       | everything. By comparison, I was able to get my driving licence
       | at the California DMV pretty easily even as a foreigner and my
       | passport and so on were photocopied and not retained. This
       | drastic difference in service ability between the DVLA and a
       | notoriously disliked American government service lead me to
       | believe that the proximal technical causes for this are
       | downstream from organizational choices for how to deliver
       | service.
        
         | robertlagrant wrote:
         | > downstream from organizational choices for how to deliver
         | service
         | 
         | 100000%. They're a monopoly service you must interact with or
         | get fined and (eventually) locked up. They have zero incentive
         | to do a particularly good job. Some orgs in this situation are
         | just well run and do a good job, but there's no competitive
         | pressure for them to do so.
        
           | Y_Y wrote:
           | And there are pressures other than competition, and some
           | people just want to do a particularly job just because it's
           | their job.
        
       | IOT_Apprentice wrote:
       | This seems weird to me. The number of records is minuscule
       | compared to internet scale tech.
       | 
       | The data model for this sounds like it would be simple. Exactly
       | how many use cases are there to be implemented?
       | 
       | Build this with modern tech on HA Linux backends. Eliminate the
       | batch job nonsense.
       | 
       | This could be written up as a project for bootcamps or even a
       | YouTube series.
       | 
       | I suspect some internal politics about moving forward and
       | clinging to old methods is at hand.
       | 
       | Perhaps someone could build an open source platform if the
       | requirements were made public.
        
         | dhosek wrote:
         | The thing is that a lot of internet scale stuff tends to be
         | non-critical. It's not a big deal if 1% of users don't see a
         | post to a social network site. It'll show up later, maybe, or
         | never, but nobody will care.
         | 
         | On the other hand, with transactions like banking or licensing
         | or health insurance, it's absolutely essential that we
         | definitely maintain ACID compliance for every single
         | transaction, which is something that many "internet-scale" data
         | solutions do not and often cannot promise. I have a vague
         | recollection of some of the data issues at a large health
         | insurance company where I worked a couple years ago that made
         | it really clear why there would be an overnight period where
         | the system would be offline--it was essential to make sure that
         | systems could be brought to a consistent state. It also became
         | clear why enrolling someone in a new plan was not simply a
         | matter of adding a record to a database somewhere.
         | 
         | Not to mention that I suspect that data such as bank
         | transaction records or health insurance claims probably rival
         | "internet scale" for being real big data operations.
        
           | mh- wrote:
           | The reason that these "internet scale" solutions are
           | challenging to operate is _because_ of their latency and
           | availability targets.
           | 
           | If you threw into the requirements "can go down nightly, for
           | hours, for writes AND reads", they could absolutely provide
           | the transactional guarantees you're looking for.
        
       | neuroelectron wrote:
       | I'm sure the upgrade would have been trivial for a competent
       | expert to do but instead they outsourced it to a big software
       | firm and surprise, it went over-budget. Seriously, what could
       | this database be doing that's so complicated?
        
       | simonbarker87 wrote:
       | Excellent bit of pragmatism and as a user of this service I'm
       | happy with the trade off.
       | 
       | People wondering why it's not a simple switch and "there must be
       | something else going on here" have clearly never worked with
       | layers of legacy systems where the data actually matters. Sure
       | it's fixable and it's a shame it hasn't been but don't assume
       | there aren't very good reasons why it's not a quick fix.
       | 
       | The gov.uk team have moved mountains over the past decade,
       | members of it have earned the right to be believed when they say
       | "it's not simple".
        
         | NVHacker wrote:
         | Having legacy data and systems for a few years is a challenge.
         | Still having them after decades is incompetence.
        
           | simonbarker87 wrote:
           | Yes I'd imagine the reason it still hasn't been fixed after
           | nearly a decade is management/politics etc. But it taking
           | more than just 6 months will be technical. As a result it's a
           | job that falls into the area of being canned because it's
           | taking too long even though no one said it would be quick.
        
       ___________________________________________________________________
       (page generated 2025-01-16 23:01 UTC)