[HN Gopher] How we migrated Gov.uk notify to AWS elastic contain...
       ___________________________________________________________________
        
       How we migrated Gov.uk notify to AWS elastic container service
        
       Author : dmdmdmdm
       Score  : 48 points
       Date   : 2024-08-15 07:40 UTC (15 hours ago)
        
 (HTM) web link (gds.blog.gov.uk)
 (TXT) w3m dump (gds.blog.gov.uk)
        
       | Rinzler89 wrote:
       | I love the UK Gov for being so transparent and self suficient on
       | their digital infrastructure compared to my developed EU country
       | where they usually just outsource it to some major consultancy,
       | who's in bed with the politicians, which then farms it out to the
       | cheapest bidder in Eastern Europe for 10x the amount of money it
       | would cost to make it themselves and getting 10x worse quality,
       | just so that taxpayer money gets funneled into the politically
       | friendly business pockets. Privatize the profits, socialize the
       | losses/externalities.
       | 
       | I remember there was a heated debate last year about greed-
       | flation in the country as people blamed the large retailers for
       | simultaneously jacking up prices in sync leading to much higher
       | prices on the same goods compared to neighboring Germany and the
       | government said _" well, we could build an online price
       | comparison system to track prices and then check the validity of
       | these claims, but oh shucks it's probably gonna take us a few
       | years and double digit million euros..."_, and then in response
       | some guy builds it in a weekends and posts it on Github for free,
       | showing how corrupt, clueless and scummy closed source government
       | funded digital projects are.
        
         | dijit wrote:
         | Given that hyperscaler cloud providers can be in the 5x-11x
         | cost increase territory; and AWS/Azure are _definitely_ guilty
         | of lobbying governments.... is your comment sarcastic?
         | 
         | EDIT: downvotes? What did I say that's untrue? Are so many
         | people really employed by these hyperscalers that they go
         | around downvoting things against them? Calm down- you'll have a
         | job.
        
           | _joel wrote:
           | They used to be several other uk based public cloud providers
           | gov.uk used through gCloud.
        
         | _joel wrote:
         | Oh, we can still outsource like the best of them, just look up
         | the NHS Track and Trace app to see how badly we can do it (or
         | line the pockets of certain people, whichever you prefer).
        
           | edent wrote:
           | Hello. I worked on that app. You are wrong.
           | 
           | Firstly, "Track and Trace" is what the Post Office do.
           | Perhaps you're thinking of "Test and Trace"?
           | 
           | Secondly, the UK Government hasn't had app development skills
           | in-house for a long time - see
           | https://gds.blog.gov.uk/2013/03/12/were-not-appy-not-appy-
           | at... - so there was little choice but to use an external
           | provider.
           | 
           | Thirdly, the initial version of the app was built by an
           | external team who were already engaged with DHSC. They had
           | won a competitive tender (which was published) but, as I'm
           | sure you can understand, there wasn't time to run a new one
           | for the Contact Tracing app.
           | 
           | Fourthly, if you have evidence that the development of the
           | app - which was done quickly, with all source code and design
           | documents published as open source, and which saved lives
           | (https://www.ox.ac.uk/news/2023-02-22-nhs-covid-19-app-
           | saved-...) - was somehow corrupt, I'm sure we'd all like to
           | see it.
           | 
           | Fifthly, if you're about to say "PS37bn" - have a read of
           | this https://fullfact.org/health/NHS-test-and-trace-
           | app-37-billio...
        
             | seagullriffic wrote:
             | Did this app have anything to do with the huge QR codes
             | which encoded far too much information?
             | https://www.revk.uk/2020/09/how-not-to-qr-nhs-c19-app.html
        
               | edent wrote:
               | Yes - another bit that I worked on (albeit tangentially).
               | 
               | The QR code stuff was an interesting one. There was a
               | worry that people would generate fraudulent codes - hence
               | the weird (in my opinion) signing requirements.
               | 
               | Similarly, with a URl there was a risk that people would
               | open the page and think that was all they needed to do.
               | Hence a code designed to be read by a specific app.
               | 
               | I _think_ (and you 'll have to forgive my slightly hazy
               | memory of a difficult time) that it was based on the same
               | code New Zealand were using for their check-in service.
        
               | seagullriffic wrote:
               | Interesting! That article I linked had lingered in my
               | memory for a while, so good to hear a response to it!
        
             | _joel wrote:
             | I stand corrected, thanks!
        
           | pjc50 wrote:
           | The big scandal was actually PPE
           | https://www.theguardian.com/uk-news/2023/dec/17/how-the-
           | mich...
        
         | akimbostrawman wrote:
         | A foreign hosting provider like AWS is the opposite oft self
         | sufficent
        
           | fleischhauf wrote:
           | while I agree, I think being able to do this by themselves is
           | already way way ahead of German administration capabilities.
           | So you need to see this from a positive angle. I for one am
           | jealous.
        
             | croes wrote:
             | I they try they get sacked by lobbyists.
        
         | seagullriffic wrote:
         | Unfortunately this very application, Gov.uk Notify, is
         | currently being used by Councils to send emails to residents
         | directing them to an outsourced company's website,
         | https://www.householdresponse.com, to input sensitive details
         | about where they live.
         | 
         | The emails are phishy to the extreme and there's no indication
         | or way to verify that it's an official service. See for example
         | https://www.bleepingcomputer.com/news/security/uk-gov-keeps-...
         | 
         | While some parts of Gov.uk are done well, there are still
         | terrible practices everywhere due to cheapness and ignorance
         | and presumably because the Gov UK people can't do everything,
         | unfortunately, even though it would be cheaper and better if
         | they did.
        
       | b800h wrote:
       | Great article. One question -
       | 
       | This is a high-throughput service, so I'm interested in whether
       | Python is necessarily the right choice. It could be that it's a
       | minor concern and the latency is all elsewhere in the
       | architecture anyway. I'd be interested in opinions on here.
        
         | JimDabell wrote:
         | This is a notification service. They sent 2.6M emails, 2.8M
         | text messages, and 60K letters yesterday [0]. That's about 30
         | emails per second, 32 SMS per second, and less than one letter
         | per second. That's not nothing, but it doesn't need crazy
         | processing efficiency either. Most of the work will be I/O
         | bound, just messages sitting in a queue waiting for the
         | receiving service to be able to accept them. Python is fine for
         | this. You don't need email to be sent ASAP; in fact a lot of
         | work goes into making sure you don't send high volume too
         | quickly in case you look like spam.
         | 
         | [0]
         | https://www.notifications.service.gov.uk/features/performanc...
        
           | ustad wrote:
           | Thanks for the extra info. And I agree those numbers are
           | nothing to get excited about.
        
         | ustad wrote:
         | Why do you think it's great?
         | 
         | And how do you know its high-throughput?
        
         | matthewmacleod wrote:
         | The throughput is described as "thousands of requests per
         | minute" - with modern hardware that's likely not something
         | you'd even have to think twice about. It would probably run
         | happily on a laptop!
        
           | b800h wrote:
           | Fair point.
        
       | Angostura wrote:
       | Background on the decommissioning of the Gov.uk Platform as a
       | service (PAS) https://gds.blog.gov.uk/2022/07/12/why-weve-
       | decided-to-decom...
        
         | djoldman wrote:
         | Anyone know if there is any way to find out what PAS cost the
         | government?
        
           | Neil44 wrote:
           | FOI request?
        
       | vindex10 wrote:
       | I'm wondering how usual is it to host the infrastructure of the
       | national services using foreign cloud provider?
        
         | EwanToo wrote:
         | It's pretty common, all the biggest clouds are USA or China
         | owned.
         | 
         | In the UK government services go through information security
         | classification to determine what level of security is needed,
         | with the most confidential stuff still being self-hosted.
         | 
         | I assume most countries operate that way.
        
           | b800h wrote:
           | There was a UK-based cloud provider. Unfortunately it
           | collapsed, leading to a lot of costly replatforming.
           | 
           | https://www.civilserviceworld.com/news/article/cabinet-
           | offic...
        
             | everfrustrated wrote:
             | Probably couldn't afford their new VMWare bill
        
         | BartjeD wrote:
         | In the Netherlands, critical infrastructure is required to be
         | hosted in government cloud data centers.
         | 
         | An exception is possible if after a risk assessment and the
         | determination that no state secrets may be exposed, a
         | government body decided to use a commercial cloud provider.
         | 
         | The private cloud providers list is then filtered by whether or
         | not their country of origin / incorporation, or effective
         | control, has an effective cyber-control program it runs against
         | the Netherlands or against Dutch interests. This arguably
         | includes corporate espionage programs.
        
           | vindex10 wrote:
           | https://www.notifications.service.gov.uk/features/who-can-
           | us...
           | 
           | * central government departments
           | 
           | * local authorities
           | 
           | * the armed forces
           | 
           | * the NHS
           | 
           | * the emergency services
           | 
           | * GP surgeries
           | 
           | * state-funded schools
           | 
           | looks quite critical to me
        
             | BartjeD wrote:
             | Yes, personally I don't think it's a good idea to host
             | these things with the US companies. As a citizen I prefer
             | it's in my own country, unless it's really not critical or
             | interesting information / services.
             | 
             | The UK made a different choice.
        
               | patrakov wrote:
               | Is the gov.uk website infrastructure compliant with their
               | own Cyber Essentials requirements? I very much doubt it,
               | as the anti-malware requirements applicable to cloud
               | providers that are not using Windows or MacOS ([1],
               | section 5, subsection "Requirements", option "Application
               | allow listing" on page numbered 14 in the corner) are not
               | implementable as worded. Using Azure instead of AWS could
               | have helped here.
               | 
               | [1] https://www.ncsc.gov.uk/files/Cyber-Essentials-
               | Requirements-...
        
             | djtango wrote:
             | > GOV.UK Notify makes it easy for public sector service
             | teams to send emails, text messages and letters.
             | 
             | Doesn't seem that critical to me. Important, but doesn't
             | pass the sniff test of "is this a matter of national
             | security" that would justify self-hosting ultimately
             | slowing down development and making it more expensive and
             | in effect less feature-rich for taxpayers
             | 
             | EDIT the API docs suggest this is used for sending formal
             | Notifications en-masse rather than mission-critical comms
        
             | scaryclam wrote:
             | The kinds of messages that get sent via email or text are
             | usually pretty unimportant. Important things tend to be
             | sent via letter or a phone call.
             | 
             | It's not likely to be anything critical.
        
               | jamessb wrote:
               | Gov.Uk Notify does support sending letters (as well as
               | email and SMs/text messages):
               | https://www.notifications.service.gov.uk/using-notify
        
               | vindex10 wrote:
               | In Norway, I received my residence permit by email, and I
               | mean the official document.
               | 
               | I stressed a bit when after a year I was trying to find
               | the paper letter, until I eventually realized xD
        
             | PontifexMinimus wrote:
             | You should also add GCHQ, MI5 and MI6 who all use AWS to
             | host top secret material (https://archive.md/n2cNB).
             | 
             | As an IT professional, I would question whether that makes
             | sense.
             | 
             | But what do I know? I'm sure the people who run the country
             | -- people of the calibre of Liz Truss, no less -- know what
             | they're doing!
        
           | nonrandomstring wrote:
           | Storage and processing location is a big, big trust issue on
           | the world stage. There are all sorts of wobbly notions of
           | alignment. And no doubt lots of leverage going on behind.
           | 
           | If you made a democratic poll and asked people, "would you
           | like national data stored in your own country or elsewhere?"
           | there would be no ambiguity in the answer. And that would not
           | be an "uninformed" poll, since matters of public trust should
           | direct policy and not technics and economics.
           | 
           | Of course there are good reasons for outsourcing, like
           | geographical diversity, but those raise a new and I think
           | separate questions like "Who would you trust with our
           | backups?". That nuance of examination seems to be missing in
           | the UK at present.
        
             | pjc50 wrote:
             | > "would you like national data stored in your own country
             | or elsewhere?"
             | 
             | And if you ask the question "how much more would you pay to
             | host UK data in the UK with UK owned providers only", you
             | get the answer PS0. So it doesn't happen.
        
               | nonrandomstring wrote:
               | Yes. I mean it's a fair objection to that question as is.
               | Many people expect technology to happen magically and for
               | free. When it comes to critical infrastructure like
               | roads, reservoirs and the army, nobody asks "how much
               | would I pay?", because people elected a government to
               | make those decisions and raise taxes appropriately.
               | Ironically one big missing source of income is fair tax
               | on overseas tech. Although we have a body that recognises
               | digital as critical national infrastructure [0], some
               | people in London haven't got the memo yet.
               | 
               | [0] https://www.ncsc.gov.uk/news/ncsc-warns-of-emerging-
               | threat-t...
        
         | 46Bit wrote:
         | It's pretty normal for ordinary government workloads in the UK,
         | or at least it was at GDS. Using niche suppliers who cater to
         | government paranoia is expensive, and they're usually much less
         | mature than hyperscaler platforms. It's also open for debate
         | whether those niche, inflexible suppliers result in a genuinely
         | more hardened target or not.
        
         | pjc50 wrote:
         | You have to understand that buying computers comes out of the
         | capital budget, and is several times more expensive than just
         | leasing them for this year; and that hiring staff runs into
         | severe civil service pay issues. Once "buy some computers and
         | hire staff to manage them" has been ruled out by politics,
         | buying hosting on the open market becomes the remaining
         | reasonable choice, and nobody got fired for choosing AWS.
        
           | vidarh wrote:
           | You can lease or even rent the servers without paying cloud
           | prices, and there's a wide range of companies providing
           | devops services on contract. So really, the main reason is
           | your last clause - AWS is "safe" even though you might as
           | well set cash on fire.
        
             | pjc50 wrote:
             | But then you have to run two competitive tenders, one for
             | the servers and one for the contract devops. How much does
             | that cost and how long does it take?
             | 
             | https://www.fgould.com/uk-europe/articles/cutting-the-
             | cost-o...
        
               | vidarh wrote:
               | Plenty of companies would happily offer you a package for
               | both.
        
         | benrutter wrote:
         | Usual: very
         | 
         | Good: Not so much
         | 
         | Unfortunately, cloud provision isn't very competitive and is
         | very US/China centric.
         | 
         | I was at a talk recently around how one of the UKs major
         | infrastructure providers was building their architecturrle, and
         | I was pretty freaked by the level if vendor lock in.
         | 
         | Would love to see more governments viewing this as the security
         | risk it is, but I'm not holding my breath.
        
       | dangsux wrote:
       | I know a couple of people close to this. They work alongside a
       | load of CV-Driven-Development offshore Capgemini employees. They
       | had to be mentored by junior members of the team for how to even
       | connect to infrastructure.
       | 
       | Bear in mind a mid grade engineer (low grade in the real world)
       | is a PS800-1000/day line item. Capgem tell their employees to lie
       | about their capabilities to get "bums on seats" for public sector
       | contracts. Their own employees are only on 45k-60k for a senior
       | engineer.
       | 
       | So much wastage of public funds.
       | 
       | I do wonder why they chose AWS for this when DDaT is primarily
       | Azure.
        
       | testplzignore wrote:
       | > random.randint(0, 100)
       | 
       | That 0 should be a 1. As written, I think 1 out of 101 requests
       | would go to the new target when percent was set to 0.
        
       | VoodooJuJu wrote:
       | How is interesting - but why?
        
         | petepete wrote:
         | It used to run GOV.UK PaaS which was decommissioned last year,
         | all government services had to find new homes - mostly in Azure
         | or AWS.
         | 
         | https://gds.blog.gov.uk/2022/07/12/why-weve-decided-to-decom...
        
           | nonrandomstring wrote:
           | How would you rebuild a secure national "cloud" service if
           | given the mandate and the money? (come and talk to us on
           | cybershow if you'd like!)
        
             | pjc50 wrote:
             | > if given the mandate and the money?
             | 
             | Well, those are the difficult bits. If you also add in
             | "control over staff pay scales", which is the other thing
             | needed to make it work, it becomes a relatively simple job.
             | Five to six years for planning permission and we could get
             | started quickly after that.
             | 
             | https://www.itpro.com/infrastructure/data-
             | centres/permission...
        
             | petepete wrote:
             | Thank you. I am but a tiny cog in the huge machine and
             | wouldn't know where to start when it came to rebuilding a
             | national cloud.
             | 
             | edit: read the "Like Click, but with the safety catch taken
             | off - Bang!" review and subscribed immediately
        
               | nonrandomstring wrote:
               | Thanks. No worries. British "national cloud" will be lots
               | of shed clouds wired up together. Or we'll create a storm
               | in a teacup. :)
        
       | ustad wrote:
       | I'm not impressed.
       | 
       | There is no mention on how much that cost and how much traffic
       | the system handles. What about recurring costs?
       | 
       | Moreover, on mobile, the persistent bottom left link to the
       | popover (i think cookie consent banner) does not behave correctly
       | when you have already scrolled down and leaves a border when
       | activated.
        
       | dtech wrote:
       | I do wonder why they decided to tie themselves heavily to AWS
       | tech over using cloud-agnostic alternatives. You'd think for a
       | government the latter has higher value than for private business,
       | and even there it's a consideration.
        
         | szszrk wrote:
         | Notice that each major cloud vendor has dedicated gov
         | regions...
         | 
         | So I guess the tie is larger than it seems at first sight.
        
         | politelemon wrote:
         | Ecs isn't exactly tying, because ultimately it's still docker
         | containers, so moving out wouldn't be a tricky prospect. A
         | cloud agnostic solution though would likely mean k8s and bring
         | with it much more complexity and overhead (and is also a form
         | of lock in).
        
           | aquaticsunset wrote:
           | I half agree with you. We just went through an ECS to EKS
           | migration, and we're still incredibly dependent on AWS. The
           | hard part isn't the container orchestration system or even
           | containerizing your workload - it's all the other crap you
           | need to develop and maintain around it. Your databases,
           | networking stack, MQ brokers, secrets managers, and
           | everything else are still stuck to whatever cloud provider
           | you're using.
           | 
           | EKS really isn't much harder to build out than ECS - but it
           | doesn't set you up to be much more cloud agnostic.
        
       | marcus0x62 wrote:
       | 3 months from now: how we spent hundreds of hours optimizing our
       | AWS bill to save 10%
       | 
       | 6 months from now: top ten reasons why it isn't a problem AWS
       | costs us twice as much as self hosting.
       | 
       | 12 months from now: how we saved 75% by migrating our app back to
       | our "legacy" data center.
        
         | nimbius wrote:
         | facts.
         | 
         | anyone intentionally migrating infrastructure to the cloud in
         | 2024 hasnt seen the bill, or is spending the taxpayers pound.
         | 
         | "Gov.uk notify" isnt a critical business service. it doesnt
         | need multi-az or multi region failover. running a docker
         | container itself isnt a feat of achievement.
         | 
         | you could save money over the longterm by running a pair of
         | pizzaboxes in Cardiff and Edenborough running orchestrator.
         | hell, you could arguably run notify in a pensioners basement
         | off a pi powered by a solar panel.
        
           | TeaDude wrote:
           | I'm really disappointed that "Edenborough" isn't some
           | hitherto unknown place with a similar name to the one I'm
           | thinking of.
        
         | nerdjon wrote:
         | We see this argument anytime a cloud migration comes up and
         | it's a severe simplification.
         | 
         | Having your own datacenter requires someone to manage that
         | hardware. There are costs for the space you are in.
         | 
         | But you also have to have the hardware for your peak
         | utilization. If you are either very bursty or have significant
         | times of little to no activity you should be able to scale up
         | and down.
         | 
         | Yes a simple 1:1 is likely going to cost more (depending on
         | your scale, if your Cloud bill is under $200 not sure you could
         | really do much better) but then your not utilizing the cloud to
         | its advantage.
         | 
         | That is before going into the savings with things like spot
         | inferences for processes and don't have to be real time.
         | 
         | Yes this doesn't take into account what else they might be
         | running to share resources with. But not every organization is
         | going to be running a bunch of different workloads that can
         | fully utilize their hardware at different times
        
           | marcus0x62 wrote:
           | > We see this argument anytime a cloud migration comes up and
           | it's a severe simplification.
           | 
           | Here's a less simplified example:
           | https://world.hey.com/dhh/the-big-cloud-exit-faq-20274010
        
             | nerdjon wrote:
             | Not every company operates at that scale.
             | 
             | And if you click their first article about making the
             | decision, they even acknowledged what I said about the
             | ability to scale up and down being a major feature of the
             | cloud and then said that does not apply to them anymore but
             | it also was a big advantage for them at one point.
             | 
             | They are an example where moving to their own hardware made
             | sense but for many companies it doesn't.
             | 
             | For me I need to be able to spin up several hundred high
             | power GPU instances for a few hours and then it's quiet for
             | a couple week.
             | 
             | The couple thousand dollars (if that thanks to spot) it
             | costs to run that workload is far better than the cost of
             | that same hardware for machines that would not be doing
             | anything most of the time.
             | 
             | So yes you are still over simplifying the situation
             | ignoring that there are real reasons that a company would
             | use the cloud, which your example even references.
        
               | marcus0x62 wrote:
               | Here's an example at a much smaller scale. https://idlewo
               | rds.com/talks/website_obesity.htm#heavyclouds
               | 
               | Search for "Let me give you a concrete example" if you
               | don't want to read the whole thing.
               | 
               | If you want, I can go ahead and find a "medium" sized
               | example for when you respond back that this example was
               | too small.
               | 
               | > For me I need to be able to spin up several hundred
               | high power GPU instances for a few hours and then it's
               | quiet for a couple week.
               | 
               | That's great. For you and your application. Nobody is
               | saying there is NO application where the cloud makes
               | sense. I certainly didn't say that. But, there are many,
               | many, applications where IaaS/PaaS ends up being much
               | more expensive than on prem and where the flexibility of
               | the cloud is, if not completely irrelevant, just not
               | worth the extra cost.
               | 
               | I'm guessing the scale-out needs of the UK's notification
               | app are going to fall squarely in the category of "way
               | more expensive to run in the cloud," but, hey, who knows?
               | Maybe they sign their notifications in some blockchain
               | ledger on every second Tuesday from 8 - 9:17 AM and need
               | to rent some GPUs occasionally.
        
               | nerdjon wrote:
               | The reason I responded is your original post gave the
               | impression that you think any workload in the cloud will
               | be cheaper on your own hardware.
               | 
               | Which is not true.
               | 
               | I am reluctant to try to make any assumptions about the
               | workflow here since I would assume they had ran the
               | numbers to estimate what their cost would be. Possibly
               | looking at their utilization.
               | 
               | The nature of what they built does seem to be like it
               | would be a burst application. But the details on that are
               | not here and are just assumptions on both of our parts.
               | 
               | Admittedly I responded do you because of this response to
               | your post also:
               | 
               | > anyone intentionally migrating infrastructure to the
               | cloud in 2024 hasnt seen the bill, or is spending the
               | taxpayers pound.
               | 
               | Which again is a simplification of the situation. It's a
               | blanket statement that seems more anti cloud than
               | reality.
               | 
               | Yes there are workloads that make sense to run on your
               | own hardware, but many, many exist that don't make sense
               | either.
        
       | Fluorescence wrote:
       | This is their published reason for shutting down their PaaS:
       | 
       | https://gds.blog.gov.uk/2022/07/12/why-weve-decided-to-decom...
       | 
       | > GOV.UK PaaS has not seen the rapid and continued growth that
       | we've seen with some of our other platform products, and is now
       | at a point where we either invest heavily in some significant
       | technical architecture changes, or we make the difficult decision
       | to sunset the product. We have decided to do the latter
       | 
       | Not terribly convincing. Not seeing "rapid and continued growth"?
       | Feels like the junk words you say as a ceremony for people who
       | sign-off any crime if justified by "growth".
       | 
       | I think I'd need to see transparency on Amazon lobbying and
       | revolving doors with GDS, let's see... oh, look who "advised"
       | them on this move:
       | 
       | https://www.civilserviceworld.com/professions/article/amazon...
       | 
       | > GDS retained Amazon UK boss Doug Gurr as an adviser.
       | 
       | Which gets worse...
       | 
       | > Gurr would have a hand in choosing the government chief digital
       | officer
       | 
       | Seems beyond satire that the Head of Amazon UK has a role in
       | picking in the civil servant that makes hosting choices.
        
       ___________________________________________________________________
       (page generated 2024-08-15 23:01 UTC)