[HN Gopher] How we migrated Gov.uk notify to AWS elastic contain...
___________________________________________________________________
How we migrated Gov.uk notify to AWS elastic container service
Author : dmdmdmdm
Score : 48 points
Date : 2024-08-15 07:40 UTC (15 hours ago)
(HTM) web link (gds.blog.gov.uk)
(TXT) w3m dump (gds.blog.gov.uk)
| Rinzler89 wrote:
| I love the UK Gov for being so transparent and self suficient on
| their digital infrastructure compared to my developed EU country
| where they usually just outsource it to some major consultancy,
| who's in bed with the politicians, which then farms it out to the
| cheapest bidder in Eastern Europe for 10x the amount of money it
| would cost to make it themselves and getting 10x worse quality,
| just so that taxpayer money gets funneled into the politically
| friendly business pockets. Privatize the profits, socialize the
| losses/externalities.
|
| I remember there was a heated debate last year about greed-
| flation in the country as people blamed the large retailers for
| simultaneously jacking up prices in sync leading to much higher
| prices on the same goods compared to neighboring Germany and the
| government said _" well, we could build an online price
| comparison system to track prices and then check the validity of
| these claims, but oh shucks it's probably gonna take us a few
| years and double digit million euros..."_, and then in response
| some guy builds it in a weekends and posts it on Github for free,
| showing how corrupt, clueless and scummy closed source government
| funded digital projects are.
| dijit wrote:
| Given that hyperscaler cloud providers can be in the 5x-11x
| cost increase territory; and AWS/Azure are _definitely_ guilty
| of lobbying governments.... is your comment sarcastic?
|
| EDIT: downvotes? What did I say that's untrue? Are so many
| people really employed by these hyperscalers that they go
| around downvoting things against them? Calm down- you'll have a
| job.
| _joel wrote:
| They used to be several other uk based public cloud providers
| gov.uk used through gCloud.
| _joel wrote:
| Oh, we can still outsource like the best of them, just look up
| the NHS Track and Trace app to see how badly we can do it (or
| line the pockets of certain people, whichever you prefer).
| edent wrote:
| Hello. I worked on that app. You are wrong.
|
| Firstly, "Track and Trace" is what the Post Office do.
| Perhaps you're thinking of "Test and Trace"?
|
| Secondly, the UK Government hasn't had app development skills
| in-house for a long time - see
| https://gds.blog.gov.uk/2013/03/12/were-not-appy-not-appy-
| at... - so there was little choice but to use an external
| provider.
|
| Thirdly, the initial version of the app was built by an
| external team who were already engaged with DHSC. They had
| won a competitive tender (which was published) but, as I'm
| sure you can understand, there wasn't time to run a new one
| for the Contact Tracing app.
|
| Fourthly, if you have evidence that the development of the
| app - which was done quickly, with all source code and design
| documents published as open source, and which saved lives
| (https://www.ox.ac.uk/news/2023-02-22-nhs-covid-19-app-
| saved-...) - was somehow corrupt, I'm sure we'd all like to
| see it.
|
| Fifthly, if you're about to say "PS37bn" - have a read of
| this https://fullfact.org/health/NHS-test-and-trace-
| app-37-billio...
| seagullriffic wrote:
| Did this app have anything to do with the huge QR codes
| which encoded far too much information?
| https://www.revk.uk/2020/09/how-not-to-qr-nhs-c19-app.html
| edent wrote:
| Yes - another bit that I worked on (albeit tangentially).
|
| The QR code stuff was an interesting one. There was a
| worry that people would generate fraudulent codes - hence
| the weird (in my opinion) signing requirements.
|
| Similarly, with a URl there was a risk that people would
| open the page and think that was all they needed to do.
| Hence a code designed to be read by a specific app.
|
| I _think_ (and you 'll have to forgive my slightly hazy
| memory of a difficult time) that it was based on the same
| code New Zealand were using for their check-in service.
| seagullriffic wrote:
| Interesting! That article I linked had lingered in my
| memory for a while, so good to hear a response to it!
| _joel wrote:
| I stand corrected, thanks!
| pjc50 wrote:
| The big scandal was actually PPE
| https://www.theguardian.com/uk-news/2023/dec/17/how-the-
| mich...
| akimbostrawman wrote:
| A foreign hosting provider like AWS is the opposite oft self
| sufficent
| fleischhauf wrote:
| while I agree, I think being able to do this by themselves is
| already way way ahead of German administration capabilities.
| So you need to see this from a positive angle. I for one am
| jealous.
| croes wrote:
| I they try they get sacked by lobbyists.
| seagullriffic wrote:
| Unfortunately this very application, Gov.uk Notify, is
| currently being used by Councils to send emails to residents
| directing them to an outsourced company's website,
| https://www.householdresponse.com, to input sensitive details
| about where they live.
|
| The emails are phishy to the extreme and there's no indication
| or way to verify that it's an official service. See for example
| https://www.bleepingcomputer.com/news/security/uk-gov-keeps-...
|
| While some parts of Gov.uk are done well, there are still
| terrible practices everywhere due to cheapness and ignorance
| and presumably because the Gov UK people can't do everything,
| unfortunately, even though it would be cheaper and better if
| they did.
| b800h wrote:
| Great article. One question -
|
| This is a high-throughput service, so I'm interested in whether
| Python is necessarily the right choice. It could be that it's a
| minor concern and the latency is all elsewhere in the
| architecture anyway. I'd be interested in opinions on here.
| JimDabell wrote:
| This is a notification service. They sent 2.6M emails, 2.8M
| text messages, and 60K letters yesterday [0]. That's about 30
| emails per second, 32 SMS per second, and less than one letter
| per second. That's not nothing, but it doesn't need crazy
| processing efficiency either. Most of the work will be I/O
| bound, just messages sitting in a queue waiting for the
| receiving service to be able to accept them. Python is fine for
| this. You don't need email to be sent ASAP; in fact a lot of
| work goes into making sure you don't send high volume too
| quickly in case you look like spam.
|
| [0]
| https://www.notifications.service.gov.uk/features/performanc...
| ustad wrote:
| Thanks for the extra info. And I agree those numbers are
| nothing to get excited about.
| ustad wrote:
| Why do you think it's great?
|
| And how do you know its high-throughput?
| matthewmacleod wrote:
| The throughput is described as "thousands of requests per
| minute" - with modern hardware that's likely not something
| you'd even have to think twice about. It would probably run
| happily on a laptop!
| b800h wrote:
| Fair point.
| Angostura wrote:
| Background on the decommissioning of the Gov.uk Platform as a
| service (PAS) https://gds.blog.gov.uk/2022/07/12/why-weve-
| decided-to-decom...
| djoldman wrote:
| Anyone know if there is any way to find out what PAS cost the
| government?
| Neil44 wrote:
| FOI request?
| vindex10 wrote:
| I'm wondering how usual is it to host the infrastructure of the
| national services using foreign cloud provider?
| EwanToo wrote:
| It's pretty common, all the biggest clouds are USA or China
| owned.
|
| In the UK government services go through information security
| classification to determine what level of security is needed,
| with the most confidential stuff still being self-hosted.
|
| I assume most countries operate that way.
| b800h wrote:
| There was a UK-based cloud provider. Unfortunately it
| collapsed, leading to a lot of costly replatforming.
|
| https://www.civilserviceworld.com/news/article/cabinet-
| offic...
| everfrustrated wrote:
| Probably couldn't afford their new VMWare bill
| BartjeD wrote:
| In the Netherlands, critical infrastructure is required to be
| hosted in government cloud data centers.
|
| An exception is possible if after a risk assessment and the
| determination that no state secrets may be exposed, a
| government body decided to use a commercial cloud provider.
|
| The private cloud providers list is then filtered by whether or
| not their country of origin / incorporation, or effective
| control, has an effective cyber-control program it runs against
| the Netherlands or against Dutch interests. This arguably
| includes corporate espionage programs.
| vindex10 wrote:
| https://www.notifications.service.gov.uk/features/who-can-
| us...
|
| * central government departments
|
| * local authorities
|
| * the armed forces
|
| * the NHS
|
| * the emergency services
|
| * GP surgeries
|
| * state-funded schools
|
| looks quite critical to me
| BartjeD wrote:
| Yes, personally I don't think it's a good idea to host
| these things with the US companies. As a citizen I prefer
| it's in my own country, unless it's really not critical or
| interesting information / services.
|
| The UK made a different choice.
| patrakov wrote:
| Is the gov.uk website infrastructure compliant with their
| own Cyber Essentials requirements? I very much doubt it,
| as the anti-malware requirements applicable to cloud
| providers that are not using Windows or MacOS ([1],
| section 5, subsection "Requirements", option "Application
| allow listing" on page numbered 14 in the corner) are not
| implementable as worded. Using Azure instead of AWS could
| have helped here.
|
| [1] https://www.ncsc.gov.uk/files/Cyber-Essentials-
| Requirements-...
| djtango wrote:
| > GOV.UK Notify makes it easy for public sector service
| teams to send emails, text messages and letters.
|
| Doesn't seem that critical to me. Important, but doesn't
| pass the sniff test of "is this a matter of national
| security" that would justify self-hosting ultimately
| slowing down development and making it more expensive and
| in effect less feature-rich for taxpayers
|
| EDIT the API docs suggest this is used for sending formal
| Notifications en-masse rather than mission-critical comms
| scaryclam wrote:
| The kinds of messages that get sent via email or text are
| usually pretty unimportant. Important things tend to be
| sent via letter or a phone call.
|
| It's not likely to be anything critical.
| jamessb wrote:
| Gov.Uk Notify does support sending letters (as well as
| email and SMs/text messages):
| https://www.notifications.service.gov.uk/using-notify
| vindex10 wrote:
| In Norway, I received my residence permit by email, and I
| mean the official document.
|
| I stressed a bit when after a year I was trying to find
| the paper letter, until I eventually realized xD
| PontifexMinimus wrote:
| You should also add GCHQ, MI5 and MI6 who all use AWS to
| host top secret material (https://archive.md/n2cNB).
|
| As an IT professional, I would question whether that makes
| sense.
|
| But what do I know? I'm sure the people who run the country
| -- people of the calibre of Liz Truss, no less -- know what
| they're doing!
| nonrandomstring wrote:
| Storage and processing location is a big, big trust issue on
| the world stage. There are all sorts of wobbly notions of
| alignment. And no doubt lots of leverage going on behind.
|
| If you made a democratic poll and asked people, "would you
| like national data stored in your own country or elsewhere?"
| there would be no ambiguity in the answer. And that would not
| be an "uninformed" poll, since matters of public trust should
| direct policy and not technics and economics.
|
| Of course there are good reasons for outsourcing, like
| geographical diversity, but those raise a new and I think
| separate questions like "Who would you trust with our
| backups?". That nuance of examination seems to be missing in
| the UK at present.
| pjc50 wrote:
| > "would you like national data stored in your own country
| or elsewhere?"
|
| And if you ask the question "how much more would you pay to
| host UK data in the UK with UK owned providers only", you
| get the answer PS0. So it doesn't happen.
| nonrandomstring wrote:
| Yes. I mean it's a fair objection to that question as is.
| Many people expect technology to happen magically and for
| free. When it comes to critical infrastructure like
| roads, reservoirs and the army, nobody asks "how much
| would I pay?", because people elected a government to
| make those decisions and raise taxes appropriately.
| Ironically one big missing source of income is fair tax
| on overseas tech. Although we have a body that recognises
| digital as critical national infrastructure [0], some
| people in London haven't got the memo yet.
|
| [0] https://www.ncsc.gov.uk/news/ncsc-warns-of-emerging-
| threat-t...
| 46Bit wrote:
| It's pretty normal for ordinary government workloads in the UK,
| or at least it was at GDS. Using niche suppliers who cater to
| government paranoia is expensive, and they're usually much less
| mature than hyperscaler platforms. It's also open for debate
| whether those niche, inflexible suppliers result in a genuinely
| more hardened target or not.
| pjc50 wrote:
| You have to understand that buying computers comes out of the
| capital budget, and is several times more expensive than just
| leasing them for this year; and that hiring staff runs into
| severe civil service pay issues. Once "buy some computers and
| hire staff to manage them" has been ruled out by politics,
| buying hosting on the open market becomes the remaining
| reasonable choice, and nobody got fired for choosing AWS.
| vidarh wrote:
| You can lease or even rent the servers without paying cloud
| prices, and there's a wide range of companies providing
| devops services on contract. So really, the main reason is
| your last clause - AWS is "safe" even though you might as
| well set cash on fire.
| pjc50 wrote:
| But then you have to run two competitive tenders, one for
| the servers and one for the contract devops. How much does
| that cost and how long does it take?
|
| https://www.fgould.com/uk-europe/articles/cutting-the-
| cost-o...
| vidarh wrote:
| Plenty of companies would happily offer you a package for
| both.
| benrutter wrote:
| Usual: very
|
| Good: Not so much
|
| Unfortunately, cloud provision isn't very competitive and is
| very US/China centric.
|
| I was at a talk recently around how one of the UKs major
| infrastructure providers was building their architecturrle, and
| I was pretty freaked by the level if vendor lock in.
|
| Would love to see more governments viewing this as the security
| risk it is, but I'm not holding my breath.
| dangsux wrote:
| I know a couple of people close to this. They work alongside a
| load of CV-Driven-Development offshore Capgemini employees. They
| had to be mentored by junior members of the team for how to even
| connect to infrastructure.
|
| Bear in mind a mid grade engineer (low grade in the real world)
| is a PS800-1000/day line item. Capgem tell their employees to lie
| about their capabilities to get "bums on seats" for public sector
| contracts. Their own employees are only on 45k-60k for a senior
| engineer.
|
| So much wastage of public funds.
|
| I do wonder why they chose AWS for this when DDaT is primarily
| Azure.
| testplzignore wrote:
| > random.randint(0, 100)
|
| That 0 should be a 1. As written, I think 1 out of 101 requests
| would go to the new target when percent was set to 0.
| VoodooJuJu wrote:
| How is interesting - but why?
| petepete wrote:
| It used to run GOV.UK PaaS which was decommissioned last year,
| all government services had to find new homes - mostly in Azure
| or AWS.
|
| https://gds.blog.gov.uk/2022/07/12/why-weve-decided-to-decom...
| nonrandomstring wrote:
| How would you rebuild a secure national "cloud" service if
| given the mandate and the money? (come and talk to us on
| cybershow if you'd like!)
| pjc50 wrote:
| > if given the mandate and the money?
|
| Well, those are the difficult bits. If you also add in
| "control over staff pay scales", which is the other thing
| needed to make it work, it becomes a relatively simple job.
| Five to six years for planning permission and we could get
| started quickly after that.
|
| https://www.itpro.com/infrastructure/data-
| centres/permission...
| petepete wrote:
| Thank you. I am but a tiny cog in the huge machine and
| wouldn't know where to start when it came to rebuilding a
| national cloud.
|
| edit: read the "Like Click, but with the safety catch taken
| off - Bang!" review and subscribed immediately
| nonrandomstring wrote:
| Thanks. No worries. British "national cloud" will be lots
| of shed clouds wired up together. Or we'll create a storm
| in a teacup. :)
| ustad wrote:
| I'm not impressed.
|
| There is no mention on how much that cost and how much traffic
| the system handles. What about recurring costs?
|
| Moreover, on mobile, the persistent bottom left link to the
| popover (i think cookie consent banner) does not behave correctly
| when you have already scrolled down and leaves a border when
| activated.
| dtech wrote:
| I do wonder why they decided to tie themselves heavily to AWS
| tech over using cloud-agnostic alternatives. You'd think for a
| government the latter has higher value than for private business,
| and even there it's a consideration.
| szszrk wrote:
| Notice that each major cloud vendor has dedicated gov
| regions...
|
| So I guess the tie is larger than it seems at first sight.
| politelemon wrote:
| Ecs isn't exactly tying, because ultimately it's still docker
| containers, so moving out wouldn't be a tricky prospect. A
| cloud agnostic solution though would likely mean k8s and bring
| with it much more complexity and overhead (and is also a form
| of lock in).
| aquaticsunset wrote:
| I half agree with you. We just went through an ECS to EKS
| migration, and we're still incredibly dependent on AWS. The
| hard part isn't the container orchestration system or even
| containerizing your workload - it's all the other crap you
| need to develop and maintain around it. Your databases,
| networking stack, MQ brokers, secrets managers, and
| everything else are still stuck to whatever cloud provider
| you're using.
|
| EKS really isn't much harder to build out than ECS - but it
| doesn't set you up to be much more cloud agnostic.
| marcus0x62 wrote:
| 3 months from now: how we spent hundreds of hours optimizing our
| AWS bill to save 10%
|
| 6 months from now: top ten reasons why it isn't a problem AWS
| costs us twice as much as self hosting.
|
| 12 months from now: how we saved 75% by migrating our app back to
| our "legacy" data center.
| nimbius wrote:
| facts.
|
| anyone intentionally migrating infrastructure to the cloud in
| 2024 hasnt seen the bill, or is spending the taxpayers pound.
|
| "Gov.uk notify" isnt a critical business service. it doesnt
| need multi-az or multi region failover. running a docker
| container itself isnt a feat of achievement.
|
| you could save money over the longterm by running a pair of
| pizzaboxes in Cardiff and Edenborough running orchestrator.
| hell, you could arguably run notify in a pensioners basement
| off a pi powered by a solar panel.
| TeaDude wrote:
| I'm really disappointed that "Edenborough" isn't some
| hitherto unknown place with a similar name to the one I'm
| thinking of.
| nerdjon wrote:
| We see this argument anytime a cloud migration comes up and
| it's a severe simplification.
|
| Having your own datacenter requires someone to manage that
| hardware. There are costs for the space you are in.
|
| But you also have to have the hardware for your peak
| utilization. If you are either very bursty or have significant
| times of little to no activity you should be able to scale up
| and down.
|
| Yes a simple 1:1 is likely going to cost more (depending on
| your scale, if your Cloud bill is under $200 not sure you could
| really do much better) but then your not utilizing the cloud to
| its advantage.
|
| That is before going into the savings with things like spot
| inferences for processes and don't have to be real time.
|
| Yes this doesn't take into account what else they might be
| running to share resources with. But not every organization is
| going to be running a bunch of different workloads that can
| fully utilize their hardware at different times
| marcus0x62 wrote:
| > We see this argument anytime a cloud migration comes up and
| it's a severe simplification.
|
| Here's a less simplified example:
| https://world.hey.com/dhh/the-big-cloud-exit-faq-20274010
| nerdjon wrote:
| Not every company operates at that scale.
|
| And if you click their first article about making the
| decision, they even acknowledged what I said about the
| ability to scale up and down being a major feature of the
| cloud and then said that does not apply to them anymore but
| it also was a big advantage for them at one point.
|
| They are an example where moving to their own hardware made
| sense but for many companies it doesn't.
|
| For me I need to be able to spin up several hundred high
| power GPU instances for a few hours and then it's quiet for
| a couple week.
|
| The couple thousand dollars (if that thanks to spot) it
| costs to run that workload is far better than the cost of
| that same hardware for machines that would not be doing
| anything most of the time.
|
| So yes you are still over simplifying the situation
| ignoring that there are real reasons that a company would
| use the cloud, which your example even references.
| marcus0x62 wrote:
| Here's an example at a much smaller scale. https://idlewo
| rds.com/talks/website_obesity.htm#heavyclouds
|
| Search for "Let me give you a concrete example" if you
| don't want to read the whole thing.
|
| If you want, I can go ahead and find a "medium" sized
| example for when you respond back that this example was
| too small.
|
| > For me I need to be able to spin up several hundred
| high power GPU instances for a few hours and then it's
| quiet for a couple week.
|
| That's great. For you and your application. Nobody is
| saying there is NO application where the cloud makes
| sense. I certainly didn't say that. But, there are many,
| many, applications where IaaS/PaaS ends up being much
| more expensive than on prem and where the flexibility of
| the cloud is, if not completely irrelevant, just not
| worth the extra cost.
|
| I'm guessing the scale-out needs of the UK's notification
| app are going to fall squarely in the category of "way
| more expensive to run in the cloud," but, hey, who knows?
| Maybe they sign their notifications in some blockchain
| ledger on every second Tuesday from 8 - 9:17 AM and need
| to rent some GPUs occasionally.
| nerdjon wrote:
| The reason I responded is your original post gave the
| impression that you think any workload in the cloud will
| be cheaper on your own hardware.
|
| Which is not true.
|
| I am reluctant to try to make any assumptions about the
| workflow here since I would assume they had ran the
| numbers to estimate what their cost would be. Possibly
| looking at their utilization.
|
| The nature of what they built does seem to be like it
| would be a burst application. But the details on that are
| not here and are just assumptions on both of our parts.
|
| Admittedly I responded do you because of this response to
| your post also:
|
| > anyone intentionally migrating infrastructure to the
| cloud in 2024 hasnt seen the bill, or is spending the
| taxpayers pound.
|
| Which again is a simplification of the situation. It's a
| blanket statement that seems more anti cloud than
| reality.
|
| Yes there are workloads that make sense to run on your
| own hardware, but many, many exist that don't make sense
| either.
| Fluorescence wrote:
| This is their published reason for shutting down their PaaS:
|
| https://gds.blog.gov.uk/2022/07/12/why-weve-decided-to-decom...
|
| > GOV.UK PaaS has not seen the rapid and continued growth that
| we've seen with some of our other platform products, and is now
| at a point where we either invest heavily in some significant
| technical architecture changes, or we make the difficult decision
| to sunset the product. We have decided to do the latter
|
| Not terribly convincing. Not seeing "rapid and continued growth"?
| Feels like the junk words you say as a ceremony for people who
| sign-off any crime if justified by "growth".
|
| I think I'd need to see transparency on Amazon lobbying and
| revolving doors with GDS, let's see... oh, look who "advised"
| them on this move:
|
| https://www.civilserviceworld.com/professions/article/amazon...
|
| > GDS retained Amazon UK boss Doug Gurr as an adviser.
|
| Which gets worse...
|
| > Gurr would have a hand in choosing the government chief digital
| officer
|
| Seems beyond satire that the Head of Amazon UK has a role in
| picking in the civil servant that makes hosting choices.
___________________________________________________________________
(page generated 2024-08-15 23:01 UTC)