[HN Gopher] Running servers and services well is not trivial (2018)
___________________________________________________________________
Running servers and services well is not trivial (2018)
Author : rognjen
Score : 102 points
Date : 2022-03-26 11:37 UTC (11 hours ago)
(HTM) web link (utcc.utoronto.ca)
(TXT) w3m dump (utcc.utoronto.ca)
| goodpoint wrote:
| Sooner or later the pendulum will swing back, hopefully, and
| we'll have a new generation of secure and easy to maintain local
| services.
| nijave wrote:
| Article leaves out cost of migration. If you're trying to get
| better uptime, you're likely going to incur a downtime migrating
| everything over so you've already taken a few step back uptime
| wise unless the platform you're moving off of is already
| horrible.
|
| The article touches on it, but there's also compliance concerns
| (access control is mentioned, also retention policies, DR,
| ability to redact/remove improperly added data)
|
| They briefly mentioned monitoring but, like auth, can be non
| trivial for a well built system. Something like email is easy
| until the entire system is down and the alerts don't go out
| anymore (so you need an external monitoring system)
|
| I also don't see labor cost mentioned very often. How many hours
| will it take to support?
| johnklos wrote:
| "I'm new to systems administration, and this thing is hard for me
| and for many others, so we should all stop doing it."
|
| Does that about sum it up?
|
| Why are there even hackers? People should stop doing tricky
| things because others might find those things difficult. We're
| making them uncomfortable and should stop.
| wintermutestwin wrote:
| Isn't the real answer: This thing is unnecessarily hard, so
| let's get together and fix it.
|
| One clear thing that needs fixing: Linux desperately needs
| definitive how tos for every common thing an admin/user might
| need to do posted and maintained on a distro specific/owned
| site.
|
| As it is now, when I want to learn how to do {thing1}, I have
| to sift through a complex maze of stack questions, blog posts,
| youtube videos, etc. Many are ancient, don't really apply to
| the distro I'm on, fail to mention that there are other ways of
| doing it (some of which might be better for a given situation),
| etc.
|
| Then, when I finally settle on a howto, it fails to work and
| then I burn hours troubleshooting and tweaking. Eventually, I
| get it to work but when I reflect on what it took to get there,
| I couldn't really follow the breadcrumbs of my frustrated
| efforts well enough to document it for posterity.
|
| Eventually, I run into problems with {thing2} and I realize
| that the fastest way to troubleshoot is to wipe the box and
| start clean, but I can't because recreating {thing1} is a
| multi-hour task.
| erulabs wrote:
| It's just a matter of tooling. Do you think AWS logs into the
| servers that back your service and care for them individually (as
| the article imagines caring for your own server)?
|
| Not even in the least - they treat them like cattle. If data
| center scale tools were easier to use for the average engineer -
| the gap between managed and self-hosted would _start_ to close.
| Obviously, software, tooling, process, scale: there are plenty of
| huge challenges to making self-hosting viable. Personally, I see
| it as far more doable and less radical than trying to distribute
| the internet in any other fashion (blockchain or otherwise).
| azornathogron wrote:
| I agree that tools can and should be improved, but it's not
| _just_ a matter of tooling.
|
| If you're running 10,000 machines then you divide your
| management costs across them and treat them as cattle and end
| up spending (not real numbers, obviously) 0.02 person-days-per-
| month-per-machine or whatever on managing them. But that
| doesn't mean that with the same tooling you could run just one
| machine with just 0.02 person-days-per-month, because a lot of
| the benefit you're getting from scale is the ability to make
| one decision and do _something_ to all 10,000 machines at once,
| and it 's the decision that takes time and effort.
| WJW wrote:
| I agree that it's a tooling issue, but I don't agree that the
| gap between managed and self-hosted will ever start to close
| again. IMO, modern IT tooling is having an industrial
| revolution moment where economies of scale really start to push
| out small scale operators. Back in the day every village would
| have their own blacksmith, but as steel mills scaled up fewer
| and fewer of them were needed. These days no hobbyist
| metalworker can ever hope to keep up with the accuracy and
| speed of professionals running multi-million CNC machines.
| Similarly, big tech companies will keep developing tools that
| provide total cost-of-ownership several orders cheaper than
| what hobbyists achieve, but will need to be operated by entire
| teams of people. The complexity of Kubernetes is only the
| start.
|
| No consumer refines their own petroleum or makes their own
| steel these days, except maybe as a hobby. Similarly I don't
| think the market share of consumers who host their own email,
| git servers or payment infrastructure will ever rise again.
| spicyusername wrote:
| > Similarly I don't think the market share of consumers who
| host their own email, git servers or payment infrastructure
| will ever rise again.
|
| Every IT shop I've ever interacted with has a sizable on-prem
| footprint. And an even larger self-managed footprint if we
| include things that are run in the cloud but are managed by
| the IT shop itself (i.e. Red Hat OpenShift on AWS, HashiCorp
| Vault in Azure, GitLab in Alibaba, etc). So I think we've yet
| to see the initial demise of that paradigm.
|
| Even if the trend is that most Enterprises are moving towards
| SaaS and PaaS products, I think we still have a long way to
| go until the majority of IT infrastructure is managed by a
| third-party.
| cassepipe wrote:
| Fossil has built-in web interface and does not require a central
| server :
|
| "Fossil does not require a central server. Data sharing and
| synchronization can be entirely peer-to-peer. Fossil uses
| conflict-free replicated data types to ensure that (in the limit)
| all participating peers see the same content. "
|
| ...but still it is easy to set up on a central server :
|
| https://fossil-scm.org/home/doc/trunk/www/server/whyuseaserv...
| https://fossil-scm.org/home/doc/trunk/www/webui.wiki
| https://fossil-scm.org/home/doc/trunk/www/server/
| mountainriver wrote:
| This is why Kubernetes is complex, people who shout that need
| haven't internalized articles like this
| a-dub wrote:
| they forgot "document everything you did so that your replacement
| can recreate everything exactly as it was from instructions and
| backups"
| sokoloff wrote:
| That's one of the appeals of infrastructure as code. For my own
| hobby work (where I might put down a project for months between
| serious work sessions, I often _am that replacement_ ). Some
| projects I have well enough structured or documented to pick
| right back inside of 30 minutes; others are more archeological
| in nature.
| cersa8 wrote:
| I wish there was a statically typed version of Ansible. There
| is Pulumi but it's mostly tied to cloud API's and not Linux
| system administration (setting up HAProxy, NGINX, PostgreSQL,
| users)
| edude03 wrote:
| Me too, though nix (https://nixos.org) and eventually
| nickel (https://nickel-lang.org) get you very close today.
| baq wrote:
| Not exactly that, but definitely take a look:
| https://dhall-lang.org/
| nijave wrote:
| I wish Ansible just had better data handling. Adding
| variable assignment plays and jamming things in Jinja is
| pretty clunky (if you want to, say, pull some data from a
| REST API and loop over a subset of the info)
|
| You could always make an Ansible module but then there's
| the overhead to managing/installing that
|
| Data wrangling can also make idempotent playbooks a bit
| clunky, too. You get into this 2-4 play "run check, reshape
| results, conditionally run play" pattern
| pid-1 wrote:
| Most software is hostile to IaC, unfortunately.
|
| Even cloud providers, who have APIs for everything, will fuck
| up at making services that can be deployed as code in a sane
| way.
| sokoloff wrote:
| If I'm being honest with myself, more of my side projects
| have a bin/publish script than are actually IaC.
| maccard wrote:
| bin/publish.sh is better than remembering whether the
| service is on kubernetes, containers or a raw ec2 box,
| and "do I need to restart the service after update or
| just wait"
| voiper1 wrote:
| Not just your replacement!
|
| If I have to check or change something or recreate it even 6
| months later I often have no clue what the heck I did.
| throwaway984393 wrote:
| "If it's not written down it doesn't exist"
| annoyingnoob wrote:
| I've been running infrastructure for a long time. I feel like all
| of the work is just part of my job, things I do regularly. You
| could replace git with some other service/software and each has
| its own requirements and considerations. I would not call my work
| trivial but running git and lots of other things well is
| completely reasonable.
| throwmeariver1 wrote:
| The problem is not with the services itself it's more in
| regards of infrastructure, backups etc. If you run the services
| yourself there is not a single point of failure as you would
| have with a SAAS.
| xyzzy123 wrote:
| Also team considerations. When you add an important moving
| part (all your devs will complain bitterly when it breaks and
| their work will be significantly impacted) you need at least
| 2 people who know how to keep it running _at a minimum_ , and
| more would be better.
| krnlpnc wrote:
| Writing good code is not trivial either, so while at it just buy
| a SAAS.
| speedgoose wrote:
| Buying SaaS is not trivial either.
| maccard wrote:
| I do most of the service work for a team of 25 - half
| developers and half non technical product people
| (art/design). Saas is substantially easier than writing it
| yourself or even running it yourself.
|
| Github costs $4/user/month, sentry costs $25/month, and 2
| digitalocean k8s clusters costs $20/month. For $175/month I
| can have a decelopment environment with basically 0
| maintenance for a team of 25 people, including monitoring and
| alerting for my app.
|
| Compared to running a local gitlab instance, deploying
| OpenTelemetry and running my own k8s cluster in aws, it's a
| complete no brainer to buy SAAS.
| speedgoose wrote:
| Yes almost everyone does that. But outside of the common
| SaaS services, all SaaS companies with a "contact us" price
| I contacted came back with a "fuck you" offer.
| maccard wrote:
| I've found the exact opposite. I work in games and many
| of the vendors we deal with don't publicly post their
| prices, but I've found their pricing fair and
| competitive. It's not "race to the bottom $4/user/month"
| but it's usually a fair quote.
| speedgoose wrote:
| That's nice. I work in the data and cloud industry and
| everyone thinks their customers are MoneyBagsInc, which
| is very annoying.
| simfree wrote:
| Same experience here with fiber and telecom services.
| They quote crazy prices for services I already have from
| their competitors at the lower published rates.
|
| These potential vendors have zero or near zero additional
| cost to activate service (eg: the OptiTap is only a few
| tens of feet from where an ONT would be install), yet the
| sales reps call me every few months asking when I want to
| light up service and aren't ready to even talk price or
| SLA, despite knowing what we pay their competitor.
| orasiscore wrote:
| Nothing is trivial
| paxys wrote:
| It is trivial if you have the budget
| kqr wrote:
| This is one thing feel intuitively but have trouble arguing. When
| a solution requires running a server, I mentally count it as much
| more expensive than if it's done locally and synchronously.
|
| I've noticed not everyone shares this bias, and I'm wondering if
| I'm unnecessarily conservative or other people are
| underestimating maintenance costs.
| MereInterest wrote:
| By "requires running a server", do you mean that it requires
| you personally to run a server, or that it requires the seller
| of the solution to run a server? Asking because those are two
| very different costs to me. For the former, it means that there
| is time and money investment in running the server, but that my
| continued use is only limited by my willingness to keep the
| server up. For the latter, it means that the solution may be
| end-of-lifed at any time, and I have no control over when that
| happens.
| jeffalyanak wrote:
| While there's definitely always a cost to consider when running
| a server, I think a good portion of that cost can be reduced
| with the right tooling an expertise.
|
| That's not to say that a experienced team can run an infinite
| number of arbitrary services without cost or anything like
| that, though. There may be a select few situations where the
| cost of deployment and maintenance is negligible, but that's
| going to be the exception rather than the rule.
| [deleted]
| rhizome wrote:
| > _I 'm wondering if I'm unnecessarily conservative or other
| people are underestimating maintenance costs._
|
| Check out some of the HN posts where people talk about how much
| companies spend on AWS and calculate how many sysadmin/devop
| salaries those bills could pay for (it is commonly >1).
| Probably even easier would be to find a company that's about
| 10-15 years old and see how much their tech spend declined when
| they switched to the cloud. ;)
| pid-1 wrote:
| "just run your own git server" comments always make me scratch
| my head.
|
| IMO folks managing services for personal use vastly
| underestimate how much harder everything can get in an actual
| business environment.
| marginalia_nu wrote:
| A git server is just a computer with ssh access, though. Git
| itself is designed in a way where it doesn't even need a
| server.
|
| If you keep it simple, it stays simple.
| jjtheblunt wrote:
| So what's running on machine X so that it can function as a
| remote for git pull etc?
| TacticalCoder wrote:
| > So what's running on machine X so that it can function
| as a remote for git pull etc?
|
| SSH and Git.
| nijave wrote:
| Sure but the article isn't talking about a "git" server.
| It's talking about replacing Github with Gitlab which is a
| fully integrated developer workflow tool with code review
| functionality. They use "Git server" a lot in the article,
| but the opening and closing both specifically mention a
| Github replacement.
| marginalia_nu wrote:
| Right, but git itself has no concept of code reviews, if
| that's what you want, maybe you shouldn't be looking for
| a git server, but some mechanism for code reviews.
| jeltz wrote:
| Yes? That is presumably exactly what they are looking
| for. They just worded it poorly.
|
| The reason people use GitHub and Gitlab is usually not
| because they want a git server. For that there are much
| better tools like gitolite.
| maccard wrote:
| A postgres server is just a computer with 5432 open instead
| of 22.
|
| > If you keep it simple, it stays simple.
|
| Things that work well for one person in isolation don't
| work at scale or for teams. How do you handle
| authentication for your git server? What about backups?
| Manage disk space? Updates? That's before you get to the
| point of dealing with workflows and integrations, or "it's
| slow when 10 people clone at the same time"
| hedora wrote:
| You need to have port 22 open to manage the postgres
| server. Postgres also (probably) has a worse track record
| with remotely exploitable bugs than openssh.
|
| Answering your other questions:
|
| Authentication: ssh keys
|
| Backups: the same way you back up the rest of the machine
| (s3 snapshots of ebs?), or run a second server at a
| different site with a cron job that runs "git fetch
| --all" or whatever.
|
| Manage disk space: It's not the 90's anymore. How are you
| running a 1TB machine out of disk space with a git repo?
|
| Updates: Enable unattended updates in whatever distro you
| are running. If you are running a separate backup server,
| pick more than one upstream operating system (redhat,
| Debian, arch, BSD), so a botched update won't break both.
|
| Git hooks work fine for workflows and integrations.
|
| Is it really slow when 10 people clone at once? How is
| that even possible on modern hardware with 100's GB of
| RAM and dozens of cores?
| marginalia_nu wrote:
| Every cloned git repository is a backup. That's like the
| entire point of git. It's decentralized version control.
| This is why the notion of a "git server" is a bit of an
| oxymoron.
|
| Most of the things you're struggling to solve are
| effectively preventing it from actually working as
| intended.
| Hackbraten wrote:
| But what would your disaster recovery process look like?
| marginalia_nu wrote:
| $ git clone ...
| RealStickman_ wrote:
| You don't want to ask your users for their local backup
|
| Edit: To expand on my comment a bit.
|
| 1. You will have to check with every user when they last
| pulled their repos and/or made any local change and
| wanted to push it.
|
| 2. While your git is offline and you're figuring out
| which version is the most up to date your users can't do
| any work with git
|
| 3. You just lost all your issues, pull requests, wiki
| articles and more that isn't stored in git
|
| 4. Making backups is your job as a systems administrator
| and you just failed spectacularly
| maccard wrote:
| > Every cloned git repository is a backup
|
| Is it? If I checksum the .git folder on my workstation
| and my co-workers workstation they're going to come back
| different. There's no guarantees that I haven't rebased
| main, or that I have all of the branches that were stored
| on the remote. If something catastrophic happens to our
| main remote, which one of our versions do we restore to?
|
| > It's decentralized version control.
|
| Just because git is decentralised, doesn't mean that it
| can only be used in a decentralised way. How many teams
| are pushing/pulling like a p2p network, and deploying to
| servers/clients from their workstations and verifying
| that the commit hash of their local repository matches
| what's deployed? A vanishingly small number of people.
|
| > Most of the things you're struggling to solve are
| effectively preventing it from actually working as
| intended
|
| If everyone is using it wrong, the tool is wrong. There
| are billion dollar companies out there that are based on
| a centralised git service, which proves that people can
| (and do) use tools in the way that makes sense, not
| necessarily as they were designed. Personally I'm glad I
| don't have to share patches over mailing lists with my
| coworkers, but you do you.
| senko wrote:
| I believe the core of your argument is that GitHub and
| GitLab provide more than just git DVCS. I don't think
| anyone argues with that.
|
| However, this core argument is obscured by a very
| emotional rejection of what the parent is saying - that
| you don't _always_ need these additional things, and that
| you can (sometimes? often?) keep things simple. I think
| that 's an interesting point to discuss.
|
| > If something catastrophic happens to our main remote,
| which one of our versions do we restore to?
|
| Dunno, talk it through? I hope you have a good enough
| relationship with your coworker that you can discuss your
| work with them.
|
| > Just because git is decentralised, doesn't mean that it
| can only be used in a decentralised way
|
| The OP not only did not say git can _only_ be used in a
| decentralised way, they actually mentioned a git server -
| ie. a central point.
|
| > There are billion dollar companies out there that are
| based on a centralised git service, which proves that
| people can (and do) use tools in the way that makes
| sense, not necessarily as they were designed.
|
| Nobody argued otherwise. _But_ , it is also true that
| there are billion-dollar companies out there that use an
| internal git service. How do I know that? Both GitHub and
| GitLab sell on-premises versions to those types of
| companies :)
|
| > Personally I'm glad I don't have to share patches over
| mailing lists with my coworkers, but you do you.
|
| Rationally, this argument is so off it can only be result
| of an emotional outburst. OP never mentioned sharing
| patches over mailing lists, and has in fact stated that
| it's easy to host git server.
|
| I understand and respect your argument and agree GitHub,
| GitLab and others provide valuable service. But gees,
| chill out, man. https://xkcd.com/386/
| nemetroid wrote:
| > There's no guarantees that I haven't rebased main,
|
| You may have rebased your local main branch, but that
| doesn't affect your origin/main reference.
|
| > or that I have all of the branches that were stored on
| the remote.
|
| Everytime you pull or fetch, you get all the branches
| stored on the remote. Of course, you're not going to have
| any branches that were added after the last time you
| communicated with the remote.
|
| > If something catastrophic happens to our main remote,
| which one of our versions do we restore to?
|
| The origin/main that's the most recent.
| nijave wrote:
| I don't think it's necessarily just "personal use". The
| complexity quickly scales with the company size. Replacing
| Github with Gitlab might be a week-long project at a small
| company but could easily turn into a multi-year project at a
| large one.
|
| It's not just a technical problem, either. Bigger companies
| tend to have more expertise-oriented teams (security,
| compliance, developer tooling, operations, internal
| infrastructure) which tends to make decisions more difficult
| than when a single person or team can do it themselves.
| vlunkr wrote:
| > Replacing Github with Gitlab might be a week-long project
| at a small company
|
| A week-long project initially. Now you have to install
| updates, set up and maintain secure access, reboot or
| troubleshoot when it dies, etc. Installing things is the
| easy part.
| netizen-936824 wrote:
| What kind of a company doesn't have at least one
| sysadmin, I feel like that's kind of a critical position
| for maintaining systems?
| buffet_overflow wrote:
| The problem is when they have more services than
| sysadmins. While the sysadmins are busy upgrading the git
| server, the logging infrastructure suffers. They pivot to
| work on that, now the CI/CD server is down/slow/randomly
| breaking. But the sysadmin that knew the ins and outs of
| it left last quarter so the new sysadmins don't want to
| touch it. Oh, and management doesn't prioritize any of
| this stuff, so actually jumping two versions on the git
| server is a much bigger, fragile ordeal now than it was a
| month ago.
| CameronNemo wrote:
| Exactly.
|
| Our team of 6 sysadmins manages:
|
| - DNS appliances, storage appliances, NTP appliances,
|
| - hypervisors, Dev/stage/prod k8s clusters, some other
| k8s clusters
|
| - dev/prod Elasticsearch/Logstash/Kibana clusters
|
| - internal GitLab, Jira, Confluence, nautobot, OpenDCIM,
| a deprecated Twiki
|
| - several internal custom apps
|
| - Probably more I am forgetting.
|
| Nothing gets patched consistently. Everything is
| neglected to a certain degree.
| rhizome wrote:
| I'm not going to die on this hill, but that seems like a
| lot of complexity for a company with in-house skills,
| maybe even the worst of both (in-house vs cloud/managed)
| worlds.
| icedchai wrote:
| You'd be surprised. Most dedicated sysadmins and DBA
| positions were done away with as part of the "devops"
| movement, especially as everything moved to the cloud and
| there were no longer physical systems to maintain.
| Developers are smart, they can just do that work too,
| right? It's all just typing. /s
| coward123 wrote:
| Loads of non-profits and small businesses that have
| surprisingly demanding tech needs but can't afford /
| don't understand / can't manage / have only a part-time
| need, etc.
| netizen-936824 wrote:
| Its literally essential to have solid tech these days. I
| don't understand how businesses think they can operate
| while skimping or even skipping a halfway decently funded
| IT dept
| WJW wrote:
| It's not even close to essential, since by far the
| majority of businesses are running without anything close
| to what people on HN would consider adequate. It only
| becomes a problem if too many of your competitors have
| good tech and even then only if your industry depends on
| (software) tech. Companies like bakeries and building
| contractors can run fine with shitty IT and even if their
| competitors do IT better, no customer is going to drive
| further to get a bread from someone who has proper
| backups instead of from the closest shop.
| coward123 wrote:
| No kidding. I would have preferred to migrate the client
| to GitHub, but they were convinced that wasn't acceptable
| - had to run their own GitLab. Turned into a mess of
| managing Gitlab updates and BS rather than working on the
| product.
| [deleted]
| gurjeet wrote:
| I wrote a few browser based small apps at q.ht [1], in the
| hopes that once written and tested, I will never have to
| maintain that code or infrastructure.
|
| While most of those tchotchke apps are still still functioning
| as designed, the PDF generator (that uses pdf.js) in Life-in-
| Weeks app [2] has somehow broken apart. It doesn't generate the
| PDFs like it used to.
|
| Despite so much of care and effort put into making these
| decisions, to not have to maintain/upkeep the software, the
| utopia remains elusive.
|
| [1]: http://q.ht (served via Github Pages, see
| https://github.com/gurjeet/q.ht)
|
| [2]: http://q.ht/life-in-weeks-on-one-sheet/
| jasode wrote:
| _> When a solution requires running a server, I mentally count
| it as much more expensive than if it's done locally and
| synchronously._
|
| Are you comparing server vs local desktop software?
|
| The author's article is actually comparing server-SaaS vs
| server-on-premise (or server-self-managed-cloud-vm-container).
| kqr wrote:
| I'm comparing anything that requires uptime monitoring to
| something that does not, in essence.
___________________________________________________________________
(page generated 2022-03-26 23:01 UTC)