[HN Gopher] Running servers and services well is not trivial (2018)
       ___________________________________________________________________
        
       Running servers and services well is not trivial (2018)
        
       Author : rognjen
       Score  : 102 points
       Date   : 2022-03-26 11:37 UTC (11 hours ago)
        
 (HTM) web link (utcc.utoronto.ca)
 (TXT) w3m dump (utcc.utoronto.ca)
        
       | goodpoint wrote:
       | Sooner or later the pendulum will swing back, hopefully, and
       | we'll have a new generation of secure and easy to maintain local
       | services.
        
       | nijave wrote:
       | Article leaves out cost of migration. If you're trying to get
       | better uptime, you're likely going to incur a downtime migrating
       | everything over so you've already taken a few step back uptime
       | wise unless the platform you're moving off of is already
       | horrible.
       | 
       | The article touches on it, but there's also compliance concerns
       | (access control is mentioned, also retention policies, DR,
       | ability to redact/remove improperly added data)
       | 
       | They briefly mentioned monitoring but, like auth, can be non
       | trivial for a well built system. Something like email is easy
       | until the entire system is down and the alerts don't go out
       | anymore (so you need an external monitoring system)
       | 
       | I also don't see labor cost mentioned very often. How many hours
       | will it take to support?
        
       | johnklos wrote:
       | "I'm new to systems administration, and this thing is hard for me
       | and for many others, so we should all stop doing it."
       | 
       | Does that about sum it up?
       | 
       | Why are there even hackers? People should stop doing tricky
       | things because others might find those things difficult. We're
       | making them uncomfortable and should stop.
        
         | wintermutestwin wrote:
         | Isn't the real answer: This thing is unnecessarily hard, so
         | let's get together and fix it.
         | 
         | One clear thing that needs fixing: Linux desperately needs
         | definitive how tos for every common thing an admin/user might
         | need to do posted and maintained on a distro specific/owned
         | site.
         | 
         | As it is now, when I want to learn how to do {thing1}, I have
         | to sift through a complex maze of stack questions, blog posts,
         | youtube videos, etc. Many are ancient, don't really apply to
         | the distro I'm on, fail to mention that there are other ways of
         | doing it (some of which might be better for a given situation),
         | etc.
         | 
         | Then, when I finally settle on a howto, it fails to work and
         | then I burn hours troubleshooting and tweaking. Eventually, I
         | get it to work but when I reflect on what it took to get there,
         | I couldn't really follow the breadcrumbs of my frustrated
         | efforts well enough to document it for posterity.
         | 
         | Eventually, I run into problems with {thing2} and I realize
         | that the fastest way to troubleshoot is to wipe the box and
         | start clean, but I can't because recreating {thing1} is a
         | multi-hour task.
        
       | erulabs wrote:
       | It's just a matter of tooling. Do you think AWS logs into the
       | servers that back your service and care for them individually (as
       | the article imagines caring for your own server)?
       | 
       | Not even in the least - they treat them like cattle. If data
       | center scale tools were easier to use for the average engineer -
       | the gap between managed and self-hosted would _start_ to close.
       | Obviously, software, tooling, process, scale: there are plenty of
       | huge challenges to making self-hosting viable. Personally, I see
       | it as far more doable and less radical than trying to distribute
       | the internet in any other fashion (blockchain or otherwise).
        
         | azornathogron wrote:
         | I agree that tools can and should be improved, but it's not
         | _just_ a matter of tooling.
         | 
         | If you're running 10,000 machines then you divide your
         | management costs across them and treat them as cattle and end
         | up spending (not real numbers, obviously) 0.02 person-days-per-
         | month-per-machine or whatever on managing them. But that
         | doesn't mean that with the same tooling you could run just one
         | machine with just 0.02 person-days-per-month, because a lot of
         | the benefit you're getting from scale is the ability to make
         | one decision and do _something_ to all 10,000 machines at once,
         | and it 's the decision that takes time and effort.
        
         | WJW wrote:
         | I agree that it's a tooling issue, but I don't agree that the
         | gap between managed and self-hosted will ever start to close
         | again. IMO, modern IT tooling is having an industrial
         | revolution moment where economies of scale really start to push
         | out small scale operators. Back in the day every village would
         | have their own blacksmith, but as steel mills scaled up fewer
         | and fewer of them were needed. These days no hobbyist
         | metalworker can ever hope to keep up with the accuracy and
         | speed of professionals running multi-million CNC machines.
         | Similarly, big tech companies will keep developing tools that
         | provide total cost-of-ownership several orders cheaper than
         | what hobbyists achieve, but will need to be operated by entire
         | teams of people. The complexity of Kubernetes is only the
         | start.
         | 
         | No consumer refines their own petroleum or makes their own
         | steel these days, except maybe as a hobby. Similarly I don't
         | think the market share of consumers who host their own email,
         | git servers or payment infrastructure will ever rise again.
        
           | spicyusername wrote:
           | > Similarly I don't think the market share of consumers who
           | host their own email, git servers or payment infrastructure
           | will ever rise again.
           | 
           | Every IT shop I've ever interacted with has a sizable on-prem
           | footprint. And an even larger self-managed footprint if we
           | include things that are run in the cloud but are managed by
           | the IT shop itself (i.e. Red Hat OpenShift on AWS, HashiCorp
           | Vault in Azure, GitLab in Alibaba, etc). So I think we've yet
           | to see the initial demise of that paradigm.
           | 
           | Even if the trend is that most Enterprises are moving towards
           | SaaS and PaaS products, I think we still have a long way to
           | go until the majority of IT infrastructure is managed by a
           | third-party.
        
       | cassepipe wrote:
       | Fossil has built-in web interface and does not require a central
       | server :
       | 
       | "Fossil does not require a central server. Data sharing and
       | synchronization can be entirely peer-to-peer. Fossil uses
       | conflict-free replicated data types to ensure that (in the limit)
       | all participating peers see the same content. "
       | 
       | ...but still it is easy to set up on a central server :
       | 
       | https://fossil-scm.org/home/doc/trunk/www/server/whyuseaserv...
       | https://fossil-scm.org/home/doc/trunk/www/webui.wiki
       | https://fossil-scm.org/home/doc/trunk/www/server/
        
       | mountainriver wrote:
       | This is why Kubernetes is complex, people who shout that need
       | haven't internalized articles like this
        
       | a-dub wrote:
       | they forgot "document everything you did so that your replacement
       | can recreate everything exactly as it was from instructions and
       | backups"
        
         | sokoloff wrote:
         | That's one of the appeals of infrastructure as code. For my own
         | hobby work (where I might put down a project for months between
         | serious work sessions, I often _am that replacement_ ). Some
         | projects I have well enough structured or documented to pick
         | right back inside of 30 minutes; others are more archeological
         | in nature.
        
           | cersa8 wrote:
           | I wish there was a statically typed version of Ansible. There
           | is Pulumi but it's mostly tied to cloud API's and not Linux
           | system administration (setting up HAProxy, NGINX, PostgreSQL,
           | users)
        
             | edude03 wrote:
             | Me too, though nix (https://nixos.org) and eventually
             | nickel (https://nickel-lang.org) get you very close today.
        
             | baq wrote:
             | Not exactly that, but definitely take a look:
             | https://dhall-lang.org/
        
             | nijave wrote:
             | I wish Ansible just had better data handling. Adding
             | variable assignment plays and jamming things in Jinja is
             | pretty clunky (if you want to, say, pull some data from a
             | REST API and loop over a subset of the info)
             | 
             | You could always make an Ansible module but then there's
             | the overhead to managing/installing that
             | 
             | Data wrangling can also make idempotent playbooks a bit
             | clunky, too. You get into this 2-4 play "run check, reshape
             | results, conditionally run play" pattern
        
           | pid-1 wrote:
           | Most software is hostile to IaC, unfortunately.
           | 
           | Even cloud providers, who have APIs for everything, will fuck
           | up at making services that can be deployed as code in a sane
           | way.
        
             | sokoloff wrote:
             | If I'm being honest with myself, more of my side projects
             | have a bin/publish script than are actually IaC.
        
               | maccard wrote:
               | bin/publish.sh is better than remembering whether the
               | service is on kubernetes, containers or a raw ec2 box,
               | and "do I need to restart the service after update or
               | just wait"
        
         | voiper1 wrote:
         | Not just your replacement!
         | 
         | If I have to check or change something or recreate it even 6
         | months later I often have no clue what the heck I did.
        
         | throwaway984393 wrote:
         | "If it's not written down it doesn't exist"
        
       | annoyingnoob wrote:
       | I've been running infrastructure for a long time. I feel like all
       | of the work is just part of my job, things I do regularly. You
       | could replace git with some other service/software and each has
       | its own requirements and considerations. I would not call my work
       | trivial but running git and lots of other things well is
       | completely reasonable.
        
         | throwmeariver1 wrote:
         | The problem is not with the services itself it's more in
         | regards of infrastructure, backups etc. If you run the services
         | yourself there is not a single point of failure as you would
         | have with a SAAS.
        
           | xyzzy123 wrote:
           | Also team considerations. When you add an important moving
           | part (all your devs will complain bitterly when it breaks and
           | their work will be significantly impacted) you need at least
           | 2 people who know how to keep it running _at a minimum_ , and
           | more would be better.
        
       | krnlpnc wrote:
       | Writing good code is not trivial either, so while at it just buy
       | a SAAS.
        
         | speedgoose wrote:
         | Buying SaaS is not trivial either.
        
           | maccard wrote:
           | I do most of the service work for a team of 25 - half
           | developers and half non technical product people
           | (art/design). Saas is substantially easier than writing it
           | yourself or even running it yourself.
           | 
           | Github costs $4/user/month, sentry costs $25/month, and 2
           | digitalocean k8s clusters costs $20/month. For $175/month I
           | can have a decelopment environment with basically 0
           | maintenance for a team of 25 people, including monitoring and
           | alerting for my app.
           | 
           | Compared to running a local gitlab instance, deploying
           | OpenTelemetry and running my own k8s cluster in aws, it's a
           | complete no brainer to buy SAAS.
        
             | speedgoose wrote:
             | Yes almost everyone does that. But outside of the common
             | SaaS services, all SaaS companies with a "contact us" price
             | I contacted came back with a "fuck you" offer.
        
               | maccard wrote:
               | I've found the exact opposite. I work in games and many
               | of the vendors we deal with don't publicly post their
               | prices, but I've found their pricing fair and
               | competitive. It's not "race to the bottom $4/user/month"
               | but it's usually a fair quote.
        
               | speedgoose wrote:
               | That's nice. I work in the data and cloud industry and
               | everyone thinks their customers are MoneyBagsInc, which
               | is very annoying.
        
               | simfree wrote:
               | Same experience here with fiber and telecom services.
               | They quote crazy prices for services I already have from
               | their competitors at the lower published rates.
               | 
               | These potential vendors have zero or near zero additional
               | cost to activate service (eg: the OptiTap is only a few
               | tens of feet from where an ONT would be install), yet the
               | sales reps call me every few months asking when I want to
               | light up service and aren't ready to even talk price or
               | SLA, despite knowing what we pay their competitor.
        
           | orasiscore wrote:
           | Nothing is trivial
        
           | paxys wrote:
           | It is trivial if you have the budget
        
       | kqr wrote:
       | This is one thing feel intuitively but have trouble arguing. When
       | a solution requires running a server, I mentally count it as much
       | more expensive than if it's done locally and synchronously.
       | 
       | I've noticed not everyone shares this bias, and I'm wondering if
       | I'm unnecessarily conservative or other people are
       | underestimating maintenance costs.
        
         | MereInterest wrote:
         | By "requires running a server", do you mean that it requires
         | you personally to run a server, or that it requires the seller
         | of the solution to run a server? Asking because those are two
         | very different costs to me. For the former, it means that there
         | is time and money investment in running the server, but that my
         | continued use is only limited by my willingness to keep the
         | server up. For the latter, it means that the solution may be
         | end-of-lifed at any time, and I have no control over when that
         | happens.
        
         | jeffalyanak wrote:
         | While there's definitely always a cost to consider when running
         | a server, I think a good portion of that cost can be reduced
         | with the right tooling an expertise.
         | 
         | That's not to say that a experienced team can run an infinite
         | number of arbitrary services without cost or anything like
         | that, though. There may be a select few situations where the
         | cost of deployment and maintenance is negligible, but that's
         | going to be the exception rather than the rule.
        
         | [deleted]
        
         | rhizome wrote:
         | > _I 'm wondering if I'm unnecessarily conservative or other
         | people are underestimating maintenance costs._
         | 
         | Check out some of the HN posts where people talk about how much
         | companies spend on AWS and calculate how many sysadmin/devop
         | salaries those bills could pay for (it is commonly >1).
         | Probably even easier would be to find a company that's about
         | 10-15 years old and see how much their tech spend declined when
         | they switched to the cloud. ;)
        
         | pid-1 wrote:
         | "just run your own git server" comments always make me scratch
         | my head.
         | 
         | IMO folks managing services for personal use vastly
         | underestimate how much harder everything can get in an actual
         | business environment.
        
           | marginalia_nu wrote:
           | A git server is just a computer with ssh access, though. Git
           | itself is designed in a way where it doesn't even need a
           | server.
           | 
           | If you keep it simple, it stays simple.
        
             | jjtheblunt wrote:
             | So what's running on machine X so that it can function as a
             | remote for git pull etc?
        
               | TacticalCoder wrote:
               | > So what's running on machine X so that it can function
               | as a remote for git pull etc?
               | 
               | SSH and Git.
        
             | nijave wrote:
             | Sure but the article isn't talking about a "git" server.
             | It's talking about replacing Github with Gitlab which is a
             | fully integrated developer workflow tool with code review
             | functionality. They use "Git server" a lot in the article,
             | but the opening and closing both specifically mention a
             | Github replacement.
        
               | marginalia_nu wrote:
               | Right, but git itself has no concept of code reviews, if
               | that's what you want, maybe you shouldn't be looking for
               | a git server, but some mechanism for code reviews.
        
               | jeltz wrote:
               | Yes? That is presumably exactly what they are looking
               | for. They just worded it poorly.
               | 
               | The reason people use GitHub and Gitlab is usually not
               | because they want a git server. For that there are much
               | better tools like gitolite.
        
             | maccard wrote:
             | A postgres server is just a computer with 5432 open instead
             | of 22.
             | 
             | > If you keep it simple, it stays simple.
             | 
             | Things that work well for one person in isolation don't
             | work at scale or for teams. How do you handle
             | authentication for your git server? What about backups?
             | Manage disk space? Updates? That's before you get to the
             | point of dealing with workflows and integrations, or "it's
             | slow when 10 people clone at the same time"
        
               | hedora wrote:
               | You need to have port 22 open to manage the postgres
               | server. Postgres also (probably) has a worse track record
               | with remotely exploitable bugs than openssh.
               | 
               | Answering your other questions:
               | 
               | Authentication: ssh keys
               | 
               | Backups: the same way you back up the rest of the machine
               | (s3 snapshots of ebs?), or run a second server at a
               | different site with a cron job that runs "git fetch
               | --all" or whatever.
               | 
               | Manage disk space: It's not the 90's anymore. How are you
               | running a 1TB machine out of disk space with a git repo?
               | 
               | Updates: Enable unattended updates in whatever distro you
               | are running. If you are running a separate backup server,
               | pick more than one upstream operating system (redhat,
               | Debian, arch, BSD), so a botched update won't break both.
               | 
               | Git hooks work fine for workflows and integrations.
               | 
               | Is it really slow when 10 people clone at once? How is
               | that even possible on modern hardware with 100's GB of
               | RAM and dozens of cores?
        
               | marginalia_nu wrote:
               | Every cloned git repository is a backup. That's like the
               | entire point of git. It's decentralized version control.
               | This is why the notion of a "git server" is a bit of an
               | oxymoron.
               | 
               | Most of the things you're struggling to solve are
               | effectively preventing it from actually working as
               | intended.
        
               | Hackbraten wrote:
               | But what would your disaster recovery process look like?
        
               | marginalia_nu wrote:
               | $ git clone ...
        
               | RealStickman_ wrote:
               | You don't want to ask your users for their local backup
               | 
               | Edit: To expand on my comment a bit.
               | 
               | 1. You will have to check with every user when they last
               | pulled their repos and/or made any local change and
               | wanted to push it.
               | 
               | 2. While your git is offline and you're figuring out
               | which version is the most up to date your users can't do
               | any work with git
               | 
               | 3. You just lost all your issues, pull requests, wiki
               | articles and more that isn't stored in git
               | 
               | 4. Making backups is your job as a systems administrator
               | and you just failed spectacularly
        
               | maccard wrote:
               | > Every cloned git repository is a backup
               | 
               | Is it? If I checksum the .git folder on my workstation
               | and my co-workers workstation they're going to come back
               | different. There's no guarantees that I haven't rebased
               | main, or that I have all of the branches that were stored
               | on the remote. If something catastrophic happens to our
               | main remote, which one of our versions do we restore to?
               | 
               | > It's decentralized version control.
               | 
               | Just because git is decentralised, doesn't mean that it
               | can only be used in a decentralised way. How many teams
               | are pushing/pulling like a p2p network, and deploying to
               | servers/clients from their workstations and verifying
               | that the commit hash of their local repository matches
               | what's deployed? A vanishingly small number of people.
               | 
               | > Most of the things you're struggling to solve are
               | effectively preventing it from actually working as
               | intended
               | 
               | If everyone is using it wrong, the tool is wrong. There
               | are billion dollar companies out there that are based on
               | a centralised git service, which proves that people can
               | (and do) use tools in the way that makes sense, not
               | necessarily as they were designed. Personally I'm glad I
               | don't have to share patches over mailing lists with my
               | coworkers, but you do you.
        
               | senko wrote:
               | I believe the core of your argument is that GitHub and
               | GitLab provide more than just git DVCS. I don't think
               | anyone argues with that.
               | 
               | However, this core argument is obscured by a very
               | emotional rejection of what the parent is saying - that
               | you don't _always_ need these additional things, and that
               | you can (sometimes? often?) keep things simple. I think
               | that 's an interesting point to discuss.
               | 
               | > If something catastrophic happens to our main remote,
               | which one of our versions do we restore to?
               | 
               | Dunno, talk it through? I hope you have a good enough
               | relationship with your coworker that you can discuss your
               | work with them.
               | 
               | > Just because git is decentralised, doesn't mean that it
               | can only be used in a decentralised way
               | 
               | The OP not only did not say git can _only_ be used in a
               | decentralised way, they actually mentioned a git server -
               | ie. a central point.
               | 
               | > There are billion dollar companies out there that are
               | based on a centralised git service, which proves that
               | people can (and do) use tools in the way that makes
               | sense, not necessarily as they were designed.
               | 
               | Nobody argued otherwise. _But_ , it is also true that
               | there are billion-dollar companies out there that use an
               | internal git service. How do I know that? Both GitHub and
               | GitLab sell on-premises versions to those types of
               | companies :)
               | 
               | > Personally I'm glad I don't have to share patches over
               | mailing lists with my coworkers, but you do you.
               | 
               | Rationally, this argument is so off it can only be result
               | of an emotional outburst. OP never mentioned sharing
               | patches over mailing lists, and has in fact stated that
               | it's easy to host git server.
               | 
               | I understand and respect your argument and agree GitHub,
               | GitLab and others provide valuable service. But gees,
               | chill out, man. https://xkcd.com/386/
        
               | nemetroid wrote:
               | > There's no guarantees that I haven't rebased main,
               | 
               | You may have rebased your local main branch, but that
               | doesn't affect your origin/main reference.
               | 
               | > or that I have all of the branches that were stored on
               | the remote.
               | 
               | Everytime you pull or fetch, you get all the branches
               | stored on the remote. Of course, you're not going to have
               | any branches that were added after the last time you
               | communicated with the remote.
               | 
               | > If something catastrophic happens to our main remote,
               | which one of our versions do we restore to?
               | 
               | The origin/main that's the most recent.
        
           | nijave wrote:
           | I don't think it's necessarily just "personal use". The
           | complexity quickly scales with the company size. Replacing
           | Github with Gitlab might be a week-long project at a small
           | company but could easily turn into a multi-year project at a
           | large one.
           | 
           | It's not just a technical problem, either. Bigger companies
           | tend to have more expertise-oriented teams (security,
           | compliance, developer tooling, operations, internal
           | infrastructure) which tends to make decisions more difficult
           | than when a single person or team can do it themselves.
        
             | vlunkr wrote:
             | > Replacing Github with Gitlab might be a week-long project
             | at a small company
             | 
             | A week-long project initially. Now you have to install
             | updates, set up and maintain secure access, reboot or
             | troubleshoot when it dies, etc. Installing things is the
             | easy part.
        
               | netizen-936824 wrote:
               | What kind of a company doesn't have at least one
               | sysadmin, I feel like that's kind of a critical position
               | for maintaining systems?
        
               | buffet_overflow wrote:
               | The problem is when they have more services than
               | sysadmins. While the sysadmins are busy upgrading the git
               | server, the logging infrastructure suffers. They pivot to
               | work on that, now the CI/CD server is down/slow/randomly
               | breaking. But the sysadmin that knew the ins and outs of
               | it left last quarter so the new sysadmins don't want to
               | touch it. Oh, and management doesn't prioritize any of
               | this stuff, so actually jumping two versions on the git
               | server is a much bigger, fragile ordeal now than it was a
               | month ago.
        
               | CameronNemo wrote:
               | Exactly.
               | 
               | Our team of 6 sysadmins manages:
               | 
               | - DNS appliances, storage appliances, NTP appliances,
               | 
               | - hypervisors, Dev/stage/prod k8s clusters, some other
               | k8s clusters
               | 
               | - dev/prod Elasticsearch/Logstash/Kibana clusters
               | 
               | - internal GitLab, Jira, Confluence, nautobot, OpenDCIM,
               | a deprecated Twiki
               | 
               | - several internal custom apps
               | 
               | - Probably more I am forgetting.
               | 
               | Nothing gets patched consistently. Everything is
               | neglected to a certain degree.
        
               | rhizome wrote:
               | I'm not going to die on this hill, but that seems like a
               | lot of complexity for a company with in-house skills,
               | maybe even the worst of both (in-house vs cloud/managed)
               | worlds.
        
               | icedchai wrote:
               | You'd be surprised. Most dedicated sysadmins and DBA
               | positions were done away with as part of the "devops"
               | movement, especially as everything moved to the cloud and
               | there were no longer physical systems to maintain.
               | Developers are smart, they can just do that work too,
               | right? It's all just typing. /s
        
               | coward123 wrote:
               | Loads of non-profits and small businesses that have
               | surprisingly demanding tech needs but can't afford /
               | don't understand / can't manage / have only a part-time
               | need, etc.
        
               | netizen-936824 wrote:
               | Its literally essential to have solid tech these days. I
               | don't understand how businesses think they can operate
               | while skimping or even skipping a halfway decently funded
               | IT dept
        
               | WJW wrote:
               | It's not even close to essential, since by far the
               | majority of businesses are running without anything close
               | to what people on HN would consider adequate. It only
               | becomes a problem if too many of your competitors have
               | good tech and even then only if your industry depends on
               | (software) tech. Companies like bakeries and building
               | contractors can run fine with shitty IT and even if their
               | competitors do IT better, no customer is going to drive
               | further to get a bread from someone who has proper
               | backups instead of from the closest shop.
        
               | coward123 wrote:
               | No kidding. I would have preferred to migrate the client
               | to GitHub, but they were convinced that wasn't acceptable
               | - had to run their own GitLab. Turned into a mess of
               | managing Gitlab updates and BS rather than working on the
               | product.
        
           | [deleted]
        
         | gurjeet wrote:
         | I wrote a few browser based small apps at q.ht [1], in the
         | hopes that once written and tested, I will never have to
         | maintain that code or infrastructure.
         | 
         | While most of those tchotchke apps are still still functioning
         | as designed, the PDF generator (that uses pdf.js) in Life-in-
         | Weeks app [2] has somehow broken apart. It doesn't generate the
         | PDFs like it used to.
         | 
         | Despite so much of care and effort put into making these
         | decisions, to not have to maintain/upkeep the software, the
         | utopia remains elusive.
         | 
         | [1]: http://q.ht (served via Github Pages, see
         | https://github.com/gurjeet/q.ht)
         | 
         | [2]: http://q.ht/life-in-weeks-on-one-sheet/
        
         | jasode wrote:
         | _> When a solution requires running a server, I mentally count
         | it as much more expensive than if it's done locally and
         | synchronously._
         | 
         | Are you comparing server vs local desktop software?
         | 
         | The author's article is actually comparing server-SaaS vs
         | server-on-premise (or server-self-managed-cloud-vm-container).
        
           | kqr wrote:
           | I'm comparing anything that requires uptime monitoring to
           | something that does not, in essence.
        
       ___________________________________________________________________
       (page generated 2022-03-26 23:01 UTC)