[HN Gopher] LinkedIn shelved plan to migrate to Microsoft Azure ...
       ___________________________________________________________________
        
       LinkedIn shelved plan to migrate to Microsoft Azure cloud
        
       Author : helsinkiandrew
       Score  : 13 points
       Date   : 2023-12-14 14:04 UTC (1 days ago)
        
 (HTM) web link (www.cnbc.com)
 (TXT) w3m dump (www.cnbc.com)
        
       | helsinkiandrew wrote:
       | 4 years ago: "LinkedIn is moving to Microsoft's Azure public
       | cloud three years after $27 billion acquisition"
       | 
       | https://www.cnbc.com/2019/07/23/linkedin-is-moving-to-micros...
        
       | RoyTyrell wrote:
       | My company has invested in moving to Azure except where we need
       | to stay on Google. Apparently MS gave us a package on all of
       | their products if we use Azure and it was enough to sway the
       | execs.
       | 
       | We were then given the directive that everyone at my level would
       | need to get some certifications so we could properly use Azure,
       | assist the architects and more jr devs. It's a good idea but my
       | god the training is so poorly executed. I want to like Azure but
       | it also seems like an uncoordinated mess.
       | 
       | Maybe I'm just a grumpy dev. Anyone else have a better and more
       | positive perspective? Who has good training for certs such as the
       | Data Engineer or AI Engineer?
        
       | green-eclipse wrote:
       | Microsoft bought Hotmail back in 1997. Hotmail was powered by
       | Unix servers until 2004, despite MS's best efforts to transition
       | to their own Wintel-powered backend [0]. These things take time.
       | 
       | [0] https://news.softpedia.com/news/Windows-Live-Hotmail-Was-
       | Pow...
        
       | yellow_lead wrote:
       | > LinkedIn was having a hard time taking advantage of the cloud
       | provider's software. Sources told CNBC that issues arose when
       | LinkedIn attempted to lift and shift its existing software tools
       | to Azure rather than refactor them to run on the cloud provider's
       | ready made tools.
       | 
       | I think I need this translated back into tech-speak.
        
         | asylteltine wrote:
         | That is tech speak. They tried to redeploy existing
         | architecture into azure and it failed
        
           | ghaff wrote:
           | The headline notwithstanding, this doesn't seem like anything
           | particularly Azure specific. They'd likely have had many of
           | the same issues trying to mostly lift and shift to any of the
           | big public cloud providers.
        
         | eitally wrote:
         | ELI5: For any sufficiently complex enterprise system (e.g.
         | LinkedIn, or Google), any plain vanilla architecture is
         | infeasible for lift & shift. Moreover, the vanilla services may
         | not comply with internal security requirements, or play nicely
         | with internal CI/CD tools, or internal databases / data
         | structures / data processing pipelines / analytics.
        
           | mecsred wrote:
           | No five year old is going to understand a single sentence of
           | that.
        
             | calvin wrote:
             | ELI40YO Engineer
        
               | that_guy_iain wrote:
               | Still need it dumbed down a bit /s
        
             | maronato wrote:
             | You spend a week building a castle with legos, and suddenly
             | your mom asks if you can change some parts to use the new
             | <lego competitor>. You can try to make the old and new
             | parts fit together, but it isn't going to be easy most of
             | the time, and you can't be certain that the lego competitor
             | will have the same pieces or do the same things as your
             | lego version. By the time you are done redoing those parts,
             | you'll end up having to recreate large portions of your
             | castle to make everything work together again, and even
             | then you might miss something important that breaks a
             | functionality of your castle.
        
             | mypetocean wrote:
             | Maybe in this case "ELI5" stands for "Explain it like I
             | have 5 yrs experience in software development but gave up
             | in year two"?
        
             | eitally wrote:
             | Arguably, if the five year old was reading hacker news,
             | they might. :) Point taken, though, but honestly this
             | doesn't seem like the place to simplify things to quite to
             | 5Yo level.
        
           | asmor wrote:
           | That's all correct in theory, but in my experience these
           | things still happen and are usually outgrowths of bad
           | engineering culture / shadow IT / not wanting to be reliant
           | on your cloud infra / platform team (often for irrational
           | reasons, sometimes not). They get built with entire teams
           | taking responsibility on paper, but then before you know it,
           | nobody from that team still works at the company or on that
           | team. Usually these systems are also GDPR nightmares if they
           | contain user data, because these people don't understand when
           | you tell them they need to have a plan for deleting user
           | data. They don't even consider it a legal barrier, they think
           | you're putting stones in their way.
           | 
           | I've been on enough Cloud Archeology expeditions into the
           | land of VMs where nobody knows what they do, it might as well
           | be my job title now.
        
           | _rutinerad wrote:
           | My five-year-old loves her complex enterprise system.
        
         | sqeaky wrote:
         | Is that supposed to make it seem better?
         | 
         | If refactoring is too hard for a Microsoft owned company what
         | am I to think about my tech stack?
        
           | bostik wrote:
           | Beyond ludicrously small systems, refactoring of _live 24 /7
           | production systems_ is never easy.
           | 
           | Reality has a surprising amount of detail, and any non-
           | trivial, customer-facing system will have accumulated weird
           | code paths to account for obscure but nonetheless expensive
           | edge cases. A codebase built across >20 years, scaled to
           | support millions of concurrent users is going to be
           | absolutely filled to the brim with weird things.
           | 
           | When you add the need for live migrations with zero downtime,
           | done every few years to account for next order of magnitude
           | loads, you end up with a proper Frankenstein's monster. It's
           | not called "rebuilding an airplane while flying" for a lark.
           | 
           | Every round includes a long, complex engineering effort of
           | incremental live migration. With parallel read/write patterns
           | between old and new systems, and all their annoying semantic
           | differences. And then, to add insult to injury, while your
           | core team was going through the months-long process of
           | migrating _one_ essential service, half a dozen upstream
           | teams have independently realised they can depend on some
           | weird side effects of the intermediate state and embedded its
           | assumptions to a Critical Business Process[tm] responsible
           | for a decent fraction of your company 's monthly revenue.
           | Breaking their implicit workflow will make your entire
           | company go from black to red, so your core team is now
           | saddled with supporting the known-broken assumptions.
           | 
           | Then you get to add wildly differing latency profiles to the
           | mix. While you were running on your own hardware, the worst-
           | case latency was rack-to-rack. Implicit assumptions on
           | massive but essential workloads may depend, unknowingly, on
           | call-to-call latencies that only rarely exceed 100
           | microseconds. In a cross-AZ cloud setting you may suddenly
           | have a p90 floor of 0.2ms. A _lot_ of software can break in
           | unexpected ways when things are consistently just a _little
           | bit_ too slow.
           | 
           | Welcome to the wonderful world of distributed systems and
           | cloud migrations. At some point the scars will heal.
           | Allegedly.
        
         | Raed667 wrote:
         | I think this is an awkward way of saying they tried to add an
         | abstraction on top of their AWS dependencies so that their
         | services would work on Azure without a refactor.
        
           | bbarnett wrote:
           | They don't use AWS, but primarily baremetal.
        
             | bbarnett wrote:
             | Also, this reminds me of the time Microsoft bought Hotmail,
             | and couldn't port it to WinNT. They had to leave it on its
             | BSD variant for a long time, NT couldn't handle it.
        
               | cplusplusfellow wrote:
               | I'm surprised it can today.
        
               | Scoundreller wrote:
               | Arstechnica forum discussion on the topic in 2001:
               | 
               | https://arstechnica.com/civis/threads/i-thought-hotmail-
               | was-...
        
         | wharvle wrote:
         | "Lift and shift" is a term for when you move to "the cloud" but
         | really just replace your physical servers with clones in cloud
         | VMs. It's a relatively cheap (in terms of effort) way to get on
         | "the cloud" but gains you basically zero of the benefits. The
         | term's in wide use, talk to anyone involved with cloud-anything
         | and they'll be familiar with it.
         | 
         | I'm not sure what else needs to be translated? Nothing, I
         | think?
        
           | mmcgaha wrote:
           | Lift and shift is a sales term to make it sound like the
           | internal team is trying to over-complicate the migration. The
           | sales guy will normally phrase it as "just lift and shift."
        
             | neilv wrote:
             | I love it. I could totally believe the etymology starts as
             | slick sales persuasion trying to downplay the
             | implementation difficulty of something that's being sold.
             | 
             | And then people also pick it up for non-persuasion, because
             | it also sounds like a catchy name for an engineering
             | approach we already had.
             | 
             | Of course it can still be used for persuasion for awhile,
             | but will grow baggage over time, as efforts linked to the
             | term don't play out that way.
        
           | neilv wrote:
           | Thanks for the explanation, but no need to imply that someone
           | is out of the loop if they didn't know it.
           | 
           | The term didn't sound familiar to me (though the concept
           | was), and the term might not have been familiar to some
           | others.
           | 
           | People might not want to contradict an assertion because of
           | language like "The term's in wide use, talk to anyone
           | involved with cloud-anything and they'll be familiar with it.
           | [...] I'm not sure what else needs to be translated? Nothing,
           | I think?"
        
             | wharvle wrote:
             | > People might not want to contradict an assertion because
             | of language like "The term's in wide use, talk to anyone
             | involved with cloud-anything and they'll be familiar with
             | it. [...] I'm not sure what else needs to be translated?
             | Nothing, I think?"
             | 
             | > > LinkedIn was having a hard time taking advantage of the
             | cloud provider's software. Sources told CNBC that issues
             | arose when LinkedIn attempted to lift and shift its
             | existing software tools to Azure rather than refactor them
             | to run on the cloud provider's ready made tools.
             | 
             | The only other terms I can see that are jargon are "cloud
             | provider" and "refactor", and those are already technical
             | (more or less) so don't need to be translated into
             | technical language.
             | 
             | As for the other bit, I just meant that it's a widely-used
             | term so one may continue to encounter it in these contexts.
             | It truly is ubiquitous in discussion of and around
             | "enterprise transformations" to the cloud, and among cloud
             | practitioners more generally, so anyone connected to that
             | space will know what it means. It's also _kinda_ already a
             | technical term, in that developer /devops and SRE sorts
             | throw it around and do mean a specific thing by it, which
             | doesn't need to be translated for other technical folks in
             | that area.
        
               | neilv wrote:
               | "Ten Thousand" https://xkcd.com/1053/
               | 
               | The original person might've instead asked for an
               | explanation in a way that didn't come across as
               | criticizing the article.
               | 
               | But probably best not to insist that everyone should
               | already know the term; just explain it.
        
               | wharvle wrote:
               | Yeah, you're probably right. Feedback received.
        
         | vergessenmir wrote:
         | Lift and shift is a cloud migration strategy which involves
         | moving your applications to the cloud with little to no
         | modification. For example, you have an application running on a
         | server in your data-centre, you then deploy a VM in the cloud
         | with a similar spec and install the application.
         | 
         | It's usually done to avoid the engineering cost of making the
         | services more cloud native. What tends to happen a lot is that
         | after a considerable portion of the migration is completed, the
         | cost of the lift-and-shift effort start to overtake the
         | savings, and the projected costs, dwarf the future savings.
         | 
         | I suspect this is what happened with Linkedin.
        
           | Agingcoder wrote:
           | which savings ? It's never been obvious to me that cloud was
           | cheaper if you're a large company
        
             | lobsterthief wrote:
             | It's easier to scale cloud infrastructure.
        
               | jvm___ wrote:
               | Even if you never need to scale it's cheaper to not have
               | to physically maintain your own data center. If all the
               | broken server, building power, building internet, access
               | control, real estate costs... are all handled in the
               | cloud there's savings there as well.
        
               | NegativeK wrote:
               | But those costs don't go away -- the cloud provider is
               | going to charge you for them, along with a premium for
               | profit?
               | 
               | I'm used to organizations moving out of the cloud when
               | they realize that it's more expensive if you don't have
               | very peaky load demands.
        
               | marcolussetti wrote:
               | But that's somewhat negated if you lift and shift,
               | because your application is not designed to leverage that
               | capability in that way.
        
             | campbel wrote:
             | Compute is at a premium, but you can shift opex/capex
             | around which might be more suitable. It can also be cheaper
             | in headcount since you need fewer operators and less
             | expertise in datacenter operations.
        
               | adolph wrote:
               | > you need fewer operators and less expertise in
               | datacenter operations
               | 
               | Because you are paying someone else for them.
               | 
               | This is considered rational because those operators are
               | presumably more productive in a pool of people using
               | similar skills to support many customers rather than just
               | one. It is similar to hiring a cleaning service rather
               | than employing individual cleaners in a department of
               | cleaning because cleaning things is not a core competency
               | of business.
               | 
               | It might be less irrational if some amount of compute is
               | part of the core competency of the business. Since
               | "software is eating the world," compute is a core
               | competency of all businesses except for the ones that
               | don't realize it yet.
        
               | wharvle wrote:
               | > It can also be cheaper in headcount since you need
               | fewer operators and less expertise in datacenter
               | operations.
               | 
               | I've not _really_ seen this work out well. I think it
               | might be true for simple set-ups, letting a tiny
               | developer team also handle infra and support without
               | going nuts doing it, if they set it up that way from the
               | beginning, but more-complex setups always seem to have so
               | damn many sharp edges and moving pieces that support ends
               | up looking similar to what a far more DIY approach (short
               | of building one 's own datacenter outright) would, in
               | terms of time lost to it.
               | 
               | ... and so does downtime, for that matter.
        
             | asmor wrote:
             | It's at least more predictable. You don't pay for staff
             | with datacenter skills (sort of in short supply) and you
             | don't need to make large investments early on to build the
             | datacenter and you don't have a huge headache if you need
             | to scale up or down operations.
        
             | hibikir wrote:
             | It really depends on workloads. Imagine you need massive
             | spikes of compute for, say, flash sales, or people watching
             | the superbowl in your streaming service. Buying all that
             | hardware for just the spikes might not make sense vs just
             | scaling up vms in a cloud provider and scale them down.
             | 
             | In the real world, for baseline load, the big advantage for
             | many large companies isn't price, but the massive lack of
             | alacrity of many inhouse ops teams. If it takes me 3+
             | months to provision compute for the simplest, lowest demand
             | services (as is the custom in many large companies full of
             | red tape and arguments about who bears costs), letting
             | teams just spin up anything they want and get billed
             | directly is often a winner, even if it's more expensive.
             | Having entire teams waste months before they can put
             | something in prod is a very different kind of expense in
             | itself.
        
             | maccard wrote:
             | The simplest example is if you have on-prem hardware, you
             | need to have capacity for your peak load. In a lift and
             | shift, you would replace your fleet of 96 core xeons with a
             | fleet of 96 core xeons in AWS.
             | 
             | The cloud native approach would be to modify your app so
             | that it can be scaled up and down so you keep a few
             | machines always running, and scale up and down with your
             | traffic so you only run at capacity when you need it.
        
               | wredue wrote:
               | This doesn't demonstrate anything about the savings.
               | 
               | Anecdotally, when my previous company was looking at
               | costs, cloud unequivocally came out significantly more
               | expensive, and that wasn't even a large company (only
               | 2,000 or so employees).
               | 
               | I will grant that we did not have globalization problems
               | to solve (but I'd also wager that lots of businesses
               | prematurely "what if" this scenario anyway).
        
               | maccard wrote:
               | > This doesn't demonstrate anything about the savings.
               | 
               | If you neeed 4 CPUs for your peak load for 4 hours per
               | day, and only 1 of them for the other 20 hours a day, you
               | can save by scaling down to 1 cpu for 85% of the day.
        
           | bee_rider wrote:
           | Although, it must be unusual, right? This is not one company
           | porting their service to the cloud, this is Microsoft porting
           | their LinkedIn service from whatever servers came along with
           | LinkedIn, to their own servers, on which they also run a
           | cloud business.
           | 
           | Which... isn't to say anything about which way we should
           | expect that to swing things. But it seems quite unusual, as
           | most companies have not been bought by a cloud provider.
           | Yet...
        
             | random42 wrote:
             | > this is Microsoft porting their LinkedIn service from
             | whatever servers came along with LinkedIn, to their own
             | servers
             | 
             | Nope, LinkedIn executes completely independently.
        
           | hinkley wrote:
           | If your architecture is chatty enough, you will be sharding
           | things so that most traffic stays in one rack, room, or data
           | center.
           | 
           | If you treat us-west-1 as a single data center, you may find
           | you are spending a lot on traffic between AZs.
           | 
           | A lift and shift might treat us-west-1 like a single data
           | center. A more sophisticated strategy might treat it as
           | three.
        
         | mdeeks wrote:
         | There is no such thing as "lift and shift". It is something
         | Azure account reps like to say to make it sound like moving is
         | easy. It sounds like you're picking up some boxes from one side
         | of the room and moving them to the other. When in reality
         | you're rewriting your infra code mostly from scratch.
         | 
         | When we were acquired by MSFT we had the same project. We had
         | to move from AWS to Azure. I made them all stop saying "lift
         | and shift" because in reality it is "throw away all of your
         | provisioning code and rewrite it using Azure primitives which
         | don't work the same way as AWS ones".
         | 
         | It is more akin to writing an iOS app to work on Android.
        
           | axus wrote:
           | I'm gonna bet that many Azure customers had no such thing as
           | "provisioning code".
        
           | asmor wrote:
           | To be fair, AWS also used the exact term when we moved a
           | project out of a tiny expensive to operate (though lack of
           | scale) datacenter that only hadn't been retired because we
           | had a 30+ year old COBOL app suite on a z system.
        
           | arielcostas wrote:
           | But lift and shift is not that, is it? It's having
           | applications running directly on OSs (without
           | containerisation or separation of dependencies like the
           | database or physical disks) and moving it to "the cloud" to
           | be ran on a VM in the same fashion.
           | 
           | I mean, if you're already with AWS using their services
           | (besides EC2 for hosting) such as RDS or S3; moving to Azure
           | SQL (or DB for MySQL or whatever) and Blob Storage is not
           | just lift-and-shift anymore, since you are actually changing
           | from a cloud provider to a different one.
           | 
           | AFAIK an actual migration to the cloud would involve
           | rewriting some parts of the application to be cloud-native,
           | such as using Service Bus for queues instead of a local
           | Redis/RabbitMQ instance, using GCS instead of local disks,
           | and using RDS instead of hosting your own single MySQL
           | server.
        
             | oasisbob wrote:
             | There's no formal definition of "lift and shift", certainly
             | nothing that would dictate specific virtualization
             | strategies.
             | 
             | I've always read it as being roughly analogous to "like for
             | like," and dependent on the specific circumstances and
             | status quo.
        
           | oasisbob wrote:
           | "Lift and shift" isn't just an Azure-specific phrase. Many
           | people use it pejoratively, and point to it as an anti-
           | pattern, and something to avoid.
           | 
           | Similar terminology is "forklift"... been hearing that one
           | for well over a decade.
           | 
           | Migrations are oftentimes an opportunity to revisit scaling,
           | configuration, build and deployment pipelines, platform
           | primitives, etc. Every migration I've been involved in has a
           | (probably necessary) tension between getting the job done
           | efficiently, while not repeating all the mistakes of the
           | past.
        
             | hinkley wrote:
             | "Lift and shift" came into the conversation once we started
             | talking about how we were paying too much for AWS. The
             | obvious stuff was things like less bin packing, and
             | bandwidth for third party services, like telemetry
             | dashboards.
             | 
             | And it's not just the service fees. I blanche to think of
             | the opportunity costs we accrued by focusing for that long
             | on infrastructure to the exclusion of new product and
             | features. It's truly breathtaking.
             | 
             | And then there's the burnout, and the ruffled feathers.
        
               | oasisbob wrote:
               | I've become convinced that most migrations are absolute
               | losers in terms of opportunity costs.
               | 
               | Even if done skillfully with valid rationale, they don't
               | show any value until you come out the other side
               | successfully.
        
               | hinkley wrote:
               | Definitely. We migrated to a new telemetry vendor and I'm
               | pretty sure it'll take 10 years for us to recoup the cost
               | savings in man power and opportunity cost.
               | 
               | They were worried the old vendor might go under. My own
               | track record with predicting company failures is pretty
               | bad, so I suspect they'll still be around ten years from
               | now.
        
         | ben_jones wrote:
         | It could mean multiple things. My guess is they used vendor
         | specific services that don't translate as well as the basic
         | build blocks like vanilla S3/ec2
        
       | upon_drumhead wrote:
       | I wonder how the GitHub to Azure migration is going
        
         | redrove wrote:
         | I have it on good authority they're trying a lift and shift too
         | and it's not going well, at least as of ~9mo ago.
        
       | duxup wrote:
       | Any move to any cloud is going to depend on the environment
       | you're coming from. I've been in on decisions not to use X, Y, Z
       | ... doesn't mean there was anything wrong with them, we just
       | weren't ready for that yet or had different priorities or the
       | ever present weird deal-breaker issue / requirement.
        
         | brodouevencode wrote:
         | Exactly. The cost to retool for the cloud is not insignificant.
        
         | konschubert wrote:
         | You have to
         | 
         | 1) Be comfortable routing traffic between on-prem and cloud
         | over the internet or at least over a VPN, and
         | 
         | 2) avoid the temptation to build your own platform (Terraform
         | templates are a liability, not an asset!) and
         | 
         | 3) Move tiny stuff first and bigger stuff later.
         | 
         | It's amazing how many companies fail at that.
        
           | bobthepanda wrote:
           | Doing things the right way and learning from before usually
           | requires a certain degree of humility, and more often than
           | not the person leading these projects is either required to
           | be, or is deluding themselves to be a hotshot who can succeed
           | in a bold new way.
        
             | konschubert wrote:
             | You mean somebody who posts 3 bullet points on the internet
             | and claims that they solve everything ? ;)
        
               | bobthepanda wrote:
               | I mean, we invented the term Promotion Oriented
               | Architecture for a reason.
               | 
               | It's like politics; the best person for the job is a
               | sufficiently experienced person who does not want it.
        
       | mfer wrote:
       | This doesn't appear to be about Microsoft's cloud but rather
       | Public Cloud.
       | 
       | The whole migration of LinkedIn from their own data centers to
       | the public cloud (Microsofts) isn't going well.
       | 
       | It appears they are still going to operate on-premise for many
       | things. Some things moving or have moved to the public cloud.
       | 
       | Isn't this more a shot at the public cloud for all the things
       | than to any specific one?
        
         | that_guy_iain wrote:
         | I don't see anything that points to it being a general public
         | cloud issue. And instead they talk about Azure software
         | specifically as something that they couldn't take advantage of,
         | no?
        
           | mlhpdx wrote:
           | I would not assume that it is a specific Azure problem from
           | that statement. Many, many teams struggle to take advantage
           | of cloud infrastructure because of habits and knowledge
           | retained for operating the existing systems.
           | 
           | It's possible given what they have, I t's simply best to keep
           | it on premise - at least to some degree. That would likely
           | not be true with a successful re-architecture, but not
           | everyone is up for that.
        
             | mfer wrote:
             | It may not be about the teams. For example, when you
             | control the data center you can do certain things around
             | performance and scale you can't do in a public cloud.
             | 
             | There are so many unknowns about how things are setup that
             | it's hard to know.
        
         | carimura wrote:
         | Yes I came away with the same thing. It's The Register's modus
         | operandi to use cheeky clickbait titles.
        
       | femiagbabiaka wrote:
       | Most cloud migration projects at large companies fail. It usually
       | takes 3 or 4 tries at least before all the necessary lessons are
       | learned.
        
       | oooyay wrote:
       | Maybe I'm a bit contrarian on this one but once I saw data
       | center, Azure, and the phrase "lift and shift" it filled in a lot
       | of context for me. I spent a lot of my early to mid career
       | participating in these strategies. They don't work. VM images
       | almost always are different in some way, there's something one
       | vendor provides that another doesn't - in general there's enough
       | minute details that add up to make a series of mini-mountains in
       | terms of blockers.
        
         | jeffbee wrote:
         | Yep, there are always differences. Just one thing I stumbled
         | into recently was one of our program images that has long
         | worked fine in AWS can't start in Azure because something their
         | hypervisor does to the virtual address layout conflicts with
         | the way that we remap .text to a huge page. It is both trivia
         | and a showstopper.
        
         | pphysch wrote:
         | Yeah, there is a vast gulf between "it works for us" and "every
         | dependency was implemented strictly according to open standards
         | and is therefore seamlessly portable". See also the joke of
         | migrating between "SQL" databases.
        
       | lgkk wrote:
       | How do you move that much data over to another cloud provider?
       | 
       | Without losing data or disrupting the customer?
       | 
       | Or do the databases just stay in the data center and not migrate.
        
         | wharvle wrote:
         | Live replicas (perhaps initialized with a cold backup,
         | initially, if the dataset's _really_ huge), carving off parts
         | of it for separate migration if that 's at all feasible, and
         | some expensive folks doing a lot of butt-clenching-worthy
         | activity for an hour or two (unless it goes very poorly...) for
         | the final cut-over, some evening.
        
         | ThomasMoll wrote:
         | We (when I worked at LinkedIn) did it with ETL clusters, we
         | already had built them out for moving data between datacenters
         | nightly. They would mirror an HDFS cluster, then ran batch jobs
         | to transfer either directly to the outbound cluster or to
         | another ETL cluster in another DC.
         | 
         | We used one of our ETL clusters to ship data to MSFT for
         | various LinkedIn integrations, like seeing LinkedIn profile
         | information in Outlook or Office products.
        
       | lumost wrote:
       | It's incredibly difficult for a mature software business to
       | justify infrastructure and tooling investments. This is why we
       | think that startups are a haven for modern tooling and the
       | largest legacy firms are ... well ... difficult.
       | 
       | The last 15 years possibly broke this rule by virtue of low
       | interest rates, enabling the justification of large internal
       | teams focused on modernization efforts which sometimes went as
       | far as moving the state of computing forward.
       | 
       | I wouldn't be surprised to see legacy enterprises return to form
       | now that interest rates are 7%
        
       | ksec wrote:
       | I wonder what sort of scale do LinkedIn operate in terms of
       | Server count.
       | 
       | And Github also under Microsoft seems to be doing fine with on-
       | prem as well. Why force LinkedIN to use Azure?
        
         | wredue wrote:
         | If I had to guess, there are hordes of businesses out there
         | that maintain operations on prem, and a large lift like this is
         | great for the resume.
         | 
         | Of course, I could also be entirely wrong, but I also am not
         | going to pretend that IT resume padding then jumping ship and
         | leaving a shart of an architecture behind doesn't happen all
         | the time in this industry.
        
         | astockwell wrote:
         | $$$
        
         | rdoherty wrote:
         | When I was there it was in the low hundreds of thousands.
         | Probably more as growth was still in double digit percentages
         | per year of user base.
        
           | ksec wrote:
           | >When I was there it was in the low _hundreds of thousands_.
           | 
           | Blows my mind every time I see these kind of numbers.
        
       | thenewwazoo wrote:
       | I'm obviously not going to comment on anything internal, and I'm
       | obviously speaking for myself and not the company, but it's worth
       | bearing in mind that this migration was not from "on-prem" in the
       | traditional sense. LinkedIn has its own internal cloud, complete
       | with all the abstractions you'd expect from a public cloud
       | provider, except developed contemporaneously with all the _rest_
       | of the  "clouds" everyone is familiar with. It was designed for,
       | and is tightly coupled to, LinkedIn's particular view on how to
       | build a flexible infrastructure (for an extreme example, using
       | Rest.Li[1], which includes client-side load balancing).
       | 
       | There was no attempt to "lift-and-shift" anything. There are
       | technologies that overlap and technologies that conflict and
       | technologies that compliment one another. As with any huge
       | layered stack, you have to figure out which from the "LinkedIn"
       | column marry well with those in the "Azure" column.
       | 
       | I personally appreciate LI management's ability to be clear-eyed
       | about whether the ROI was there.
       | 
       | [1] https://linkedin.github.io/rest.li/
        
         | dfxm12 wrote:
         | Yeah, based on my own experience with AWS and Azure (that has
         | nothing to do with Linked In), my immediate reaction to the
         | headline was, "well, you can be keen on Azure, but "stuck" on
         | AWS for a myriad of other reasons". Reading the article pretty
         | confirmed it.
        
         | foobarian wrote:
         | Oof, I'm twitching just reading that because we're in exactly
         | the same boat. The problem with the ROI is that any kind of
         | not-self-run cloud is guaranteed to be more expensive in direct
         | costs. This has been shown time and time again for any
         | reasonably large enterprise. However, there is a long list of
         | things that are hard to express in money that support a cloud
         | move, mostly to do with keeping up with modern tech, hiring,
         | DR, better resiliency, etc. and so the decision can be quite
         | dependent on the particular execs in the chain of command and
         | their subjective values.
        
           | sargun wrote:
           | This is based on the assumption that Azure has modern tech,
           | hires well, DR, and better resiliency than LinkedIn's "cloud"
           | for LinkedIn's needs. There's a bit of a problem around
           | incentives here, where Azure is built to sell to Azure's
           | customer base, whereas LinkedIn has evolved their own stack
           | over the years.
           | 
           | The questions become:
           | 
           | 1. Does it make sense to dump our special features in the
           | stack, or move them to a higher level in the stack? 2. Does
           | Azure have comparable capabilities for the LinkedIn stack? 3.
           | Is LinkedIn worth it to Azure to sell to?
           | 
           | ---
           | 
           | Often times, "at scale", you can support custom solutions
           | outside of cloud providers that are purpose-built, and often
           | times more resilient and efficient than the cloud providers.
           | 
           | AWS has taken a very interesting approach of building an
           | incredibly wide set of solutions to support every customer
           | under the sun, and their approach to being "customer
           | obsessed" leads to them building super niche solutions if the
           | deal is worth it.
           | 
           | I'm not sure how Google and Azure handle these engagements.
        
           | ljm wrote:
           | It's not really 'the cloud' as much as it's a managed
           | mainframe you allocate resources from. Only it's actually
           | quite expensive to allocate resources but it becomes more
           | palatable with a monthly bill compared to setting up on-prem.
           | 
           | Costs more money but easier on the cash flow.
        
       | tjpnz wrote:
       | Sounds like they would've faced a similar set of issues moving to
       | AWS or GCP.
        
       | miguelazo wrote:
       | There are plenty of issues with Azure, but LinkedIn is hardly at
       | the vanguard of innovation. And that was still the case before
       | Microsoft vastly overpaid for it.
        
         | joshhart wrote:
         | I left LinkedIn 1.5 years ago. I was there 12 years. I saw the
         | revenue & profitability growth that occurred post acquisition.
         | I am very very confident LinkedIn would be worth north of $100B
         | on public markets today and Microsoft made the acquisition for
         | $26B. You might argue that in the subsequent 6 years post
         | acquisition that wasn't enough growth and they should have
         | bought back shares instead but it was completely a debt
         | financed acquisition and very high ROI for Microsoft.
        
       | joshhart wrote:
       | This was cancelled over a year ago - which the articles notes and
       | is old news. It was clear the effort would have needed a very
       | significant push that would have required a large halt in product
       | development and management wasn't willing to stomach it due to
       | high growth in 2020/2021. Which made sense. But LinkedIn revenue
       | growth has heavily slowed with the pullback in tech hiring and
       | they had the space to do it and consider it optimization time.
       | 
       | Also as part of Blueshift the plan was to do batch processing
       | first but LinkedIn had a culture belief in colocation of batch
       | compute & storage, which is against the disaggregated storage
       | paradigm we see now. IMO this led to some dragging of feet.
       | 
       | Source: Worked at LinkedIn 12 years, am a director at Databricks
       | now.
        
         | ThomasMoll wrote:
         | Not only that but the Hadoop team literally had the guy who
         | wrote the original HDFS whitepaper. Moving a service with that
         | much in house expertise first never made sense. I worked on one
         | of the original Azure PoCs for Hadoop, even before Blueshift
         | and it was immediately clear that we operated at a scale that
         | Azure couldn't handle at the time. Our biggest cluster had over
         | 500PB and total we had over an exabyte as of 2021 [1]. It was
         | exorbitantly expensive to run a similar setup on VMs, and at
         | the scale that we had I think it would have taken over 4,000 -
         | 5,000 separate Azure Data Lake namespaces to support one of our
         | R&D clusters. I believe most of this "make the biggest cluster
         | you can" mentality was a hold over from the Yahoo! days.
         | 
         | [1] https://engineering.linkedin.com/blog/2021/the-exabyte-
         | club-...
        
       | hasty_pudding wrote:
       | As someone who worked at LI.
       | 
       | They spent years and god knows how many millions TRYING to move
       | to Azure with the Blueshift project..before pulling the plug.
       | They hired armies of contractors.
       | 
       | They didn't stop by choice.
       | 
       | They stopped because their tech stack is a giant over engineered
       | unmovable turd.
        
         | bbkane wrote:
         | As a current employee, there's things I don't like, but the
         | infrastructure is more custom than bad (far better than my last
         | job)
        
       ___________________________________________________________________
       (page generated 2023-12-15 23:01 UTC)