[HN Gopher] Cloudera taken private for $5.3B, acquires Datacoral...
___________________________________________________________________
Cloudera taken private for $5.3B, acquires Datacoral and Cazena
Author : swyx
Score : 124 points
Date : 2021-06-01 15:12 UTC (7 hours ago)
(HTM) web link (blog.cloudera.com)
(TXT) w3m dump (blog.cloudera.com)
| cosmodisk wrote:
| To all Cloudera's clients: pack your stuff and try to get an
| alternative vendor. Private Equity will not bring anything
| valuable to this business except higher prices,more aggressive
| sales and poor customer service.
| bostonsre wrote:
| After the merger with hortonworks, they already went extremely
| aggressive and I didn't think it was possible for them to have
| an even dimmer future...
| lars_francke wrote:
| If you're looking for an alternative vendor: We've started a
| new company (called Stackable) to build a distribution for all
| these open source "data" tools (e.g. Apache Kafka, Apache NiFi,
| Apache Spark, ...).
|
| I'm a committer for Apache HBase, Apache Hive myself and I've
| been in this space for 13 years now. Yes, the hype is over but
| there are tons of companies using this stuff in production and
| tons of companies choosing it for new projects.
|
| We're trying to tackle the biggest pain points our customers
| had: Lack of flexibility (i.e. locked into specific versions
| for ages), CDH/HDP not built on Infrastructure as Code
| principles, Security is hard to do, ...
|
| The three-sentence (buzzword heavy) technical summary: Our
| distro uses the Kubernetes control plane but we've developed a
| custom Kubelet that runs software using systemd as its backend
| as well as a bunch of operators (all in Rust...). This allows
| us to leverage the best of both worlds and also allows hybrid
| scenarios (part in containers, part on "bare metal"). We've
| replaced Ranger/Sentry with OpenPolicyAgent.
|
| If anyone's interested feel free to reach out to me (E-Mail in
| profile / https://www.stackable.de/en ).
| vasco wrote:
| Your cookie pop-up is in german even in the english website.
| lars_francke wrote:
| Thanks for the heads-up, I'll forward the feedback.
| thirsteh wrote:
| Serious question: Has PE M&A ever led to an improved product?
|
| (Maybe that's just a stupid rather than a serious question)
| markus_zhang wrote:
| Limit that to KKR and you are going to see many "good"
| examples...
| cavejay wrote:
| Dynatrace is a monitoring solution and company that recently
| went public again after being initially taken private by a
| PE.
|
| I believe their offering significantly improved during
| period.
|
| (I was a Professional Services employee for a few years)
| dimitrios1 wrote:
| Dell? Silver Lake played a big role in that IIRC.
| skeeter2020 wrote:
| Dell's approach was actually more like the classic "taking
| a company private again", where you use public equity
| markets to grow big but keep control, then take it private
| at terms that don't really reward shareholders for the
| massive growth. This looks like the modern variety of PE
| capturing predictable revenues from a large, mature client
| base that can pay their fund the expected returns for the
| next 5-7 years. It's boring as hell and never means (a) a
| better product, or (b) a bigger pay-off for employees.
| pinewurst wrote:
| Silver Lake was only a source for money, not "management
| expertise" on that deal.
| bombcar wrote:
| You could make an argument that Berkshire Hathaway is the
| largest M&A firm ever. But it's not really private equity
| (though it's not really public, either).
| missedthecue wrote:
| Hilton hotels in my opinion got much better after the
| takeover
| htrp wrote:
| This 100%
| x86_64Ubuntu wrote:
| As seen with ExtJS/Sencha and TravisCI. I'm sure the story will
| be the same. Chill for about 6 months to 1 year, and then
| layoff all the engineers, layoff the support while shipping it
| overseas and then hope long tail subscriptions and "trapped"
| organizations continue to pay the fees.
| chatmasta wrote:
| The data industry continues to hype this idea of "multi-cloud,"
| but then the "modern data stack" is centralized around a single
| warehouse and nobody sees any irony in that.
|
| The big bet we're making at Splitgraph [0] is that the next
| wave of data engineering will take a more decentralized, "data
| mesh" type approach to enterprise architecture. "Data gravity"
| really does exist - it's expensive to move, in terms of both
| cost and operational complexity. And with increasing
| specialization of analytical databases, a single source of
| truth will become unrealistic. So instead of bringing the data
| to the query, why not bring the query to the data? All we need
| for that is a set of read only credentials. And yes, it should
| also be easy to warehouse your data, but it doesn't need to be
| the default.
|
| Cloudera mentions they bought DataCoral to help with data
| integration and connectors. They've correctly identified the
| problem - data sprawl and fragmentation will inevitably grow -
| but I'm not sure they have the right solution.
|
| Data integration is important, but it's a moving target, which
| is why it calls for a collaborative open source solution. This
| is why so many new startups, like AirByte most recently, are
| coalescing around the Singer taps that Stitch left behind after
| its acquisition by Talend.
|
| We also support using Singer taps to ingest data into versioned
| Splitgraph images [1], so we're excited to see more
| collaboration on maintenance of taps. For us it's a useful
| feature, but it should be just that -- a feature. Is there
| really a need to replicate _all_ of your data before you can
| even query it? Or would you rather experiment by directly
| querying its source?
|
| [0] https://www.splitgraph.com
|
| [1] unreleased and undocumented atm, but it does work. We're
| hiring, especially on the frontend if you want to help build
| the web UI. See profile.
| halflings wrote:
| This comment seems unrelated to the original story or the
| comment you're responding you, mostly reads like an ad.
| bastardoperator wrote:
| Exactly this, don't get squeezed.
| flakiness wrote:
| For a reference:
|
| Hadoop on Google Trends peeked at 2015:
| https://trends.google.com/trends/explore?date=all&geo=US&q=h...
|
| "Hadoop is dead" seems to be a popular topic in past few years.
| https://www.google.com/search?q=hadoop+is+dead
| [deleted]
| asperous wrote:
| Good points of evidence, however you can't read too much into
| the google trends. A lot of technologies peak when they are new
| and then fall to a steady state.
|
| For example "computers" peaked pre-2004:
| https://trends.google.com/trends/explore?date=all&geo=US&q=c...
|
| Javascript:
| https://trends.google.com/trends/explore?date=all&geo=US&q=j...
|
| Machine learning:
| https://trends.google.com/trends/explore?date=all&geo=US&q=m...
| threeseed wrote:
| In this case it's accurate. Hadoop is largely dead.
|
| YARN, Hive, HDFS, MapReduce have been replaced by Kubernetes,
| Snowflake, S3, Spark.
| Fordec wrote:
| And even that will continue to change.
|
| Kubernetes is overused right now, it has its place but it's
| not nearly universally the right tool for the job.
|
| Snowflake will eventually fall to something else due it's
| poor economics.
|
| S3 and Spark though I anticipate to be around for a good
| few years and if they lose out it will be to imitators or
| evolutionary equivalents.
| peterthehacker wrote:
| > Snowflake will eventually fall to something else due
| it's poor economics.
|
| Can you elaborate on this point? What's wrong with their
| model?
| DebtDeflation wrote:
| Yeah, very few companies are running Hadoop clusters on premise
| these days the way many were at least trying to 5 years ago.
| wardb wrote:
| Worked at Hortonworks and post merger Cloudera. Was interesting
| to see how market demand changed over the years and how the
| company worked on re-inventing themselves. Databricks and
| Snowflake seemed to understand SaaS earlier and better though.
| Still have some great friends working at Cloudera, and hope this
| will indeed accelerate a great next phase.
| swyx wrote:
| for the uninitiated, do you mind diffing what Hortonworks used
| to do and what post merger Cloudera is now focused on?
| macksd wrote:
| Worked at Cloudera pre- and post-merger. I thought of on-
| premises CDH clusters (and similarly HDP clusters) as trying
| to be the majority of your data infrastructure, but open so
| that it can integrate with other stuff. It's not just about
| having big data, but one place to store all of that data
| regardless of schema: massive database tables, logs, etc. all
| on shared hardware. AND frameworks to process it different
| ways in-place: SQL queries, Spark jobs, Search, etc. Data
| gravity was very important to the business model.
|
| As more people moved to the cloud, Hadoop-style storage was
| extremely expensive (naively moving your Hadoop cluster to 3x
| replication on EBS volumes would result in a nasty case of
| sticker shock) so the data would move to S3 / ADLS / GCP. And
| now you've lost your data gravity.
|
| Post-merger Cloudera focused less on on-premises clusters and
| tried to offer those same diverse workloads as a multi-cloud
| SaaS, with more focus on elasticity. This is hard because (a)
| there's a massive amount of surface area if you want
| enterprise customers to bring their own accounts, run all
| these managed open-source services in those accounts, and be
| multi-cloud, and (b) you're just competing more directly with
| the cloud vendors, on their turf as both a customer, partner
| and competitor.
| threeseed wrote:
| Would add that HDFS was a particular nightmare to manage.
|
| You had to worry about the size of files since the NameNode
| would be overloaded. Being a Java app running on the older
| JVMs it would do a full GC under heavy load and cause
| failovers. And it was impossible to get data in/out from
| outside the cluster using third party tools.
|
| I remember many companies seeing S3 and just being in shock
| that it was so cheap, limitless and that someone else was
| going to manage it all.
| bpodgursky wrote:
| It's interesting, because I think HDFS (and NameNodes in
| particular) were impressively engineered for a use-case
| which didn't quite materialize -- ie, very fast metadata
| queries (they are still much faster than S3 API calls).
| Turns out that cheap, simple, and massively scalable
| object storage is just far far more important in
| practice.
|
| I think there are still a couple use-cases where HDFS
| dominates S3 (I think some HBase workloads?). But yeah, I
| scaled up and maintained a 2000+ Hadoop cluster for
| years, and I would never choose it over object storage if
| given any plausible alternative.
| macksd wrote:
| This is actually a topic I love to talk about because I
| spent a lot of my time on S3A and the cloud FileSystem
| implementations. Fast metadata queries were actually a
| huge deal for query planning, and of course with
| performance there were a lot of potential surprises on
| S3. HBase was (unsurprisingly) heavily dependent on
| semantics that HDFS has but that are hard to get right on
| object storage, and required a couple of layers to be
| able to work properly on S3 (and even then - write-ahead
| logs were still on a small HDFS cluster last I heard). My
| biggest complaint about S3 was always eventual
| consistency (for which Hadoop developed a work-around -
| it originally employed a lot of worst-practices on S3 and
| suffered from eventual consistent A LOT) but now that S3
| has much better consistency guarantees, I agree: it's
| incredibly hard to beat something that cheap.
| macksd wrote:
| Yeah I would have loved to see HDFS get really scalable
| metadata management. I remember hearing about LinkedIn's
| intentions to really do some significant work there are
| the last community event I attended, but from their blog
| post this week it doesn't sound like that's happened
| since the read-from-standby work [1].
|
| Kerberos (quite popular on big enterprise clusters) is
| really what makes it hard to get data in / out IMO. I see
| generic Hadoop connectors in A LOT of third party tools.
|
| [1] https://engineering.linkedin.com/blog/2021/the-
| exabyte-club-...
| threeseed wrote:
| Before the cloud took off Hortonworks and Cloudera owned the
| Big Data market.
|
| They both offered a Hadoop distribution but had different
| strengths e.g. Hortonworks had fine grained access control,
| Cloudera had a better SQL product with Impala.
|
| Then AWS came along and built their own which was
| significantly cheaper and more flexible as you could easily
| scale your cluster up/down. And so companies moved to it when
| they over time began to move to the cloud.
|
| The Hortonworks/Cloudera response to this threat was to put
| away their differences and merge together.
|
| Over time Big Data has evolved from being Hadoop centric to
| being much more ML/AI focused i.e. not just manipulating and
| querying the data but doing something interesting with it.
| And AWS, Azure, GCP have really jumped in with a whole suite
| of products that are tightly integrated with the rest of
| their cloud offerings. And it's a large part of what
| differentiates their offerings so they compete very hard.
|
| So Cloudera has no choice but to do things that cloud
| providers won't or can't do: (1) focus on non-cloud or multi-
| cloud and (2) offer a much more integrated and cohesive
| solution.
|
| But having spent 10+ years in this space and deployed many
| Hadoop clusters I can tell you that Cloudera is going to
| struggle. Companies that I never thought would move to the
| cloud e.g. banks are figuring out the security and regulatory
| challenges and eagerly moving across. And so it's going to be
| a Cloudera versus Amazon/Google/Microsoft which is an
| impossible fight.
| wardb wrote:
| Competing with Amazon/Google/Microsoft on their own cloud
| is...ehm...good luck with that indeed. I believe they
| should have partnered with them early on (real partnership,
| like a premier offering, not the rubber stamp / marketplace
| partnership).
| manigandham wrote:
| It can work, as that's exactly what Snowflake has done
| and it's one of the fastest-growing SaaS companies today.
|
| A good product is more valuable than a partnership.
| bostonsre wrote:
| AWS EMR is still pretty pricey compared to free
| ambari/cloudera running on ec2. Although, there is a lot of
| time and effort that needed to be put into automation that
| uses those ambari/cloudera hadoop management layers. After
| they merged, they got really aggressive and made moves that
| effectively killed each of the free versions. They
| definitely put another nail in the coffin of hadoop. Spark
| on kubernetes is pretty gorgeous and has been a successful
| route out of pricey hadoop infrastructure for my company.
| swyx wrote:
| thank you! i have no history in this space so can't ask
| followups except to observe that the tendency of ML/AI to
| reward the "big gets bigger" phenomenon is exemplified
| here. I don't feel too great about that but also don't have
| ideas for a better system.
| ralph84 wrote:
| The elephant (no pun intended) in the room continues to be most
| data in the enterprise isn't big data. You don't need mapreduce
| when your data set fits in RAM.
| threeseed wrote:
| This is simply nonsense. Big data has never been just about
| MapReduce.
|
| It has always revolved around the concept of a data lake, with
| data stored as objects, a series of data engineering pipelines
| moving data around and a query engine on top. And in almost
| every enterprise company this is the high level architecture
| you see today.
|
| And this model only continues to grow in popularity as the use
| of siloed SaaS products drives data sprawl and the need for
| tools like Spark, Fivetran etc to move it all back to a
| centralised data lake for analysis.
| kfk wrote:
| Sort of. The issue is coupling data with sql databases doesn't
| play well with even basic inferential statisics so you end up
| replicating data based on analytics use cases. Having parquet
| files in a cheap storage system that can be queried with sql,
| python, R, etc. is very convenient. What I don't get is why so
| little investments are going in making data lakes safer and
| easier to govern with proper access controls
| peterthehacker wrote:
| Yes! And if it doesn't fit in RAM, then most businesses will
| only need an OLAP data warehouse, like Snowflake or Redshift.
| threeseed wrote:
| And if you want to join data from different SaaS and internal
| systems e.g. Google Analytics and a Pega decisioning system.
|
| Are you going to spend months upfront carefully modelling the
| data in order to ingest it making sure to handle schema and
| DQ issues etc. All to support one use case who only needs a
| handful of fields.
|
| No. Which is why data lakes exist. Because it's cost
| effective. You simply dump the data and ask the Engineer or
| Data Scientist building the use case to do the heavy lifting
| rather than a centralised data team.
| peterthehacker wrote:
| There are integration companies that solve this specific
| use-case. I've used Fivetran [0] and highly recommend it.
| They will extract-load data from your SaaS to your
| warehouse and your data scientists can run SQL against the
| tables. Their most popular warehouses are Redshift and
| Snowflake. So you can still use a centralized data
| warehouse without dedicating internal resources to the
| integrations.
|
| [0] https://fivetran.com/
| dgudkov wrote:
| >After all, we invented the whole idea of Big Data.
|
| Really?
| andyxor wrote:
| AWS, GCP and Azure ate Cloudera's lunch.
| bombcar wrote:
| A huge portion of the growth in cloud was the relatively
| cheaper aspect for a VPS vs places that sold physical server
| rackspace. Once the foothold was in there the value-add of
| services on top became a huge revenue generator.
| erhserhdfd wrote:
| This is an interesting move. Many companies are still operating
| on-prem, hybrid cloud or they need to have a multicloud strategy
| (to operate in certain geographies or to avoid vendor lock-in).
| If Cloudera can get to the point where they have a big data
| infrastructure that is competitive with the native offerings from
| GCP, AWS, etc but easily supports on-prem, hybrid-cloud and
| multi-cloud, I would imagine there would be many ready corporate
| customers.
|
| Going private could give them the opportunity to make significant
| RnD investments away from the quarterly demands of a public
| company. On the other hand, the private equity backers could
| enact a bunch of cost-cutting, gut the company, cause its top
| employees to leave and load the company up with debt. It will be
| interesting to see how this shakes out.
| streetcat1 wrote:
| Well. Not.
|
| PE usually is about discipline. I.e. cash based discipline. So
| PE change the debt/equity ratio from 30/70 to 70/30, thus
| enforcing the company to be much more cash efficient. I.e. take
| LESS risks.
|
| To sum up, I do not envision more R & D.
| markus_zhang wrote:
| I think the problem of PE is 1) They need some quick return,
| 2) They don't necessarily understand the details of the tech
| (although they probably have access to some talents who do,
| or claim to do. So usually they just cut cost and make profit
| from it.
| qeternity wrote:
| I hear this argument a lot but with all due respect, simply
| understanding the tech is what got them into this issue.
| They didn't understand the commercial.
|
| PE has some scars, but they don't typically take over
| healthy, well run companies. They take over mismanaged
| companies, and refocus on the pursuit of profit that you
| seem to take issue with, ultimately saving the company from
| bankruptcy.
| jmspring wrote:
| Cloudera / Horton works merger -> President steps down ->
| Cloudera gets taken private. Interesting trajectory over the last
| few years.
| ternaryoperator wrote:
| Agreed. It's astonishing, to me at least, that a company that
| was on the vanguard of both the cloud and big data was so
| incapable of figuring out a way to make real money off either
| one. Their sale today, despite the buyer premium, is well below
| their IPO price.
| Traster wrote:
| Is this surprising? I wouldn't be surprised if the value in
| cloud accretes almost exclusively to the big cloud companies.
| Exit strategies for these kind of enterprises should
| primarily be "get acquired by X cloud provider".
| StreamBright wrote:
| They just proved the hard way, that there is no one size fits
| all data infrastructure based on Hadoop that makes sense
| financially. Most of the value comes from deep understanding
| of data access patterns and having the right solution.
| jmspring wrote:
| One thing to note, and this was seen with the Skype purchase,
| typically contracts as part of a private equity deal limit
| stock issued to lower level employees but also generally have
| an aggressive clawback clause.
|
| I'm too lazy to find the best reference, but this is one re:
| Skype - https://www.businessinsider.com/skype-scandal-silver-
| lake-20...
| t0mas88 wrote:
| The Skype one sounds like it wasn't such a weird deal.
| Those that got fired were rumoured to not get their options
| but in reality they did. Only someone that decided to quit
| voluntarily 13 months in didn't get an equity deal on the
| exit that happened later. And this wasn't a secret clause
| somewhere, those were the terms he had from the beginning.
| sgt101 wrote:
| they got outspent.
|
| they raised a bunch, built a market... then Amazon and Google
| came for them. Impala was great - I was a big advocate, but
| then I worked on BQ for a bit, and now Omni.... Cloudera
| cannot compete.
|
| The only stumble I can identify is that they didn't support
| Spark. They backed the Hadoop side too hard and left Spark
| open to Databricks. They should have signed those guys before
| any other investors got in (told Matie and co "go get offers,
| we will add 30%).
| jayparth wrote:
| WDYM? They were the Hadoop company. You can't just become
| the Spark company, the philosophies of the products are
| very different. This comment is pretty silly.
|
| The "only" stumble I can identify is that they're selling a
| last-generation solution and most companies see Hadoop as
| tech debt nowadays. Which is to say, it's a systemic issue
| with their entire product, not a tiny mistake. This is like
| Mesos vs Kubernetes. One got squashed.
| jsjsbdkj wrote:
| Spark's initial path to success was "a faster way to
| process your data in HDFS". Cloudera was selling users
| Spark before DataBricks was even founded. The idea was
| that Hadoop was an ecosystem of tools for processing data
| built on commodity storage and compute hardware, for when
| your data was too big and expensive to transfer to the
| cloud.
|
| Over time it became increasingly popular to use cloud
| storage instead of running HDFS. This really destroyed
| Cloudera's moat, because there was no operational
| overhead to putting your data in S3 or GCS. You just
| needed to run some stateless compute, and if you fucked
| up it didn't matter. Nowadays your "data lake" is a bunch
| of files in commodity storage someone else runs.
| markus_zhang wrote:
| Clayton, Dubilier & Rice (CD&R) and KKR.
|
| When I see the name of KKR mentioned...
| kolbe wrote:
| I don't even know what they do anymore. They used to make money
| by just raiding their portfolio companies' pensions, but now
| that corporations don't offer private pensions, I just don't
| know. Maybe they're actually improving the companies?
| zx2391 wrote:
| I was under the impression that private equity was mainly a
| vehicle for the rich to get richer. Serious question: Do
| companies have no choice but to go this route or why are they
| doing this?
| hogFeast wrote:
| Not really. PE is mainly a vehicle for smart Wall Streeters to
| fleece credulous and incompetent pension fund managers.
|
| But the latest boom in PE is a function of: higher than usual
| levels of credulity around private vs public performance (but
| PE funds have lower volatility???), and the wave of free money
| coming out of the Fed. In 2010, lots of PE funds got bailed out
| after making unbelievably bad bets with no economic rationale
| (the worst being Blackstone Real Estate, they should have gone
| bust, they now manage $250bn in RE).
|
| Tech companies are hot right now because of Vista Equity's
| numbers. Tech companies appear nominally attractive in cash
| flow terms because so many tech companies pay staff non-cash.
| So you can acquire at an unreasonable multiple, load up with
| debt, and then leave employees holding your bags if it goes
| wrong...self-evidently though, no actual value is being created
| here beyond playing the capital cycle.
|
| Most of the growth in PE is correlated to the growth in
| investors (both in the funds, and financing) who don't
| understand what they are doing. If you look at Europe, private
| equity activity has exploded higher with money-printing from
| the ECB and the growth in direct lending/leveraged loan markets
| (these have gone from 20bn to 150bn in five years or so...where
| it was in 2007, and lots of very unsophisticated private debt
| funds with poor incentives). Shadow banking all over again, no-
| one knows where the money is coming from, no-one knows where it
| is going.
| erhserhdfd wrote:
| Many struggling technology companies will go private to all
| them to substantially invest in RnD. This allows then to make
| those expenditures without having the overhead and distractions
| of having to be a public company where investors are expecting
| short-term returns. Further, these private equity companies may
| then roll up several related businesses in an attempt to create
| synergies through sales or product.
|
| Ultimate software is a recent example of this:
| https://www.forbes.com/sites/antoinegara/2019/03/01/an-insid...
| bpodgursky wrote:
| Yeah, I think there's a pretty big difference between tech
| companies going private (often to double-down R&D investment)
| and mature consumer brands going private (which is usually a
| way to extract profits and milk the carcass dry, see: Toys r
| Us, Olive Garden).
| skeeter2020 wrote:
| this +100. You see the later in manture enterprise software
| a lot more these days and it is not fun to be a part of it.
| You get to watch software development transition from a
| profit generator to a cost center, which changes the game
| completely.
| jeffbee wrote:
| The people with the authority to make the deal get paid huge
| sums for pulling the trigger. After that it doesn't matter to
| them that the company gets digested from the inside out.
| api wrote:
| Stock markets only look a quarter or two in the future. It's
| hard to do anything long-term like invest in R&D unless your
| numbers are so spectacular you can slip it in there. Very few
| public companies have numbers _that_ good.
| flakiness wrote:
| Dell did this at 2013 and seems to be doing fine.
| https://www.dell.com/learn/us/en/uscorp1/secure/acq-dell-sil...
|
| This changed my impression to going-private moves, although in
| Dell's case the buyer is more like a founder with a help from
| the investor.
| htrp wrote:
| Dell was a special case .... going private allowed them to
| avoid shooting themselves in the foot like HP
| gostsamo wrote:
| I'm not an expert so below is only my personal take on the
| matter.
|
| Being private allows for less accountability to people who
| might not be acquainted with the business, want short-term
| profits, or do not share the owners long-term vision for the
| company. All of these allow for more freedom in movements and
| limit the damages to those who understand the risks. On the
| other side, the company has more limited pool of liquidity
| sources and each of the investors might have stronger voice in
| shareholder meetings.
|
| As a result, the public investor might not share the profits of
| the business, but it might not incur losses because of it as
| well. Therefore, private equity is just another business
| organization model which has its pros and cons.
___________________________________________________________________
(page generated 2021-06-01 23:01 UTC)