[HN Gopher] Building data infrastructure that will last
___________________________________________________________________
Building data infrastructure that will last
Author : andyjohnson0
Score : 114 points
Date : 2024-08-11 10:24 UTC (12 hours ago)
(HTM) web link (seattledataguy.substack.com)
(TXT) w3m dump (seattledataguy.substack.com)
| kerkeslager wrote:
| I work in a different domain (full stack development), but I
| think the principle here applies broadly.
|
| I tend to favor tools that have been around for a long time. A
| lot of the sites I have built have deployment scripts written in
| bash with dependencies on apt packages and repos, git to pull
| application code, and rsync to copy over files. It would probably
| be okay to update to zsh at this point. ;) I'm constantly shocked
| by the complexity of deployment infrastructures when I get into
| new projects: I've spent plenty of time working with Docker and
| Kubernetes and I have yet to see a case where these simplified
| things. As a rule, I don't throw out existing infrastructure, but
| if I'm doing greenfield development I never introduce containers
| --they simply don't do anything that can't be done more
| explicitly in a few lines of Bash.
|
| One of the sites I still maintain has been running for 15 years.
| I ported it from Fedora (not my choice) to Debian about 8 years
| ago, and the only thing that changed in the deployment scripts
| was the package manager. I switched to DigitalOcean 5 years ago
| and the deployment script didn't change, period. The deployment
| script is 81 lines of bash. git blame shows 64 of those lines are
| from the original commit of the file. The changes are primarily
| to add new packages and to change the firewall to UFW.
|
| And critically: I didn't write this script. This was written by a
| guy before me who just happened to have a similar deploy
| philosophy to me. That's a much better maintainability story than
| having to hire someone with your specific deploy tools on their
| resume.
| tazu wrote:
| For infrastructure, I try to keep the Lindy effect [1] in mind.
|
| [1]: https://en.wikipedia.org/wiki/Lindy_effect
| kerkeslager wrote:
| Exactly! Thanks, I didn't know there was a name for that.
| banku_brougham wrote:
| This is a compelling testimonial. Someone recently posted a
| 'git ops' package here that was equally simple and bash based.
| Im interested to try, but im wondering what the rationale will
| be for avoiding github actions.
|
| If you could share script or snippets I would be grateful.
|
| One difficulty for me is my datastore and reporo g package is a
| python application using Prefect, and I manage dependency with
| Poetry.
|
| The 'poetry install' phase isnt always ha ds off and my script
| fails.
| ambicapter wrote:
| > but im wondering what the rationale will be for avoiding
| github actions.
|
| Having used them, I would say lack of test-ability and
| slightly unhelpful flow-control rules would be my guess. The
| former you'll find in any managed CI solution.
| kerkeslager wrote:
| > Im interested to try, but im wondering what the rationale
| will be for avoiding github actions.
|
| I don't. Github actions are simple enough, and easily run the
| scripts I'm talking about.
|
| > If you could share script or snippets I would be grateful.
|
| Maybe I'll pull something together and post it if I get the
| chance this week.
|
| > One difficulty for me is my datastore and reporo g package
| is a python application using Prefect, and I manage
| dependency with Poetry.
|
| > The 'poetry install' phase isnt always ha ds off and my
| script fails.
|
| I don't use poetry, I use pypi directly with pip, so not sure
| I can help you there. That sounds like a problem to debug.
| But notably, it's probably easier to debug than similar
| problems with a containerized system (which happen).
| kkfx wrote:
| Hem... Sorry but... It seems more propaganda for proprietary
| cloud solutions than a personal statement and actually the
| conclusion "do not do it yourself" tend to be regularly denied by
| the facts...
|
| Choosing third party, well known, FLOSS infra/open formats it's a
| thing, not developing their own infra with such tools in house is
| another.
| weego wrote:
| _How do you ensure the data infrastructure you're building
| doesn't get replaced as soon as you leave in the future?_
|
| If this is a core conceit of the thinking then my answer is who
| cares?
|
| Why do you want to try and influence a situation you're not even
| involed in?
|
| Taking it back to the best lesson I was ever given in software
| engineering "don't code for every future".
|
| Do what you're asked to and not get caught up in projecting your
| own biases into trying to make a "solid base" for the future when
| you can't know the concerns of said future.
| layer8 wrote:
| You should be interested in building something that others
| don't want to replace as soon as you leave. That doesn't
| require predicting the future.
|
| It should be obvious that people not caring what comes after
| them is not a good thing.
| ipaddr wrote:
| No matter what you build someone will come along later and
| try to rewrite. If it is built too well with too many future
| cases in mind it will be too complex. If you write something
| simple and basic someone will try to add their complexity. If
| you write in one language someone will try to use something
| different. Same goes for framework.
|
| Write for your current requirements not some future state
| because people will say your work was subpar or overkill
| regardless because you are not around to defend your
| decisions and putting you down raises them up.
|
| Things you learn after 25 years.
| Closi wrote:
| Then let someone come along and try to rewrite and improve
| - but if your solution is so flimsy it forces a rewrite,
| it's just poorly made to start with.
| d_sem wrote:
| Some data retention requirements are mandated by law and it
| is necessary to develop robust systems that can stand the
| test of time. I've seen 15 and 25 year retention periods
| for data in safety related applications.
|
| Things my interns learned in the first month as part of new
| hire training.
|
| My quip above is to illustrate that in a dynamic and
| complex field its important we don't over index on
| experience.
| dataflow wrote:
| > No matter what you build someone will come along later
| and try to rewrite.
|
| > If you write something simple and basic someone will try
| to add their complexity.
|
| Note that extending != rewriting.
| jvans wrote:
| They mean rewriting. People love rewriting stuff they
| didn't write
| jvans wrote:
| The desire to rewrite something is the single biggest red
| flag for me that someone has questionable technical
| decision making skills. Yes there can be good reasons for
| it, but my priors shift dramatically once I hear someone
| suggest it
| bborud wrote:
| You can view the question as a proxy for "how do you provide
| value for money?".
|
| If you build something that then gets replaced a few years
| later, maybe you did something wrong. Ideally you make
| something that evolves, or even better, that acts as a
| foundation others can build on. If you get a lot of assumptions
| right and the implementation doesn't get in the way of what
| people do - or better yet, meaningfully enables them to get
| work done, you've succeeded.
|
| Here are some things I've observed in the wild.
|
| Data infrastructure projects often fail, not because the
| technology doesn't work, but because the solution does not
| enable _organizations_ to work with them. I've seen many
| companies invest millions in solutions that eventually turned
| out to be useless because they failed to help make data and
| results accessible to complex organizations with lots of
| internal boundaries.
|
| Too much too soon and too complex. You try to address every
| possible need from the start and in order to make the feature
| list as long and impressive as possible, you introduce lots and
| lots of systems that are expensive and complex. Then to use the
| system, you unload a huge burden onto the users. They have to
| learn all of these systems and spend lots of time and money
| training people and adapting their systems so they can
| interoperate with the rest.
|
| I've helped a few companies design their data infrastructure. I
| usually follow an extremely minimalist approach. Here's how I
| start.
|
| 1) your long term data store is flat files, 2) you make real-
| time data available over streaming protocols, 3) by default
| everyone (inside the company) has access - access limitations
| have to be justified, 4) you document formats and share code
| that is used to interpret, transform and process data so the
| consumer can interpret the data. 5) you give people access to
| resources where they can spin up databases and run stuff. Data
| producers and consumers decide how they want to create and
| process data. You focus on the interface where they exchange
| data.
|
| (I left security as an exercise to the reader because a) it
| depends and b) how to secure these kinds of systems is an even
| longer post)
|
| Points 1 and 2 are sufficient to bootstrap databases and
| analytic systems at any time. Including systems that receive
| live data. It makes it possible to both support systems that
| are supposed to be up permanently and systems that perhaps only
| load the data, do some progressing and then get nuked. 5
| provides the resources to do so.
|
| 3 usually meets with resistance in some types of organizations,
| but is critical. I've seen companies invest millions in "data
| lakes" and whatnot ... and then piss away the value because
| only 2-3 people have access to the data and they ain't sharing.
| You need executive management to empower someone to put their
| foot down. (One way to make people share data is to use
| budgets. If you don't share data, your department pays for its
| storage. If it is shared, it is paid for by a central budget.)
|
| Point 4 requires you to also educate people a bit on data
| exchange. For instance in many areas there exists exchange
| standards, but these are not necessarily very good. If you find
| yourself in a situation where you spend a lot of effort
| expressing the data in format X and then spend a lot of effort
| interpreting the data at the other end, you are wasting your
| time. Come up with something simpler. Not all standards are
| worth using. And not everything is worth standardizing - don't
| lose sight of actual goals.
|
| Point 5 is where you grow new core services. Producers and
| consumers get to pick their own technologies and do whatever
| they want. When they can show that they've built something that
| makes life easier for other parts of the organization, you can
| consider moving it to the "core" but this only happens when
| something has shown that it works and improves productivity
| across internal boundaries.
| liveoneggs wrote:
| Resume-building job-hoppers are annoying and self absorbed,
| yes. The idiots who enable them are even worse.
| Culonavirus wrote:
| > If this is a core conceit of the thinking then my answer is
| who cares?
|
| Yep. At the end of the day, it's very simple:
|
| People working for a company are not ants or bees. A company is
| not a hive and people are not going to put down their own
| interests to serve the hive. We are a bunch of cooperating, but
| ultimately independent agents, who act in their own benefit.
|
| It is up to the business owner to keep their employee activity
| in check. Does that mean giving them work to do? Checking on
| the progress of their tasks? Checking on their methodology and
| software stack sustainability? Making sure there are no single
| points of failure for the business? Making sure the "IT know-
| how" of the business is preserved when a person leaves? ALL OF
| THE ABOVE!
|
| When a business owner can't do these periodic checks
| themselves, they're free to hire someone that will do this for
| them.
|
| But the idea that individual developers should care about what
| happens to the business after they leave is just preposterous.
|
| Also, the entire "resume driven development" thing is absurd.
| This has always happened in software development. People care a
| lot about what their resume will look like in 5 years. It's
| perfectly normal and the business benefits too ("we use modern
| tools, come work for us"). It doesn't mean the business should
| allow needless "shiny new thing" syndrome to thrive, but you
| should watch out to not stomp out innovation or you might find
| yourself unable to hire talented devs because no one wants to
| work on your shitty "php with jquery" web app.
| vladms wrote:
| > But the idea that individual developers should care about
| what happens to the business after they leave is just
| preposterous.
|
| It's not about caring after you leave. It's while you stay
| caring enough to do useful things for the company. Sure, you
| can be like a consultant (require very specific requirements
| and not trying to understand or put things in perspective),
| but as an employer these are the first people that I will let
| go because they bring less value than someone that "cares"
| (again, while being there, not after they left)
| neilv wrote:
| Yes. Put another way, this school of thought concerns
| professionalism while you're there, when you already know
| that what you do will still have effects after you're gone.
|
| A different school of thought is that a job is about
| showing up and doing some interpretation of what your your
| manager tells you to do. This might not be very aligned,
| and much of the org chart might not be very aligned, so the
| priority tends to be appearances. Manager told you to make
| a Web site that does X, so you try to make a Web site that
| arguably does X. You don't tell the manager all the factors
| that in a better organization they should care about, and
| you maybe don't do a particularly good job of the site you
| do make, and you definitely don't base all your
| implementation decisions based on company needs rather than
| your own resume and political capital. But you're satisfied
| that you arguably did what you were told to do, and that's
| the transaction.
|
| The latter school of thought is very common, and I think
| it's not really due to individual ICs. Rather, usually the
| organization is actually pushing people towards that
| thinking, because the org chart and practices are also full
| of that kind of thinking. A more conscientious professional
| would blow a gasket, due to the "preposterous" situation of
| a company of individual irresponsible mercenary behavior
| and collective dysfunction like that.
|
| I naturally subscribe to the true alignment school of
| thought, and that's one of the appeals of being a startup
| founder: I can apply my experience (and, admittedly, just
| as much theories/guesses) towards building a company and
| team where things are aligned better. It's also one of the
| reasons I dread some aspects of founding, because I know
| that, no matter how good I am about hiring and onboarding
| into the aligned culture, we'll sometimes have to deal with
| very mis-aligned (even bad-faith) people from
| partners/customers/investors. Not only is that unpleasant,
| but there's the risk of infection.
| mvkel wrote:
| SaaS was born from comments like this. Paying to keep the
| lights on, effectively ensuring that an employee quitting won't
| undermine the entire operation.
| OutOfHere wrote:
| SaaS companies simply cannot be trusted to not leak customer
| data. They always will leak it to hackers. This is different
| from major clouds and self-hosted services which have
| different sets of security considerations. Snowflake
| validated this assertion this year with a major data leak.
|
| Also, with SaaS, you pay 5-20x for everything. For example,
| you can self-host Airflow in a $20 USD/month VM, but any
| managed Airflow service is going to cost astronomically more.
| fuzzfactor wrote:
| A lot of the article relates to a key person dependency issue.
|
| >Sure, they have built data infrastructure that works and
| solved the businesses current problems. They maintain it, and
| no one asks questions.
|
| Probably all the "budget" has allowance for is current needs.
| Some of the key engineers may not even be paid very fairly
| considering the true magnitude of business problems being
| overcome _currently_. You can 't really expect them to prepare
| for a longer future than they have already been fully staffed
| for, especially succession.
|
| >Perhaps no one realizes that one of the team members has to
| wake up at 6 AM every morning to check and ensure all the
| reports and tables have been created.
|
| >That all works until the day they leave.
|
| If a talented engineer is regularly working overtime to get
| things going like infrastructure, or worse to keep things
| going, even worse to keep things from failing, then that
| engineer is definitely short two staff members. And has
| probably been short the entire time. Nothing less than an
| assistant engineer and a technical secretary if they want real
| documentation as they go along. Plus even more true investment
| if there's any need to make up for lost time.
|
| If infrastructure is important, something like this is
| absolutely pure executive failure from someone who's just not
| in the proper league.
|
| You can not paint a pretty picture, and it's reported to be
| very difficult to fix stupid.
|
| Some people just should not be accepted as executives in
| technical endeavors.
|
| It can be tough for lesser executives to accept a non-cutthroat
| non-business-ladder-climber as more of a "key person" than
| themselves, but it is far too often the case. Whether the
| bonehead executives realize it and shrewdly calculate how much
| more payroll would expand if there was to be better coverage,
| or are completely oblivious, as the article says about the
| overworked engineers:
|
| >They maintain it, and no one asks questions.
|
| Example article is from a more expert data "repairman" who
| knows better than to rely on a single company as an employer if
| it's got dingbat executives.
|
| There's so many of the under-qualified executives to go around,
| he's got a lifetime of work ahead of him as a consultant fixing
| the lackadaisical way they let technical debt underlie a
| business to where it could topple unexpectedly.
| halfcat wrote:
| You should care because, the vast majority of the time the
| person working with it after you, will be you.
|
| But to your point, people think they want "flexibility" or some
| similar concept, and they end up adding immediate complexity
| that never pays off, or worse, they pick the wrong abstraction
| and have a mess that's hard to undo later.
|
| What they should be aiming for is _simplicity_. Instead of
| trying to predict future, keep it as simple as possible to give
| that future person a chance of tackling the future needs when
| they arise.
| pmx wrote:
| > In one example I came in and found a data vault project that
| wasn't being used. It actually had pretty good documentation.
| However, the team had taken so long and hadn't fully completed
| the project which led to their dismissal.
|
| I feel like this is a major reason things don't get
| documentation. We don't get judged on it, nobody cares how good
| the docs are, but they DO care that we're shipping features.
| g4zj wrote:
| I think I write decent documentation, but my target audience is
| usually my future self. If I can rely on my own documentation
| to help quickly reacquaint myself with the project later on, I
| generally consider it sufficient.
| cmiles74 wrote:
| The most praise I've ever received for my documentation was
| from the developers forced to read it after I moved on to a
| position with another company. It's a little unfortunate I
| didn't hear more about it as I was writing it, but then again
| it's the kind of stuff other developers find useful.
|
| Still, in my opinion, spending the time writing documentation
| was a net positive: it didn't take me all that long to write up
| and it clearly made someone else's job easier, so much so that
| they mentioned it to me. And, of course, I've had to read my
| own docs more than once.
| devjab wrote:
| Should they care? I've come into organisations that had spent
| entire years worth of man hours on setting things up correctly
| so that they could potentially scale to millions of concurrent
| users. Organisations which would never reach more than 50.000
| concurrent users in their wildest dreams.
|
| On the flip side I've seen some extreme cowboy hacker man code
| run perfectly fine for its entire 10 year life cycles.
|
| Now, I don't think you should go completely cowboy, but I do
| think you should think about whether or not your "correctness"
| is getting in the way of your actual job. Which is typically as
| a service function where you're supposed to deliver business
| value at a rapid pace. Obviously it depends on what you do. If
| you work in medical software you're probably going to want to
| get things right, but just how much programming could be
| perfectly fine if it was just thrown together without any
| adherence to "correctness"? The theory tells you it'll cost you
| down the line, and in some cases it will. In my anecdotal
| experience it's not as often as we might like to think, and
| sometimes the cost is even worth it. In two decades I've only
| ever really seen two poorly build systems cost so much down the
| line that they would've been better off having been build
| better from the get go. In both cases they couldn't have been
| build better upfront because the startups didn't have the
| people to do so.
| mritchie712 wrote:
| (disclaimer: I'm a founder in this space)
|
| > the project is either so incomplete or so lacking in a central
| design that the best thing to do is replace the old system
|
| I put a lot of blame here on "the modern data stack". Hundreds[0]
| of point-solution data tools and very few of them achieve
| "business outcomes" on their own. You need to stitch together 5
| of them to get dashboards, 7 of them to get real time analytics,
| etc.
|
| We're going to see more products that achieve an outcome end-to-
| end. A lot of companies just want a few dashboards that give a
| 360 degree view of their data. They want all their data in one
| spot, an easy way to access it and don't want to spend fortune on
| it. That's what we're focused on at Definite[1].
|
| We're built on the best open source data projects (e.g. DuckDB,
| Iceberge, Cube, etc.). If you decide to self host, you can use
| the same components, but it's generally cheaper to use us than
| manage all this stuff yourself.
|
| 0 - https://mattturck.com/landscape/mad2024.pdf
|
| 1 - https://www.definite.app/
|
| 2 - https://youtu.be/7FAJLc3k2Fo
| tomrod wrote:
| I'm not a founder in this space like yourself, but I do a solid
| mix of consulting and building on the modern data stack for
| AI/ML and always appreciate a well constructed data stack.
| steveBK123 wrote:
| The "you need 5-7 different tools glued together to solve
| anything" is the CORE problem of the "modern data stack". It
| also ties very closely with Resume Driven Development.
|
| It leads to a lot of anti-patterns.
|
| For example, the 5-7 different tools are constantly changing,
| so after hiring some proclaimed expert.. they end up re-
| inventing the wheel by choosing a new combination of tools than
| they've used in the past, hitting various unexpected issues as
| they go.
|
| VERY rarely in this space do you see someone come in and go "I
| used these 5 tools in previous roles, they work great, and I'm
| going to build the best solution because I have done it
| before."
|
| These guys always think they need to reinvent the wheel, and
| then end up wrecking the car with some combination of v0.1
| untested FOSS, up&coming SaaS, and their own in-house DSL.
| kwillets wrote:
| OMG I'm that guy -- 3 straight Vertica roles with $B annual
| revenue.
|
| I did learn a lot from watching MDS people try to beat it (in
| the end I'm also looking for what should come next), but
| mostly it was confirming the article and RDD. What they
| didn't know about data warehousing they also didn't know
| about performance or price-performance or selecting tools or
| managing projects or vendors, so costs exploded.
|
| These folks were hilarious because they kept insisting that
| Vertica is not "modern", while it beat the pants off them
| with basic columnstore stuff.
| victor106 wrote:
| Definite - seems interesting.
|
| But what is the Definite warehouse? Is it built on open
| standards?
| mritchie712 wrote:
| Yes, our warehouse is built on DuckDB and Iceberg
| (https://iceberg.apache.org/). DuckDB is used as the query
| engine and storage or smaller or static data and Iceberg is
| used to store larger / more frequently updated data (e.g. CDC
| from Postgres).
| cmiles74 wrote:
| In my opinion there are many tools and products in this space
| and they all seem somewhat confused in their target audience
| (is it marketed to management, analysts producing reports or
| developers supporting the analysts?) The boundaries between
| these projects is often fuzzy and they are often complicated
| (does it include a scripting language?) When you are starting
| from close to scratch with an application and it's backing
| database and being asked to produce timely reports, I think
| these tools aren't the best place to start.
|
| My process has been to talk to stakeholders and sketch out
| reports they find useful, preferably getting a set of data
| together that many people find useful. Running these reports
| against a read-only replica of production data is typically not
| a big lift. If dashboards are required, write this data out to
| another database to back the dashboard, probably on some set
| schedule. It hasn't been long but now we have the bones of an
| ETL process that is already returning value.
|
| At that point I think these tools start to look more
| compelling. Now we have a handle on the source data, what it
| looks like and any places where we need to do something tricky
| to connect the dots to get our data out. We know what the
| reports and dashboards look like.
|
| In short, we know what we need these tools to do and where they
| can help us.
| cletus wrote:
| All technical problems are organizational problems. Put another
| way: any technical problem is a symptom of an organizational
| problem.
|
| Even at Google, which has some truly amazing homegrown technical
| infrastructure, you see what I called Promotion Driven
| Development ("PDD"). I didn't see this when it came to core
| technical infrastructure (eg storage, networking) but at a higher
| level I saw many examples of something being replaced solely
| (ultimately) because someobody wanted to get promoted and you
| don't get promoted for maintaining the thing. You get promoted
| for replacing the thing.
|
| The most egregious example was someone getting promoted to
| Principal Engineer (T8) for being the TL of something that was
| meant to replace existing core infrastructure before it had even
| shipped. In the end it didn't ship. The original thing is still
| there. But wait, "we learned a lot".
|
| So this happens because the organization rewards the new thing.
|
| So why is your data infrastructure being replaced? Probably
| because of an organizational failure and it'll have almost
| nothing to do with technical aspects of that infrastructure. This
| is true at least 90% of the time (IME).
|
| Data infrastructure is particularly bad for this because in any
| sufficiently large organization you will completely underestimate
| the impact of changing data dependencies for metrics, dashboards,
| monitoring, ML training and so on. Those things can be hard to
| find and map out and generally you only find them when they
| break. Sometimes they can break for _years_ before anyone notices
| even when the thing is used by live production systems.
| mkl95 wrote:
| > The problem I find is that many data teams get thrown into
| having to design data infrastructure with little experience
| actually setting up one in the past. Don't get me wrong, we all
| start with no experience. But it can be difficult to assess what
| all the nuances of different tooling and designs can be.
|
| I've been there. Companies can be cheap about training and I was
| given none before building some sophisticated data stuff that
| surprisingly worked, but probably could have been simpler.
|
| I got a much better job soon after, and hopefully my replacement
| got some training.
| devjab wrote:
| I think that one of the biggest issues is that a lot of
| training is flat out horrible. The author preaches simplicity
| and working directly with the business, but mean while you have
| entire teams of developers being taught something like Clean
| Architecture, SOLID, DRY and all sorts of fancy things that
| when used poorly leads to extreme over-abstractions.
|
| So even the most well meaning engineers with good training can
| go about building data structures which won't last very long
| once the key people leave. Simply because what they were taught
| doesn't work.
| moltar wrote:
| An easy to maintain stack from my experience that almost anyone
| can do:
|
| - S3 for storage
|
| - Glue catalog to describe / define source data shapes
|
| - Athena to query the above
|
| - dbt for business data modelling (has Athena and glue adapter)
|
| The only difficult part I always struggle with is getting
| partitioning right.
| hipadev23 wrote:
| Everytime I look at S3/Glue/Athena I can't help but feeling
| like the Glue layer shouldn't be necessary and it's instead
| just part of athena's ddl
| ianburrell wrote:
| Athena is query engine and can use multiple catalogs. It
| forwards DDL queries to the catalog. Glue is the default
| catalog.
| OutOfHere wrote:
| Is DBT really necessary? (serious question) If so, why? What
| would go wrong by skipping it?
| moltar wrote:
| No, not necessary at all. You can write queries and CTEs and
| create views in Athena/Glue by hand, if that's what you
| prefer.
| OutOfHere wrote:
| I mean what does DBT offer me here that makes it
| worthwhile?
| pphysch wrote:
| > Our really smart engineer working over time built amazing
| custom infrastructure
|
| > They quit and no one knows how it works
|
| Either the infrastructure wasn't "amazing" in the first place or
| clueless management is looking for a scapegoat.
|
| "Amazing" is an interesting word choice because a non-technical
| manager will be amazed by any blob of code. "Amazing" doesn't
| mean a straightforward, robust solution to a difficult problem.
| yobbo wrote:
| Yes, but this article seems to be talking to a business that has
| no competence in "data" outside of a handful random engineers
| that have no stake in the business. The advice given amounts to
| "avoid bad things".
|
| Resume-driven engineering etc is the result of engineers with no
| stake in the business future. Any solution must involve
| incentives against "bad things" and favour "good things".
| fifilura wrote:
| Parquet+iceberg stored on s3 as a base. That is solid enough.
|
| After that comes various kinds of caches, maybe postgres for
| frontend? Or something streaming?
|
| But once everything is stored as files you gain the freedom to
| experiment or refactor.
| fsndz wrote:
| The problem is consultants selling bullshit as expertise are more
| prevalent than honest consultants. And for these bullshit
| consultant, selling the most unnecessarily complex solution with
| all the trendy keywords and making beautiful slides is all that
| counts. And what is funny is that customers believe all those
| lies.
| Terr_ wrote:
| I've come to believe the opposite, promoting it as "Design for
| Deletion."
|
| I used to think I could make a wonderful work of art which
| everyone will appreciate for the ages, crafted so that every
| contingency is planned for, every need met... But nobody predicts
| future needs that well. Someday whatever I make is going to be
| That Stupid Thing to somebody, and they're going to be justified
| demolishing the whole mess, no matter how proud I may feel about
| it now.
|
| So instead, put effort into making it _easy to remove_. This
| often ends up reducing coupling, but--crucially--it 's not the
| same as some enthusiastic young developer trying to decouple _all
| the things_ through a meta-configurable framework. Sometimes a
| tight coupling is better when it 's easier to reason about.
|
| The question isn't whether You Ain't Gonna Need It, the question
| is whether when you _do_ need it will so much have changed that
| other design-aspects won 't be valid anymore. It also means a
| level of trust towards (or helpless acceptance of) a future
| steward of your code.
| beardedetim wrote:
| Cannot agree more to this sentiment. I call it "throw away
| code" and it's always seemed like the easiest to change in the
| future, and we all know everything is gonna change in the
| future.
___________________________________________________________________
(page generated 2024-08-11 23:00 UTC)