[HN Gopher] Building data infrastructure that will last
       ___________________________________________________________________
        
       Building data infrastructure that will last
        
       Author : andyjohnson0
       Score  : 114 points
       Date   : 2024-08-11 10:24 UTC (12 hours ago)
        
 (HTM) web link (seattledataguy.substack.com)
 (TXT) w3m dump (seattledataguy.substack.com)
        
       | kerkeslager wrote:
       | I work in a different domain (full stack development), but I
       | think the principle here applies broadly.
       | 
       | I tend to favor tools that have been around for a long time. A
       | lot of the sites I have built have deployment scripts written in
       | bash with dependencies on apt packages and repos, git to pull
       | application code, and rsync to copy over files. It would probably
       | be okay to update to zsh at this point. ;) I'm constantly shocked
       | by the complexity of deployment infrastructures when I get into
       | new projects: I've spent plenty of time working with Docker and
       | Kubernetes and I have yet to see a case where these simplified
       | things. As a rule, I don't throw out existing infrastructure, but
       | if I'm doing greenfield development I never introduce containers
       | --they simply don't do anything that can't be done more
       | explicitly in a few lines of Bash.
       | 
       | One of the sites I still maintain has been running for 15 years.
       | I ported it from Fedora (not my choice) to Debian about 8 years
       | ago, and the only thing that changed in the deployment scripts
       | was the package manager. I switched to DigitalOcean 5 years ago
       | and the deployment script didn't change, period. The deployment
       | script is 81 lines of bash. git blame shows 64 of those lines are
       | from the original commit of the file. The changes are primarily
       | to add new packages and to change the firewall to UFW.
       | 
       | And critically: I didn't write this script. This was written by a
       | guy before me who just happened to have a similar deploy
       | philosophy to me. That's a much better maintainability story than
       | having to hire someone with your specific deploy tools on their
       | resume.
        
         | tazu wrote:
         | For infrastructure, I try to keep the Lindy effect [1] in mind.
         | 
         | [1]: https://en.wikipedia.org/wiki/Lindy_effect
        
           | kerkeslager wrote:
           | Exactly! Thanks, I didn't know there was a name for that.
        
         | banku_brougham wrote:
         | This is a compelling testimonial. Someone recently posted a
         | 'git ops' package here that was equally simple and bash based.
         | Im interested to try, but im wondering what the rationale will
         | be for avoiding github actions.
         | 
         | If you could share script or snippets I would be grateful.
         | 
         | One difficulty for me is my datastore and reporo g package is a
         | python application using Prefect, and I manage dependency with
         | Poetry.
         | 
         | The 'poetry install' phase isnt always ha ds off and my script
         | fails.
        
           | ambicapter wrote:
           | > but im wondering what the rationale will be for avoiding
           | github actions.
           | 
           | Having used them, I would say lack of test-ability and
           | slightly unhelpful flow-control rules would be my guess. The
           | former you'll find in any managed CI solution.
        
           | kerkeslager wrote:
           | > Im interested to try, but im wondering what the rationale
           | will be for avoiding github actions.
           | 
           | I don't. Github actions are simple enough, and easily run the
           | scripts I'm talking about.
           | 
           | > If you could share script or snippets I would be grateful.
           | 
           | Maybe I'll pull something together and post it if I get the
           | chance this week.
           | 
           | > One difficulty for me is my datastore and reporo g package
           | is a python application using Prefect, and I manage
           | dependency with Poetry.
           | 
           | > The 'poetry install' phase isnt always ha ds off and my
           | script fails.
           | 
           | I don't use poetry, I use pypi directly with pip, so not sure
           | I can help you there. That sounds like a problem to debug.
           | But notably, it's probably easier to debug than similar
           | problems with a containerized system (which happen).
        
       | kkfx wrote:
       | Hem... Sorry but... It seems more propaganda for proprietary
       | cloud solutions than a personal statement and actually the
       | conclusion "do not do it yourself" tend to be regularly denied by
       | the facts...
       | 
       | Choosing third party, well known, FLOSS infra/open formats it's a
       | thing, not developing their own infra with such tools in house is
       | another.
        
       | weego wrote:
       | _How do you ensure the data infrastructure you're building
       | doesn't get replaced as soon as you leave in the future?_
       | 
       | If this is a core conceit of the thinking then my answer is who
       | cares?
       | 
       | Why do you want to try and influence a situation you're not even
       | involed in?
       | 
       | Taking it back to the best lesson I was ever given in software
       | engineering "don't code for every future".
       | 
       | Do what you're asked to and not get caught up in projecting your
       | own biases into trying to make a "solid base" for the future when
       | you can't know the concerns of said future.
        
         | layer8 wrote:
         | You should be interested in building something that others
         | don't want to replace as soon as you leave. That doesn't
         | require predicting the future.
         | 
         | It should be obvious that people not caring what comes after
         | them is not a good thing.
        
           | ipaddr wrote:
           | No matter what you build someone will come along later and
           | try to rewrite. If it is built too well with too many future
           | cases in mind it will be too complex. If you write something
           | simple and basic someone will try to add their complexity. If
           | you write in one language someone will try to use something
           | different. Same goes for framework.
           | 
           | Write for your current requirements not some future state
           | because people will say your work was subpar or overkill
           | regardless because you are not around to defend your
           | decisions and putting you down raises them up.
           | 
           | Things you learn after 25 years.
        
             | Closi wrote:
             | Then let someone come along and try to rewrite and improve
             | - but if your solution is so flimsy it forces a rewrite,
             | it's just poorly made to start with.
        
             | d_sem wrote:
             | Some data retention requirements are mandated by law and it
             | is necessary to develop robust systems that can stand the
             | test of time. I've seen 15 and 25 year retention periods
             | for data in safety related applications.
             | 
             | Things my interns learned in the first month as part of new
             | hire training.
             | 
             | My quip above is to illustrate that in a dynamic and
             | complex field its important we don't over index on
             | experience.
        
             | dataflow wrote:
             | > No matter what you build someone will come along later
             | and try to rewrite.
             | 
             | > If you write something simple and basic someone will try
             | to add their complexity.
             | 
             | Note that extending != rewriting.
        
               | jvans wrote:
               | They mean rewriting. People love rewriting stuff they
               | didn't write
        
             | jvans wrote:
             | The desire to rewrite something is the single biggest red
             | flag for me that someone has questionable technical
             | decision making skills. Yes there can be good reasons for
             | it, but my priors shift dramatically once I hear someone
             | suggest it
        
         | bborud wrote:
         | You can view the question as a proxy for "how do you provide
         | value for money?".
         | 
         | If you build something that then gets replaced a few years
         | later, maybe you did something wrong. Ideally you make
         | something that evolves, or even better, that acts as a
         | foundation others can build on. If you get a lot of assumptions
         | right and the implementation doesn't get in the way of what
         | people do - or better yet, meaningfully enables them to get
         | work done, you've succeeded.
         | 
         | Here are some things I've observed in the wild.
         | 
         | Data infrastructure projects often fail, not because the
         | technology doesn't work, but because the solution does not
         | enable _organizations_ to work with them. I've seen many
         | companies invest millions in solutions that eventually turned
         | out to be useless because they failed to help make data and
         | results accessible to complex organizations with lots of
         | internal boundaries.
         | 
         | Too much too soon and too complex. You try to address every
         | possible need from the start and in order to make the feature
         | list as long and impressive as possible, you introduce lots and
         | lots of systems that are expensive and complex. Then to use the
         | system, you unload a huge burden onto the users. They have to
         | learn all of these systems and spend lots of time and money
         | training people and adapting their systems so they can
         | interoperate with the rest.
         | 
         | I've helped a few companies design their data infrastructure. I
         | usually follow an extremely minimalist approach. Here's how I
         | start.
         | 
         | 1) your long term data store is flat files, 2) you make real-
         | time data available over streaming protocols, 3) by default
         | everyone (inside the company) has access - access limitations
         | have to be justified, 4) you document formats and share code
         | that is used to interpret, transform and process data so the
         | consumer can interpret the data. 5) you give people access to
         | resources where they can spin up databases and run stuff. Data
         | producers and consumers decide how they want to create and
         | process data. You focus on the interface where they exchange
         | data.
         | 
         | (I left security as an exercise to the reader because a) it
         | depends and b) how to secure these kinds of systems is an even
         | longer post)
         | 
         | Points 1 and 2 are sufficient to bootstrap databases and
         | analytic systems at any time. Including systems that receive
         | live data. It makes it possible to both support systems that
         | are supposed to be up permanently and systems that perhaps only
         | load the data, do some progressing and then get nuked. 5
         | provides the resources to do so.
         | 
         | 3 usually meets with resistance in some types of organizations,
         | but is critical. I've seen companies invest millions in "data
         | lakes" and whatnot ... and then piss away the value because
         | only 2-3 people have access to the data and they ain't sharing.
         | You need executive management to empower someone to put their
         | foot down. (One way to make people share data is to use
         | budgets. If you don't share data, your department pays for its
         | storage. If it is shared, it is paid for by a central budget.)
         | 
         | Point 4 requires you to also educate people a bit on data
         | exchange. For instance in many areas there exists exchange
         | standards, but these are not necessarily very good. If you find
         | yourself in a situation where you spend a lot of effort
         | expressing the data in format X and then spend a lot of effort
         | interpreting the data at the other end, you are wasting your
         | time. Come up with something simpler. Not all standards are
         | worth using. And not everything is worth standardizing - don't
         | lose sight of actual goals.
         | 
         | Point 5 is where you grow new core services. Producers and
         | consumers get to pick their own technologies and do whatever
         | they want. When they can show that they've built something that
         | makes life easier for other parts of the organization, you can
         | consider moving it to the "core" but this only happens when
         | something has shown that it works and improves productivity
         | across internal boundaries.
        
         | liveoneggs wrote:
         | Resume-building job-hoppers are annoying and self absorbed,
         | yes. The idiots who enable them are even worse.
        
         | Culonavirus wrote:
         | > If this is a core conceit of the thinking then my answer is
         | who cares?
         | 
         | Yep. At the end of the day, it's very simple:
         | 
         | People working for a company are not ants or bees. A company is
         | not a hive and people are not going to put down their own
         | interests to serve the hive. We are a bunch of cooperating, but
         | ultimately independent agents, who act in their own benefit.
         | 
         | It is up to the business owner to keep their employee activity
         | in check. Does that mean giving them work to do? Checking on
         | the progress of their tasks? Checking on their methodology and
         | software stack sustainability? Making sure there are no single
         | points of failure for the business? Making sure the "IT know-
         | how" of the business is preserved when a person leaves? ALL OF
         | THE ABOVE!
         | 
         | When a business owner can't do these periodic checks
         | themselves, they're free to hire someone that will do this for
         | them.
         | 
         | But the idea that individual developers should care about what
         | happens to the business after they leave is just preposterous.
         | 
         | Also, the entire "resume driven development" thing is absurd.
         | This has always happened in software development. People care a
         | lot about what their resume will look like in 5 years. It's
         | perfectly normal and the business benefits too ("we use modern
         | tools, come work for us"). It doesn't mean the business should
         | allow needless "shiny new thing" syndrome to thrive, but you
         | should watch out to not stomp out innovation or you might find
         | yourself unable to hire talented devs because no one wants to
         | work on your shitty "php with jquery" web app.
        
           | vladms wrote:
           | > But the idea that individual developers should care about
           | what happens to the business after they leave is just
           | preposterous.
           | 
           | It's not about caring after you leave. It's while you stay
           | caring enough to do useful things for the company. Sure, you
           | can be like a consultant (require very specific requirements
           | and not trying to understand or put things in perspective),
           | but as an employer these are the first people that I will let
           | go because they bring less value than someone that "cares"
           | (again, while being there, not after they left)
        
             | neilv wrote:
             | Yes. Put another way, this school of thought concerns
             | professionalism while you're there, when you already know
             | that what you do will still have effects after you're gone.
             | 
             | A different school of thought is that a job is about
             | showing up and doing some interpretation of what your your
             | manager tells you to do. This might not be very aligned,
             | and much of the org chart might not be very aligned, so the
             | priority tends to be appearances. Manager told you to make
             | a Web site that does X, so you try to make a Web site that
             | arguably does X. You don't tell the manager all the factors
             | that in a better organization they should care about, and
             | you maybe don't do a particularly good job of the site you
             | do make, and you definitely don't base all your
             | implementation decisions based on company needs rather than
             | your own resume and political capital. But you're satisfied
             | that you arguably did what you were told to do, and that's
             | the transaction.
             | 
             | The latter school of thought is very common, and I think
             | it's not really due to individual ICs. Rather, usually the
             | organization is actually pushing people towards that
             | thinking, because the org chart and practices are also full
             | of that kind of thinking. A more conscientious professional
             | would blow a gasket, due to the "preposterous" situation of
             | a company of individual irresponsible mercenary behavior
             | and collective dysfunction like that.
             | 
             | I naturally subscribe to the true alignment school of
             | thought, and that's one of the appeals of being a startup
             | founder: I can apply my experience (and, admittedly, just
             | as much theories/guesses) towards building a company and
             | team where things are aligned better. It's also one of the
             | reasons I dread some aspects of founding, because I know
             | that, no matter how good I am about hiring and onboarding
             | into the aligned culture, we'll sometimes have to deal with
             | very mis-aligned (even bad-faith) people from
             | partners/customers/investors. Not only is that unpleasant,
             | but there's the risk of infection.
        
         | mvkel wrote:
         | SaaS was born from comments like this. Paying to keep the
         | lights on, effectively ensuring that an employee quitting won't
         | undermine the entire operation.
        
           | OutOfHere wrote:
           | SaaS companies simply cannot be trusted to not leak customer
           | data. They always will leak it to hackers. This is different
           | from major clouds and self-hosted services which have
           | different sets of security considerations. Snowflake
           | validated this assertion this year with a major data leak.
           | 
           | Also, with SaaS, you pay 5-20x for everything. For example,
           | you can self-host Airflow in a $20 USD/month VM, but any
           | managed Airflow service is going to cost astronomically more.
        
         | fuzzfactor wrote:
         | A lot of the article relates to a key person dependency issue.
         | 
         | >Sure, they have built data infrastructure that works and
         | solved the businesses current problems. They maintain it, and
         | no one asks questions.
         | 
         | Probably all the "budget" has allowance for is current needs.
         | Some of the key engineers may not even be paid very fairly
         | considering the true magnitude of business problems being
         | overcome _currently_. You can 't really expect them to prepare
         | for a longer future than they have already been fully staffed
         | for, especially succession.
         | 
         | >Perhaps no one realizes that one of the team members has to
         | wake up at 6 AM every morning to check and ensure all the
         | reports and tables have been created.
         | 
         | >That all works until the day they leave.
         | 
         | If a talented engineer is regularly working overtime to get
         | things going like infrastructure, or worse to keep things
         | going, even worse to keep things from failing, then that
         | engineer is definitely short two staff members. And has
         | probably been short the entire time. Nothing less than an
         | assistant engineer and a technical secretary if they want real
         | documentation as they go along. Plus even more true investment
         | if there's any need to make up for lost time.
         | 
         | If infrastructure is important, something like this is
         | absolutely pure executive failure from someone who's just not
         | in the proper league.
         | 
         | You can not paint a pretty picture, and it's reported to be
         | very difficult to fix stupid.
         | 
         | Some people just should not be accepted as executives in
         | technical endeavors.
         | 
         | It can be tough for lesser executives to accept a non-cutthroat
         | non-business-ladder-climber as more of a "key person" than
         | themselves, but it is far too often the case. Whether the
         | bonehead executives realize it and shrewdly calculate how much
         | more payroll would expand if there was to be better coverage,
         | or are completely oblivious, as the article says about the
         | overworked engineers:
         | 
         | >They maintain it, and no one asks questions.
         | 
         | Example article is from a more expert data "repairman" who
         | knows better than to rely on a single company as an employer if
         | it's got dingbat executives.
         | 
         | There's so many of the under-qualified executives to go around,
         | he's got a lifetime of work ahead of him as a consultant fixing
         | the lackadaisical way they let technical debt underlie a
         | business to where it could topple unexpectedly.
        
         | halfcat wrote:
         | You should care because, the vast majority of the time the
         | person working with it after you, will be you.
         | 
         | But to your point, people think they want "flexibility" or some
         | similar concept, and they end up adding immediate complexity
         | that never pays off, or worse, they pick the wrong abstraction
         | and have a mess that's hard to undo later.
         | 
         | What they should be aiming for is _simplicity_. Instead of
         | trying to predict future, keep it as simple as possible to give
         | that future person a chance of tackling the future needs when
         | they arise.
        
       | pmx wrote:
       | > In one example I came in and found a data vault project that
       | wasn't being used. It actually had pretty good documentation.
       | However, the team had taken so long and hadn't fully completed
       | the project which led to their dismissal.
       | 
       | I feel like this is a major reason things don't get
       | documentation. We don't get judged on it, nobody cares how good
       | the docs are, but they DO care that we're shipping features.
        
         | g4zj wrote:
         | I think I write decent documentation, but my target audience is
         | usually my future self. If I can rely on my own documentation
         | to help quickly reacquaint myself with the project later on, I
         | generally consider it sufficient.
        
         | cmiles74 wrote:
         | The most praise I've ever received for my documentation was
         | from the developers forced to read it after I moved on to a
         | position with another company. It's a little unfortunate I
         | didn't hear more about it as I was writing it, but then again
         | it's the kind of stuff other developers find useful.
         | 
         | Still, in my opinion, spending the time writing documentation
         | was a net positive: it didn't take me all that long to write up
         | and it clearly made someone else's job easier, so much so that
         | they mentioned it to me. And, of course, I've had to read my
         | own docs more than once.
        
         | devjab wrote:
         | Should they care? I've come into organisations that had spent
         | entire years worth of man hours on setting things up correctly
         | so that they could potentially scale to millions of concurrent
         | users. Organisations which would never reach more than 50.000
         | concurrent users in their wildest dreams.
         | 
         | On the flip side I've seen some extreme cowboy hacker man code
         | run perfectly fine for its entire 10 year life cycles.
         | 
         | Now, I don't think you should go completely cowboy, but I do
         | think you should think about whether or not your "correctness"
         | is getting in the way of your actual job. Which is typically as
         | a service function where you're supposed to deliver business
         | value at a rapid pace. Obviously it depends on what you do. If
         | you work in medical software you're probably going to want to
         | get things right, but just how much programming could be
         | perfectly fine if it was just thrown together without any
         | adherence to "correctness"? The theory tells you it'll cost you
         | down the line, and in some cases it will. In my anecdotal
         | experience it's not as often as we might like to think, and
         | sometimes the cost is even worth it. In two decades I've only
         | ever really seen two poorly build systems cost so much down the
         | line that they would've been better off having been build
         | better from the get go. In both cases they couldn't have been
         | build better upfront because the startups didn't have the
         | people to do so.
        
       | mritchie712 wrote:
       | (disclaimer: I'm a founder in this space)
       | 
       | > the project is either so incomplete or so lacking in a central
       | design that the best thing to do is replace the old system
       | 
       | I put a lot of blame here on "the modern data stack". Hundreds[0]
       | of point-solution data tools and very few of them achieve
       | "business outcomes" on their own. You need to stitch together 5
       | of them to get dashboards, 7 of them to get real time analytics,
       | etc.
       | 
       | We're going to see more products that achieve an outcome end-to-
       | end. A lot of companies just want a few dashboards that give a
       | 360 degree view of their data. They want all their data in one
       | spot, an easy way to access it and don't want to spend fortune on
       | it. That's what we're focused on at Definite[1].
       | 
       | We're built on the best open source data projects (e.g. DuckDB,
       | Iceberge, Cube, etc.). If you decide to self host, you can use
       | the same components, but it's generally cheaper to use us than
       | manage all this stuff yourself.
       | 
       | 0 - https://mattturck.com/landscape/mad2024.pdf
       | 
       | 1 - https://www.definite.app/
       | 
       | 2 - https://youtu.be/7FAJLc3k2Fo
        
         | tomrod wrote:
         | I'm not a founder in this space like yourself, but I do a solid
         | mix of consulting and building on the modern data stack for
         | AI/ML and always appreciate a well constructed data stack.
        
         | steveBK123 wrote:
         | The "you need 5-7 different tools glued together to solve
         | anything" is the CORE problem of the "modern data stack". It
         | also ties very closely with Resume Driven Development.
         | 
         | It leads to a lot of anti-patterns.
         | 
         | For example, the 5-7 different tools are constantly changing,
         | so after hiring some proclaimed expert.. they end up re-
         | inventing the wheel by choosing a new combination of tools than
         | they've used in the past, hitting various unexpected issues as
         | they go.
         | 
         | VERY rarely in this space do you see someone come in and go "I
         | used these 5 tools in previous roles, they work great, and I'm
         | going to build the best solution because I have done it
         | before."
         | 
         | These guys always think they need to reinvent the wheel, and
         | then end up wrecking the car with some combination of v0.1
         | untested FOSS, up&coming SaaS, and their own in-house DSL.
        
           | kwillets wrote:
           | OMG I'm that guy -- 3 straight Vertica roles with $B annual
           | revenue.
           | 
           | I did learn a lot from watching MDS people try to beat it (in
           | the end I'm also looking for what should come next), but
           | mostly it was confirming the article and RDD. What they
           | didn't know about data warehousing they also didn't know
           | about performance or price-performance or selecting tools or
           | managing projects or vendors, so costs exploded.
           | 
           | These folks were hilarious because they kept insisting that
           | Vertica is not "modern", while it beat the pants off them
           | with basic columnstore stuff.
        
         | victor106 wrote:
         | Definite - seems interesting.
         | 
         | But what is the Definite warehouse? Is it built on open
         | standards?
        
           | mritchie712 wrote:
           | Yes, our warehouse is built on DuckDB and Iceberg
           | (https://iceberg.apache.org/). DuckDB is used as the query
           | engine and storage or smaller or static data and Iceberg is
           | used to store larger / more frequently updated data (e.g. CDC
           | from Postgres).
        
         | cmiles74 wrote:
         | In my opinion there are many tools and products in this space
         | and they all seem somewhat confused in their target audience
         | (is it marketed to management, analysts producing reports or
         | developers supporting the analysts?) The boundaries between
         | these projects is often fuzzy and they are often complicated
         | (does it include a scripting language?) When you are starting
         | from close to scratch with an application and it's backing
         | database and being asked to produce timely reports, I think
         | these tools aren't the best place to start.
         | 
         | My process has been to talk to stakeholders and sketch out
         | reports they find useful, preferably getting a set of data
         | together that many people find useful. Running these reports
         | against a read-only replica of production data is typically not
         | a big lift. If dashboards are required, write this data out to
         | another database to back the dashboard, probably on some set
         | schedule. It hasn't been long but now we have the bones of an
         | ETL process that is already returning value.
         | 
         | At that point I think these tools start to look more
         | compelling. Now we have a handle on the source data, what it
         | looks like and any places where we need to do something tricky
         | to connect the dots to get our data out. We know what the
         | reports and dashboards look like.
         | 
         | In short, we know what we need these tools to do and where they
         | can help us.
        
       | cletus wrote:
       | All technical problems are organizational problems. Put another
       | way: any technical problem is a symptom of an organizational
       | problem.
       | 
       | Even at Google, which has some truly amazing homegrown technical
       | infrastructure, you see what I called Promotion Driven
       | Development ("PDD"). I didn't see this when it came to core
       | technical infrastructure (eg storage, networking) but at a higher
       | level I saw many examples of something being replaced solely
       | (ultimately) because someobody wanted to get promoted and you
       | don't get promoted for maintaining the thing. You get promoted
       | for replacing the thing.
       | 
       | The most egregious example was someone getting promoted to
       | Principal Engineer (T8) for being the TL of something that was
       | meant to replace existing core infrastructure before it had even
       | shipped. In the end it didn't ship. The original thing is still
       | there. But wait, "we learned a lot".
       | 
       | So this happens because the organization rewards the new thing.
       | 
       | So why is your data infrastructure being replaced? Probably
       | because of an organizational failure and it'll have almost
       | nothing to do with technical aspects of that infrastructure. This
       | is true at least 90% of the time (IME).
       | 
       | Data infrastructure is particularly bad for this because in any
       | sufficiently large organization you will completely underestimate
       | the impact of changing data dependencies for metrics, dashboards,
       | monitoring, ML training and so on. Those things can be hard to
       | find and map out and generally you only find them when they
       | break. Sometimes they can break for _years_ before anyone notices
       | even when the thing is used by live production systems.
        
       | mkl95 wrote:
       | > The problem I find is that many data teams get thrown into
       | having to design data infrastructure with little experience
       | actually setting up one in the past. Don't get me wrong, we all
       | start with no experience. But it can be difficult to assess what
       | all the nuances of different tooling and designs can be.
       | 
       | I've been there. Companies can be cheap about training and I was
       | given none before building some sophisticated data stuff that
       | surprisingly worked, but probably could have been simpler.
       | 
       | I got a much better job soon after, and hopefully my replacement
       | got some training.
        
         | devjab wrote:
         | I think that one of the biggest issues is that a lot of
         | training is flat out horrible. The author preaches simplicity
         | and working directly with the business, but mean while you have
         | entire teams of developers being taught something like Clean
         | Architecture, SOLID, DRY and all sorts of fancy things that
         | when used poorly leads to extreme over-abstractions.
         | 
         | So even the most well meaning engineers with good training can
         | go about building data structures which won't last very long
         | once the key people leave. Simply because what they were taught
         | doesn't work.
        
       | moltar wrote:
       | An easy to maintain stack from my experience that almost anyone
       | can do:
       | 
       | - S3 for storage
       | 
       | - Glue catalog to describe / define source data shapes
       | 
       | - Athena to query the above
       | 
       | - dbt for business data modelling (has Athena and glue adapter)
       | 
       | The only difficult part I always struggle with is getting
       | partitioning right.
        
         | hipadev23 wrote:
         | Everytime I look at S3/Glue/Athena I can't help but feeling
         | like the Glue layer shouldn't be necessary and it's instead
         | just part of athena's ddl
        
           | ianburrell wrote:
           | Athena is query engine and can use multiple catalogs. It
           | forwards DDL queries to the catalog. Glue is the default
           | catalog.
        
         | OutOfHere wrote:
         | Is DBT really necessary? (serious question) If so, why? What
         | would go wrong by skipping it?
        
           | moltar wrote:
           | No, not necessary at all. You can write queries and CTEs and
           | create views in Athena/Glue by hand, if that's what you
           | prefer.
        
             | OutOfHere wrote:
             | I mean what does DBT offer me here that makes it
             | worthwhile?
        
       | pphysch wrote:
       | > Our really smart engineer working over time built amazing
       | custom infrastructure
       | 
       | > They quit and no one knows how it works
       | 
       | Either the infrastructure wasn't "amazing" in the first place or
       | clueless management is looking for a scapegoat.
       | 
       | "Amazing" is an interesting word choice because a non-technical
       | manager will be amazed by any blob of code. "Amazing" doesn't
       | mean a straightforward, robust solution to a difficult problem.
        
       | yobbo wrote:
       | Yes, but this article seems to be talking to a business that has
       | no competence in "data" outside of a handful random engineers
       | that have no stake in the business. The advice given amounts to
       | "avoid bad things".
       | 
       | Resume-driven engineering etc is the result of engineers with no
       | stake in the business future. Any solution must involve
       | incentives against "bad things" and favour "good things".
        
       | fifilura wrote:
       | Parquet+iceberg stored on s3 as a base. That is solid enough.
       | 
       | After that comes various kinds of caches, maybe postgres for
       | frontend? Or something streaming?
       | 
       | But once everything is stored as files you gain the freedom to
       | experiment or refactor.
        
       | fsndz wrote:
       | The problem is consultants selling bullshit as expertise are more
       | prevalent than honest consultants. And for these bullshit
       | consultant, selling the most unnecessarily complex solution with
       | all the trendy keywords and making beautiful slides is all that
       | counts. And what is funny is that customers believe all those
       | lies.
        
       | Terr_ wrote:
       | I've come to believe the opposite, promoting it as "Design for
       | Deletion."
       | 
       | I used to think I could make a wonderful work of art which
       | everyone will appreciate for the ages, crafted so that every
       | contingency is planned for, every need met... But nobody predicts
       | future needs that well. Someday whatever I make is going to be
       | That Stupid Thing to somebody, and they're going to be justified
       | demolishing the whole mess, no matter how proud I may feel about
       | it now.
       | 
       | So instead, put effort into making it _easy to remove_. This
       | often ends up reducing coupling, but--crucially--it 's not the
       | same as some enthusiastic young developer trying to decouple _all
       | the things_ through a meta-configurable framework. Sometimes a
       | tight coupling is better when it 's easier to reason about.
       | 
       | The question isn't whether You Ain't Gonna Need It, the question
       | is whether when you _do_ need it will so much have changed that
       | other design-aspects won 't be valid anymore. It also means a
       | level of trust towards (or helpless acceptance of) a future
       | steward of your code.
        
         | beardedetim wrote:
         | Cannot agree more to this sentiment. I call it "throw away
         | code" and it's always seemed like the easiest to change in the
         | future, and we all know everything is gonna change in the
         | future.
        
       ___________________________________________________________________
       (page generated 2024-08-11 23:00 UTC)