hngopher.com

       [HN Gopher] Building a data team at a mid-stage startup
       ___________________________________________________________________
        
       Building a data team at a mid-stage startup
        
       Author : squarecog
       Score  : 543 points
       Date   : 2021-07-08 21:04 UTC (1 days ago)
        
 (HTM) web link (erikbern.com)
 (TXT) w3m dump (erikbern.com)
        
       | gumby wrote:
       | Great article. The confusion about what team does what is
       | priceless...yet so common!
       | 
       | To provide some sympathy for the folks already working there: you
       | always replace systems well _after_ you 've overrun them.
       | 
       | When the ad hoc system works (consider that google spreadsheet at
       | a time when there were three support people and perhaps a dozen
       | customers) you're not going to decide to replace it with
       | something more complicated. Then you're busy growing so you just
       | keep the system going through sheer force of will. You only
       | replace it when the effort is unbearable; at that point you say,
       | frustratedly, "I wish we'd done this sooner."
        
       | cobertos wrote:
       | Part of me wonders what the long term of a transition like this
       | looks like. Would this company be able to keep its data
       | consumption healthy, or would it drive product changes that might
       | harm it's users or lead to dark patterns?
        
       | civilized wrote:
       | Wow, a story where things start out a mess and end up a lot
       | better! Can we write one of these for society too?
        
       | roystonvassey wrote:
       | This is a perfect encapsulation of my career as a data-guy square
       | peg in a round hole, filled with jargon and misplaced
       | understanding of data in general.
       | 
       | Despite all that you read and hear about data science advancing,
       | you'll be surprised to see how poorly leveraged, or worse,
       | billions of dollars are sought to implement the latest tool that
       | promises to change the world. Tech and data as we imagine it be
       | in the FAANG kind of companies is far different than how it is in
       | older industries. It's not just systems that need upgrading,
       | company cultures do and that's never an easy or fast process.
       | I've been in the data Analytics space for 16 years now and I
       | still feel, more often than not, I'm part of the minority,
       | working to demonstrate true data use-cases
        
       | jabagonuts wrote:
       | Really enjoyed this narrative, but what about the next phase?
       | Going from mid-stage to mature startup?
       | 
       | > Note that you took on a lot of "tech debt" earlier when you
       | started dumping the production database tables straight into the
       | data warehouse.
       | 
       | How do you manage expectations when the year-long honeymoon is
       | over, the business grows tremendously, and the centralized data
       | warehouse reaches a breaking point?
        
         | neighbour wrote:
         | Also thought this. Let's hope the author has a SQL in the works
         | as I am keen to hear more.
        
           | [deleted]
        
       | spicyramen wrote:
       | Can correlate, author is a truly a genius. We had a company
       | mandate to be ML first, we went through a lot of phases and so
       | many conversations happened as described in this amazing piece.
       | Thanks Erik
        
       | simonw wrote:
       | "This is basically a (somewhat cynical) depiction of things that
       | may happen at a lot of companies early in the data maturity
       | stage"
       | 
       | I don't think this is very cynical at all! Feels pretty accurate
       | to me.
        
       | IMTDb wrote:
       | What would be the name of the position/profile of someone in
       | charge of building the data warehousing architecture/ETL
       | pipelines?
       | 
       | I my view, they need make sure the warehouse model is a correct
       | representation of the business and that it can be leveraged to
       | answer basic or not-so-basic questions using SQL. They also need
       | to promote it's usage internally by ensuring it is accessible and
       | easy to use and guide other team to a more data oriented mindset.
       | 
       | I feel that this is a specialised position not exactly similar to
       | a developer, but every time I look for "data scientist" I get
       | guys that want to do machine learning prediction models, which is
       | not exactly the same stuff either.
        
         | edmundsauto wrote:
         | This is what data engineers do, although that is also used to
         | describe data ops (maintaining clusters, running kafka, etc.)
        
         | herodoturtle wrote:
         | You pretty much described my job in a nutshell, and they call
         | me "the database guy".
        
         | sischoel wrote:
         | What about "data engineer"? There seem to be a lot of jobs for
         | that title nowadays.
        
           | skrtskrt wrote:
           | Yeah we would call this Data Engineer (likely Senior level or
           | up for someone that has had experience building multiple data
           | warehouses) plus the DevOps/SRE work required to stitch all
           | the architecture together
        
         | sjg007 wrote:
         | The bigger issue is adaptability.. can you migrate schemas
         | preserving older clients, typically that's by providing a
         | decent middleware.... SQL views are one way, APIs are another
         | etc...
         | 
         | All of that while improving performance.
        
         | teej wrote:
         | A new role has arisen in the last few years that captures much
         | of this responsibility - Analytics Engineer.
         | 
         | This article by Claire Carroll describes the role and
         | motivation for it https://www.getdbt.com/what-is-analytics-
         | engineering/
        
         | tmp_anon_22 wrote:
         | Most common would be a DevOps or SRE on an observability team.
        
         | pram wrote:
         | I've done this for the past 6 years and my title was "Big Data
         | Infrastructure Engineer" but I don't think there's any
         | consistency at companies from what I've seen
        
         | Orou wrote:
         | I would also vote for "data engineer" (it's my current job
         | title).
         | 
         | You very likely don't want a data scientist to be doing a data
         | engineer's job (and they probably don't want to be doing it
         | themselves!). While there are similarities, data engineering
         | tends to be a lot closer to software development than data
         | science. If you're advertising for a data scientist role, don't
         | expect them to be happy if 80% of their job is writing ETL
         | scripts and cleaning datasets.
         | 
         | I think the reason there has been a flattening in data
         | scientist job growth more recently is that lots of companies
         | hired data scientists to build cool ML applications but had no
         | infrastructure in place to support advanced data analysis.
         | These companies didn't realize they needed to walk before they
         | could run, and that what they really wanted was data analysts
         | and engineers to build the foundation for a strong data science
         | function.
         | 
         | Tools like dbt have been great for advancing an ELT approach to
         | managing data pipelines, where modeling for BI tools, business
         | users, and data scientists alike can all happen in the
         | warehouse and ensure consistency in data usage across the
         | company.
        
           | dijksterhuis wrote:
           | Seconded.
           | 
           | I was a bit sad to not see any mention of a data engineer
           | anywhere in the article.
           | 
           | Like, if you gave me access to all the prod tables and the
           | warehouse I'd be having a whale of a time and (hopefully)
           | delivering enough business value to automate some of the more
           | regular "English to SQL" translations.
           | 
           | > You very likely don't want a data scientist to be doing a
           | data engineer's job.
           | 
           | 100%. This is one of those things that would make
           | "disgruntled ML people" in the article want to leave.
        
           | ramraj07 wrote:
           | The one issue is that the gamut of experience and ability in
           | a data engineer (and the salaries) is extremely wide, far
           | wider than I've seen for any other role. Hiring a good DE is
           | so hard!
        
           | sails wrote:
           | IMO data engineer roles are further subset into:
           | 
           | 1. kafka / streaming oriented software engineering
           | 
           | 2. data warehouse and ETL/ELT development for analytics
        
             | dijksterhuis wrote:
             | A good data engineer understands and can work with both of
             | these.
             | 
             | They're both "data in, data out" mental models that are
             | part of the Lambda architecture which every data engineer
             | should at least know about [0].
             | 
             | But if you want a specialist streaming person to optimise
             | all the streaming pipelines, then sure hire a specialist.
             | 
             | [0]: https://en.m.wikipedia.org/wiki/Lambda_architecture
        
           | rickeydidio wrote:
           | This is spot on. As someone who has been looking for a data
           | analyst role, I've actually read quite a few DS reqs that
           | were geared more towards infrastructure and ETL. Then the
           | flip side with the DE reqs wanting NumPy and Pandas along
           | with the infrastructure and ETL. Weird, right?
        
         | hobs wrote:
         | I currently do that job as a Data Architect - kind of a
         | mouthful lol but it covers the gamut of understanding the
         | entire business as an abstract set of data flows, being
         | responsible for the ingest and outflows of data, the level of
         | quality in our overarching system, managing data engineers,
         | developers, business folks all accessing said data, at the end
         | of the day explaining what it all means to our clients and devs
         | via standard modeling stuff and more targeted things as needed.
        
           | edmundsauto wrote:
           | You mention that you manage data engineers. Where does your
           | role not overlap w/ a data eng?
        
             | hobs wrote:
             | In our team its mostly a difference of business focus and
             | the overarching responsibility - most data engineers I work
             | with manage a major leg of the business and are responsible
             | for their domain but I am responsible for all of them.
             | 
             | I certainly spend time coding (especially because again,
             | small-medium startups cant afford anyone in the data space
             | who isnt able to heave ho) but much of it is translating
             | pretty vague stuff into market research/a proof of
             | concept/an initial design of what will bring value to the
             | business and scale alright and then often more people will
             | throw in.
             | 
             | That being said you can call me whatever you want, as long
             | as its not late for dinner :)
        
         | mjirv wrote:
         | Analytics Engineer is a clear one for this, as teej said.
         | 
         | The title is strongly associated with the dbt community, so it
         | could imply you're using dbt for your data modeling (not
         | necessarily a bad thing, as it sounds like it would be a good
         | tool for your use case).
        
         | marcinzm wrote:
         | You're mixing up two different tasks as I see it:
         | 
         | * Building/defining the data infrastructure
         | 
         | * Building/defining the schemas
         | 
         | In a traditional ETL infrastructure they are jumbled together
         | but if you do ELT they are not. A data engineer can build the
         | infrastructure but the transformations can be handled better by
         | technical analysts. They're simply one view on the underlying
         | data so the risk is minimal. Analysts query the data day in and
         | day out so they know much better what they need than someone
         | who doesn't.
        
       | czep wrote:
       | This is so eerily familiar I swear I've had many of these exact
       | conversations word for word. The only way this doesn't turn into
       | a complete nightmare of a cluster is if the exec team "gets it".
       | If so, you just might stand a chance at building a data team that
       | gels with the rest of the org.
       | 
       | But if the exec team simply hired you for window-dressing, expect
       | to be treated like a scapegoat and a punching bag. Any mistakes
       | will be your fault. Any wins will be to the credit of the
       | business. The Director of Product will ask to "embed" dedicated
       | DS headcount and you won't have any real power to shape the
       | roadmap. If the exec team doesn't give you equal footingf with
       | Product (or Marketing, Finance, and Eng for that matter) then
       | this will rapidly become a soul-sucking job. However, if E-team
       | does give you the authority to call Product's bullshit, and tell
       | Finance to stuff it, and not take direction from Eng leads, then
       | you actually might be able to accomplish something really cool.
        
         | WastingMyTime89 wrote:
         | > However, if E-team does give you the authority to call
         | Product's bullshit, and tell Finance to stuff it, and not take
         | direction from Eng leads, then you actually might be able to
         | accomplish something really cool.
         | 
         | So what's the business case for having a data team independent
         | of product, business and engineering?
         | 
         | Because as I see it the data team is a support function not q
         | core part of the business. I'm sure it can be cool for you but
         | if you are at odd with all the people actually creating value,
         | what exactly do you bring to the table?
        
           | higeorge13 wrote:
           | Engineering is building some schema, creates and uses
           | multiple data stores , message queues, etc, eventually the
           | queries do not longer work properly as the company scales and
           | gets more and larger customers and hundreds of other issues.
           | Doesn't engineering need a proper data engineering
           | team/dba/you name it to handle those?
        
         | marcinzm wrote:
         | In my experience much of this is a question of trust, political
         | capital and soft power. Find out the problems that the key
         | players in the business are actually having that you can solve
         | and then solve them. Find out what the key KPIs are for the
         | business and make a plan to improve them and then have a plan
         | to publicize that improvement. And make sure to hire a team
         | that covers your weaknesses rather than exposes them. Don't
         | fight people if you can help it, either they're as competent as
         | you on average or you shouldn't have taken the job. Figure out
         | how to help them and what they need to work more efficiently
         | and then give it to them. Sure there's a ton of politics
         | involved in all of that but that's management in general.
        
         | nwsm wrote:
         | This was my only complaint about this great article. The CEO
         | was innately "data-driven" which opened a lot of doors.
         | 
         | OTOH, if the execs don't have this priority, no one gets hired
         | to lead and scale a data team and the story never starts.
        
         | PragmaticPulp wrote:
         | This applies to most specialties. Companies tend to have a few
         | teams that lead the charge and expect everyone else to follow.
         | Knowing which teams get the authority and which teams are along
         | for the ride at a company is important for knowing what your
         | job experience will look like.
         | 
         | > However, if E-team does give you the authority to call
         | Product's bullshit, and tell Finance to stuff it, and not take
         | direction from Eng leads
         | 
         | I know this was meant partially in jest, but if you reach the
         | point where you're at odds with all of the teams and
         | departments in the company you may get a lot done in the short
         | term, but long term it's going to be difficult if you don't
         | have some allies in each of those departments. Obviously no one
         | should roll over and take orders from other departments, but
         | some times it's necessary to do some give and take to build
         | rapport. It's a balance, not a war.
        
           | czep wrote:
           | Thanks for the tips! One mantra I've tried when starting at a
           | new job is "for the first 3 months say yes to everything, for
           | the next 3 months say no to everything." The idea is you
           | first immerse yourself in everything, to find out what works
           | and what doesn't. Then you dedicate time to fix the broken
           | processes so that hopefully when you hit 6 months your team
           | is better positioned to be more efficient. Obviously you
           | can't be too rigid, but it seemed to work for me when I had
           | buy in. Curious if you think that approach sounds good.
        
             | PragmaticPulp wrote:
             | Good advice as long as you don't take it too literally.
             | 
             | The most important thing is to work closely with your
             | manager on expectations. If someone from another department
             | comes to you with a proposal, an ask, or a directive, you
             | don't want to say yes without first consulting with your
             | manager. Depending on company politics, some managers might
             | try to rope new employees into doing work that isn't
             | actually part of their job description.
             | 
             | Discovering expectations and then proactively managing
             | those expectations is key in any role.
        
               | tharkun__ wrote:
               | Very good advice. I've also seen this from new ICs
               | (incidentally from one of our new data guys). I bet he
               | said yes but he shouldn't have.
               | 
               | New guy, knows nothing about the company and product yet
               | but was asked to "get KPI X by end of day". He obviously
               | has no idea how to get this done so goes to various
               | people and throws around the "VP XYZ wants this by end of
               | day, help me now or else!".
               | 
               | Needless to say I, as politely as I could, told him to
               | shut it, look at his data and what he could get from it
               | and stop interrupting dev with mid day, two days after
               | start of a sprint, requests to do his work for him (dude
               | I don't even have access to your data storage, don't know
               | what data you have or don't etc). And do it by end of
               | day. Sure.
               | 
               | The guy is burned for me now. He will have to do a LOT of
               | sucking up to dev now for his try at "do my job for me or
               | else"
        
       | ttz wrote:
       | > MBA types
       | 
       | I chuckled. Then cried, because at least his MBA types can use
       | SQL. My MBA types use Excel.
       | 
       | OT: Good article. Like and agree with the push for centralizing
       | data first, then building outwards so external teams can move
       | towards self-service.
        
         | herodoturtle wrote:
         | I'm an MBA type that studied math and computer science, and for
         | a living programs distributed database solutions.
         | 
         | I chuckled too.
        
         | munk-a wrote:
         | Building a good process into your company to receive a query,
         | execute it against a read-only database, and shovel the results
         | back to the user as a CSV file will pay dividends and is,
         | honestly, pretty trivial in most cases.
        
           | ttz wrote:
           | Funnily enough, this is what I did, except I built an app
           | where I write the queries as "pre-built" parameterized ones
           | (sanitized, of course).
           | 
           | People still do a bunch of stuff in Excel, though, and every
           | once in a while, it breaks, and I have to dig through the
           | mess. Excel is great when it's just for yourself and you can
           | manage it... it's a pain when others have to figure out
           | someone else's.
        
           | jaggederest wrote:
           | Blazer is my go-to for this kind of thing:
           | 
           | https://github.com/ankane/blazer
           | 
           | Pretty easy to set up and share queries, dashboards, whatever
        
       | herodoturtle wrote:
       | For the last 15 years I've been building (what I consider to be)
       | accessible database solutions, for a bunch of different
       | industries.
       | 
       | This sentence from the article resonated with me:
       | 
       | > You're starting to lay the most basic foundation of what is
       | most critically needed: all the important data, in the same
       | place, easily queryable.
        
       | Artgor wrote:
       | When I had started reading this article, I had thought that it
       | would be a sad story about another startup failure. The blogpost
       | turned out to be a fascinating story of the success. I really
       | liked it.
       | 
       | But after I had finished reading it, I have realized that it is a
       | sad story, if we look from the eyes of data scientists in the
       | team. People were hired to do cool machine learning projects, but
       | it turned out there is no infrastructure for them. After the new
       | boss had arrived, they had to work as analysts for months. What
       | is more sad - the new boss dangled a carrot before them several
       | times, but each time the carrot disappeared.
        
       | AtNightWeCode wrote:
       | I really enjoyed reading this. Very well written. At companies I
       | worked teams can never read data from the DW btw.
       | 
       | My experience with A/B tests is that they are way overrated.
       | 
       | On the poor data quality. You sit on a product like a call
       | center. Frontend developers thinks it is an excellent idea to
       | store all data in some doc db blob. Then business wants stats
       | about number of calls based on users...
       | 
       | Be careful when putting tabular data into doc dbs.
        
       | tsrez wrote:
       | It's such an interesting and valuable article on building a data
       | team, esp. insightful for organisation starting out. Guess the
       | challenges in traditional/larger companies starting out a data
       | team might look slightly different.
        
       | correlator wrote:
       | Thank you for writing this. I personally just walked into a very
       | similar role and this rang really true. This article made me
       | realize how much more effort I need to put into the data culture
       | side of the role.
        
       | soumyadeb wrote:
       | Such a great read. Have been in this position in a large public
       | org. Over a year was spent just creating a catalog of what all
       | data the company has and figuring out how to pull them into a
       | data-warehouse
        
       | waynesonfire wrote:
       | TLDR, refine your thoughts.
        
         | oliv__ wrote:
         | Refine your mind
        
       | te_chris wrote:
       | This is a good write-up, but for the sort of insights they're
       | getting they're over staffed and overpaying. A combination of a
       | cloud dw (big query, e.g), cloud etl (stitch, fivetran) and dbt
       | for the T in ELT to build useful reporting tables, along with
       | some sort of sql based BI (mode, in our case), could deliver the
       | same insights for a fraction of the price. Throw in a sub to Heap
       | or similar for ad-hoc product analytics as a cherry on top.
       | 
       | I concede, of course, that they're rescuing a bad situation, not
       | starting from scratch, but still.
        
       | mindvirus wrote:
       | This is a wonderful article, thank you for sharing. I really like
       | the narrative of bringing people with you on the journey, and
       | celebrating the small wins that lead to a good long term outcome.
        
       | plaidfuji wrote:
       | So many gems in this article...
       | 
       | > You notice a a lot of the code starts with very complicated
       | preprocessing steps, where data has to be fetched from many
       | different systems. There appears to be several scripts that have
       | to be run manually in the right order to run some of these
       | things.
       | 
       | > "We need to focus on delivering business value as quickly as
       | possible", you say, but you add that "we might get back to the
       | machine learning stuff soon... let's see".
       | 
       | So so relatable. But the key insight is a really really key
       | insight.
       | 
       | > What I think makes most sense to push for is a centralization
       | the reporting structure, but keeping the work management
       | decentralized. Why? Primarily because it creates a much tighter
       | feedback loop between data and decisions. If every question has
       | to go through a central bottleneck, transaction costs will be
       | high. On the other hand, you don't want to decentralize the
       | management. Strong data people want to report into a manager who
       | understands data, not into a business person.
       | 
       | I have the same role at a non-software company, and to me this is
       | nothing short of a complete reimagining of IT. It's not just,
       | "make sure everyone's computer works and help them install
       | software," it's, "build a model of the business, determine what
       | information flows and metrics are crucial to success, and build
       | an IT and analysis infrastructure around that model." The CIO
       | will soon be better thought of as the Chief Optimization Officer.
        
       | plank_time wrote:
       | This is probably the singly best written and most realistic
       | article I've read on HN ever and I've been on HN for a long long
       | time. It's so realistic I wonder if the author took it from his
       | diary or something. Everything about it is supersaturated with
       | authenticity and teaches better than any other article I've read.
       | Kudos to the author, and I would love to see this style of
       | article take off.
        
         | maileslin wrote:
         | Erik is a legend in the modern data world. Wrote Luigi and
         | built Spotify's first recommendation engine. He has the ground-
         | level experience to lean on
        
         | alexpetralia wrote:
         | His post on Berkson's Paradox is excellent!
        
       | zippy5 wrote:
       | This was wonderfully written and if your gonna start a data team,
       | this is how you do it. But I can see that I'm the only one who
       | thought it was crazy to start a data team in the first place.
       | 
       | This company makes 10M and spends 3M on the team and
       | infrastructure to make data a core competency?
       | 
       | A vast majority of wins discussed were lowly differentiated web /
       | mobile / supply chain analytics which they could have gotten and
       | setup with 3rd party software for an order of magnitude cheaper.
       | 
       | I can only imagine what this hypothetical startup could have
       | learned if they spent that money actually talking to customers,
       | and running more experiments.
       | 
       | I've heard people talk about data as the new oil but for most
       | companies it's a lot closer uranium. Hard to find people who can
       | to handle / process it correctly, nontrivial security/liabilities
       | if PII is involved, expensive to store and a generally
       | underwhelming return on effort relative to the anticipated
       | utility.
       | 
       | My take away was that startups benefit tremendously from a data
       | advisor role to get the data competency, as well as the
       | educational and cultural benefits, but realistically the data
       | infrastructure and analytics at that scale should have been
       | bought not built. Obviously there are a couple of exceptions such
       | regulatory reasons like hippa compliance for which building in-
       | house can be the right choice if no vendor fits your use case.
        
         | roenxi wrote:
         | Having _unique_ data is quite valuable. If your organisation
         | can make decisions based on signals that other people can 't
         | detect then it can gain a decisive edge.
         | 
         | I do wonder at the anecdotes in this article though. In
         | businesses that I've seen, the data team is usually the biggest
         | impediment to a data-driven culture because they have databases
         | full of numbers and no real grasp of how that links to the
         | decision making process that makes the business money.
         | 
         | Beefing up the team doesn't help. In data, as in business more
         | generally, the important think is not trying to guess what job
         | your doing and spend a lot of time talking to customers about
         | what job they need done. If the data team is where that work
         | happens in a business then that can be helpful - but the grunt
         | work of SQL/reporting/basic analysis is almost never where the
         | value appears from.
        
         | chupchap wrote:
         | > it's a lot closer uranium
         | 
         | Love this analogy!
        
         | lifeisstillgood wrote:
         | As someone who reaches for code if they need to blow their
         | nose, what is a 3rd party vendor going to supply that a
         | "English-to-SQL translators" wont do?
         | 
         | (I have not finished the article, but the idea that devs / data
         | scientists can be replaced by some vendors makes me wonder what
         | I have missed)
         | 
         | Edit: Also love the Uranium quote :-)
        
           | zippy5 wrote:
           | So my assumption is that for a given business model, like
           | e-commerce or Saas business much of the highest value
           | analysis is fairly standardized and can be templated. For
           | example breaking down conversion rate by weekly cohort is
           | something that can be pretty easily be done in google
           | analytics.
           | 
           | The problem with English to sql translators or most coders in
           | general are the assumptions we make, in particular about the
           | underlying data. For example, say we want a join two tables,
           | so we write a query to join on two columns and often call it
           | correct which it is from a logical or schema perspective it
           | is. However, null values, defaults like 0, many to one
           | relationships vs one to one relationships, issues with
           | instrumentation such as networking timeouts or bot detection,
           | etc all can impact the down stream metrics. My point is that
           | when there are 500 lines of sql in a query such as those
           | mentioned the article, there's a lot of ways to be mostly
           | correct but to cumulatively be wrong.
           | 
           | Like many popular enough open source tools, 3rd party vendors
           | get battle tested, issues get found before you, and they can
           | justify devoting more resources to rigorously ensure
           | correctness than the average analyst has the time or energy
           | todo because their business depend on you trusting the
           | outputs.
           | 
           | I'm not saying you couldn't do all this yourself. But given
           | the sheer number of analytics tools that are reasonably
           | priced, you might have chosen to spend your time on something
           | more specialized like a recommendation system.
        
             | lifeisstillgood wrote:
             | can you point me at some of the vendors - I am missing a
             | chunk of knowledge i suspect.
             | 
             | Or is this - for exmaple - people taking google analytics
             | and producing analysis on top of that.?
        
               | somberi wrote:
               | +1. @Zippy - May I ask for some of the vendors you refer
               | to, please?
               | 
               | Also love the Uranium analogy.
        
               | jiaweihli wrote:
               | Highly recommend Heap [1] - they have a neat approach
               | that doesn't require you to 'decide' which analytics you
               | want to track ahead of time.
               | 
               | Disclaimer: I was an early engineer at Heap.
               | 
               | [1] https://heap.io/
        
               | Dyac wrote:
               | Heap might be good but they are crazy expensive. We were
               | quoted something like a quarter million dollars. Good
               | luck getting that signed off, plus you still need quite
               | technical analysts to run the thing.
               | 
               | I've found https://contentsquare.com/ to be much better
               | received by juniors and seniors alike, and it's a
               | fraction of the cost of heap.
        
               | lifeisstillgood wrote:
               | Ah, so these do do web analytics on users - ok. That
               | makes much more sense.
        
               | jiaweihli wrote:
               | I don't know the specifics of what you were quoted, but a
               | quarter million dollars (guessing per year?) does strike
               | me as high.
               | 
               | Were you a later-stage startup by chance? The price point
               | for pre-Series-C startups should be much, much lower.
        
               | tomrod wrote:
               | That's odd. Why would you charge more for a post-series C
               | startup or enterprise versus a pre-series C?
        
               | jiaweihli wrote:
               | That's generally how pricing works for SAAS products -
               | most later stage customers have stricter or more
               | customized needs. Think support SLAs, SSO, ACLs for their
               | employees, etc.
        
               | grvdrm wrote:
               | +2 on that! would love to know about what you think is
               | worth investigating @zippy5
        
         | fouc wrote:
         | > spends 3M on the team and infrastructure
         | 
         | You're making a pretty big assumption on cost of team &
         | infrastructure there. This company could have 100+ people with
         | that kind of revenue (I've worked at a company this size
         | before). The data team is only about 6 people. The cost of the
         | data team & infrastructure is likely less than $1M
        
       | GlennS wrote:
       | I liked this article, but I have two questions:
       | 
       | 1. Is it definitely a good idea to build a separate data team,
       | rather than embedding people with analytics knowledge in feature
       | teams?
       | 
       | Is it possible to do the latter, but still have end up with a
       | well-curated source-of-truth for your data?
       | 
       | 2. Is A/B testing and driving your business by metrics really a
       | good idea?
       | 
       | My (uninformed) impression is that data-driven is responsible for
       | rather a lot of rot:
       | 
       | - Extremely irritating websites.
       | 
       | - Businesses ignoring important things because they can't measure
       | them. (Financialisation, hand-in-hand with the MBA types the
       | author decries.)
        
         | alzaeem wrote:
         | I share the frustration with how many A/B testing driven
         | development processes end up. Leads to a very iterative process
         | with lots of small changes, rather than big bets. Also, trying
         | to get statistical significance from iterative changes when you
         | don't have a ton of data is problematic.
        
           | iamacyborg wrote:
           | I think that's just down to a lot of folks who think ab
           | testing is the answer to every problem not necessarily having
           | a background in maths or stats. I see it all the time in
           | marketing teams where people's are so conditioned to think of
           | testing as the default that they don't understand what
           | they're doing or why.
        
         | dijksterhuis wrote:
         | > Is it possible to do the latter, but still have end up with a
         | well-curated source-of-truth for your data?
         | 
         | It's important to get the core centralised data infrastructure
         | up and running (even if it's dirty af) as that helps with the
         | bulk of the data work.
         | 
         | The oft quoted not completely true but kinda true statistic is
         | that 70% of data work is finding, cleaning and storing the
         | data. Analysis and modelling is the easy bit.
         | 
         | You _could_ do it the other way around. Hire some data people
         | in each team and get them to meet up every once in a while.
         | 
         | But I'd wager the central data stuff that makes _everyone 's_
         | life easier will get pushed back behind the "urgent" team work
         | every time.
         | 
         | #ConwaysLaw
         | 
         | Edit: it's possible to do both btw. E.g. Have a bunch of
         | centralised data engineers that do the heavy lifting stuff.
         | With data scientist/analysts embedded in teams doing the fine
         | grained modelling stuff. It's not a binary choice (once things
         | are up and running).
         | 
         | > My (uninformed) impression is that data-driven is responsible
         | for rather a lot of rot.
         | 
         | I agree! I was talking to someone else (not a tech head) the
         | other week and realised why they hate tech so much... User
         | interfaces that just... Don't work.
         | 
         | Showed him a terminal cli and he went nuts over it.
         | 
         | Then again, we're two kinda weird ye olde "back in my day"
         | kinda people... So...
        
           | dgb23 wrote:
           | Interesting. I'm a bit of a hybrid, CLI/GUI user. There are
           | things that I find easier to to in a CLI (or with text in
           | general) and things were a GUI is more natural.
           | 
           | CLIs are finicky and force you to think in terms of text,
           | whether it is appropriate or not. GUIs can be more expressive
           | and haptic, but are typically very idiosyncratic and can get
           | in the way of things.
           | 
           | The data-driven approach to UI seems a bit crazy?
           | 
           | If I think about the problems of any UI, I think in terms of
           | communication, intent, learning, psychology and aesthetics.
           | All of those things are human to human or human to computer
           | related issues.
           | 
           | I think data-driven (as in statistical data derived from user
           | behavior) approaches are or can be useful in terms of "what"
           | to present, prioritize and so on. But much less so on "how",
           | because I think this should be based on experiences derived
           | from direct interaction and needs to be induced by
           | creativity.
           | 
           | And I mean creativity from both sides, the implementer _and_
           | the user. One thing that CLIs generally do better is to
           | provide composable tools within a adaptive and simple system
           | (pipes, text etc.), whereas it is hard to impossible to let
           | GUIs talk to eachother and compose them to a user tailored
           | whole.
           | 
           | I think we should empower "non-technical" users with the
           | freedoms and sound principles we have come to enjoy
           | ourselves, instead of letting statistical data dominate their
           | experience.
        
       | oliv__ wrote:
       | No snark implied but what a great ad for the author!
       | 
       | This was very fun to read, and an interesting window into the
       | processes and inner workings of a startup that size.
        
       | neighbour wrote:
       | Excellent article. For me, the timing couldn't be better as I am
       | about to step into a role not too dissimilar to the one described
       | in the piece. It will be interesting to see if I run into many of
       | the situations the author describes.
        
       | div3rs3 wrote:
       | Done well (like here), The Goal like storytelling, is both
       | educational and interesting.
        
       | nerdponx wrote:
       | This is an incredibly valuable writeup. Great job.
        
       ___________________________________________________________________
       (page generated 2021-07-09 23:03 UTC)