hngopher.com

       [HN Gopher] Launch HN: Secoda (YC S21) - Searchable Company Data
       ___________________________________________________________________
        
       Launch HN: Secoda (YC S21) - Searchable Company Data
        
       Hey HN, we're Etai and Andrew, and together with our team, we're
       building Secoda (https://www.secoda.co). We do search (that
       actually works!) for your company data. The name Secoda stands for
       "Searchable Company Data".  As a company grows, so does its data.
       Tables, metrics, queries, and dashboards often become isolated and
       are difficult to find. Even with great practices, organizations
       still struggle to get value out of their data - up to 73% of all
       enterprise data goes unused. One of the big contributors to this
       problem is that organizations create data silos by not documenting
       and centralizing their data knowledge in a single place where every
       employee can access information about data.  Andrew and I
       experienced this problem first hand at the last company we worked
       at. Andrew led the Product team and I led the Operations team and
       found that it was extremely difficult to find, understand and use
       data without looping in someone on the data team to help. The
       problem was that we only had 1 employee on the data team who
       supported over 100 employees asking questions about how to find and
       use company data, which meant that it would take around 2 weeks to
       get an answer to any data request.  Other data management tools
       focus on listing all data resources, regardless of their relevance
       or accuracy - you generally just get a list of what's available,
       but not in a form that's very meaningful. We adopted some of these
       tools in our last jobs but found that they created an overwhelming
       index of too many tables, dashboards and queries that weren't
       relevant to most employees. This meant that even after adopting a
       tool to solve the problem, most employees still couldn't use them
       to find, understand and use data.  Our approach to solving this
       problem is to build Secoda as a tool that helps data teams curate
       metadata for less technical employees. Instead of listing every
       resource, data teams can use our tool to curate and document data
       for specific departments or roles. As a result, employees who are
       less familiar with data will not be overloaded by information that
       is irrelevant or too technical. Our goal is basically to be like
       Google search for in-company data. You enter what you need and you
       get back the relevant information. We integrate into databases,
       data warehouses, BI, and transformation tools and offer both an on-
       prem and cloud-hosted deployment.  Over the last six months, our
       team has been improving our product closely with our early adopters
       to build a better product. Today, we're excited to share the launch
       of our self-service product with the HN community. You can now sign
       up to Secoda, connect your database or data warehouse and start
       using Secoda without a sales call. We offer a free 14-day trial (no
       credit card required). After the free trial, we charge per editor,
       per month. If you'd like, you can also take a look at this video of
       us setting up our Secoda workspace:
       https://www.loom.com/share/f41b317441554a36930b9cfe4c91a45f.  We're
       also hiring for a number of roles, which you can find here:
       https://www.workatastartup.com/companies/secoda.  We'd love to hear
       about your experiences with data discovery and any
       ideas/feedback/questions you might have about what we're building!
        
       Author : Etai
       Score  : 80 points
       Date   : 2021-10-29 13:24 UTC (9 hours ago)
        
       | dmolot wrote:
       | We always have a hard time figuring out how to define our source
       | of truth and which tables/graphs to use for measuring our KPIs.
       | Definitely going to reach out about using Secoda... would save us
       | a lot of number wrangling during our weekly syncs.
        
         | Etai wrote:
         | Looking forward to showing you around the product! :)
        
       | bnj wrote:
       | This would go a long way to addressing one of the key needs that
       | we've been planning around: a central library to manage all the
       | different documents and datasets that are accumulated by
       | different teams.
       | 
       | We've sketched out an initial solution which looks a lot like
       | Secoda, except focused on csv files, the concept being to check
       | csv data sets into the library, add metadata, and then define how
       | to bridge it into the central data store.
       | 
       | I'll dig further into the website, it looks like you've done a
       | lot of good work avoiding repeatedly addressing the same
       | questions!
        
         | Etai wrote:
         | Ideally, no team has to answer the same question twice once
         | they start using Secoda. In reality, there's times when are
         | question looks similar but is defined differently. We're trying
         | to suggest resources to people who ask questions so they become
         | more self service. Similar to Intercom knowledge hub for data.
        
       | gregdoesit wrote:
       | A very cool approach. When I worked at Uber, there were internal
       | tools to query data sources. In practice, I found few people
       | outside data scientists knew how to query the data, or know what
       | was available. I was an engineer and barely used this tool, until
       | a peer DS showed me how to do this. Even then I was overwhelmed
       | by the number of tables to join, knowing what data source
       | contained what data, or knowing which tables I had access to
       | query or join on.
       | 
       | A few questions:
       | 
       | 1. How do you go about permission? This was a major question at
       | Uber (where permission were put in place early enough).
       | Especially with GDPR and other regulations, you cannot have
       | anyone access any data.
       | 
       | 2. What about PII? Some data needs to be stored, but cannot be
       | viewed except for very, very few people and with a strong audit
       | tail. This is a more specialized case for #1.
       | 
       | 3. How do you see the tool "spread" the most within companies? I
       | would assume that easy sharing is how people learn about this,
       | then try it themselves... but would love to hear what you
       | actually see.
        
         | Etai wrote:
         | 1. How do you go about permission?
         | 
         | We have pretty advanced RBAC in Secoda. You can make anyone a
         | viewer, guest, admin or editor in the workspace. Viewers and
         | Editors are only able to see the information. Secondly, we
         | allow you to create "groups" for different functions in the
         | organizations (ie. marketing, sales etc.). You can choose to
         | share any resource with a specific user or group. This works
         | similar to the RBAC that Notion uses, which only means that the
         | right people are seeing the right information in Secoda.
         | Lastly, we allow data teams to create "collections" of
         | information, which can be shared with specific groups or
         | specific users. Without sounding bias, I think this is where
         | Secoda excels as a product.
         | 
         | 2. What about PII? Some data needs to be stored, but cannot be
         | viewed except for very, very few people and with a strong audit
         | tail. This is a more specialized case for #1.
         | 
         | We have an ability to auto tag PII on a table and column level.
         | Any PII data won't be viewable without permission from the
         | admin.
         | 
         | 3. How do you see the tool "spread" the most within companies?
         | I would assume that easy sharing is how people learn about
         | this, then try it themselves... but would love to hear what you
         | actually see.
         | 
         | Usually the Slack integration is the best way to spread Secoda.
         | With our Slack integration, any employee can search for
         | information by pressing /secoda in Slack. You can also push
         | information from Slack to Secoda and vice versa. This exposes
         | Secoda to new employees in the place they work.
        
       | mritchie712 wrote:
       | How is this different than ThoughtSpot?
        
         | andrewmcewen wrote:
         | Secoda is different from ThoughtSpot in a few ways:
         | 
         | 1. It is likely that you'll need to setup a 1:1 meeting with a
         | ThoughtSpot expert to help get your company up and running on
         | the software. Our goal with Secoda is to be a self-service
         | platform that is designed for any company, large or small, to
         | get their data knowledge base setup in 5 minutes.
         | 
         | 2. ThoughtSpot's price point is typically in the six figure
         | range, which is much higher than Secoda's price point that
         | starts at $29/editor on the platform.
         | 
         | 3. Thoughtspot's core focus is providing answers to data
         | questions through visualizations. Secoda takes a more
         | comprehensive approach to documenting data knowledge. In
         | addition to having visualizations that help answer questions,
         | we also provide a shared data dictionary for defining metrics,
         | as well as a catalog that can store tables, dashboards, jobs,
         | and many other data resources.
        
       | Grimm1 wrote:
       | Congrats on launching!
       | 
       | So how do you compare to a Data Catalog like datahub?
       | https://datahubproject.io/
       | 
       | From the video you looked very similar to them as a metadata
       | consumer and they provide extensive API integrations so you can
       | add basically any set of metadata you want including slack, jira
       | etc. They're also offering a hosted version.
       | 
       | Their metadata is indexed into a tuneable ES cluster so you can
       | fiddle with relevance etc to your hearts content.
       | 
       | What's your big differentiator?
        
         | andrewmcewen wrote:
         | Thank you! Secoda is different from DataHub in a few ways:
         | 
         | 1. If you're using the DataHub open source solution it requires
         | a data engineer to get the platform up and running and
         | maintained, which can be a fairly expensive cost depending on
         | the salary of the data engineer. Secoda has 15+ no code
         | integrations that can be setup in 5 minutes and is a fully
         | managed solution. We are releasing a metadata API that will be
         | available before the end of the year, in case an organization
         | is using a product that we do not currently integrate with.
         | 
         | 2. Acryl (managed version of DataHub) is mainly focused on the
         | data catalog, which they do a great job for. However, they
         | don't provide the questions, dictionary, and visualization
         | components that we provide in addition to the catalog. These
         | additional components of the product add more context around
         | data knowledge, and are also focused on helping non-technical
         | users understand company data. Whereas the data catalog is
         | focused more on helping technical data users understand company
         | data.
         | 
         | 3. Also if you're using Acryl, you'll have to get in touch with
         | their team to get a demo of the product. For Secoda, you can
         | signup at https://app.secoda.co and try out a free trial of the
         | product without having to talk with our team. We do offer demos
         | if people are interested though.
        
           | mythbuster wrote:
           | Hey Etai and team, Congrats on the launch! I'm so glad to see
           | several teams trying to tackle this hard problem of
           | complexity in the modern data stack.
           | 
           | I'm Shirshanka, the founder of the DataHub project,
           | occasional responder to HN threads and reachable at
           | https://slack.datahubproject.io :)
           | 
           | I wanted to respond to some of the text here since DataHub
           | and Acryl Data was directly mentioned.
           | 
           | 1. We've heard repeatedly from the community that DataHub
           | quickstart just works in 5 mins or less (besides a current
           | known issue with M1: thanks Apple!). Once people are able to
           | show value with the quickstart and the pre-packaged
           | connectors that connect upto 20+ systems, they quickly move
           | towards a deployment model based on helm, that is open source
           | and maintained by the Acryl team. All of this requires no
           | code. Deploying DataHub using the provided helm charts is
           | also quite easy based on what we're hearing from the
           | community.
           | 
           | 2. Acryl Data is reimagining what a data catalog can do, data
           | discovery, data observability and federated data governance.
           | We believe that techniques like semantic knowledge graphs are
           | only useful and reliable if they are built on top of a live
           | and fresh operational metadata graph. Also we see ourselves
           | not just as an "end user tool", but as a central fabric
           | through which metadata is stored, and transformed before
           | integrating in other tools. As a result we are intentionally,
           | API-first and stream-first.
           | 
           | 3. We already offer the open source DataHub demo at
           | https://demo.datahubproject.io. People talk to Acryl Data
           | after they have already tried out the open source product and
           | they are looking for a managed version that has more to
           | offer.
        
       | applgo443 wrote:
       | Congrats on the launch, and good luck!
       | 
       | How is it different from Glean? https://www.glean.com
        
       | ramish94 wrote:
       | As you may already know, integrations are the heart and soul for
       | products like this. I'm assuming you're already being bombarded
       | by potential/current users asking "when will you have integration
       | X?".
       | 
       | What is your strategy to scale out & maintain integrations?
       | Speaking from experience, it's not something that is easy to
       | scale out unless you have a dedicated team whose job is to build
       | them out, or you have some third-party provider like CData
       | providing OOTB connectors for your product.
       | 
       | (On a side note, this looks fantastic. Are you hiring any product
       | folks per chance? I have significant experience tackling this
       | same problem).
        
         | Etai wrote:
         | You're spot on. One thing we're doing to deal with the long
         | tail of integrations is opening up an API to customers.
         | 
         | We're also considering open sourcing that part of the product,
         | but haven't made a firm decision on that yet. Would love to
         | chat if you're open to it. We're definitely looking for people
         | in product. Feel free to send me an email to etai@secoda.co
        
       | rememberlenny wrote:
       | Looks great and reminds me of Moma at Google.
        
       | mona_rakibe wrote:
       | Very nice ! congrats - pretty awesome tool
        
         | Etai wrote:
         | Thanks!
        
       | ccleve wrote:
       | Enterprise search is a hard problem, but this looks pretty slick.
       | 
       | What's your tech stack?
       | 
       | Did you create the integrations from scratch, or use something
       | like Zapier?
        
         | Etai wrote:
         | We use react, flask, neo4j, elastic search and it's all hosted
         | in AWS. Integrations are from scratch
        
       | nchudleigh wrote:
       | We have exactly this problem (and have tried to solve it in
       | various ways) and launched Secoda internally at PartnerStack a
       | few days ago.
       | 
       | Excited to see it roll-out in the org and build a solid data
       | knowledge base.
        
         | djbusby wrote:
         | Excited for your update after a few months of usage.
        
         | Etai wrote:
         | Wicked! Super excited to having you build the knowledge base at
         | Partnerstack with Secoda
        
       | coderintherye wrote:
       | We've been using Secoda since the Meet the Batch thread. It's
       | come a long way already since then. Great product! Hard part is
       | still getting the team to use it, but we've been making inroads.
        
         | dang wrote:
         | That thread was at
         | https://news.ycombinator.com/item?id=28156461, for those
         | curious.
        
       | slotrans wrote:
       | Wait, is it curated, or is it search? To me "search" implies that
       | the tool _discovers_ my stuff and makes it searchable. If I have
       | to tell it about my stuff ("curation"), then that's just a
       | metadata catalog.
       | 
       | The reason the distinction matters is that if it's curation-
       | based, the onus is still on the data team to document all
       | relevant assets, which they could already do, and have already
       | demonstrated they don't want to.
       | 
       | Now, it could still be a _good_ metadata catalog! Most of what 's
       | out there is bad. But if that's what you're shooting for,
       | pitching it as "search" will be confusing.
        
         | andrewmcewen wrote:
         | That's a great point, we think of Secoda as _both_ a search and
         | curation tool. The search portion of the tool is accomplished
         | through the no-code integrations. When you connect Snowflake,
         | for example, we extract metadata about the tables in Snowflake
         | such as the columns, descriptions, number of queries run,
         | people who are frequently using that table, etc. All that
         | information becomes searchable in the catalog. After a company
         | integrates all of their data sources, we see that teams
         | leverage the curation capabilities of the product. Editors can
         | add documentation through our documentation editor to provide
         | additional context about the data resources that are
         | discoverable in Secoda. In addition, teams can add shared
         | definitions of metrics, answer questions that are asked by data
         | consumers, and create ad hoc analyses that are also
         | discoverable.
        
       ___________________________________________________________________
       (page generated 2021-10-29 23:00 UTC)