[HN Gopher] Launch HN: Credal.ai (YC W23) - Data Safety for Ente...
___________________________________________________________________
Launch HN: Credal.ai (YC W23) - Data Safety for Enterprise AI
Hi Hacker News! We're Ravin and Jack, the founders of Credal.ai
(https://www.credal.ai/). We provide a Chat UI and APIs that
enforce PII redaction, audit logging, and data access controls for
companies that want to use LLMs with their corporate data from
Google Docs, Slack, or Confluence. There's a demo video here:
https://www.loom.com/share/2b5409fd64464dc9b5b6277f2be4e90f?....
One big thing enterprises and businesses are worried about with
LLMs is "what's happening to my data"? The way we see it, there are
three big security and privacy barriers companies need to solve:
1. Controlling what data goes to whom: the basic stuff is just
putting controls in place around customer and employee PII, but it
can get trickier when you also want to be putting controls in place
around business secrets, so companies can ensure the Coca Cola
recipe doesn't accidentally leave the company. 2. Visibility:
Enterprise IT wants to know exactly what data was shared by whom,
when, at what time, and what the model responded with (not to
mention how much the request cost!). Each provider gives you a
piece of the puzzle in their dashboard, but getting all this
visibility per request from either of the main providers currently
requires writing code yourself. 3. Access Controls: Enterprises
have lots of documents that for whatever reason cannot be shared
internally to everyone. So how do I make sure employees can use AI
with this stuff, without compromising the sensitivity of the data?
Typically this pain is something that is felt most acutely by
Enterprise IT, but also of course by the developers and business
people who get told not to build the great stuff they can envision.
We think it's critical to solve these issues since the more
visibility and control we can give Enterprise IT about how data is
used, the more we can actually build on top of these APIs and start
applying some of the awesome capabilities of the foundation models
across every business problem. You can easily grab data from
sources like Google Docs via their APIs, but for production use
cases, you have to respect the permissions on each Google Doc,
Confluence Page, Slack channel etc. This gets tricky when these
systems combine some permissions defined totally inside their
product, with permissions that are inherited from the company's SSO
provider (often Okta or Azure AD). Respecting all these permissions
becomes both hard and vital as the number of employees and tools
accessing the data grows. The current state of the art is to use a
vector database like Pinecone, Milvus, or Chroma, integrate your
internal data with those systems, and then when a user asks a
question, dynamically figure out which bits are relevant to the
user's question and send those to the AI as part of the prompt. We
handle all this automatically for you (using Milvus for now, which
we host ourselves), including the point and click connectors for
your data (Google Docs/Sheets, Slack, Confluence with many more
coming soon). You can use that data through our UI already and
we're in the process of adding this search functionality to the API
as well. There's other schlep work that devs would rather not
worry about: building out request level audit logs, staying on top
of the rapidly changing API formats from these providers,
implementing failover for when these heavily overburdened APIs go
down etc, We think individual devs should not have to do these
themselves, but the foundation model providers are unlikely to
provide consistent, customer centric approaches for them. The PII
detection piece in some ways is the easiest - there are a lot of
good open source models for doing this, and companies using Azure
OpenAI and AWS Bedrock seem less concerned with it anyway. We
expect that the emphasis companies place on the redactions we
provide may actually go down over time, while the emphasis on
unified, consistent audit logging and data access controls will
increase. Right now we have three plans: a free tier (which is
admittedly very limited but intended to give you a feel for the
product), the business plan which starts at $500pm which gets you
access to the data integration as well as the most powerful models
like GPT 4 32k, Anthropic 100k etc, and an enterprise plan which
starts at $5000pm, which is a scaled up version of the business
tier and lets you go on-prem (more details on each plan are on the
website). You can try the free tier self-serve, but we haven't yet
built out fully self service onboarding for the paid plans so for
now it is a "book a meeting" button, apologies! (But it only takes
5 minutes and if you want it, we can fully onboard you in the
meeting itself). When Jack and I started Credal, we actually set
out to solve a different problem: an 'AI Chief of Staff' that could
read your documents and task trackers, and guide your strategic
decision making. We knew that data security was going to be a
critical problem for enterprises. Jack and I were both deep in the
Enterprise Data Security + AI space before Credal, so we naturally
took a security first approach to building out our AI Chief of
Staff. But in reality, when we started showing the product to
customers, we learned pretty fast that the 'Chief of Staff'
features were at best nice to have, and the security features were
what they were actually excited by. So we stripped the product back
to basics, and built out the thing our customers actually needed.
Since then we've signed a bunch of customers and thousands of
users, which has been really exciting. Now that our product is
concretely helping a bunch of people at work, is SOC 2 T1
Compliant, and is ready for anyone to just walk up and use, we're
super excited to share it with the Hacker News community, which
Jack and I have been avid readers of for a decade now. It's still a
very early product (the private beta opened in March), but we can't
wait to get your feedback and see how we can make it even better!
Author : r_thambapillai
Score : 71 points
Date : 2023-06-14 14:26 UTC (8 hours ago)
| rishsriv wrote:
| This is so sorely needed. I used the app after the PH launch and
| loved how easy the self-serve was!
|
| Do you have plans to let users define "types" of data that can be
| redacted (like monetary terms in a contract, code embedded in
| documents etc)? Also, any plans on making this an API that other
| developers could build on top of?
| r_thambapillai wrote:
| Great questions and thanks for trying the product!!
|
| Yup so a few thoughts here - we're exploring using embeddings
| to allow a description of what you want to hide, that will
| actually immediately show you what of the data you have synced
| already would be caught by that (or which previous requests).
|
| On the API side: yes ABSOLUTELY! the API is already live and
| used intensely by some of our startup customers like Sourceful.
| The API docs for using the OpenAI models here:
| https://credalai.notion.site/OpenAI-Drop-in-API-0ef7cfd18a7c...
|
| and the Anthropic models here:
| https://credalai.notion.site/Anthropic-Drop-In-API-ad298f6f7...
| mritchie712 wrote:
| Going thru the SOC2 process myself[0].
|
| As I expected, we're hearing from customers they won't use a
| product that passes the contents of their database tables into an
| AI model (although some AI products are doing this). So the
| problem Credal is solving makes sense. Have you considered
| building an open source Python package for solving just this bit
| of the problem?
|
| Any tips on the SOC2? Did you use something like Drata / Vanta?
|
| 0 - https://www.definite.app/
| debarshri wrote:
| Apologies for the shameless plugin. I generally don't do this,
| but I just thought our product might be relevant for the use
| case you mentioned. We do not compete with Credal, but at
| Adaptive [1], we have been building a platform that helps with
| infrastructure access management and allows users to
| automatically generate and collect evidence, especially for CC5
| and CC6 (logical access). Vendor security questionnaires become
| easy to answer when we, as an organisation, use our product.
|
| We have seen reproducibility and access auditability in
| organisations that adopt products that access schema and
| metadata from databases, compute infrastructures, etc. comforts
| customers. Your customers care about security incidents like
| unauthorised access, privilege abuse, accidental operations,
| insider threats, etc. on the vendor's side which in my opinion
| are real threats.
|
| [1] http://adaptive.live
| r_thambapillai wrote:
| Thanks!! There are some fairly good OS models for the core
| stuff (PII, SSNs etc) out there already (Presidio, Spacey), so
| folks that need an OS option have one to start with. Detecting
| the more complex stuff can sometimes need a little iteration,
| but I could definitely imagine a world where we publish that in
| the future
|
| On SOC 2, we used Drata, and spoke to Vanta, Laika and a few
| others. The price Vanta initially quoted us was waaaay higher
| than the other two, and between Laika and Drata we went with
| Drata mostly because there seemed to be more automation in
| Drata. In the end, the Drata live support was _incredible_ and
| hard to imagine how we would have gotten the certification so
| fast without. We started our infra on DO, and so the most
| painful part of SOC 2 for us was the migration we did to AWS to
| take advantage of AWS ' many security features. My main advice
| would be take full use of the Drata live support (I'd guess
| Vanta have something similar), but maybe on a deeper level -
| when you're doing SOC 2, don't focus on the certification:
| focus on the policies and technology that actually makes your
| company secure. In the end, that's what enterprises really care
| about, especially for the ones that have given us 300 question
| long questionnaires!
| mritchie712 wrote:
| Nice! How long did it take end-to-end to get the SOC2 Type 1?
| r_thambapillai wrote:
| Our AWS migration wound up taking about 4 weeks, getting
| all the policies in place took about 8 weeks (which
| overlapped with about 2 weeks of the migration), and then
| the audit itself was a couple weeks as well
| pseudonymouspat wrote:
| Go Ravin and Jack! We're not at sufficient scale to really get
| use of this product, but would love to try it down the road. Are
| you using Foundry for data integration and ACLs?
| r_thambapillai wrote:
| Stay tuned!! Right now no, but Foundry is definitely under
| consideration
| jackfischer wrote:
| Thank you! We haven't gone down the Foundry route yet. We do
| have some smaller scale apps and companies using Credal either
| as their AI API or chat platform respectively - would be
| interested to hear a bit about your use case and see if it's a
| match?
| pseudonymouspat wrote:
| Word- we're in the thick of it but I'll reach out once we're
| ready to start thinking through bringing in chat.
| aiappreciator wrote:
| This seems like a very promising product, a ChatGPT interface,
| enterprise version, is a large gap in the market.
|
| However, this part in your advertising, sounds very
| dubious:"Credal can be deployed fully on-premise for Large
| Enterprises, including the large language models themselves"
|
| What do you mean the LLMs themselves? Open source I can
| understand, how are you going to move GPT-4 to on prem? OpenAI is
| not giving you the weights.
| r_thambapillai wrote:
| Thanks for your encouragement and that is a totally fair
| criticism - when we say that, we mean two things:
|
| 1. We support using Credal with your own, open source LLM,
| which can of course be fully on prem in every sense
|
| 2. We also support using Credal with your own Azure OpenAI
| instance. As you say, OpenAI aren't giving us the weights, but
| many of our customers have procured Azure OpenAI from Microsoft
| and then we point GPT 4 usage at their Azure instance, meaning
| that the data never goes to Open AI at all.
|
| One of the things that's going to be really interesting to see
| moving forward is whether the open source models are going to
| be able to compete with the blistering pace and funding that
| the closed source ones - Bard, Claude, and GPT-X are going to
| be able to attract (and maybe Mistral?). For the sake of the
| industry, I really hope that the OS models catch up but given
| the amount of funding (and now in OpenAI's case, revenue) the
| closed source models are generating, its hard to see how that
| happens
| alokjnv10 wrote:
| That look what i need for data privacy for my chat with pdf tool
| Documind. https://documind.chat
| r_thambapillai wrote:
| Nice! Which AI model are you using for it? If you're using
| ChatGPT, you can actually use our ChatGPT API and get the PII
| redaction for free, with hopefully hardly any code changes
| javidlakha wrote:
| Congratulations on the launch! (I'm a beta user.) Audit trails,
| access controls and security certifications are big headaches
| when developing in regulated industries. Having these already set
| up has made it easier for us to experiment with and build on LLM
| APIs.
| jackfischer wrote:
| Many thanks! There is a lot of opportunity for LLMs lurking in
| regulated industries right now - glad to have given you a
| boost!
| deet wrote:
| Looks awesome and will make many enterprises feel more
| comfortable using AI.
|
| I suspect your intuition about moving emphasis from redaction to
| unified access control and audit logging over time is right.
|
| The "AI Chief of Staff" sounds interesting though -- can you
| share a bit more about what you showed to companies and received
| lukewarm response to?
| jackfischer wrote:
| The AI Chief of Staff had a few layers. The first was data
| integration of both productivity data (slack, notion etc) and
| "big data" lakes/warehouses. The former tells you what is
| getting done at a human level and the latter has the potential
| to tell you whether and how it is working. The second layer was
| modeling of your business strategy and including dependencies
| between concepts like projects and teams, which allows us to
| back out things like stakeholders and early warning recipients
| for any given progress or problems. The third was a
| presentation layer allowing humans to get a birds' eye view of
| what's happening including generating artifacts like meetings
| decks.
|
| Ultimately this 1) wasn't successfully solving an urgent enough
| problem for most businesses and 2) was too difficult to adopt.
|
| LLMs do break open opportunities in this space so I expect to
| see some more versions of this, perhaps on top of the Credal
| API!
| gaut wrote:
| This looks awesome. Congrats on the launch!
| r_thambapillai wrote:
| Thanks! :) It feels so surreal to be launching on Hacker News!
| When I was first discovering tech the people launching YC
| funded startups on HN seemed like wisened old Gods to me. Now I
| laugh about it because obviously I'm still learning so much,
| even the basics, every day. I hope we get to inspire someone
| else the way the early YC cos inspired me
| kriro wrote:
| Congrats, I think this will be really successfull and you got a
| very early food in the door.
|
| Do you consider self hosted LLMs a competitor of sorts? I suppose
| your premise is if a company uses Google Docs they will also
| likely never host internal LLMs, right?
| r_thambapillai wrote:
| Thanks so much! So about half of our enterprise customers use
| Credal in conjunction with either a self hosted LLM or Azure
| OpenAI (which you can debate, but most companies we've spoken
| to seem to treat their Azure OpenAI instance as equivalent to
| self hosted). In practice, you still need to:
|
| 1. Manage the permissions of making sure the self hosted LLM is
| only reading from the documents, slack channels, e.t.c. that
| the end user should actually have access to
|
| 2. Generate an audit log of exactly who did what when
|
| So we actually see self hosted LLMs being a big part of how
| Credal is used! In the long term, we think Credal will actually
| become a tool for AI app developers to safely request access to
| data & embeddings from the enterprise on the fly, and make sure
| the data they get is appropriately controlled and the audit
| logs exist in a single place for the enterprise to see what
| data went to whom/when/why etc
___________________________________________________________________
(page generated 2023-06-14 23:00 UTC)