[HN Gopher] Employees are feeding sensitive data to ChatGPT, rai...
___________________________________________________________________
Employees are feeding sensitive data to ChatGPT, raising security
fears
Author : taubek
Score : 280 points
Date : 2023-03-27 18:32 UTC (4 hours ago)
(HTM) web link (www.darkreading.com)
(TXT) w3m dump (www.darkreading.com)
| danielmarkbruce wrote:
| Seems like a temporary problem. Surely OpenAI will have a version
| which runs in a customers public cloud VPC, orchestrated by
| OpenAI.
| VLM wrote:
| I believe there were FUD pieces like this when internet search
| engines were rolled out, and again when social media became
| popular. I suppose its universal for new technologies.
|
| I had an interview awhile ago at a place where during the phone
| screen "they can't talk about their tech stack in detail" so I
| looked on linkedin and figured out their entire tech stack before
| on the onsite interview. Come on guys, according to linkedin, you
| have an entire department of people doing AWS with Terraform and
| Ansible, you don't have to pretend you can't say it in public.
| Mizoguchi wrote:
| "In one case, an executive cut and pasted the firm's 2023
| strategy document into ChatGPT and asked it to create a
| PowerPoint deck."
|
| There's really not much you can do here. This is complete lack of
| very basic common sense. Having someone like this in your
| business, particularly at the executive level, is a liability
| regardless of ChatGPT.
| phineyes wrote:
| Let's not forget that we're also feeding in all our code into
| OpenAI Codex.
| nodesocket wrote:
| Which I assume also means sensitive code that may be .gitignore
| but being pushed up to OpenAI. I.E. secrets, passwords, api
| keys.
| buzzscale wrote:
| This is one of the reasons Databricks created Dolly, a slim LLM
| that unlocks the magic of ChatGPT. A homegrown LLM that can tap
| into/query the datasets of all the data in an organizations Data
| Lakehouse will be hugely powerful.
|
| I am working with customers that are looking to train a homegrown
| LLM that they host and have blocked access to ChatGPT.
|
| https://www.datanami.com/2023/03/24/databricks-bucks-the-her...
|
| https://news.ycombinator.com/item?id=35288063
| ngngngng wrote:
| This reads like you had an LLM write an ad for you
| cloudking wrote:
| ChatGPT Business Edition seems pretty obvious and I'd surprised
| if OpenAI isn't already working on it. Separate models for each
| customer, data silos and protection. The infra is already there
| on Azure.
| repler wrote:
| It actually is on Azure, exactly as you described.
|
| https://learn.microsoft.com/en-us/azure/cognitive-services/o...
| cloudking wrote:
| Yep, they just need to provide a business specific frontend
| chat UI.
| Rzor wrote:
| First thing that went through my mind as I read the headline was
| Zuckerberg comments on people posting their info on Facebook.
| wantsanagent wrote:
| We saw these same fears with the release of Gmail. Why would you
| trust your _email_ to _Google?!!_ Aren 't they going to train
| their spam filters on all your data? Aren't they going to sell
| it, or use it to sell you ads?
|
| Corporations constantly put their most sensitive data in 3rd
| party tools. The executive in the article was probably copying
| his company strategy from Google docs.
|
| Yes, there are good reasons for concern, but the power of the
| tool is simply too great to ignore.
|
| Banning these tools will go the same way as prohibition did in
| the US, people will simply ignore it until it becomes too absurd
| to maintain and too profitable to not participate in.
|
| Companies which are able to operate _without_ these fears will
| move faster, grow more quickly, and ultimately challenge
| companies restricted to operate without.
|
| Now I think the article _should_ be a wake-up call for OpenAI.
| Messaging around what is and what is not used for training could
| be improved. Corporate accounts for Chat with clearer privacy
| policies would be great and warnings that, yes, LLMs do memorize
| data and you should treat anything you put into a free product on
| the web as fair game for someone 's training algorithm.
| Thorentis wrote:
| > too profitable to not participate in
|
| Sorry, but I really struggle to see how a non AI company will
| actually become more profitable simply by getting their
| employees to use ChatGPT. In fact, the more companies that use
| it, the more demand there will be for "human only" services.
| raincole wrote:
| If your company's code is all repositories on Github (or
| bitbucket, or any similar service), worrying about ChatGPT is
| quite silly.
|
| And on the other hand, if your company doens't use Github etc
| due to security concern, it's a very good sign telling you need
| to ban ChatGPT too.
| paxys wrote:
| Trusting Gmail with corporate communication _was_ was a
| terrible idea (and explicitly illegal in a lot of industries),
| and companies didn 't start to adopt it until Google released
| an enterprise version with table-stakes security features like
| no training on the data, no ad targeting, auditing, compliance
| holds and more.
|
| There's a huge difference between trusting a third party
| service with strict security and data privacy agreements in
| place vs one that can (legally) do whatever they want with your
| corporate data.
| ChatGTP wrote:
| This is vital for professional adoption. We cannot live in a
| world where basically all commercial information, all secrets
| are being submitted to one company.
| fnordpiglet wrote:
| Was?
| kccqzy wrote:
| Well it was, until Google Workspace (G Suite) came along
| and provided essentially an enterprise version of Gmail.
| fnordpiglet wrote:
| I still question the wisdom of giving data to the worlds
| largest spyware company that makes its money by
| converting mass surveillance into dollars.
| jackson1442 wrote:
| I think this is different in that ChatGPT is expressly using
| your data as training in a probabilistic model. This means:
|
| * Their contractors can (and do!) see your chat data to tune
| the model
|
| * If the model is trained on your confidential data, it may
| start returning this data to other users (as we've seen with
| Github Copilot regurgitating licensed software)
|
| * The site even _tells you_ not to put confidential data in for
| these reasons.
|
| Until OpenAI makes a version that you can stick on a server in
| your own datacenter, I wouldn't trust it with anything
| confidential.
| hgsgm wrote:
| Google had all the same problems, until it found a balance of
| functionality, security, and privacy.
|
| OpenAI just hasn't started to try adding privacy and security
| yet.
| waboremo wrote:
| Sticking it in your own datacenter doesn't really prevent any
| of these problems (except maybe #2), only now your leaks are
| internal and because of all the false sense of security, you
| might wind up leaking far more confidential and specific
| information (ie. an executive leaking to the rest of the team
| in advance that they are planning layoffs for noted reasons,
| whereas that executive might have used more vague terms when
| speaking to public chatGPT).
| mholm wrote:
| Sticking it in your own private datacenter would imply that
| you can opt in or out of using your data to train the next
| generation. ChatGPT does not dynamically train itself in
| realtime.
| waboremo wrote:
| The implication is that you would bother with ChatGPT at
| all to train it on the relevant local data, the key value
| aspect to ChatGPT beyond general public use.
| renewiltord wrote:
| Not that I don't expect them to do this, but how is it
| expressly said to be so?
|
| https://help.openai.com/en/articles/5722486-how-your-data-
| is...
|
| > _OpenAI does not use data submitted by customers via our
| API to train OpenAI models or improve OpenAI's service
| offering. In order to support the continuous improvement of
| our models, you can fill out this form to opt-in to share
| your data with us. Sharing your data with us not only helps
| our models become more accurate and better at solving your
| specific problem, it also helps improve their general
| capabilities and safety._
| avereveard wrote:
| Hehe old tos trick. Here it doesn't say "will never use"
| but say "does not use" and I wager below or somewhere will
| say that they can change the tos at any time in the future
| unilaterally
| kristofferR wrote:
| Did you read the next paragraph?
|
| > When you use our non-API consumer services ChatGPT or
| DALL-E, we may use the data you provide us to improve our
| models.
| renewiltord wrote:
| I _definitely_ did not correctly read that. Thanks for
| the clarification. Totally misread the 'our API' bit!
|
| It's also in the FAQ:
| https://help.openai.com/en/articles/6783457-chatgpt-
| general-...
|
| > _Will you use my conversations for training?_
|
| > _Yes. Your conversations may be reviewed by our AI
| trainers to improve our systems._
| marcosdumay wrote:
| > Companies which are able to operate without these fears will
| move faster
|
| Or the fears are real and companies that operate without them
| will be exploited, or extinguished for annoying their
| customers.
| yellow_postit wrote:
| This cycle happens regularly and it seems often times the service
| provider wises up and charges for extra controls.
|
| Yammer pre-Microsoft and nowadays Blind -- lots of "insider"
| information seemingly posted.
|
| As usage goes up the target size, and opportunity cost, both go
| up.
| agloe_dreams wrote:
| Meanwhile over at Github Copilot...
|
| Hahahahahahaha
| dubbelboer wrote:
| If you're using Github already then Copilot isn't seeing
| anything new.
| agloe_dreams wrote:
| Correct, but that level of security is expected from GitHub
| proper, they have all sorts of independent security reviews
| for their partners. Does all of that exist for Copilot?
| grepfru_it wrote:
| Yes
| WhereIsTheTruth wrote:
| That's why privacy is important in the real world
|
| Imagine if everyone knew your inner secrets just by looking at
| you at the bar..
|
| Why is it different online? i have no idea.. well i kinda know
| but.. oh well.. we deserve it i guess
| bob1029 wrote:
| We published an internal policy for AI tools last week. The basic
| theme is: "We see the value too, but please don't copypasta our
| intellectual property until we get a chance to stand up something
| internal."
|
| We've granted some exceptions to the team responsible for
| determining _how_ to stand up something internal. Lots of
| shooting in the dark going on here, so I figured we would need
| some divulgence of our IP against public tools to gain traction.
| meghan_rain wrote:
| Inform us when you figured out a way to host something with the
| quality of ChatGPT internally :-)
| visarga wrote:
| You can use chatGPT inside Azure, like any other service.
| It's not the same one used by OpenAI, and there are different
| guarantees.
|
| > ChatGPT is now available in Azure OpenAI Service
|
| https://azure.microsoft.com/en-us/blog/chatgpt-is-now-
| availa...
| meghan_rain wrote:
| Sorry but the whole point is to not use a closed source
| third party API with a dubious privacy police ran buy a
| multinational surveillance capitalism megacorporation.
| philsquared_ wrote:
| Just use the API? It deletes your data after 30 days...
| bob1029 wrote:
| Even if we had a 100% private ChatGPT instance, it wouldn't
| fully cover our internal use case.
|
| There is way more context to our business than can fit in
| 4/8/32k tokens. Even if we could fit the 32k token budget, it
| would be _very_ expensive to run like this 24 /7. Fine-tuning
| a base model is the only practical/affordable path for us.
| cmelbye wrote:
| You can retrieve information on demand based on what the
| user is asking, like this:
| https://github.com/openai/chatgpt-retrieval-plugin
| knodi123 wrote:
| I can host llama's 7b model internally. It hallucinates more
| often than not and has a tendency to ramble, but dammit it's
| local and secure!
| walthamstow wrote:
| What's it like with code, documentation, regex, etc?
|
| That's all I use ChatGPT for. I don't need it to be able to
| write poetry.
| beiller wrote:
| I did not verify this regex, on the very surface scan
| seems OK:
|
| ./main -m ./models/7B/ggml-model-q4_0.bin -t 4 --temp 0.7
| --top_k 40 --top_p 0.5 --repeat_last_n 256
| --repeat_penalty 1.17647 -n 1024 -p $'Here is a handy
| short form regex to validate an email address: '
|
| Here is a handy short form regex to validate an email
| address:
| ^([a-zA-Z0-9_.+-]+@[a-zA-Z0-9.-]+\\.[a-z\\.]{2,6})+$ The
| first character can be either uppercase or lower case.
| The second group must contain at least one letter and the
| third group may contain any number of characters (up to
| 5). The last part ensures that it ends with @ followed by
| two more letters separated by dots. If you want to make
| sure that your input string contains only valid
| characters for emails then use this regex instead:
| \A[\w.] _@[\w.]_ \\.\w{1,4}\z
| bob1029 wrote:
| This looks like good performance. We are keeping an open
| mind with regard to actually-open alternatives.
| [deleted]
| NoZebra120vClip wrote:
| This is nothing new at all. How many people have Grammarly
| plugins installed? They are advertising aggressively, so I'd
| think it is the new hotness. Don't tell me Grammarly is not
| hoovering up all of the Slack, Word, Docs, and Gmail data that
| everyone sends it, and holding on for some future purpose. We'll
| see.
| sophiabits wrote:
| Grammarly is an OpSec nightmare that's somehow managed to slip
| under most people's radar. I know folks who won't use the Okta
| browser extension because of the extensive permissions it asks
| for, but will happily use Grammarly on everything.
|
| Last time I looked (maybe things have improved...?) Grammarly
| would automatically attach itself to any text box you
| interacted with, and immediately send all of your content to
| their servers for processing. How this software gets past IT
| departments is a mystery to me.
| belter wrote:
| 2017... "Grammarly Vulnerability Allows Attackers To See
| Sensitive Data of Their Customers" -
| https://www.invicti.com/blog/web-security/grammarly-vulnerab...
| brightball wrote:
| This was my first concern when it came to IDE plugins.
| bradgessler wrote:
| Apple will make a killing if they can deploy private LLMs to
| their M-series chips.
| jcims wrote:
| The wild thing is that you can just annotate the content with the
| associated organization information then ask GPT to tell you what
| it sees in the logs.
| tombert wrote:
| This is scary, but it doesn't surprise me even in the slightest.
| ChatGPT is useful for so many things that it's extremely tempting
| to convince yourself that you should trust it.
|
| For example, I was having some issues with my LTO-6 drive
| recently, and I had to finagle through a bunch of arcane server
| logs to diagnose it. I had the idea of simply copypasting the
| logs into ChatGPT and having it look at them, and it quickly
| summarized the logs and told me what things to look for. It
| didn't directly solve the problem, but it made the logs 100x more
| digestible and I was able to figure out my problem. It made a
| problem that probably would have taken 2-3 hours of Googling take
| about 20 minutes of finagling.
|
| I'm not doing anything terribly interesting or proprietary on my
| home server, so I didn't really have any reservations sharing
| dmesg logs with it, but obviously that might not be the case in a
| company. Server logs can often have a _ton_ of data that could be
| useful for a competitor (whether it should be there or not), and
| someone not paying attention to what they 're pasting into
| ChatGPT could easily expose that data.
| johntiger1 wrote:
| It's alright, I just told it that I don't consent to my data
| being used. Checkmate openAI!
| reneberlin wrote:
| just because your code is art of course, which should be weighted
| in gold and copyrighted for the next 2000 years living in a cold
| wallet. Did you get that memo, that humans implemented all the
| faulty security that lasted the past decades by accident?
| balls187 wrote:
| And what percent of employees send sensitive information via
| unencrypted email?
| deltree7 wrote:
| Classic fearmongering article targeted to HN crowd.
|
| i) In this world, there are very few people whose private
| conversation is worth anything to anybody (celebrities,
| journalists -- so around 10,000 people)
|
| ii) A tiny tiny %age of information is truly secret (mostly
| private keys).
|
| iii) Business strategies are mostly a result of execution, not
| any 'trade secrets'. Meta will succeed because it has executed
| it's metaverse strategy, not because they kept the metaverse
| strategy secret.
|
| People who take risks and not care about irrelevant details (just
| like how they took risk with internet shopping, cloud, SaaS) will
| win. Losers like the ones who thought AWS will steal their data
| will be left behind
| nemo44x wrote:
| Looking forward to AI on chip not too far down the road. As long
| as we use APIs for the model itself we can't really use it for
| much.
| sigstoat wrote:
| when it first came out and my boss was behind himself about how
| cool it was, he was feeding it all of his emails with other
| businesses to have it clean them up. boggled my mind.
| dannyobrien wrote:
| do those other businesses use gmail? does your company?
| SketchySeaBeast wrote:
| I think those are different models.
|
| Gmail has a vested interested in keeping any knowledge it
| gains about you secret - it's competitive advantage is
| knowing more about you than anyone else does.
|
| ChatGPT's strength is its ability to clearly communicate the
| knowledge it has (including training data it gains from
| people it interacts with) to give you good responses.
| yawnxyz wrote:
| I think there's more fear of OpenAI leaking data than say,
| Airtable or Notion or Github or AWS/S3 or Cloudflare or Vercel or
| some other company that has gobs of a company's data. Microsoft
| also has gobs of data: anything on Office and Outlook is your
| company data -- but the fear that they'll leak (intentional or
| accidental) is somehow more contained.
|
| If we want to be intellectually honest with ourselves, we can
| either be fearful and have a plan to contain data from ALL of
| these companies, OR, we address the risk of data leaks through
| bugs as an equal threat. OpenAI uses Azure behind the scenes, so
| it'll be as solid (or not solid) as most other cloud-based tools
| IMO.
|
| As for your data training their data: OpenAI is mostly a
| Microsoft company now. Most companies use Microsoft for
| documentation, code, communications, etc. If Microsoft wanted to
| train on your data, they have all the corporate data in the
| world. They would (or already could!) train on it.
|
| If there's a fear that OpenAI will train their model on your data
| submitted through their silly textbox toy, but NOT through
| training on the troves of private corporate data, then that fear
| is unwarranted too.
|
| This is where OpenAI should just get a "corporate" tier, charge
| more for it, and is basically make it HIPAA/SOC2/whatever
| compliant, and basically do that to assuage the fears of
| corporate customers.
| rrdharan wrote:
| 1. Azure has the worst security of the major cloud providers;
| multiple insanely terrible RCE and open readable DB exposures.
|
| 2. Azure infrastructure still likely has far better
| security/privacy by virtue of all their compliance, (HIPAA,
| FedRAMP, ISO certifications etc.) than whatever startup-move-
| fast-ignore-compliance crap OpenAI layers on top of it in their
| application layer.
| philwelch wrote:
| For most of those tools you can get your own self-hosted
| version if you're worried about your data.
| krisoft wrote:
| > I think there's more fear of OpenAI leaking data than say,
| Airtable or Notion or Github or AWS/S3 or Cloudflare or Vercel
| or some other company that has gobs of a company's data.
|
| There is zero fear. OpenAI openly writes that they are going to
| use ChatGPT chats for training. On the popup modal they show
| you when you load the page. That is not a fear, that is a
| promise that they will do leak it whatever you tell them.
|
| If i tell you "please give me a dollar, but i warn you you will
| never see it again" would you describe your feeling over the
| transaction as "fearful that the loan won't be repaid"?
| satvikpendem wrote:
| I noticed this too, this is why I'm working on a startup that
| lets you train your data internally through a self-hosted OSS LLM
| product. It's going to start off with just reading emails for now
| until we can integrate with other products too, like
| spreadsheets, Airtable, databases, etc. Basically an OSS version
| of OpenAI ChatGPT plugins, I suppose. If you're interested,
| message my at my email in my profile.
| crop_rotation wrote:
| In my experience most corporate employees just take the path of
| least resistance. It is not uncommon for people to paste non
| public data into websites just to do json formatting, and paste
| base64 strings to random websites just to decode them. So just
| telling people not to do something won't accomplish much. Most
| corporate employees also somehow think they know better than the
| policy.
|
| Any company that doesn't want to feed data into ChatGPT should
| need to proactively block both ChatGPT and any website serving as
| a wrapper over it.
| japhyr wrote:
| > and any website serving as a wrapper over it
|
| I agree this would be a good move, but it's going to be harder
| and harder to do that definitively.
| sophiabits wrote:
| Perhaps we need an "LLM Block" browser extension? :)
| itronitron wrote:
| A while back I got to hear about how the IT team running my
| then-employer's internal time reporting tool was sending all
| the usage data through Google Analytics and how neat that was
| for them to look at :\ .
|
| I shudder to think what they are doing now.
| guestbest wrote:
| This is a user led data leak that ranks up there with Facebook
| and LinkedIn asking for email passwords to "look for your
| contacts to add".
| varunjain99 wrote:
| Not only do you have to worry about employees directly sharing
| data, but many companies are also just wrappers around GPT. Or
| they may use your data in the future to roll out new AI services.
|
| While this is not a new problem -- employees share sensitive data
| with Google all the time -- the data leakage will be more clear
| than ever. With ads-based tracking and Google search, the leakage
| was very indirect. With generative AI, it can literally
| regurgitate memorized documents.
|
| The security risk goes beyond data exfiltration. Folks are
| already trying to teach the AI incorrect information by spamming
| it with something like 2+2 = 5.
|
| Data exfiltration + incorrect data injection are super underrated
| risks to mass adoption of generative AI tech in the B2B world...
| cuuupid wrote:
| We block ChatGPT, as do most federal contractors. I think it's a
| horrible exploit waiting to happen:
|
| - there's no way they're manually scrubbing out sensitive data so
| its bound to spill out from the training data when prompting the
| model
|
| - OpenAI is openly storing all this data they're collecting to
| the extent that they've had several leaks now where people can
| see others' conversations and data. We are one step away if it
| hasn't already happened from an exploit of their systems (that
| likely weren't built with security as the top priority as opposed
| to scale and performance) that could leak a monumental amount of
| data from users.
|
| In the most innocent case they could leak the personal info of
| naive users. But largely if Linkedin is any indication, the
| business world is filled with dopes who genuinely believe the AI
| is free thinking and better than their employees. For every org
| that restricts ChatGPT use, there are fifty others that don't,
| most of which have at least one of said dopes who are ready to
| upload confidential data at a moments notice.
|
| Wouldn't even put it past military personnel putting S/TS
| information into it at this point. OpenAI should include more
| brazen warnings against providing this type of data if they want
| to keep up this facade of "we can't release it because ethics"
| because cybersecurity is a much more real liability than a
| supervised LM turning into terminator.
| probablynish wrote:
| This really depends on the cost/benefit tradeoff for the entity
| in question. If using ChatGPT makes you X% more productive
| (shipping faster / lowers labor costs / etc), but comes with Y%
| risk of data leakage, is that worth it in expectation or not? I
| would argue that there definitely exist companies for which
| it's worth the tradeoff.
|
| By the way, OpenAI says they wont use data submitted through
| its API for model training -
| https://techcrunch.com/2023/03/01/addressing-criticism-opena...
| eldaisfish wrote:
| the #1 problem with corporations saying things is that many
| things they say are not regulated or are taken on good faith.
| What happens with OpenAI are acquired and the rules change?
| These comments are often entirely worthless.
| eptcyka wrote:
| Risk of leakage? It is not a risk, it is a matter of time.
| osterbit2 wrote:
| To anyone who may be pasting code along the lines of 'convert
| this sql table schema into a [pydantic model|JSON Schema]'
| where you're pasting in the text, just ask it instead to
| write you a [python|go|bash|...] function that _reads in a
| text file_ and 'converts an sql table schema to output x' or
| whatever. Related/not-related--great pandas docs replacement
| is another great+safe use-case.
|
| Point is, for a meaningful subset of high-value use-cases you
| don't _need_ to move your important private stuff across any
| trust boundaries, and it still can be pretty helpful...so
| just calling that out in case that 's useful to anyone...
| bsder wrote:
| Do you really think the people asking ChatGPT to write
| their code can make that abstraction?
|
| The fact that the can't do this is the whole reason they
| have to use ChatGPT.
| lukifer wrote:
| I've been doing this kind of thing pretty regularly for
| the past few weeks, even though I know how to do any of
| the tasks in question. It's usually still faster, even
| when taking the time to anonymize the details; and I
| don't paste anything I wouldn't put on a public gist
| (lots of "foo, bar", etc)
| q7xvh97o2pDhNrh wrote:
| I use it because it's 10-100x more interesting, fun, and
| fast as a way to program, instead of me having to
| personally hand-craft hundreds of lines of boilerplate
| API interaction code every time I want to get something
| done.
|
| Besides, it's not like it puts out _great_ code (or even
| always _working_ code), so I still have to read
| everything and debug it. And sometimes it writes code
| that is just fine _and_ fit for purpose _and_
| horrendously ugly, so I still have to scrap everything
| and do it myself.
|
| (And then sometimes I spend 10x as long doing _that,_
| because it turns out it 's also just plain good fun to
| grow an aesthetic corner of the code just for the hell of
| it, too -- as long as I don't _have_ to.)
|
| And even after all that extra time is factored back in:
| it's _still_ way faster and more fun than the before-
| times. I 'm actually enjoying building things again.
| funfunfunction wrote:
| People aren't using ChatGPT because they can't do it
| themselves, they're using it to save time.
| ChatPGT wrote:
| I'm doing that since day one. I can't believe people are
| pasting real data into this corporate black boxes.
| heyyyouu wrote:
| But that's the API, not the Chat input or Playground.
|
| Companies can use Azure OpenAI Services to get around this --
| there's data privacy, encryption, SLAs even. The problem is
| it's very hard to get access to (right now).
| nostromo wrote:
| Does blocking ever work? People are smart and usually just work
| around them.
| PeterisP wrote:
| It works in the sense that it does add an extra "reminder"
| and requires specific intent. I mean, in this scenario all
| the people already have been informed that they're absolutely
| not allowed to do things like that, but if someone has
| forgotten that, or simply is careless and just wants to "try
| something out" then if it's unblocked they might actually do
| it, but if they need to work around a restriction, that
| forces them to acknowledge that there _is_ a restriction and
| they shouldn 't try to work around it even if they can.
| mnd999 wrote:
| The smart ones don't paste in all their private data.
|
| And yes, if the bypassing the block is combined with
| disciplinary action, it does work. It's not worth getting
| fired over. This is likely what heavily regulated industries
| like financial services and defense are doing.
| tedunangst wrote:
| Blocks are effective reminders of policies.
| acomjean wrote:
| I remember someone trying to look up winning lottery
| numbers at work. The site came up "Blocked: Gambling". It
| was a little reminder that they're watching our web
| browsing at work..
| michaelteter wrote:
| Possibly I don't know how this all works, but I think if the
| host of a ChatGPT interface were willing to provide their own
| API key (and pay), they could then provide a "service" to
| others (and collect all input).
|
| In that case, you wouldn't know to block them until it was too
| late.
|
| Ultimately either you must watch/block all outgoing traffic, or
| you must train your people so thoroughly that they become
| suspicious of everything. Sadly, being paranoid is probably the
| most economical attitude these days if IP and company secrets
| have any value.
| dragonwriter wrote:
| > Possibly I don't know how this all works, but I think if
| the host of a ChatGPT interface were willing to provide their
| own API key (and pay), they could then provide a "service" to
| others (and collect all input).
|
| Well, GP was referring to blocking ChatGPT _as a federal
| contractor_. I suspect that as a federal contractor, they are
| also vetting other people that they share data with, not just
| blocking ChatGPT as a one-off thing. I mean, generic federal
| data isn't as tightly regulated as, say, HIPAA PHI (having
| spent quite a lot of time working for a place that handles
| both), but there _are_ externally-imposed rules and
| consequences, unlike simple internal-proprietary data.
| michaelteter wrote:
| But it really seems like a cat and mouse game. For example,
| a very determined bad actor could infiltrate some lesser
| approved government contractor and provide an additional
| interface/API which would invite such information leaking,
| and possibly nobody would notice for a long time.
| Jensson wrote:
| And then they could face death penalty for espionage if
| they leaked sensitive enough data. You would have to be
| really stupid to build such a service for government
| contractors unless you actually are a foreign spy.
| sebzim4500 wrote:
| At least then we would finally find out if it is
| constituional to execute someone for espionage.
| yathrowaway01 wrote:
| So you block internet access for all employees? Cos anything
| you think is being pasted into ChatGPT is being pasted
| everywhere, whether its Google, Slack, Chrome Plugins, public
| Wifi.
| [deleted]
| m000 wrote:
| Does US intelligence have access to OpenAI data? Private
| organizations is one thing. But with all the dopes in
| government positions around the world, OpenAI logs would
| probably be a treasure trove for intelligence gathering.
| grepfru_it wrote:
| They are just one national security letter away from all US-
| held data.
| Ensorceled wrote:
| Let's also not discount that for every "dope" there is at least
| one "bad actor" who is willing to take the risk to get an edge
| in their workplace or appease their managers demands. The
| warnings will only deter the first group.
| comment_ran wrote:
| If your competitor use ChatGPT to compete with you and they're
| 10x productive than yours, are you still willing to insist? If
| the productive is 100x, will you?
| amelius wrote:
| Uh, there's no sign of that yet.
| chaxor wrote:
| [flagged]
| Ensorceled wrote:
| > Think about how long it's taken tools like pandas to
| reach the point that it is now. That entire package can
| be built to the level it is now in a couple of days.
|
| I don't think that is true at all. Do you have an example
| of a significant project being duplicated in days, or
| even months, with ANY of these tools?
|
| By significant, I mean something on the order of pandas
| which you claimed.
| msm_ wrote:
| And this is completely ignoring the fact, that the real
| hard problem is the design. Spitting boilerplate code is
| not. How pandas could be designed perfectly in one
| afternoon (and generated with GPT) is beyond my
| comprehension.
| brickteacup wrote:
| Cool, please provide a link to a library of similar size
| and complexity to pandas which was written using ChatGPT
| in the span of a few days. We'll be waiting.
| saulpw wrote:
| > Think about how long it's taken tools like pandas to
| reach the point that it is now. That entire package can
| be built to the level it is now in a couple of days.
|
| Let me hand you a mirror: You're absolutely and
| completely wrong
| pedrosorio wrote:
| > Think about how long it's taken tools like pandas to
| reach the point that it is now. That entire package can
| be built to the level it is now in a couple of days.
|
| I am having trouble parsing this statement. You're saying
| a person equipped with chatGPT trained on data prior to
| December 2007 (the month before the initial pandas
| release) could have put together the entire pandas
| library in a couple of days?
|
| That seems obviously wrong, starting with the fact that
| one would need to know "what" to build in the first
| place. If you're saying that chatGPT in 2023 can spit out
| pandas library source code when asked for it directly,
| that's obvious.
|
| Somewhere between the impossible statement and the
| obvious statement I made above, there must be something
| interesting that you were trying to claim. What was it?
| misnome wrote:
| > There is an immense amount of evidence of that
|
| Then it should be easy to provide some?
| brobdingnagians wrote:
| It might be just as likely that ChapGPT will cause a mistake
| like Knight Capital because no one bothered to thoroughly
| verify the AI's looks-good-but-deeply-flawed answer, and the
| two aren't mutually exclusive possibilities.
| causality0 wrote:
| Right. I've had ChatGPT completely fail at something as
| simple as writing a batch file to find and replace text in
| a text file.
| arcticfox wrote:
| Sure, but humans do that all the time as well
| remexre wrote:
| Humans are a lot better at "I don't know how to do this;
| hey Alice, can you look this over if you've got a sec and
| tell me if I'm making a noob mistake"
| bcrosby95 wrote:
| Security and privacy should be table stakes. Speaking for my
| country, we needs privacy laws with teeth to punish bad
| actors from shitting people's private information where ever
| they want in the name of a dollar.
| tomatotomato37 wrote:
| This isn't an argument of ChatGPT vs nothing. This is an
| argument of "external" ChatGPT vs some other AI sitting on
| your own secured hardware, maybe even a branch of ChatGPT.
| hgsgm wrote:
| > some other AI sitting on your own secured hardware, maybe
| even a branch of ChatGPT.
|
| Where can I, a random employee, get that? I know how to get
| ChatGPT.
| ghaff wrote:
| You can't. So maybe you as a random employee should just
| do without whatever IT hasn't approved whether you agree
| or not.
| dougb5 wrote:
| Given that they use all the labor of the Internet without
| attribution, we should assume that they will use every additional
| drop of data we give to them for their own ends.
| ChatGTP wrote:
| This is what I hate about it.
| DoingIsLearning wrote:
| Funnily enough I wrote a cautionary comment on this just 2 days
| ago :
|
| https://news.ycombinator.com/item?id=35299695
| jedberg wrote:
| No one cares about security because there is no consequence for
| getting it wrong. Look at all the major breaches ever. And look
| specifically at the stock price of those companies. They took
| small short term hits at best.
|
| Worst case the CISO gets fired and then they all play musical
| chairs and end up in new roles.
|
| Heck, even Lastpass, ostensibly a _security company_ , doesn't
| seem particularly affected by their breach.
|
| My point is, especially with ChatGPT, where it can reasonably 10x
| your productivity, most people will be willing to take the risk.
| baxtr wrote:
| We went pretty quickly from:
|
| No way I'm giving Google any of my data! I will use 5 different
| browsers in incognito mode and never log in.
|
| To ->
|
| Sure I will login with my name and email and feed you as much of
| my most personal thoughts and data as I can dear ChatGPT!
| grammers wrote:
| Not quite. These are the same people that use only Chrome while
| being logged in to their Google account. Convenience wins.
| smt88 wrote:
| > _These are the same people that use only Chrome while being
| logged in to their Google account_
|
| This situation is much dumber than that. ChatGPT is very
| clear that you shouldn't give it private data and that
| anything you type into it can/will be used for training.
|
| Google is nowhere near that level of transparency.
| contravariant wrote:
| Any examples where those two were the same person?
|
| Because both types of people have always existed. Heck, lack of
| vigilance among the ancient Greeks is what put the Trojan in
| Trojan horse.
| croes wrote:
| I bet you find some on HN.
|
| Sometimes couriosity beats caution.
| smodo wrote:
| It was a lack of vigilance among the Trojans, actually. The
| Greeks did the burning an pillaging. So uh OpenAI is the
| Greeks, ChatGPT is the horse and Microsoft is king Menelaos
| or something. Achilles is dead, but he did make TempleOS.
| philsquared_ wrote:
| Google takes your data and sells it. Literally making your data
| available to the highest bidder. Is OpenAI doing that? If
| Google existed in its current form during the early internet it
| would be classified in the same category as Bonzai Buddy.
| Spyware. That is what Google is. So I can very reasonably
| understand why people would trust OpenAI with data they
| wouldn't trust Google with. OpenAI hasn't spit in the face of
| its users yet.
| SketchySeaBeast wrote:
| > Google takes your data and sells it. Literally making your
| data available to the highest bidder.
|
| But it doesn't, does it? It sells the fact that it knows
| everything about everyone and can get any ad to the perfect
| people for it. It's not going on the open market and telling
| people I regularly buy 12 lbs of marshmallow fluff and then
| use it in videos I keep on my google drive.
| noncoml wrote:
| > Google takes your data and sells it. Literally making your
| data available to the highest bidder.
|
| Even if they are not doing it now(?), what makes you think
| that they will not do so in the future? It's not like your
| data has an expiration date.
| philsquared_ wrote:
| Because they are completely different business models. If
| OpenAI decides to become an advertising behemoth then I
| would show concern. Right now they use your data for
| training (when they use it).
| Jensson wrote:
| OpenAI is selling others data in their model responses.
| Selling others data is their main business model.
|
| If it uses user data to train their models other users
| could ask "Show me the code for gmail spam filters", and
| if it was trained on engineers refactoring that spam
| filter in ChatGPT chances are it would give you the code.
| If that doesn't count as "selling user data" I don't know
| what is. They not only sell it, they nicely package and
| rewrite it to make it easy to avoid copyright claims!
| JohnFen wrote:
| OpenAI has already demonstrated that they're all in for
| maximizing profit. They may not be advertisers, but
| advertisers aren't the only sorts of companies that make
| bank by selling personal data.
|
| I see no reason to think OpenAI would leave that money on
| the table.
| dragonwriter wrote:
| > Google takes your data and sells it. Literally making your
| data available to the highest bidder.
|
| No, Google takes money to present ads to people of different
| demographics, and _uses_ your data to do that. It doesn't
| sell your data, which is, in fact their competitive edge in
| ads - selling your data would be selling the cow when they'd
| prefer to sell the milk.
| jankeymeulen wrote:
| Not really. Even the most evil Google one can imagine would
| realise "your data" is the most valuable thing they possess,
| selling it would be bad for business. They're selling ads to
| the highest bidder who's looking for someone with a profile
| based on your data, but not your data itself.
| JohnFen wrote:
| True, but that's not actually any better. And it still
| counts as selling your data, just indirectly.
| JohnFen wrote:
| > I can very reasonably understand why people would trust
| OpenAI with data they wouldn't trust Google with. OpenAI
| hasn't spit in the face of its users yet.
|
| In other words, having been burnt once by touching a flame,
| the conclusion these people draw is that the problem was with
| _that particular_ flame and they 're fine with reaching for a
| different one?
| bcrosby95 wrote:
| My problem with Google is they'll ban me from gmail for
| something I do on youtube.
| interstice wrote:
| Could there be eg a browser extension for scrubbing sensitive
| data from input on paste?
|
| I'm hoping OpenAI will implement something like this on their end
| soon like data monitoring apps do (sentry etc), but if they don't
| client-side is an option.
| MarkusWandel wrote:
| Wouldn't it be trivial to add a "read-only" mode to the LLM's
| operation, where it uses stored knowledge to answer queries but
| doesn't ingest new knowledge from those queries?
| johntiger1 wrote:
| Why isn't it read-only by default? it's not even connected to
| the internet
| ben_w wrote:
| ChatGPT, and I think all the GPT LLMs, is _only_ accessible
| over the internet as far as I can tell.
|
| And the thumbs up/down are there on the chat interface
| because it's partly trained by reinforcement from human
| feedback.
| og_kalu wrote:
| Nope. LLMs don't use the internet for inference at all
| unless you give it access to a web search api or something
| like that. chtGPT is just too massive to run on any local
| machine. But make no mistake, it does not require the
| internet.
| humanistbot wrote:
| From https://help.openai.com/en/articles/7039943-data-usage-
| for-c...:
|
| > You can request to opt out of having your content used to
| improve our services at any time by filling out this form (http
| s://docs.google.com/forms/d/1t2y-arKhcjlKc1I5ohl9Gb16t6S...).
| This opt out will apply on a going-forward basis only.
|
| It goes to a google form, which is I guess better then them
| building their own survey platform from scratch that may have
| more vulnerabilities.
| eternalban wrote:
| > them building their own survey platform from scratch
|
| Funny. This nobel prize winner raises an interesting
| question:
|
| _If your AI is so great at coding, why is your software so
| buggy?_ : https://paulromer.net/openai-bug/
| meghan_rain wrote:
| I'd be worried this is also the "how to get banned from
| OpenAI in the near future" form. and if OpenAI retains a
| monopoly like Google does for search, you are basically
| screwed.
| visarga wrote:
| No, everyone will have GPT-4 level AI in 6-12 months.
| ben_w wrote:
| I'm old enough to remember when newspapers reported hackers
| being banned from using computers.
|
| And IP pirates banned from using the internet. Actually,
| that one I remember I voted against my local MP after they
| passed a law to make that the norm.
|
| We don't yet have a social, let alone legal, norm for
| antisocial use of LLMs; even with social media, government
| rules are playing catch-up with terms of service, and
| Facebook is old enough that if it was human it could now
| vote.
|
| So, yes, likewise computers/internet/social media, being
| banned from an LLM _if it 's a monopoly_ is going to
| seriously harm people.
|
| But that is likely to be a big "if". The architecture and
| the the core training data for GPT-3 isn't a secret, and
| that's already pretty impressive even if lesser than 3.5
| and 4.
| w_for_wumbo wrote:
| This is the issue with a tool so powerful, you can't just tell
| people not to use it, or to use it responsibly. Because there's
| too much incentive for them to use it. If it saves hours of a
| persons' workday, and they're not seeing any of the harm caused
| from data leakage, there's no incentive for them to not use it.
|
| Which is why a private option is so critical. To not fight
| against human nature, means providing an ability to use the tool
| in a safe way.
| raincole wrote:
| > you can't just tell people not to use it
|
| Uh, why can't you tell people not to use it...? If security is
| that important for your company, of course you can tell your
| employees which tools to use.
|
| A fun fact: in many areas of TSMC, _smart phones_ are banned.
| No one says "you can't just tell people not to use smart
| phones."
| q845712 wrote:
| https://www.bleepingcomputer.com/news/technology/fitness-
| tra...
| ghaff wrote:
| 1.) It's not at all clear it's nearly as powerful as you think.
| Certainly in my domain--writing about various topics--it's not.
|
| 2.) Of course, you can tell people not to use it. Unlike people
| at SV companies apparently, people in government and government
| contractors accept restrictions like not having phones in
| secure labs _all the time_. Start firing or even prosecuting
| people and people will discover very quickly they don 't really
| need some tool.
|
| And, yes, private versions of this sort of thing helps a lot.
| JohnFen wrote:
| There's a dev here who is using ChatGPT extensively in his
| work. The rest of the team is just waiting for him to get
| caught and fired. Sharing company data with unapproved external
| entities is very definitely a firing offense.
| mym1990 wrote:
| If you really care about your company's security, you should
| report it, otherwise you are just complicit.
| JohnFen wrote:
| I agree.
| yathrowaway01 wrote:
| Glad I work for a company where the CEO pays for everyones
| ChatGPT Plus for the devs. If you think your code is special
| then you're wrong.
| kristianp wrote:
| Does chatgpt plus collect data for training, or does it
| have more privacy than the free offering?
| JohnFen wrote:
| That entirely depends on the code. It's not that the code
| is special, it's that the code can reveal things that are
| competitive advantages (future plans, etc.)
| msm_ wrote:
| But you created a throwaway account specifically to reply
| in this thread?
|
| Unless your company really has nothing to hide, it's easy
| to accidentally dump a company secret or an API key in a
| chat session. Of course if everyone is aware of this and
| constantly careful then you may be OK.
| yathrowaway01 wrote:
| That's because accounts get shadow banned all the time
| when people get upset when you point out hard truths.
|
| If you're copy pasting API keys or such into ANYTHING,
| you probably shouldn't be a programmer to begin with.
|
| It's like people who use root account key/secret
| credentials in their codebase. It's not AWSs fault you
| got a large bill or got hacked, its because you're dumb.
| dizhn wrote:
| I posted my openAI token into a GitHub issue today
| thinking I'd just kill it right away, which I did but
| there was already an email from openAI letting me know
| that it was noticed that my token had become public and
| was thus revoked.
| bcrosby95 wrote:
| I regularly say shit that pisses people off here and I
| have never been shadow banned. It sounds like your "hard
| truths" are something other than just "hard truths",
| and/or you have a persecution complex.
| yathrowaway01 wrote:
| Your Karma is over 7000, if you get downvoted your stuff
| is still visible.
| jupp0r wrote:
| If your code has API keys in it, you have bigger problems
| than ChatGPT.
| croes wrote:
| You are still transferring your business data to an
| external entity, but on top of it you pay for it.
|
| And if you think that there is no special code then you're
| wrong.
| krono wrote:
| If the contract says the code is special, then the code is
| special.
| arcticfox wrote:
| If they're more productive by doing it, I think it's an equal
| chance said dev gets promoted.
| JohnFen wrote:
| He's not more productive, but even if he were, it wouldn't
| affect his getting fired.
|
| Productivity isn't everything.
| croes wrote:
| Why?
|
| Uploading code to ChatGPT can be done by trainees.
| robbywashere_ wrote:
| The real innovation of chatgpt will be how you go about getting
| permission to use it from your corpo middle management overlords.
| phn wrote:
| Employees have been feeding sensitive data to online translators
| in the same way.
|
| Not to say it isn't a problem, but it's not an AI/chatGPT
| specific one.
| twblalock wrote:
| If you think this is bad, imagine what JSONLint has seen over the
| years.
| nbzso wrote:
| OpenAI. The heist of the century. I am waiting for A.I. generated
| blockbuster in the near future.
| yieldcrv wrote:
| I was just watching Altman's interview from the Lex Friedman
| podcast a few days ago
|
| It really does feel like YC was a plot to fund the harvesting
| of all with an OpenAI climax. Not a serious conclusion I have,
| its just funny to watch it unfold, as if nobody even cares
| about the optics.
| syntaxing wrote:
| I'm curious if anyone's employer has set up their own LLM. My
| employer has a couple of A100 sitting around which could easily
| host a couple instance of 65B LLaMA or Alpaca. Convincing upper
| management to allow me is the hard part.
| tiborsaas wrote:
| Team up with someone from sales, marketing :)
___________________________________________________________________
(page generated 2023-03-27 23:00 UTC)