[HN Gopher] Employees are feeding sensitive data to ChatGPT, rai...
       ___________________________________________________________________
        
       Employees are feeding sensitive data to ChatGPT, raising security
       fears
        
       Author : taubek
       Score  : 280 points
       Date   : 2023-03-27 18:32 UTC (4 hours ago)
        
 (HTM) web link (www.darkreading.com)
 (TXT) w3m dump (www.darkreading.com)
        
       | danielmarkbruce wrote:
       | Seems like a temporary problem. Surely OpenAI will have a version
       | which runs in a customers public cloud VPC, orchestrated by
       | OpenAI.
        
       | VLM wrote:
       | I believe there were FUD pieces like this when internet search
       | engines were rolled out, and again when social media became
       | popular. I suppose its universal for new technologies.
       | 
       | I had an interview awhile ago at a place where during the phone
       | screen "they can't talk about their tech stack in detail" so I
       | looked on linkedin and figured out their entire tech stack before
       | on the onsite interview. Come on guys, according to linkedin, you
       | have an entire department of people doing AWS with Terraform and
       | Ansible, you don't have to pretend you can't say it in public.
        
       | Mizoguchi wrote:
       | "In one case, an executive cut and pasted the firm's 2023
       | strategy document into ChatGPT and asked it to create a
       | PowerPoint deck."
       | 
       | There's really not much you can do here. This is complete lack of
       | very basic common sense. Having someone like this in your
       | business, particularly at the executive level, is a liability
       | regardless of ChatGPT.
        
       | phineyes wrote:
       | Let's not forget that we're also feeding in all our code into
       | OpenAI Codex.
        
         | nodesocket wrote:
         | Which I assume also means sensitive code that may be .gitignore
         | but being pushed up to OpenAI. I.E. secrets, passwords, api
         | keys.
        
       | buzzscale wrote:
       | This is one of the reasons Databricks created Dolly, a slim LLM
       | that unlocks the magic of ChatGPT. A homegrown LLM that can tap
       | into/query the datasets of all the data in an organizations Data
       | Lakehouse will be hugely powerful.
       | 
       | I am working with customers that are looking to train a homegrown
       | LLM that they host and have blocked access to ChatGPT.
       | 
       | https://www.datanami.com/2023/03/24/databricks-bucks-the-her...
       | 
       | https://news.ycombinator.com/item?id=35288063
        
         | ngngngng wrote:
         | This reads like you had an LLM write an ad for you
        
       | cloudking wrote:
       | ChatGPT Business Edition seems pretty obvious and I'd surprised
       | if OpenAI isn't already working on it. Separate models for each
       | customer, data silos and protection. The infra is already there
       | on Azure.
        
         | repler wrote:
         | It actually is on Azure, exactly as you described.
         | 
         | https://learn.microsoft.com/en-us/azure/cognitive-services/o...
        
           | cloudking wrote:
           | Yep, they just need to provide a business specific frontend
           | chat UI.
        
       | Rzor wrote:
       | First thing that went through my mind as I read the headline was
       | Zuckerberg comments on people posting their info on Facebook.
        
       | wantsanagent wrote:
       | We saw these same fears with the release of Gmail. Why would you
       | trust your _email_ to _Google?!!_ Aren 't they going to train
       | their spam filters on all your data? Aren't they going to sell
       | it, or use it to sell you ads?
       | 
       | Corporations constantly put their most sensitive data in 3rd
       | party tools. The executive in the article was probably copying
       | his company strategy from Google docs.
       | 
       | Yes, there are good reasons for concern, but the power of the
       | tool is simply too great to ignore.
       | 
       | Banning these tools will go the same way as prohibition did in
       | the US, people will simply ignore it until it becomes too absurd
       | to maintain and too profitable to not participate in.
       | 
       | Companies which are able to operate _without_ these fears will
       | move faster, grow more quickly, and ultimately challenge
       | companies restricted to operate without.
       | 
       | Now I think the article _should_ be a wake-up call for OpenAI.
       | Messaging around what is and what is not used for training could
       | be improved. Corporate accounts for Chat with clearer privacy
       | policies would be great and warnings that, yes, LLMs do memorize
       | data and you should treat anything you put into a free product on
       | the web as fair game for someone 's training algorithm.
        
         | Thorentis wrote:
         | > too profitable to not participate in
         | 
         | Sorry, but I really struggle to see how a non AI company will
         | actually become more profitable simply by getting their
         | employees to use ChatGPT. In fact, the more companies that use
         | it, the more demand there will be for "human only" services.
        
         | raincole wrote:
         | If your company's code is all repositories on Github (or
         | bitbucket, or any similar service), worrying about ChatGPT is
         | quite silly.
         | 
         | And on the other hand, if your company doens't use Github etc
         | due to security concern, it's a very good sign telling you need
         | to ban ChatGPT too.
        
         | paxys wrote:
         | Trusting Gmail with corporate communication _was_ was a
         | terrible idea (and explicitly illegal in a lot of industries),
         | and companies didn 't start to adopt it until Google released
         | an enterprise version with table-stakes security features like
         | no training on the data, no ad targeting, auditing, compliance
         | holds and more.
         | 
         | There's a huge difference between trusting a third party
         | service with strict security and data privacy agreements in
         | place vs one that can (legally) do whatever they want with your
         | corporate data.
        
           | ChatGTP wrote:
           | This is vital for professional adoption. We cannot live in a
           | world where basically all commercial information, all secrets
           | are being submitted to one company.
        
           | fnordpiglet wrote:
           | Was?
        
             | kccqzy wrote:
             | Well it was, until Google Workspace (G Suite) came along
             | and provided essentially an enterprise version of Gmail.
        
               | fnordpiglet wrote:
               | I still question the wisdom of giving data to the worlds
               | largest spyware company that makes its money by
               | converting mass surveillance into dollars.
        
         | jackson1442 wrote:
         | I think this is different in that ChatGPT is expressly using
         | your data as training in a probabilistic model. This means:
         | 
         | * Their contractors can (and do!) see your chat data to tune
         | the model
         | 
         | * If the model is trained on your confidential data, it may
         | start returning this data to other users (as we've seen with
         | Github Copilot regurgitating licensed software)
         | 
         | * The site even _tells you_ not to put confidential data in for
         | these reasons.
         | 
         | Until OpenAI makes a version that you can stick on a server in
         | your own datacenter, I wouldn't trust it with anything
         | confidential.
        
           | hgsgm wrote:
           | Google had all the same problems, until it found a balance of
           | functionality, security, and privacy.
           | 
           | OpenAI just hasn't started to try adding privacy and security
           | yet.
        
           | waboremo wrote:
           | Sticking it in your own datacenter doesn't really prevent any
           | of these problems (except maybe #2), only now your leaks are
           | internal and because of all the false sense of security, you
           | might wind up leaking far more confidential and specific
           | information (ie. an executive leaking to the rest of the team
           | in advance that they are planning layoffs for noted reasons,
           | whereas that executive might have used more vague terms when
           | speaking to public chatGPT).
        
             | mholm wrote:
             | Sticking it in your own private datacenter would imply that
             | you can opt in or out of using your data to train the next
             | generation. ChatGPT does not dynamically train itself in
             | realtime.
        
               | waboremo wrote:
               | The implication is that you would bother with ChatGPT at
               | all to train it on the relevant local data, the key value
               | aspect to ChatGPT beyond general public use.
        
           | renewiltord wrote:
           | Not that I don't expect them to do this, but how is it
           | expressly said to be so?
           | 
           | https://help.openai.com/en/articles/5722486-how-your-data-
           | is...
           | 
           | > _OpenAI does not use data submitted by customers via our
           | API to train OpenAI models or improve OpenAI's service
           | offering. In order to support the continuous improvement of
           | our models, you can fill out this form to opt-in to share
           | your data with us. Sharing your data with us not only helps
           | our models become more accurate and better at solving your
           | specific problem, it also helps improve their general
           | capabilities and safety._
        
             | avereveard wrote:
             | Hehe old tos trick. Here it doesn't say "will never use"
             | but say "does not use" and I wager below or somewhere will
             | say that they can change the tos at any time in the future
             | unilaterally
        
             | kristofferR wrote:
             | Did you read the next paragraph?
             | 
             | > When you use our non-API consumer services ChatGPT or
             | DALL-E, we may use the data you provide us to improve our
             | models.
        
               | renewiltord wrote:
               | I _definitely_ did not correctly read that. Thanks for
               | the clarification. Totally misread the  'our API' bit!
               | 
               | It's also in the FAQ:
               | https://help.openai.com/en/articles/6783457-chatgpt-
               | general-...
               | 
               | > _Will you use my conversations for training?_
               | 
               | > _Yes. Your conversations may be reviewed by our AI
               | trainers to improve our systems._
        
         | marcosdumay wrote:
         | > Companies which are able to operate without these fears will
         | move faster
         | 
         | Or the fears are real and companies that operate without them
         | will be exploited, or extinguished for annoying their
         | customers.
        
       | yellow_postit wrote:
       | This cycle happens regularly and it seems often times the service
       | provider wises up and charges for extra controls.
       | 
       | Yammer pre-Microsoft and nowadays Blind -- lots of "insider"
       | information seemingly posted.
       | 
       | As usage goes up the target size, and opportunity cost, both go
       | up.
        
       | agloe_dreams wrote:
       | Meanwhile over at Github Copilot...
       | 
       | Hahahahahahaha
        
         | dubbelboer wrote:
         | If you're using Github already then Copilot isn't seeing
         | anything new.
        
           | agloe_dreams wrote:
           | Correct, but that level of security is expected from GitHub
           | proper, they have all sorts of independent security reviews
           | for their partners. Does all of that exist for Copilot?
        
             | grepfru_it wrote:
             | Yes
        
       | WhereIsTheTruth wrote:
       | That's why privacy is important in the real world
       | 
       | Imagine if everyone knew your inner secrets just by looking at
       | you at the bar..
       | 
       | Why is it different online? i have no idea.. well i kinda know
       | but.. oh well.. we deserve it i guess
        
       | bob1029 wrote:
       | We published an internal policy for AI tools last week. The basic
       | theme is: "We see the value too, but please don't copypasta our
       | intellectual property until we get a chance to stand up something
       | internal."
       | 
       | We've granted some exceptions to the team responsible for
       | determining _how_ to stand up something internal. Lots of
       | shooting in the dark going on here, so I figured we would need
       | some divulgence of our IP against public tools to gain traction.
        
         | meghan_rain wrote:
         | Inform us when you figured out a way to host something with the
         | quality of ChatGPT internally :-)
        
           | visarga wrote:
           | You can use chatGPT inside Azure, like any other service.
           | It's not the same one used by OpenAI, and there are different
           | guarantees.
           | 
           | > ChatGPT is now available in Azure OpenAI Service
           | 
           | https://azure.microsoft.com/en-us/blog/chatgpt-is-now-
           | availa...
        
             | meghan_rain wrote:
             | Sorry but the whole point is to not use a closed source
             | third party API with a dubious privacy police ran buy a
             | multinational surveillance capitalism megacorporation.
        
           | philsquared_ wrote:
           | Just use the API? It deletes your data after 30 days...
        
           | bob1029 wrote:
           | Even if we had a 100% private ChatGPT instance, it wouldn't
           | fully cover our internal use case.
           | 
           | There is way more context to our business than can fit in
           | 4/8/32k tokens. Even if we could fit the 32k token budget, it
           | would be _very_ expensive to run like this 24 /7. Fine-tuning
           | a base model is the only practical/affordable path for us.
        
             | cmelbye wrote:
             | You can retrieve information on demand based on what the
             | user is asking, like this:
             | https://github.com/openai/chatgpt-retrieval-plugin
        
           | knodi123 wrote:
           | I can host llama's 7b model internally. It hallucinates more
           | often than not and has a tendency to ramble, but dammit it's
           | local and secure!
        
             | walthamstow wrote:
             | What's it like with code, documentation, regex, etc?
             | 
             | That's all I use ChatGPT for. I don't need it to be able to
             | write poetry.
        
               | beiller wrote:
               | I did not verify this regex, on the very surface scan
               | seems OK:
               | 
               | ./main -m ./models/7B/ggml-model-q4_0.bin -t 4 --temp 0.7
               | --top_k 40 --top_p 0.5 --repeat_last_n 256
               | --repeat_penalty 1.17647 -n 1024 -p $'Here is a handy
               | short form regex to validate an email address: '
               | 
               | Here is a handy short form regex to validate an email
               | address:
               | ^([a-zA-Z0-9_.+-]+@[a-zA-Z0-9.-]+\\.[a-z\\.]{2,6})+$ The
               | first character can be either uppercase or lower case.
               | The second group must contain at least one letter and the
               | third group may contain any number of characters (up to
               | 5). The last part ensures that it ends with @ followed by
               | two more letters separated by dots. If you want to make
               | sure that your input string contains only valid
               | characters for emails then use this regex instead:
               | \A[\w.] _@[\w.]_ \\.\w{1,4}\z
        
               | bob1029 wrote:
               | This looks like good performance. We are keeping an open
               | mind with regard to actually-open alternatives.
        
       | [deleted]
        
       | NoZebra120vClip wrote:
       | This is nothing new at all. How many people have Grammarly
       | plugins installed? They are advertising aggressively, so I'd
       | think it is the new hotness. Don't tell me Grammarly is not
       | hoovering up all of the Slack, Word, Docs, and Gmail data that
       | everyone sends it, and holding on for some future purpose. We'll
       | see.
        
         | sophiabits wrote:
         | Grammarly is an OpSec nightmare that's somehow managed to slip
         | under most people's radar. I know folks who won't use the Okta
         | browser extension because of the extensive permissions it asks
         | for, but will happily use Grammarly on everything.
         | 
         | Last time I looked (maybe things have improved...?) Grammarly
         | would automatically attach itself to any text box you
         | interacted with, and immediately send all of your content to
         | their servers for processing. How this software gets past IT
         | departments is a mystery to me.
        
         | belter wrote:
         | 2017... "Grammarly Vulnerability Allows Attackers To See
         | Sensitive Data of Their Customers" -
         | https://www.invicti.com/blog/web-security/grammarly-vulnerab...
        
       | brightball wrote:
       | This was my first concern when it came to IDE plugins.
        
       | bradgessler wrote:
       | Apple will make a killing if they can deploy private LLMs to
       | their M-series chips.
        
       | jcims wrote:
       | The wild thing is that you can just annotate the content with the
       | associated organization information then ask GPT to tell you what
       | it sees in the logs.
        
       | tombert wrote:
       | This is scary, but it doesn't surprise me even in the slightest.
       | ChatGPT is useful for so many things that it's extremely tempting
       | to convince yourself that you should trust it.
       | 
       | For example, I was having some issues with my LTO-6 drive
       | recently, and I had to finagle through a bunch of arcane server
       | logs to diagnose it. I had the idea of simply copypasting the
       | logs into ChatGPT and having it look at them, and it quickly
       | summarized the logs and told me what things to look for. It
       | didn't directly solve the problem, but it made the logs 100x more
       | digestible and I was able to figure out my problem. It made a
       | problem that probably would have taken 2-3 hours of Googling take
       | about 20 minutes of finagling.
       | 
       | I'm not doing anything terribly interesting or proprietary on my
       | home server, so I didn't really have any reservations sharing
       | dmesg logs with it, but obviously that might not be the case in a
       | company. Server logs can often have a _ton_ of data that could be
       | useful for a competitor (whether it should be there or not), and
       | someone not paying attention to what they 're pasting into
       | ChatGPT could easily expose that data.
        
       | johntiger1 wrote:
       | It's alright, I just told it that I don't consent to my data
       | being used. Checkmate openAI!
        
       | reneberlin wrote:
       | just because your code is art of course, which should be weighted
       | in gold and copyrighted for the next 2000 years living in a cold
       | wallet. Did you get that memo, that humans implemented all the
       | faulty security that lasted the past decades by accident?
        
       | balls187 wrote:
       | And what percent of employees send sensitive information via
       | unencrypted email?
        
       | deltree7 wrote:
       | Classic fearmongering article targeted to HN crowd.
       | 
       | i) In this world, there are very few people whose private
       | conversation is worth anything to anybody (celebrities,
       | journalists -- so around 10,000 people)
       | 
       | ii) A tiny tiny %age of information is truly secret (mostly
       | private keys).
       | 
       | iii) Business strategies are mostly a result of execution, not
       | any 'trade secrets'. Meta will succeed because it has executed
       | it's metaverse strategy, not because they kept the metaverse
       | strategy secret.
       | 
       | People who take risks and not care about irrelevant details (just
       | like how they took risk with internet shopping, cloud, SaaS) will
       | win. Losers like the ones who thought AWS will steal their data
       | will be left behind
        
       | nemo44x wrote:
       | Looking forward to AI on chip not too far down the road. As long
       | as we use APIs for the model itself we can't really use it for
       | much.
        
       | sigstoat wrote:
       | when it first came out and my boss was behind himself about how
       | cool it was, he was feeding it all of his emails with other
       | businesses to have it clean them up. boggled my mind.
        
         | dannyobrien wrote:
         | do those other businesses use gmail? does your company?
        
           | SketchySeaBeast wrote:
           | I think those are different models.
           | 
           | Gmail has a vested interested in keeping any knowledge it
           | gains about you secret - it's competitive advantage is
           | knowing more about you than anyone else does.
           | 
           | ChatGPT's strength is its ability to clearly communicate the
           | knowledge it has (including training data it gains from
           | people it interacts with) to give you good responses.
        
       | yawnxyz wrote:
       | I think there's more fear of OpenAI leaking data than say,
       | Airtable or Notion or Github or AWS/S3 or Cloudflare or Vercel or
       | some other company that has gobs of a company's data. Microsoft
       | also has gobs of data: anything on Office and Outlook is your
       | company data -- but the fear that they'll leak (intentional or
       | accidental) is somehow more contained.
       | 
       | If we want to be intellectually honest with ourselves, we can
       | either be fearful and have a plan to contain data from ALL of
       | these companies, OR, we address the risk of data leaks through
       | bugs as an equal threat. OpenAI uses Azure behind the scenes, so
       | it'll be as solid (or not solid) as most other cloud-based tools
       | IMO.
       | 
       | As for your data training their data: OpenAI is mostly a
       | Microsoft company now. Most companies use Microsoft for
       | documentation, code, communications, etc. If Microsoft wanted to
       | train on your data, they have all the corporate data in the
       | world. They would (or already could!) train on it.
       | 
       | If there's a fear that OpenAI will train their model on your data
       | submitted through their silly textbox toy, but NOT through
       | training on the troves of private corporate data, then that fear
       | is unwarranted too.
       | 
       | This is where OpenAI should just get a "corporate" tier, charge
       | more for it, and is basically make it HIPAA/SOC2/whatever
       | compliant, and basically do that to assuage the fears of
       | corporate customers.
        
         | rrdharan wrote:
         | 1. Azure has the worst security of the major cloud providers;
         | multiple insanely terrible RCE and open readable DB exposures.
         | 
         | 2. Azure infrastructure still likely has far better
         | security/privacy by virtue of all their compliance, (HIPAA,
         | FedRAMP, ISO certifications etc.) than whatever startup-move-
         | fast-ignore-compliance crap OpenAI layers on top of it in their
         | application layer.
        
         | philwelch wrote:
         | For most of those tools you can get your own self-hosted
         | version if you're worried about your data.
        
         | krisoft wrote:
         | > I think there's more fear of OpenAI leaking data than say,
         | Airtable or Notion or Github or AWS/S3 or Cloudflare or Vercel
         | or some other company that has gobs of a company's data.
         | 
         | There is zero fear. OpenAI openly writes that they are going to
         | use ChatGPT chats for training. On the popup modal they show
         | you when you load the page. That is not a fear, that is a
         | promise that they will do leak it whatever you tell them.
         | 
         | If i tell you "please give me a dollar, but i warn you you will
         | never see it again" would you describe your feeling over the
         | transaction as "fearful that the loan won't be repaid"?
        
       | satvikpendem wrote:
       | I noticed this too, this is why I'm working on a startup that
       | lets you train your data internally through a self-hosted OSS LLM
       | product. It's going to start off with just reading emails for now
       | until we can integrate with other products too, like
       | spreadsheets, Airtable, databases, etc. Basically an OSS version
       | of OpenAI ChatGPT plugins, I suppose. If you're interested,
       | message my at my email in my profile.
        
       | crop_rotation wrote:
       | In my experience most corporate employees just take the path of
       | least resistance. It is not uncommon for people to paste non
       | public data into websites just to do json formatting, and paste
       | base64 strings to random websites just to decode them. So just
       | telling people not to do something won't accomplish much. Most
       | corporate employees also somehow think they know better than the
       | policy.
       | 
       | Any company that doesn't want to feed data into ChatGPT should
       | need to proactively block both ChatGPT and any website serving as
       | a wrapper over it.
        
         | japhyr wrote:
         | > and any website serving as a wrapper over it
         | 
         | I agree this would be a good move, but it's going to be harder
         | and harder to do that definitively.
        
           | sophiabits wrote:
           | Perhaps we need an "LLM Block" browser extension? :)
        
         | itronitron wrote:
         | A while back I got to hear about how the IT team running my
         | then-employer's internal time reporting tool was sending all
         | the usage data through Google Analytics and how neat that was
         | for them to look at :\ .
         | 
         | I shudder to think what they are doing now.
        
       | guestbest wrote:
       | This is a user led data leak that ranks up there with Facebook
       | and LinkedIn asking for email passwords to "look for your
       | contacts to add".
        
       | varunjain99 wrote:
       | Not only do you have to worry about employees directly sharing
       | data, but many companies are also just wrappers around GPT. Or
       | they may use your data in the future to roll out new AI services.
       | 
       | While this is not a new problem -- employees share sensitive data
       | with Google all the time -- the data leakage will be more clear
       | than ever. With ads-based tracking and Google search, the leakage
       | was very indirect. With generative AI, it can literally
       | regurgitate memorized documents.
       | 
       | The security risk goes beyond data exfiltration. Folks are
       | already trying to teach the AI incorrect information by spamming
       | it with something like 2+2 = 5.
       | 
       | Data exfiltration + incorrect data injection are super underrated
       | risks to mass adoption of generative AI tech in the B2B world...
        
       | cuuupid wrote:
       | We block ChatGPT, as do most federal contractors. I think it's a
       | horrible exploit waiting to happen:
       | 
       | - there's no way they're manually scrubbing out sensitive data so
       | its bound to spill out from the training data when prompting the
       | model
       | 
       | - OpenAI is openly storing all this data they're collecting to
       | the extent that they've had several leaks now where people can
       | see others' conversations and data. We are one step away if it
       | hasn't already happened from an exploit of their systems (that
       | likely weren't built with security as the top priority as opposed
       | to scale and performance) that could leak a monumental amount of
       | data from users.
       | 
       | In the most innocent case they could leak the personal info of
       | naive users. But largely if Linkedin is any indication, the
       | business world is filled with dopes who genuinely believe the AI
       | is free thinking and better than their employees. For every org
       | that restricts ChatGPT use, there are fifty others that don't,
       | most of which have at least one of said dopes who are ready to
       | upload confidential data at a moments notice.
       | 
       | Wouldn't even put it past military personnel putting S/TS
       | information into it at this point. OpenAI should include more
       | brazen warnings against providing this type of data if they want
       | to keep up this facade of "we can't release it because ethics"
       | because cybersecurity is a much more real liability than a
       | supervised LM turning into terminator.
        
         | probablynish wrote:
         | This really depends on the cost/benefit tradeoff for the entity
         | in question. If using ChatGPT makes you X% more productive
         | (shipping faster / lowers labor costs / etc), but comes with Y%
         | risk of data leakage, is that worth it in expectation or not? I
         | would argue that there definitely exist companies for which
         | it's worth the tradeoff.
         | 
         | By the way, OpenAI says they wont use data submitted through
         | its API for model training -
         | https://techcrunch.com/2023/03/01/addressing-criticism-opena...
        
           | eldaisfish wrote:
           | the #1 problem with corporations saying things is that many
           | things they say are not regulated or are taken on good faith.
           | What happens with OpenAI are acquired and the rules change?
           | These comments are often entirely worthless.
        
           | eptcyka wrote:
           | Risk of leakage? It is not a risk, it is a matter of time.
        
           | osterbit2 wrote:
           | To anyone who may be pasting code along the lines of 'convert
           | this sql table schema into a [pydantic model|JSON Schema]'
           | where you're pasting in the text, just ask it instead to
           | write you a [python|go|bash|...] function that _reads in a
           | text file_ and  'converts an sql table schema to output x' or
           | whatever. Related/not-related--great pandas docs replacement
           | is another great+safe use-case.
           | 
           | Point is, for a meaningful subset of high-value use-cases you
           | don't _need_ to move your important private stuff across any
           | trust boundaries, and it still can be pretty helpful...so
           | just calling that out in case that 's useful to anyone...
        
             | bsder wrote:
             | Do you really think the people asking ChatGPT to write
             | their code can make that abstraction?
             | 
             | The fact that the can't do this is the whole reason they
             | have to use ChatGPT.
        
               | lukifer wrote:
               | I've been doing this kind of thing pretty regularly for
               | the past few weeks, even though I know how to do any of
               | the tasks in question. It's usually still faster, even
               | when taking the time to anonymize the details; and I
               | don't paste anything I wouldn't put on a public gist
               | (lots of "foo, bar", etc)
        
               | q7xvh97o2pDhNrh wrote:
               | I use it because it's 10-100x more interesting, fun, and
               | fast as a way to program, instead of me having to
               | personally hand-craft hundreds of lines of boilerplate
               | API interaction code every time I want to get something
               | done.
               | 
               | Besides, it's not like it puts out _great_ code (or even
               | always _working_ code), so I still have to read
               | everything and debug it. And sometimes it writes code
               | that is just fine _and_ fit for purpose _and_
               | horrendously ugly, so I still have to scrap everything
               | and do it myself.
               | 
               | (And then sometimes I spend 10x as long doing _that,_
               | because it turns out it 's also just plain good fun to
               | grow an aesthetic corner of the code just for the hell of
               | it, too -- as long as I don't _have_ to.)
               | 
               | And even after all that extra time is factored back in:
               | it's _still_ way faster and more fun than the before-
               | times. I 'm actually enjoying building things again.
        
               | funfunfunction wrote:
               | People aren't using ChatGPT because they can't do it
               | themselves, they're using it to save time.
        
             | ChatPGT wrote:
             | I'm doing that since day one. I can't believe people are
             | pasting real data into this corporate black boxes.
        
           | heyyyouu wrote:
           | But that's the API, not the Chat input or Playground.
           | 
           | Companies can use Azure OpenAI Services to get around this --
           | there's data privacy, encryption, SLAs even. The problem is
           | it's very hard to get access to (right now).
        
         | nostromo wrote:
         | Does blocking ever work? People are smart and usually just work
         | around them.
        
           | PeterisP wrote:
           | It works in the sense that it does add an extra "reminder"
           | and requires specific intent. I mean, in this scenario all
           | the people already have been informed that they're absolutely
           | not allowed to do things like that, but if someone has
           | forgotten that, or simply is careless and just wants to "try
           | something out" then if it's unblocked they might actually do
           | it, but if they need to work around a restriction, that
           | forces them to acknowledge that there _is_ a restriction and
           | they shouldn 't try to work around it even if they can.
        
           | mnd999 wrote:
           | The smart ones don't paste in all their private data.
           | 
           | And yes, if the bypassing the block is combined with
           | disciplinary action, it does work. It's not worth getting
           | fired over. This is likely what heavily regulated industries
           | like financial services and defense are doing.
        
           | tedunangst wrote:
           | Blocks are effective reminders of policies.
        
             | acomjean wrote:
             | I remember someone trying to look up winning lottery
             | numbers at work. The site came up "Blocked: Gambling". It
             | was a little reminder that they're watching our web
             | browsing at work..
        
         | michaelteter wrote:
         | Possibly I don't know how this all works, but I think if the
         | host of a ChatGPT interface were willing to provide their own
         | API key (and pay), they could then provide a "service" to
         | others (and collect all input).
         | 
         | In that case, you wouldn't know to block them until it was too
         | late.
         | 
         | Ultimately either you must watch/block all outgoing traffic, or
         | you must train your people so thoroughly that they become
         | suspicious of everything. Sadly, being paranoid is probably the
         | most economical attitude these days if IP and company secrets
         | have any value.
        
           | dragonwriter wrote:
           | > Possibly I don't know how this all works, but I think if
           | the host of a ChatGPT interface were willing to provide their
           | own API key (and pay), they could then provide a "service" to
           | others (and collect all input).
           | 
           | Well, GP was referring to blocking ChatGPT _as a federal
           | contractor_. I suspect that as a federal contractor, they are
           | also vetting other people that they share data with, not just
           | blocking ChatGPT as a one-off thing. I mean, generic federal
           | data isn't as tightly regulated as, say, HIPAA PHI (having
           | spent quite a lot of time working for a place that handles
           | both), but there _are_ externally-imposed rules and
           | consequences, unlike simple internal-proprietary data.
        
             | michaelteter wrote:
             | But it really seems like a cat and mouse game. For example,
             | a very determined bad actor could infiltrate some lesser
             | approved government contractor and provide an additional
             | interface/API which would invite such information leaking,
             | and possibly nobody would notice for a long time.
        
               | Jensson wrote:
               | And then they could face death penalty for espionage if
               | they leaked sensitive enough data. You would have to be
               | really stupid to build such a service for government
               | contractors unless you actually are a foreign spy.
        
               | sebzim4500 wrote:
               | At least then we would finally find out if it is
               | constituional to execute someone for espionage.
        
         | yathrowaway01 wrote:
         | So you block internet access for all employees? Cos anything
         | you think is being pasted into ChatGPT is being pasted
         | everywhere, whether its Google, Slack, Chrome Plugins, public
         | Wifi.
        
         | [deleted]
        
         | m000 wrote:
         | Does US intelligence have access to OpenAI data? Private
         | organizations is one thing. But with all the dopes in
         | government positions around the world, OpenAI logs would
         | probably be a treasure trove for intelligence gathering.
        
           | grepfru_it wrote:
           | They are just one national security letter away from all US-
           | held data.
        
         | Ensorceled wrote:
         | Let's also not discount that for every "dope" there is at least
         | one "bad actor" who is willing to take the risk to get an edge
         | in their workplace or appease their managers demands. The
         | warnings will only deter the first group.
        
         | comment_ran wrote:
         | If your competitor use ChatGPT to compete with you and they're
         | 10x productive than yours, are you still willing to insist? If
         | the productive is 100x, will you?
        
           | amelius wrote:
           | Uh, there's no sign of that yet.
        
             | chaxor wrote:
             | [flagged]
        
               | Ensorceled wrote:
               | > Think about how long it's taken tools like pandas to
               | reach the point that it is now. That entire package can
               | be built to the level it is now in a couple of days.
               | 
               | I don't think that is true at all. Do you have an example
               | of a significant project being duplicated in days, or
               | even months, with ANY of these tools?
               | 
               | By significant, I mean something on the order of pandas
               | which you claimed.
        
               | msm_ wrote:
               | And this is completely ignoring the fact, that the real
               | hard problem is the design. Spitting boilerplate code is
               | not. How pandas could be designed perfectly in one
               | afternoon (and generated with GPT) is beyond my
               | comprehension.
        
               | brickteacup wrote:
               | Cool, please provide a link to a library of similar size
               | and complexity to pandas which was written using ChatGPT
               | in the span of a few days. We'll be waiting.
        
               | saulpw wrote:
               | > Think about how long it's taken tools like pandas to
               | reach the point that it is now. That entire package can
               | be built to the level it is now in a couple of days.
               | 
               | Let me hand you a mirror: You're absolutely and
               | completely wrong
        
               | pedrosorio wrote:
               | > Think about how long it's taken tools like pandas to
               | reach the point that it is now. That entire package can
               | be built to the level it is now in a couple of days.
               | 
               | I am having trouble parsing this statement. You're saying
               | a person equipped with chatGPT trained on data prior to
               | December 2007 (the month before the initial pandas
               | release) could have put together the entire pandas
               | library in a couple of days?
               | 
               | That seems obviously wrong, starting with the fact that
               | one would need to know "what" to build in the first
               | place. If you're saying that chatGPT in 2023 can spit out
               | pandas library source code when asked for it directly,
               | that's obvious.
               | 
               | Somewhere between the impossible statement and the
               | obvious statement I made above, there must be something
               | interesting that you were trying to claim. What was it?
        
               | misnome wrote:
               | > There is an immense amount of evidence of that
               | 
               | Then it should be easy to provide some?
        
           | brobdingnagians wrote:
           | It might be just as likely that ChapGPT will cause a mistake
           | like Knight Capital because no one bothered to thoroughly
           | verify the AI's looks-good-but-deeply-flawed answer, and the
           | two aren't mutually exclusive possibilities.
        
             | causality0 wrote:
             | Right. I've had ChatGPT completely fail at something as
             | simple as writing a batch file to find and replace text in
             | a text file.
        
               | arcticfox wrote:
               | Sure, but humans do that all the time as well
        
               | remexre wrote:
               | Humans are a lot better at "I don't know how to do this;
               | hey Alice, can you look this over if you've got a sec and
               | tell me if I'm making a noob mistake"
        
           | bcrosby95 wrote:
           | Security and privacy should be table stakes. Speaking for my
           | country, we needs privacy laws with teeth to punish bad
           | actors from shitting people's private information where ever
           | they want in the name of a dollar.
        
           | tomatotomato37 wrote:
           | This isn't an argument of ChatGPT vs nothing. This is an
           | argument of "external" ChatGPT vs some other AI sitting on
           | your own secured hardware, maybe even a branch of ChatGPT.
        
             | hgsgm wrote:
             | > some other AI sitting on your own secured hardware, maybe
             | even a branch of ChatGPT.
             | 
             | Where can I, a random employee, get that? I know how to get
             | ChatGPT.
        
               | ghaff wrote:
               | You can't. So maybe you as a random employee should just
               | do without whatever IT hasn't approved whether you agree
               | or not.
        
       | dougb5 wrote:
       | Given that they use all the labor of the Internet without
       | attribution, we should assume that they will use every additional
       | drop of data we give to them for their own ends.
        
         | ChatGTP wrote:
         | This is what I hate about it.
        
       | DoingIsLearning wrote:
       | Funnily enough I wrote a cautionary comment on this just 2 days
       | ago :
       | 
       | https://news.ycombinator.com/item?id=35299695
        
       | jedberg wrote:
       | No one cares about security because there is no consequence for
       | getting it wrong. Look at all the major breaches ever. And look
       | specifically at the stock price of those companies. They took
       | small short term hits at best.
       | 
       | Worst case the CISO gets fired and then they all play musical
       | chairs and end up in new roles.
       | 
       | Heck, even Lastpass, ostensibly a _security company_ , doesn't
       | seem particularly affected by their breach.
       | 
       | My point is, especially with ChatGPT, where it can reasonably 10x
       | your productivity, most people will be willing to take the risk.
        
       | baxtr wrote:
       | We went pretty quickly from:
       | 
       | No way I'm giving Google any of my data! I will use 5 different
       | browsers in incognito mode and never log in.
       | 
       | To ->
       | 
       | Sure I will login with my name and email and feed you as much of
       | my most personal thoughts and data as I can dear ChatGPT!
        
         | grammers wrote:
         | Not quite. These are the same people that use only Chrome while
         | being logged in to their Google account. Convenience wins.
        
           | smt88 wrote:
           | > _These are the same people that use only Chrome while being
           | logged in to their Google account_
           | 
           | This situation is much dumber than that. ChatGPT is very
           | clear that you shouldn't give it private data and that
           | anything you type into it can/will be used for training.
           | 
           | Google is nowhere near that level of transparency.
        
         | contravariant wrote:
         | Any examples where those two were the same person?
         | 
         | Because both types of people have always existed. Heck, lack of
         | vigilance among the ancient Greeks is what put the Trojan in
         | Trojan horse.
        
           | croes wrote:
           | I bet you find some on HN.
           | 
           | Sometimes couriosity beats caution.
        
           | smodo wrote:
           | It was a lack of vigilance among the Trojans, actually. The
           | Greeks did the burning an pillaging. So uh OpenAI is the
           | Greeks, ChatGPT is the horse and Microsoft is king Menelaos
           | or something. Achilles is dead, but he did make TempleOS.
        
         | philsquared_ wrote:
         | Google takes your data and sells it. Literally making your data
         | available to the highest bidder. Is OpenAI doing that? If
         | Google existed in its current form during the early internet it
         | would be classified in the same category as Bonzai Buddy.
         | Spyware. That is what Google is. So I can very reasonably
         | understand why people would trust OpenAI with data they
         | wouldn't trust Google with. OpenAI hasn't spit in the face of
         | its users yet.
        
           | SketchySeaBeast wrote:
           | > Google takes your data and sells it. Literally making your
           | data available to the highest bidder.
           | 
           | But it doesn't, does it? It sells the fact that it knows
           | everything about everyone and can get any ad to the perfect
           | people for it. It's not going on the open market and telling
           | people I regularly buy 12 lbs of marshmallow fluff and then
           | use it in videos I keep on my google drive.
        
           | noncoml wrote:
           | > Google takes your data and sells it. Literally making your
           | data available to the highest bidder.
           | 
           | Even if they are not doing it now(?), what makes you think
           | that they will not do so in the future? It's not like your
           | data has an expiration date.
        
             | philsquared_ wrote:
             | Because they are completely different business models. If
             | OpenAI decides to become an advertising behemoth then I
             | would show concern. Right now they use your data for
             | training (when they use it).
        
               | Jensson wrote:
               | OpenAI is selling others data in their model responses.
               | Selling others data is their main business model.
               | 
               | If it uses user data to train their models other users
               | could ask "Show me the code for gmail spam filters", and
               | if it was trained on engineers refactoring that spam
               | filter in ChatGPT chances are it would give you the code.
               | If that doesn't count as "selling user data" I don't know
               | what is. They not only sell it, they nicely package and
               | rewrite it to make it easy to avoid copyright claims!
        
               | JohnFen wrote:
               | OpenAI has already demonstrated that they're all in for
               | maximizing profit. They may not be advertisers, but
               | advertisers aren't the only sorts of companies that make
               | bank by selling personal data.
               | 
               | I see no reason to think OpenAI would leave that money on
               | the table.
        
           | dragonwriter wrote:
           | > Google takes your data and sells it. Literally making your
           | data available to the highest bidder.
           | 
           | No, Google takes money to present ads to people of different
           | demographics, and _uses_ your data to do that. It doesn't
           | sell your data, which is, in fact their competitive edge in
           | ads - selling your data would be selling the cow when they'd
           | prefer to sell the milk.
        
           | jankeymeulen wrote:
           | Not really. Even the most evil Google one can imagine would
           | realise "your data" is the most valuable thing they possess,
           | selling it would be bad for business. They're selling ads to
           | the highest bidder who's looking for someone with a profile
           | based on your data, but not your data itself.
        
             | JohnFen wrote:
             | True, but that's not actually any better. And it still
             | counts as selling your data, just indirectly.
        
           | JohnFen wrote:
           | > I can very reasonably understand why people would trust
           | OpenAI with data they wouldn't trust Google with. OpenAI
           | hasn't spit in the face of its users yet.
           | 
           | In other words, having been burnt once by touching a flame,
           | the conclusion these people draw is that the problem was with
           | _that particular_ flame and they 're fine with reaching for a
           | different one?
        
         | bcrosby95 wrote:
         | My problem with Google is they'll ban me from gmail for
         | something I do on youtube.
        
       | interstice wrote:
       | Could there be eg a browser extension for scrubbing sensitive
       | data from input on paste?
       | 
       | I'm hoping OpenAI will implement something like this on their end
       | soon like data monitoring apps do (sentry etc), but if they don't
       | client-side is an option.
        
       | MarkusWandel wrote:
       | Wouldn't it be trivial to add a "read-only" mode to the LLM's
       | operation, where it uses stored knowledge to answer queries but
       | doesn't ingest new knowledge from those queries?
        
         | johntiger1 wrote:
         | Why isn't it read-only by default? it's not even connected to
         | the internet
        
           | ben_w wrote:
           | ChatGPT, and I think all the GPT LLMs, is _only_ accessible
           | over the internet as far as I can tell.
           | 
           | And the thumbs up/down are there on the chat interface
           | because it's partly trained by reinforcement from human
           | feedback.
        
             | og_kalu wrote:
             | Nope. LLMs don't use the internet for inference at all
             | unless you give it access to a web search api or something
             | like that. chtGPT is just too massive to run on any local
             | machine. But make no mistake, it does not require the
             | internet.
        
         | humanistbot wrote:
         | From https://help.openai.com/en/articles/7039943-data-usage-
         | for-c...:
         | 
         | > You can request to opt out of having your content used to
         | improve our services at any time by filling out this form (http
         | s://docs.google.com/forms/d/1t2y-arKhcjlKc1I5ohl9Gb16t6S...).
         | This opt out will apply on a going-forward basis only.
         | 
         | It goes to a google form, which is I guess better then them
         | building their own survey platform from scratch that may have
         | more vulnerabilities.
        
           | eternalban wrote:
           | > them building their own survey platform from scratch
           | 
           | Funny. This nobel prize winner raises an interesting
           | question:
           | 
           |  _If your AI is so great at coding, why is your software so
           | buggy?_ : https://paulromer.net/openai-bug/
        
           | meghan_rain wrote:
           | I'd be worried this is also the "how to get banned from
           | OpenAI in the near future" form. and if OpenAI retains a
           | monopoly like Google does for search, you are basically
           | screwed.
        
             | visarga wrote:
             | No, everyone will have GPT-4 level AI in 6-12 months.
        
             | ben_w wrote:
             | I'm old enough to remember when newspapers reported hackers
             | being banned from using computers.
             | 
             | And IP pirates banned from using the internet. Actually,
             | that one I remember I voted against my local MP after they
             | passed a law to make that the norm.
             | 
             | We don't yet have a social, let alone legal, norm for
             | antisocial use of LLMs; even with social media, government
             | rules are playing catch-up with terms of service, and
             | Facebook is old enough that if it was human it could now
             | vote.
             | 
             | So, yes, likewise computers/internet/social media, being
             | banned from an LLM _if it 's a monopoly_ is going to
             | seriously harm people.
             | 
             | But that is likely to be a big "if". The architecture and
             | the the core training data for GPT-3 isn't a secret, and
             | that's already pretty impressive even if lesser than 3.5
             | and 4.
        
       | w_for_wumbo wrote:
       | This is the issue with a tool so powerful, you can't just tell
       | people not to use it, or to use it responsibly. Because there's
       | too much incentive for them to use it. If it saves hours of a
       | persons' workday, and they're not seeing any of the harm caused
       | from data leakage, there's no incentive for them to not use it.
       | 
       | Which is why a private option is so critical. To not fight
       | against human nature, means providing an ability to use the tool
       | in a safe way.
        
         | raincole wrote:
         | > you can't just tell people not to use it
         | 
         | Uh, why can't you tell people not to use it...? If security is
         | that important for your company, of course you can tell your
         | employees which tools to use.
         | 
         | A fun fact: in many areas of TSMC, _smart phones_ are banned.
         | No one says  "you can't just tell people not to use smart
         | phones."
        
           | q845712 wrote:
           | https://www.bleepingcomputer.com/news/technology/fitness-
           | tra...
        
         | ghaff wrote:
         | 1.) It's not at all clear it's nearly as powerful as you think.
         | Certainly in my domain--writing about various topics--it's not.
         | 
         | 2.) Of course, you can tell people not to use it. Unlike people
         | at SV companies apparently, people in government and government
         | contractors accept restrictions like not having phones in
         | secure labs _all the time_. Start firing or even prosecuting
         | people and people will discover very quickly they don 't really
         | need some tool.
         | 
         | And, yes, private versions of this sort of thing helps a lot.
        
         | JohnFen wrote:
         | There's a dev here who is using ChatGPT extensively in his
         | work. The rest of the team is just waiting for him to get
         | caught and fired. Sharing company data with unapproved external
         | entities is very definitely a firing offense.
        
           | mym1990 wrote:
           | If you really care about your company's security, you should
           | report it, otherwise you are just complicit.
        
             | JohnFen wrote:
             | I agree.
        
           | yathrowaway01 wrote:
           | Glad I work for a company where the CEO pays for everyones
           | ChatGPT Plus for the devs. If you think your code is special
           | then you're wrong.
        
             | kristianp wrote:
             | Does chatgpt plus collect data for training, or does it
             | have more privacy than the free offering?
        
             | JohnFen wrote:
             | That entirely depends on the code. It's not that the code
             | is special, it's that the code can reveal things that are
             | competitive advantages (future plans, etc.)
        
             | msm_ wrote:
             | But you created a throwaway account specifically to reply
             | in this thread?
             | 
             | Unless your company really has nothing to hide, it's easy
             | to accidentally dump a company secret or an API key in a
             | chat session. Of course if everyone is aware of this and
             | constantly careful then you may be OK.
        
               | yathrowaway01 wrote:
               | That's because accounts get shadow banned all the time
               | when people get upset when you point out hard truths.
               | 
               | If you're copy pasting API keys or such into ANYTHING,
               | you probably shouldn't be a programmer to begin with.
               | 
               | It's like people who use root account key/secret
               | credentials in their codebase. It's not AWSs fault you
               | got a large bill or got hacked, its because you're dumb.
        
               | dizhn wrote:
               | I posted my openAI token into a GitHub issue today
               | thinking I'd just kill it right away, which I did but
               | there was already an email from openAI letting me know
               | that it was noticed that my token had become public and
               | was thus revoked.
        
               | bcrosby95 wrote:
               | I regularly say shit that pisses people off here and I
               | have never been shadow banned. It sounds like your "hard
               | truths" are something other than just "hard truths",
               | and/or you have a persecution complex.
        
               | yathrowaway01 wrote:
               | Your Karma is over 7000, if you get downvoted your stuff
               | is still visible.
        
               | jupp0r wrote:
               | If your code has API keys in it, you have bigger problems
               | than ChatGPT.
        
             | croes wrote:
             | You are still transferring your business data to an
             | external entity, but on top of it you pay for it.
             | 
             | And if you think that there is no special code then you're
             | wrong.
        
             | krono wrote:
             | If the contract says the code is special, then the code is
             | special.
        
           | arcticfox wrote:
           | If they're more productive by doing it, I think it's an equal
           | chance said dev gets promoted.
        
             | JohnFen wrote:
             | He's not more productive, but even if he were, it wouldn't
             | affect his getting fired.
             | 
             | Productivity isn't everything.
        
             | croes wrote:
             | Why?
             | 
             | Uploading code to ChatGPT can be done by trainees.
        
       | robbywashere_ wrote:
       | The real innovation of chatgpt will be how you go about getting
       | permission to use it from your corpo middle management overlords.
        
       | phn wrote:
       | Employees have been feeding sensitive data to online translators
       | in the same way.
       | 
       | Not to say it isn't a problem, but it's not an AI/chatGPT
       | specific one.
        
       | twblalock wrote:
       | If you think this is bad, imagine what JSONLint has seen over the
       | years.
        
       | nbzso wrote:
       | OpenAI. The heist of the century. I am waiting for A.I. generated
       | blockbuster in the near future.
        
         | yieldcrv wrote:
         | I was just watching Altman's interview from the Lex Friedman
         | podcast a few days ago
         | 
         | It really does feel like YC was a plot to fund the harvesting
         | of all with an OpenAI climax. Not a serious conclusion I have,
         | its just funny to watch it unfold, as if nobody even cares
         | about the optics.
        
       | syntaxing wrote:
       | I'm curious if anyone's employer has set up their own LLM. My
       | employer has a couple of A100 sitting around which could easily
       | host a couple instance of 65B LLaMA or Alpaca. Convincing upper
       | management to allow me is the hard part.
        
         | tiborsaas wrote:
         | Team up with someone from sales, marketing :)
        
       ___________________________________________________________________
       (page generated 2023-03-27 23:00 UTC)