[HN Gopher] New AI classifier for indicating AI-written text
___________________________________________________________________
New AI classifier for indicating AI-written text
Author : davidbarker
Score : 236 points
Date : 2023-01-31 18:11 UTC (4 hours ago)
(HTM) web link (openai.com)
(TXT) w3m dump (openai.com)
| peter303 wrote:
| OpenAI archives every request and output text. Why not compare
| suspected A.I. text against this?
| nothrowaways wrote:
| is this the new Anti virus?
| minimaxir wrote:
| > Our classifier is not fully reliable. In our evaluations on a
| "challenge set" of English texts, our classifier correctly
| identifies 26% of AI-written text (true positives) as "likely AI-
| written," while incorrectly labeling human-written text as AI-
| written 9% of the time (false positives).
|
| That is an interesting mathematical description of "not fully
| reliable".
| thatcherthorn wrote:
| Similarly to deep fakes, it seems that creating tools to
| distinguish between human and AI generated data will cause more
| harm then good if. Models to distinguish will never be perfect
| and an actor that can fool such a model will be very effective.
| LanceJones wrote:
| What about introducing a new code of ethics that students sign?
| They agree to disclose the level of help (1 - 10) provided by GPT
| and the teacher/instructor/prof grades accordingly. Silly?
| Magi604 wrote:
| Just filter your text through Quillbot to get around "AI
| Detection".
|
| https://quillbot.com/
|
| Demonstration: https://youtu.be/gp64fukhBaU?t=197
|
| The arms race continues...
| GOONIMMUNE wrote:
| This seems like a sort of unwinnable arms race. Can't the people
| who work on generative text models use this classifier as a
| feedback mechanism so that their output doesn't flag it? I'm not
| an AI expert, but I believe this is even the core mechanism
| behind Generative Adversarial Networks.
| londons_explore wrote:
| Detectors can be a black box "pay $5 per detection" type
| service.
|
| That way, you can't fire thousands of texts at it to retrain
| your generative net.
|
| Plagiarism detectors in schools and universities work the same.
| In fact, some plagiarism detection companies now offer the same
| software to students to allow them to pay some money to pre-
| scan their classwork to see if it will be detected...
| Buttons840 wrote:
| Make a model to detect cheating. Market it as "a custom built
| and unique model to detect cheating; able to catch cheating
| that other models miss!" It's all 100% true. Market and
| profit.
| telotortium wrote:
| $5 is way too high of a price to use regularly. In any case,
| if it's only available to education institutions, teachers
| and grad students are poor enough to sell access to it to
| people on the dark web for the right price.
| mritchie712 wrote:
| There's also always going to be more capital going towards
| building better generators than better detectors.
| standardly wrote:
| IMO we need a mass-adopted digital signature solution using
| biometric identifiers such that publishing an article, or even a
| comment, can be signed by and only by a biological human.
| khyryk wrote:
| Any ideas on how the "only by" would work? I'm not seeing a way
| around pasting generated text and signing it as one's own work.
| Proof of work solutions would have to have a high cost for
| anyone to care, otherwise there would be bots "proving" the
| work of writing an essay by generating it in phases like a
| human would edit drafts.
| standardly wrote:
| This is the really first time I've thought about this, so
| uh... No, no ideas.
|
| One's identity would need verified concurrent with creation
| of a text. I am not really satisfied with the idea of a
| specialized word processor or input device that does
| biometric validation, I'd rather have a specific,
| standardized protocol. I wonder if this is already deemed
| impossible, or if someone is working on the problem.
| Someone1234 wrote:
| You want to ban privacy?
| standardly wrote:
| Proving you are a human and not a computer doesn't have to
| publicly reveal a single thing about you (other than the
| crypto signature). Think of it like an SSL cert for a person
| rather than a server. I'm purely spitballing man. It's a
| problem someone will eventually have to come up with a
| solution for and I think we already have the tools.
| GMoromisato wrote:
| The problem is that one can't be trusted to sign their own
| work--otherwise they could sign AI-generated text. This
| only works if a trusted human signs your work after
| watching you generate it.
| standardly wrote:
| Easy. We just need an AI that serves as a public notary.
| Wait a minute..
|
| It really is an interesting problem to think about. The
| other commenter pointed out you could just sign an AI
| text - I see all the issues, but my gut feeling tells me
| there is an elegant solution somewhere.
| GMoromisato wrote:
| 1. This is an arms race. You can build a generative AI that
| avoids generating text caught by the classifier.
|
| 2. Maybe teachers will assign rare or even fictional topics that
| cannot be found in the AI training corpus. Maybe a teacher could
| use an AI to generate essay prompts that are hard for other AIs
| to write essays for.
|
| 3. Is this a problem long term? If an AI can generate an essay
| that's indistinguishable from a human-generated one, then why do
| we need to learn how to write essays? Maybe we should just learn
| how to write good prompts.
|
| See also: "Should calculators be banned in school?", "Do students
| need to learn cursive?", "Why should I learn Greek instead of
| just reading a translation of Homer?"
| w_for_wumbo wrote:
| So for me, I like to write my own ideas but use AI to reword them
| to be succinct and readable. I'm worried that usage would flag as
| AI text.
| BulgarianIdiot wrote:
| I wrote some text about the subjectivity of communication and the
| nature of natural language, and I kept it very neutral, formal
| and verbose. And it said "this text is likely AI".
|
| So, as honestly was predictable, people who rely on this tool
| being accurate, will inflict a lot of pain on unsuspecting
| individuals who simply write like GPT writes.
| gzer0 wrote:
| How good is this really?
|
| I input an article that was written, directly by chatGPT, and it
| came back as "The classifier considers the text to be unclear if
| it is AI-generated." This article was not edited, not put through
| any paraphrasers, or anything. Interesting.
|
| Furthermore, these efforts are quite futile. One can just go to
| numerous paraphrasers such as quillbot.com, run it through there,
| and then for added obfuscation, either use an entirely different
| paraphraser (Microsoft Word now has this capability, natively in
| the beta channels at least, btw).
|
| Yeah, for someone who has intentions of bypassing this, there
| will always be a way. It's a good effort, for sure. But, I don't
| see this doing much in terms of truly distinguishing AI vs non AI
| generated outputs.
| moneywoes wrote:
| 26% good
| [deleted]
| cjrd wrote:
| This is all predicated on existing conditions, where AI-written
| text hasn't influenced the way that humans write. As the years
| pass and these tools become a common way to at least "spot check"
| your own writing, I imagine that we will all begin to write in
| styles that are increasingly similar to AI-written text.
| felipelalli wrote:
| The irony here is that tool can be used by the AI in the future
| to self-training and be more and more like a human.
| antiterra wrote:
| Heck, you can use it as a manual adversarial output filter as
| it is right now.
| O__________O wrote:
| Related option that has benchmarks and research paper; appears
| they intend to release code & datasets too.
|
| DetectGPT: Zero-Shot Machine-Generated Text Detection
|
| - https://news.ycombinator.com/item?id=34557189
| ilaksh wrote:
| It can't possibly work reliably. It's going to be very
| challenging for honest kids because almost everyone is going to
| be cheating.
|
| The reality is that learning to think and write will be harder
| because of the ubiquity of text generation AI. This may be the
| last generation of kids where most are good at doing it on their
| own.
|
| On the other hand, at least a few will be able to use this as an
| instant feedback mechanism or personal tutor, so the potential
| for some carefully supervised students to learn faster is there.
|
| And it should increase the quality of writing overall if people
| start taking advantage of these tools. It's going to fairly
| quickly become somewhat like using a calculator.
|
| Actually it probably means that informal text will really stand
| out more.
|
| I am giving it the ability to do simple tasks given commands like
| !!create filename file content etc.
|
| It's actually now very important for kids to adapt quickly and
| learn how to take advantage of these tools if they are going to
| be able to find jobs or just adapt in general even if they don't
| have jobs. It actually is starting to look like everyone is
| either an entrepreneur or just unemployed.
|
| Learning about all the ways to use these tools and the ones
| coming up in the next few years could be quite critical for
| children's education.
|
| There are always going to be luddites of course. But it's looking
| like ChatGPT etc. are going to be the least of our problems. It
| is not hard to imagine that within twenty years or so, anyone
| without a high bandwidth connection to an advanced AI will be
| essentially irrelevant because their effective IQ will be less
| than half of those who are plugged in.
| logifail wrote:
| > it should increase the quality of writing overall if people
| start taking advantage of these tools
|
| Perversely, it might also dramatically decrease reading, if
| there's no incentive for anyone to need to properly understand
| anything.
|
| A pretty dire scenario :(
| LanceJones wrote:
| Feels like the battle between computer virus creators and anti-
| virus software all over again.
| ipnon wrote:
| ChatGPT is already quite effective at deceiving these models with
| simple prompts like, "write the output in a way that seems human
| and not AI generated, so as to bypass AI-written text detectors."
| gunshai wrote:
| Or educators could be forced to evolve around a new tool that
| _gasp_ requires a different measurement of skill one that is much
| harder to fake.
|
| The obvious one that already exists ... ORAL EXAMS.
| botplaysdice wrote:
| Is this new Turing test? Who can verify the classifier itself?
| siliconc0w wrote:
| Horrible idea, you can't eliminate the false positives and these
| are going to impact innocent students or used to re-enforce
| teacher biases.
| lumost wrote:
| I don't see why teachers don't use this as an opportunity to
| accelerate curriculum. Every student now has a cheap personal
| instructor. Why not raise the bar on difficulty and quality
| expectations for assignments?
| odipar wrote:
| As always it is the journey that matters (writing), not the
| outcome (the essay).
|
| For example, students could record their writing of an essay with
| a keylogger or something.
|
| Additionally - with the use of some advanced zero-knowledge algos
| or crypto timestamp provenance - it should be possible to prove
| that they have written the essay, without revealing their
| recording.
| ekanes wrote:
| Yes, sort of, though if the economic incentive was high enough,
| someone could connect the AI to input through the keyboard and
| "type" out the essay. At scale it would be cheap. You could
| record video of you typing, which would work for some time
| until at some point video fakes get advanced enough... sigh.
| barbazoo wrote:
| > Our classifier is not fully reliable. In our evaluations on a
| "challenge set" of English texts, our classifier correctly
| identifies 26% of AI-written text (true positives) as "likely AI-
| written," while incorrectly labeling human-written text as AI-
| written 9% of the time (false positives).
|
| > The classifier is very unreliable on short texts (below 1,000
| characters). Even longer texts are sometimes incorrectly labeled
| by the classifier.
| beefman wrote:
| We've lined up a fabulous type of gorilla...
| animanoir wrote:
| What's the point of launching this when they admit it doesn't
| works most of the time and adds to the confusion? We should just
| embrace the AI Chaos.
| dukeofdoom wrote:
| I've used ChatGTP to generate some code for me and almost every
| time it was a learning experience. I saved a lot of time
| searching, and it just gave me what I was after. Observing how
| someone or something like AI can solve a problem, is fast way to
| learn. I don't see a problem with this. Teachers can always just
| use in person tests to check if a student mastered the concepts.
| Math teachers got over students using calculators for homework,
| and can check understanding just fine on tests. It used to be
| that students would solve home work problems by candle light,
| with abacus and look up tables. Yet no one want to mandate back
| to that, just because it made homework harder.
| gibsonf1 wrote:
| Wow, that is truly not a good classifier with success that low.
| gzer0 wrote:
| On a side note:
|
| My online MBA has switched from TurnItIn to this website:
| https://unicheck.com
|
| And the benefit of this is... incredible. It allows students to
| purchase X amount of pages to check the plagiarism. Full reports
| on your work before hand.
|
| Not sure why this move was made, but it will be interesting to
| see once they integrate "possible AI detection" into UniCheck.
| e_i_pi_2 wrote:
| TurnItIn also let's students submit beforehand last I heard -
| if people are going to use tools like this then students should
| also get full access to make sure they won't get flagged ahead
| of time.
|
| I had some professors where you could fully grade your own
| assignments before submitting and it was the best courses I've
| ever taken - you're fully given all the knowledge to figure out
| what you know and what you don't
| nerdponx wrote:
| Do we have GANs for text yet?
| supernova87a wrote:
| I was thinking that there have been swings of what is valued (or
| trusted) in education and testing (or voting, promoting) to prove
| that someone has the goods.
|
| At one time it was live oration skill, and then people thought,
| "maybe that disfavors people who are introverted or whose talent
| comes from thinking and writing".
|
| Then, at another time, it was thought, "well you have to test
| because sometimes time pressure and not being able to go away to
| think about something for as long as you have time to work on it
| produces something valuable".
|
| Yet another time, "let people do homework to prove their value
| through effort who don't test well" but now who knows whether
| they actually were the ones doing the work?
|
| I wonder what this development will produce?
| Logans_Run wrote:
| In my {semi-tongue-in-cheek} opinion - Thus begins the origins of
| Arnies' Skynet.
|
| The {semi-cynical} part of my corporate soul screams 'oooh, what
| a great way to boot-strap your own ML/AI and have marketing
| trumpet it as 'So good that it was trained on OpenAi data and
| Human(tm) Error Labelling!'.
|
| The Futurist (Luddite???) in me shudders at the thought of two
| very powerful computer systems (models) working to out compete
| each other in a way that turns out to be 'rather unfortunate'
| a.k.a 'Oh shit! We should have thought about how we (the human
| race) can somehow be able to tell machine output vs. human
| output'. But that is a discussion I will leave to the lawyers and
| ethicists to thrash out a solution/definition that outputs a
| simple binary Y/N with a Five-Nines certainty.
|
| But Meh - A) The above is a rather random comment and B), Time
| will tell and hopefully this and other similar efforts remain
| 100% Libre as in 'free to all individuals forever and is non-
| revocable'
| omalleyt wrote:
| I bet you can utterly defeat this by adding one or two typos into
| the text
| Sol- wrote:
| Given the weak accuracy - which is of course understandable given
| the difficulty of the task - this mostly seems like a fig leaf
| that lets them pretend to do something about the potential
| problems of AI generated text becoming more and more pervasive.
|
| Probably one shouldn't fault them for trying, but the cat is out
| of the bag I think.
| [deleted]
| kiru_io wrote:
| It would be interesting to know how this compares againast
| GPTZeor [0].
|
| [0] https://gptzero.me/
| ulizzle wrote:
| Does anyone actually even believe that this A.I generated writing
| is any good? The standards seem extremely low.
|
| Can it beat Tolkien or Asimov? No. Then what is even the point of
| all this propaganda?
| swatcoder wrote:
| It's not good by any means, but neither is most assigned,
| casual, or rote writing.
|
| 10,000 students are writing some crappy essay on the Great
| Depression every day, and ChatGPT has probably trained on a
| zillion of these. It's optimized to produce those mediocre
| essays really efficiently, and that's very disruptive to how
| teachers have been working with students for the last century
| or so. The internet (and fraternity filing cabinets) were
| already straining this kind of pedagogy, but ChatGPT breaks it
| wholesale.
| gunshai wrote:
| What I find interesting about your comment, is that while it
| can produce that mediocre essay. It can also produce a much
| better one.
|
| How? Well it's all about how you interact with it. But the
| majority of use as you said will be taking the first output
| given the input. What's amazing to me is learning to reject
| the output in favor of our own vision or conflicting ideas.
|
| If chatGPT helps people get past blank page syndrome and
| interact with their own ideas better to see the limits of
| what is returned contrast to what they think. That would be
| an incredibly useful tool for anyone trying to learn.
| ropintus wrote:
| It writes better than me (I am an ESL) in writing, and that's
| enough reason for me to use it.
|
| It might not be better than Tolkien, but so what, 99.99% people
| are also not better than Tolkien and ChatGPT can add value to
| the life of these people.
| sebzim4500 wrote:
| No offence, but I am confident that you also can't write better
| than Tolkien or Asimov.
|
| Does that mean you should delete all your comments and stop
| posting?
| dqpb wrote:
| This is such a comical point of view I can't tell if it's
| sarcasm or a genuine question.
|
| Yes, I would wager that ChatGPT can write better than at least
| 90% of living human beings.
| jfk13 wrote:
| And yet I'd rather hear what the living human beings have to
| say. They may write poorly, but at least they have actual
| thoughts and ideas -- no matter how misguided or bizarre --
| that they're trying to communicate.
| urbandw311er wrote:
| This all feels a little like OpenAI trying to get a head start on
| plausible deniability.
|
| A bit like Apple ensuring its consumer devices can't be hacked to
| bypass completely any arguments about whether they
| should/shouldn't aid the state in providing a back door.
| mindcrime wrote:
| This is just going to lead to an arms-race like with CAPTCHA.
| Next project announcement: an AI text generator that can evade
| the AI-text-detector... and so on.
| WestCoastJustin wrote:
| Great growth hacking idea. Feed ChatGPT into this and test if it
| is getting detected. You'll increase usage of both products.
| keepquestioning wrote:
| [dead]
| dakiol wrote:
| Isn't this a poor business move from OpenAI? I mean, if they make
| possible to distinguish (100% in the future) between AI-written
| text and human-written text... then a big chunk of potential
| OpenAI's customers will not use ChatGPT and similars because
| "they are gonna be caught" (e.g., students, writers, social media
| writers, etc.)
| anhner wrote:
| 1. Create AI capable of writing almost human-level text and make
| it generally available.
|
| 2. Make said AI generate text in a way that makes it possible to
| detect that it was written by a machine.
|
| 3. Create another AI that detects text written by above AI
|
| <--- You are here
|
| 4. Put your detector service behind a paywall
|
| 5. Every time a competitor appears for your generator, change its
| steganography so that only your detector correctly classifies it
|
| 6. Profit
| gunshai wrote:
| Talk about a local maxima, yuck.
| michaericalribo wrote:
| I foresee a dystopian education outcome:
|
| 1. Classifiers like this are used to flag _possible_ AI-generated
| text
|
| 2. Non-technical users (teachers) treat this like a 100%
| certainty
|
| 3. Students pay the price.
|
| Especially with a true positive rate of only 26% and a false
| positive rate of 9%, this seems next to useless.
| saltysnowball wrote:
| This is already an issue, I'm a student in college right now
| and even technical professors are operating with full
| confidence in systems like turnitin which try their hand at
| plagiarism detection (with often much higher false
| negative/false positive rates). The problem was even more
| prevalent in high school where teachers would treat it as a
| 100% certainty. Thus, I think that OpenAI making atleast a
| slightly better classification algorithm won't make the state
| of affairs any worse.
| Kiro wrote:
| Funny how everyone praised GPTZero that has even worse rates
| but starts being skeptical when it's OpenAI, the new bad guy.
| [deleted]
| dns_snek wrote:
| "Everyone" didn't. In fact, the 5 top comments in that
| thread[1] all called it useless or pointed out serious flaws.
|
| [1] https://news.ycombinator.com/item?id=34556681
| janalsncm wrote:
| I urge anyone with time to write to tech journalists explaining
| why this is so bad. Given previous coverage of GPTZero they
| don't seem to be asking the right questions.
| tremon wrote:
| I dare hope for a less dystopian outcome:
|
| - teachers will assign less mind-numbing essay homework
| assignments and focus more on oral interviews.
| screye wrote:
| Hilariously, this has already happened with music composition.
| Especially drumming.
|
| Since the advent of drum machines, a lot of younger players
| have started playing with the sort of precision that drum
| machines enable. eg: The complete absence of swing, and clean
| high-tempo blasts/rides.
|
| So you'd get accusations of drummers not being able to play
| their own songs, because traditional drummers think such
| technically complex and 'soulless' performances couldn't
| possibly be human. Only to then be proven wrong, when it turns
| out that younger players can in fact do it.
|
| The machine conditions man.
| TheRealPomax wrote:
| So, status quo then? This is already the case for educational
| software that's used to detect plagiarism. People get wrongly
| flagged, and then you'll have to plead your case.
|
| But the times software like this finds actual problems vastly
| outnumbers of times it doesn't, and when you choice is between
| "passing kids/undergrads who cheat the system" and "the
| occasional arbitration", you go with the latter. Schools don't
| pay teachers anywhere _near_ enough to not use these tools.
| michaericalribo wrote:
| Given the published true and false positive rates, it's clear
| that the true positives do not "vastly outnumber" false
| positives.
| PeterisP wrote:
| Currently the false positive rate is _far_ lower. E.g. I get
| 500-ish submissions over a school year then a 1% false
| positive rate would mean I 'd falsely accuse 5 innocent
| students annually, which isn't acceptable at all - and a 9%
| FP rate is _so_ high that 's even not worth investigating; do
| you know of any grader who has the spare time to begin formal
| proceedings/extra reviews/investigation for 9% of their
| homework?
|
| For plagiarism suspicions at least the verification is simple
| and _quick_ (just take a look at the identified likely
| source, you can get a reasonable impression in minutes) - I
| can 't even imagine what work would be required to properly
| verify ones flagged by this classifier..
| TheRealPomax wrote:
| > I can't even imagine what work would be required to
| properly verify ones flagged by this classifier.
|
| Yet.
| flatline wrote:
| At the same time the classifier is improving, the
| generative models are improving. It's a classic arms race
| and this equilibrium is not likely to shift much either
| way. We are talking about models that approximate human
| behavior with a high degree of accuracy, I think the goal
| would be to make them indistinguishable in any meaningful
| way.
| notahacker wrote:
| > This is already the case for educational software that's
| used to detect plagiarism. People get wrongly flagged, and
| then you'll have to plead your case.
|
| How often is that the case though? A while since I've had to
| worry about it, but I thought plagiarism detection generally
| worked on the principle of looking for the majority of the
| content being literal matches with existing material out
| there with only a few small edits, which - unlike using some
| "AIish" turns of phrase a bot wrongly attributes to humans 9%
| of the time and correctly attributes to AI with a not much
| better success rate - is pretty hard to do accidentally.
| i_have_an_idea wrote:
| A long time ago when I was a student, I would run my papers
| through Turnitin before submitting. The tool would
| sometimes mark my (completely original) work as high as mid
| 20% similarity.
|
| As a result, I have taken out quotes and citations to
| appease it and not have to deal with the hassle.
|
| I expect modern day students will resort to similar
| measures.
| notahacker wrote:
| IIRC the marker got the same visualization that you used
| to take out quotes and citations that highlighted that
| the similar bits were in fact quotes and citations!
|
| Maybe high school is a different matter, but I'm pretty
| sure even the most technophobic academic knows that
| jargon, terse definitions and the odd citation
| overlapping with stuff other people have written is going
| to make a similarity of at least 10% pretty much
| inevitable, especially when the purpose of the exercise
| is to show you understand the core material well enough
| to cite and paraphrase and compare it, not to generate
| novel academic insight or show you understood the field
| so well you didn't need to refer back to the source
| material. The people they were actually after were the
| ones that downloaded something off essaybank, removed a
| couple of paragraphs and rewrote the intro to match the
| given title and ended up with 80%+ similarity
| ren_engineer wrote:
| >false positive rate of 9%
|
| bringing the Roman decimation to the classroom based on AI,
| this is the future
| kmkemp wrote:
| Any solution here is just an arms race. The better AI's get at
| generating text, the more impossible the job of identifying if
| an AI was responsible for writing a given text sample.
| e_i_pi_2 wrote:
| You could even just set up a GAN to make the AI better at not
| being detected as something written by an AI, I don't see a
| good general solution to this, but I also see it as a non-
| issue - if students have better tools they should be able to
| use them, just like a calculator on a test - that's allowed
| on tests because you still need to understand the concepts to
| put it to use
| tshaddox wrote:
| It's almost as if you need to give exams in person and watch
| the students if you don't want them to cheat. This is
| fundamentally no different than cheating by writing notes on
| your hand in an exam or paying someone to write a take-home
| essay for you. It's cheaper than the latter, but that just
| means the lazy curriculum finally needs to be updated.
| dougmwne wrote:
| The cheating students who know how to use the classifier will
| be the big winners.
| cjbgkagh wrote:
| > false positive rate of 9%
|
| Yeah, that is useless. You couldn't punish based on that alone
| and students will quickly figure out to never confess.
| sometimeshuman wrote:
| Sorry for the tangent but a surprising number the general
| public doesn't know the meaning of percent[1]. So even if a
| teacher is told those percentages many wouldn't know what to
| conclude.
|
| [1] Me, giving young adults that worked for me a commission
| rate. Then asking if their commission rate is 15% and they sell
| $100 of goods what is their payment. Many failed to provide an
| answer.
| LarryMullins wrote:
| > _2. Non-technical users (teachers) treat this like a 100%
| certainty_
|
| This is the part that needs to be addressed the most. Teachers
| can't offload their critical reasoning to the computer. They
| should ask their students to write things in class and get a
| feeling for what those individual students are capable of. Then
| those that turn in essays written at 10x their normal writing
| level will be obvious, without the use of any automated cheat
| detectors.
|
| I was once accused of cheating by a computer; my friend and I
| both turned in assignments that used do-while loops, which the
| computer thought was so statistically unlikely that we surely
| must have worked together on the assignment. But the
| explanation was straight forward; I had been evangelizing the
| aesthetic virtue of do-while loops to anybody that would listen
| to me, and my friend had been persuaded. Thankfully the
| professor understood this once he compared the two submissions
| himself and realized we didn't even use the do-while loop in
| the same part of the program. There was almost no similarity
| between the two submissions besides the statistically unlikely
| but completely innocuous use of do-while loops. It's a good
| thing my professor used common sense instead of blindly
| trusting the computer.
| londons_explore wrote:
| > blindly trusting the computer.
|
| Professors blindly trust the computer not out of laziness,
| but to protect themselves from accusations of unfairness...
|
| "The work was detected as plagiarism, but the professor
| overrode it for the pretty girl in class, but not for me"
| mitchdoogle wrote:
| Seems like something like this should only be used as a
| first-level filter. If the writing doesn't pass, it
| warrants more investigation. If no proof of plagiarism is
| found, then there's nothing else to do and professor must
| pass the student
| TchoBeer wrote:
| with a 26% true positive rate that seems flawed.
| asah wrote:
| seems like this is the future... 1. first day of class, write
| a N word essay and sign a release permitting this to be used
| to detect cheating. The essay topic is chosen at random.
|
| 2. digitize & feed to learning model, which detects that YOU
| are cheating.
|
| upside: this also helps detect students who are getting help
| (e.g. parents)
|
| downside: arms race as students feed their cheat-essays
| (memorize their essays?) into AI-detection models that are
| similarly trained.
| feanaro wrote:
| There are also some countries that don't fetishize cheating
| this much so perhaps they will just continue not caring.
| kaibee wrote:
| The funniest implication here is that the student's writing
| skill isn't expected to improve.
| eh9 wrote:
| I was just asking my partner who's a writer if it would
| even be fair to train a model based on a student at _N_
| th grade if the whole point is to measure growth. Would
| there be enough "stylistic tokens" developed in a young
| person's writing style?
| AlexAndScripts wrote:
| Surely you could continuously add data about their latest
| essays to the model, meaning any gradual improvements
| would be factored in?
| ask_b123 wrote:
| Personally, I feel mildly embarrassed when reading my
| essays from years prior. And I probably still count as a
| 'young person'.
|
| That said, there's no need to consider changes in years
| when stylistic choices can change from one day to another
| depending on one's mood, recent thoughts, relationship
| with the teacher, etc.
|
| That's why I've always been a little confused about how
| some (philologists?) treat certain ancient texts as not
| being written by some authors due to the text's style, as
| if ancient people could not significantly deviate from
| their usual style.
| Aransentin wrote:
| > first day of class, write a N word essay
|
| Initially I thought you meant having the student write an
| essay about slurs, as the AI will refuse to output anything
| like that. Then I realized you meant "N" as in "Number of
| words".
|
| Still, that first idea might actually work; make the
| students write about hotwiring cars or something that's
| controversial enough for the AI to ban it but not
| controversial enough that anybody will actually care.
| JumpCrisscross wrote:
| > _first day of class, write a N word essay and sign a
| release permitting this to be used to detect cheating_
|
| Why once? Most students need writing skills more than half
| the high-school curriculum.
| TheDudeMan wrote:
| You are asking teachers to be good at their job. But is
| teaching a merit-based profession?
| busyant wrote:
| I asked chatgpt to write an essay as if it were written by a
| mediocre 10th grader. It did a reasonably good job. It threw
| in a little bit of slang and wasn't particularly formal.
|
| Edit. I sometimes tell my students "if you're going to cheat,
| don't give yourself a perfect score, especially if you've
| failed the first exam. It fires off alarm bells."
|
| But the students who struggle usually can't calibrate a non-
| suspicious performance.
|
| I guess the same applies here.
| Baeocystin wrote:
| You've touched upon a central issue that is not often
| addressed in these conversations. People who have
| difficulty comprehending and composing essays also struggle
| to work with repeated prompts in AI systems like ChatGPT to
| reach a solution. I've found in practice that when showing
| someone how prompting works, their understanding either
| clicks instantly, or they fail to grasp it at all. There
| appears to be very little in between.
| geph2021 wrote:
| ask their students to write things in class and get a feeling
| for what those individual students are capable of. Then those
| that turn in essays written at 10x their normal writing level
| will be obvious
|
| I think that's a flawed approach. Plenty of people simply
| don't perform or think well under imposed time-limited
| situations. I believe I can write close to 10x better with
| 10x the time. To be clear, I don't mean writing more, or a
| longer essay, given more time. Personally, the hardest part
| of writing is distilling your thoughts down to the most
| succinct, cogent and engaging text.
| deepspace wrote:
| > Plenty of people simply don't perform or think well under
| imposed time-limited situations
|
| From first-hand experience, the difference between poor
| stress-related performance and a total lack of knowledge is
| night and day.
|
| I have personally witnessed students who could not speak or
| understand the simplest English, and were unable to come up
| with two coherent sentences in a classroom situation, but
| turned in graduate level essays. The difference is
| blindingly obvious.
| giovannibonetti wrote:
| > I have personally witnessed students who could not
| speak or understand the simplest English, and were unable
| to come up with two coherent sentences in a classroom
| situation, but turned in graduate level essays. The
| difference is blindingly obvious.
|
| Maybe someone helped them with their homework?
| remexre wrote:
| Unless their in-class performance increases as well,
| isn't that help "probably cheating"? (That's the "moral
| benchmark" I'd use, at least; if your collaboration
| resulted in you genuinely learning the material, it's
| probably not cheating.)
| runarberg wrote:
| The point is for the teacher to get a sense of the students
| style and capabilities. Even if your home essay is 10x
| better and 10x more concise as your in class work, a good
| teacher that knows you--unlike an inference model--will be
| able to extrapolate and spot commonalities. Also a good
| teacher (that isn't overworked) will also talk to students
| and get a sense of their style and capabilities that way,
| this allows them to extrapolate even better then a computer
| could ever hope to.
| zopa wrote:
| Sure, but what about all the students with mediocre
| and/or overworked teachers? If our plan assumes the best-
| case scenario, we're going to have problems.
| runarberg wrote:
| Honestly if we can't have nice things and we keep
| skimping out on education, I'd rather we just accept the
| fact that some will students cheat, then to introduce
| another subpar technical solution to a societal problem.
| runarberg wrote:
| So the computer's evaluation model assumed that each
| student's learning is independent? That seems like a
| ludicrous assumption to put in a model like this, unless the
| model authors have never been in a class setting (which I
| doubt).
| munificent wrote:
| I think you're misunderstanding the primary purpose of
| essays.
|
| Teachers don't have the time to do deep critical reasoning
| about each student's essay. An essay is only partially an
| evaluation tool.
|
| The primary purpose of an essay is that the act of writing an
| essay teaches the student critical reasoning and structured
| thought. Essays would be an effective tool even if they
| weren't graded at all. Just writing them is most of the
| value. A big part of the reason they're graded at all is just
| to force students to actually write them.
|
| The main problem with AI generated essays isn't that teachers
| will lose out on the ability to evaluate their students. It's
| that students won't do the work and learn the skills they get
| from doing the work itself.
|
| It's like building a robot to do push ups for you. Not only
| does the teacher no longer know how many push ups you can do,
| you're no longer exercising your muscles.
| YeGoblynQueenne wrote:
| >> The primary purpose of an essay is that the act of
| writing an essay teaches the student critical reasoning and
| structured thought. Essays would be an effective tool even
| if they weren't graded at all. Just writing them is most of
| the value. A big part of the reason they're graded at all
| is just to force students to actually write them.
|
| That's our problem, I think. Education keeps failing to
| convince students of the need to be educated.
| [deleted]
| thelock85 wrote:
| For this exact reason, I feel like education systems and
| curriculum providers (teachers are just point of contact
| from a requirements perspective) should develop much more
| complex essay prompts and invite students to use AI tools
| in crafting their responses.
|
| Then it's less about the predetermined structure (5
| paragraphs) and limited set of acceptable reasoning
| (whatever is on the rubric), and more about using creative
| and critical thinking to form novel and interesting
| perspectives.
|
| I feel like this is what a lot of universities and
| companies currently claim they want from HS and college
| grads.
| desro wrote:
| This is what I'm doing as an instructor at some local
| colleges. A lot of the students are completely unaware of
| these tools, and I really want to make sure they have
| some sense of how things are changing (inasmuch as any of
| us can tell...)
|
| So I invite them to use chatGPT or whatever they like to
| help generate ideas, think things out, or learn more. The
| caveat is that they have to submit their chat transcript
| along with the final product; they have to show their
| work.
|
| I don't teach any high-stakes courses, so this won't work
| for everyone. But educators are deluded if they think
| anyone is served by pretending that (A) this
| doesn't/shouldn't exist, and that (B) this and its
| successors are going away.
|
| All of this stuff is going to change so much. It _might_
| be a bigger deal than the Internet. Time will tell.
| nonrandomstring wrote:
| A more likely outcome is that teachers will pay the price [1].
|
| [1] https://www.timeshighereducation.com/opinion/ai-will-
| replace...
|
| (turn off js to jump signup-wall)
| ibejoeb wrote:
| I think there is a more dystopian near future:
|
| 1. There will be commercial products to tune per-student
| writing models.
|
| 2. Those models will be used to evaluate progress and
| contribute directly to scores, grades, and rankings. They may
| also serve to detect collaboration.
|
| 3. The models persist indefinitely and will be sold to industry
| for all sorts of purposes, like hiring.
|
| 4. Thy will certainly be sold to the state for law enforcement
| and identity cataloging.
| e_i_pi_2 wrote:
| I can't remember the keyword to look it up, but there's a
| problem of statistics you run into with stuff like terrorism
| detection algorithms
|
| If we have 300M people in the US and only 1k terrorists, then
| you need 99.9999% accuracy before you start getting more true
| positives than false positives. If you use this in a classroom
| where no one is actually using AI you'll get false positives,
| and in a class where the usage is average you'll still get more
| false positives than true ones, which makes the test do more
| harm than good unless it's just a reason to look into it more -
| and the teacher is presumably already reading the text so if
| that doesn't help than this surely won't
| xmddmx wrote:
| It's the False Positive Paradox: https://en.wikipedia.org/wik
| i/Base_rate_fallacy#False_positi...
| mitchdoogle wrote:
| 4. Parents sue schools 5. Admins eliminate all writing
| requirements
| kilgnad wrote:
| This isn't that dystopian. The dystopian outcome is when
| there's a classifier that rates the quality of the text and
| that this classifier becomes indistinguishable from the AI-
| generated classifier because AI generated text is beginning to
| be superior to human generated text.
| thewataccount wrote:
| Hopefully they just flag relevant sections. Essay/Plagiarism
| checkers already exist, although in my experience professors
| were reasonable.
|
| For example I had a paragraph or two get flagged as being very
| similar to another paper - but both papers were about a fairly
| niche topic (involving therapy animals) and we had both used
| the relevant quotes from the study conclusions from one of only
| a few decent sources at the time - so of course they were going
| to be very similar.
|
| Given that most essays are about roughly the same set of
| topics, and there are literally hundreds of thousands of
| students writing these - I wonder how many variations are even
| possible for humans to write as I would expect us to converge
| on similar essays?
| michaericalribo wrote:
| Plagiarism is easier to verify, because you can directly
| compare with the plagiarized source material
| thewataccount wrote:
| Absolutely. I think it may have to end up more as a
| statistics thing with behaviour. For example:
|
| "Tom had a single paragraph flag as possibly generated" vs
| "Every single paper Tom writes has paragraphs flag"
|
| Basically we might have to move to detecting statistical
| outliers as cheating. Now whether the tools/teachers will
| understand/actually do that - we can only hope....
| amelius wrote:
| Solution: just write your texts with a bit less confidence than
| gpt3 would.
| Verdex wrote:
| I wonder if I should help my kids setup a server + webcam +
| screen capture tool so they can document 100% of their essay
| writing experience. That way if they ever get hit with a false
| positive they can just respond with hundreds of hours of video
| evidence that shows them as the unique author of every essay
| they've ever written.
| anotherjesse wrote:
| You will certainly have a lot of training video to create a
| "essay writing video generator" ml product
| causalmodels wrote:
| You could always teach them how to use git and have them
| commit frequently. Seems like it would be less intrusive than
| a webcam.
| Verdex wrote:
| Source control would certainly help establish a history of
| incrementally performing school work by _someone_ when
| viewed by a highly technical examiner and when periodically
| stored someplace where a trusted 3rd party can confirm it
| wasn 't all generated the night after a supposed false
| positive.
|
| However, hundreds of hours of video is compelling to non-
| technical audiences and even more importantly is a
| preponderance of evidence that's going to be particularly
| damning if played in front of a PTA meeting.
|
| With a git history it's going to come down to who can spin
| the better story. The video is the story and everyone
| recognizes it, so I expect fewer people would bother even
| challenging its authenticity.
| causalmodels wrote:
| I guess that's fair. I just personally don't think the
| additional gain is worth taking away your child's
| privacy.
| Verdex wrote:
| It's only taking away their privacy if they're falsely
| accused.
|
| And properly used you might not even have to relinquish
| privacy if falsely accused. A quick montage video demo
| and a promise to show the full hundreds of hours of video
| of "irrefutable" proof to embarrass the school district
| at the next PTA meeting might be sufficient to get the
| appropriate response.
| tshaddox wrote:
| You could still cheat quite easily and inexpensively with an
| earpiece, as long as you know how to write down what you
| hear.
| Verdex wrote:
| It's about building a narrative. Yeah, you could still
| cheat, but who would go through the effort of generating
| hundreds of hours of fake videos proving yourself innocent.
| For that amount of effort you might as well have done the
| work yourself.
|
| Of course there are some people who put insane amounts of
| effort into not doing "real" work. However, anyone trying
| to prove that your child is in that position is going to
| find themselves in an uphill battle.
|
| Which is the ultimate goal here. Make people realize that
| falsely accusing my children using dubious technology is
| going to be a lot more work than just giving up and leaving
| them alone.
| claytonjy wrote:
| Is there a longer-form paper on this yet? TPR (P(T|AI)) and FPR
| (P(T|H)) are useful, but what I really want is the probability
| that a piece flagged as AI-generated is indeed AI-generated,
| i.e. P(AI|T). Per Bayes rule I'm missing P(AI), the portion of
| the challenger set that was produced by AI.
|
| If we assume the challenger set is evenly split 50-50, that
| means P(AI|T) = P(T|AI)P(AI)/P(T) =
| (0.26)(0.5)/(0.26+0.09) ~ 37%
|
| So slightly better than a 1/3 chance of the flagged text
| actually being AI-generated.
|
| They say the web-app uses a confidence threshold to keep the
| FPR low, so maybe these numbers get a bit better, but very far
| from being used as a detector anywhere it matters.
| TchoBeer wrote:
| >Per Bayes rule I'm missing P(AI), the portion of the
| challenger set that was produced by AI
|
| This will obviously depend on your circumstances.
| adamsmith143 wrote:
| Or we realize that essays aren't that important and technical
| skills will become more highly valued. Either way, ChatGPT
| can't do your exams for you so the truth will come out anyway.
| mitchdoogle wrote:
| Writing is very important for understanding a topic and long-
| term recall. I still remember topics from papers I did 15
| years ago because I spent 10s of hours researching and
| writing and forming ideas about each topic.
|
| Instead of being overzealous about catching cheaters,
| teachers should learn to express the importance of writing
| and why it is done. Convince the students that they should do
| it to be a smarter person, not just to get a grade, and they
| will care more about doing it honestly.
| flandish wrote:
| In the same way deepfake video should not be allowed as
| evidence, thereby ensuring _no_ video is allowed... we can
| apply that to text as well.
|
| We're entering an uncanny valley before a period of "reset"
| with self taught (to stay on subject here) people re-learning
| for the sake of learning.
|
| In 30 years we will be in an educational renaissance of people
| learning "like the old masters did in the 1900's."
| EGreg wrote:
| Nah. In 30 years it will be as useless to learn most subjects
| as it is right now to learn crocheing and knitting, or
| learning times tables or using an abacus.
|
| People are wayyyy too optimistic, just like in the 1900s they
| thought people would have flying cars but not the Internet,
| or how Star Trek's android Data is so limited and lame.
|
| Bots will be doing most of the work AND have the best lines
| to say, AND make the best arguments in court etc.
|
| You don't even need to look to AI for that. The best
| algorithms are simply uploaded to all the bots and they are
| able to do 800 things, in superhuman ways, and have access to
| the internet for whatever extra info they need.
|
| When they swarm, they'll easily outcompete any group of
| humans. For example they can enter this HN thread and
| overwhelm it with arguments.
|
| No, the old masters were _needed_. Studying will not be. The
| Eloi and Morlocks is closer to what we can expect.
| flandish wrote:
| As someone who's known how to crochet and knit since he as
| 6... I disagree.
| tokai wrote:
| Apparently knitwear is forecasted to have a CAGR of 12% the
| rest of the decade. With hand knitted garments commanding
| the high prices. It's definitely not the worst cottage
| industry one can chose.
| la64710 wrote:
| Exactly IMHO it is irresponsible to release such classifier
| with a title that touts the desired feature and totally do not
| spell its limitations. At least precede such title with
| experimental or something.
| anonobviously wrote:
| This is extremely concerning.
|
| The co-author on this is includes Professor Scott Aaronson.
| Reading his blog Shtetl-Optimized and reading his
| [sad/unfortunate/debate-able/correct?/factual?/biased?] views
| on adverse/collateral harm to Palestinians civilians makes me
| question whether this model would fully consider collateral
| damage and harm to innocent civilians, whomever that subgroup
| might be. What if his model works well, except for some
| minority groups' languages which might reflect OpenAI speak?
| Does it matter if the model is 99.9% accurate if the 0.1% is
| always one particular minority group that has a specific
| dialect or phrasing style? Who monitors it? Who guards these
| guards?
| jameshart wrote:
| We can't release the essay writing language model. Lazy
| children will use it to write their essays for them!
|
| We can't release the ai-generated text detection model. Lazy
| teachers will use it to falsely accuse children of cheating!
|
| The problem here appears to be _lazy people_.
|
| Can we train an AI to detect lazy people? I promise not to
| lazily rely on it without thinking.
| jupp0r wrote:
| This is worse than useless, if taking base rate fallacy into
| account.
| optimalsolver wrote:
| https://en.wikipedia.org/wiki/Red_Queen's_race
| dxbydt wrote:
| There was a merchant who said - Buy my sword! It will pierce
| through any shield !!
|
| So the gullible people bought the swords and soon the merchant
| ran out of swords to sell.
|
| So the merchant said - Buy my shield! They can defend against any
| sword !!
|
| Once again the gullible people rushed to buy the shields.
|
| But one curious onlooker asked - what happens when your sword
| meets your shield?
| causalmodels wrote:
| My younger brother and I both have fairly severe dyslexia. He's
| been applying to school and has been using ChatGPT to help him
| correct spelling and grammar mistakes rather than going to a
| person for help. It has been fairly incredible for him.
|
| I wonder if this tool would start flagging his work even though
| he is only using it as a fancy spell checker.
| meetingthrower wrote:
| Lol, just tried it against several 500 word chunks of text I had
| the old GPT3 write for me and it classified them as "unlikely AI
| written." Maybe because I had very specific prompts which could
| include a lot of actual facts...?
| barbazoo wrote:
| > The classifier is very unreliable on short texts (below 1,000
| characters). Even longer texts are sometimes incorrectly
| labeled by the classifier.
| neonate wrote:
| How does https://openai-openai-detector.hf.space/ do on them?
| meetingthrower wrote:
| 99.4% real!!!
| groestl wrote:
| My new hobby, based on the responses I read from ChatGPT, is to
| get a "likely written by AI" rating from these classifiers.
|
| "However, this is just one example of a humorous summary and it
| is important to note that..." and so on and so on
| andrewmutz wrote:
| OpenAI should release a classifier that detects _their own_ AI-
| generated text. They could do this easily by just using
| steganography to hide some information in all text that they
| generate, and then build the classifier to look for it.
|
| Sure, it's less useful than a classifier that can detect any AI
| generated text, but it would be a nice tool for contexts where AI
| generated text can be abused (like the classroom) in the short
| term.
| ineedtocall wrote:
| Or they could just save/hash results and get rid of the
| classifier all together.
| rcme wrote:
| Yea, they could provide a fingerprinting algorithm and a
| database of every fingerprint they've generated. However, it
| wouldn't help you identify false-positives.
| sebzim4500 wrote:
| Scott Aaronson talks about something like that being done at
| OpenAI in this post
|
| https://scottaaronson.blog/?p=6823
| m3affan wrote:
| There is work on hidden signatures in generated text, invisible
| to humans. Only way to move forward.
| bnug wrote:
| I'd think people would migrate to just re-typing whatever was
| generated and change some wording along the way to prevent
| detection.
| thewataccount wrote:
| The problem with this will be the method to detect the
| signature would reveal how to hide the signature though
| right?
|
| Obviously not an issue if everyone uses a single API for it -
| but if this ends up like Stable Diffusion were anyone can run
| it locally then I don't think it's possible no?
| brink wrote:
| I miss the 90's and the early 00's. Take me away from this AI
| hell.
| [deleted]
| shagie wrote:
| Musicians Wage War Against Evil Robots -
| https://www.smithsonianmag.com/history/musicians-wage-war-ag...
|
| From the March, 1931 issue of Modern Mechanix magazine:
|
| > The time is coming fast when the only living thing around a
| motion picture house will be the person who sells you your
| ticket. Everything else will be mechanical. Canned drama,
| canned music, canned vaudeville. We think the public will tire
| of mechanical music and will want the real thing. We are not
| against scientific development of any kind, but it must not
| come at the expense of art. We are not opposing industrial
| progress. We are not even opposing mechanical music except
| where it is used as a profiteering instrument for artistic
| debasement.
| sekai wrote:
| > Take me away from this AI hell
|
| People used to say that about electricity too, and cars, and
| planes, and computers. This is just the next step in the chain.
| tgv wrote:
| So your message is: bend over?
| GMoromisato wrote:
| There are only two choices:
|
| 1. Try to stop the world from changing.
|
| 2. Adapt to the changes (which requires changing the
| world). E.g., the dangers of electricity led to electrical
| codes and licensing for electricians.
| [deleted]
| anshumankmr wrote:
| What I would love to see in GPT 3 is some sort of a confidence
| score that they could return, as in how sure their model is that
| what it returned is accurate and not gibberish. Could this
| classifier help with that? I am working on a requirement where we
| are using ElasticSearch to map a query to an article in a
| knowledge base and then the plan is to send it to GPT 3 to help
| summarize the article.
|
| Since the ElasticSearch integration is still WIP, I had made a
| POC to scrape the knowledge base (with mixed results, lots of the
| content is poorly organized, so the scraped content that would
| act as prompt to the GPT 3 model wasn't all that good either) and
| then feed it to GPT 3, but the it couldn't always give the most
| accurate answers on that. The answers sometimes were spot on, or
| quite good but other times, not so much. I would say about 30% of
| the time, it made sense. So if there was a way for me to get if
| answer was sensible or not, so we could give an error response if
| the GPT 3's response did not make sense.
|
| The reason why we are doing it cause the client has a huge
| knowledge base and mapping each question to an answer would be
| difficult for them.
| ilaksh wrote:
| OpenAI's text completion has an option to return "log
| probability" or something for each token. That might apply. You
| can also turn down the temperature parameter which reduces
| hallucinations to some degree.
| antirez wrote:
| Totally useless given it's really inaccurate, and acrively
| dangerous as people will be considered not producing stuff they
| actually produced, in case of false positives:
|
| https://twitter.com/antirez/status/1620494358947717120
| kypro wrote:
| The existence of this tool might actually do more damage if
| people are using with any level of confidence to check text
| content as important as exams. I understand why they felt the
| need to release something, but I think it would be better if this
| didn't exist.
|
| My guess is that it's very easily gamed. Something ChatGPT is
| very good at is producing text content in different styles so if
| you're a student and you run your text through a AI detector you
| can always ask ChatGPT to write it in a style which is more
| likely to pass detection.
|
| Finally, I wouldn't be surprised if this detector is mostly just
| detecting grammatical and spelling mistakes. It's obvious I'm a
| human given how awful I am at writing, but I wouldn't be
| surprised if a good write who uses very good grammar, has good
| sentence structure and who's writing looks a little bit too
| "perfect" might end up triggering the detector more often.
| tinglymintyfrsh wrote:
| Meta blocked access to staff from using DALLE2 or ChatGPT from
| their work Google accounts.
|
| Another possibility is AI generative "stenography" where rules
| exist to insert hidden meaning or hidden data.
| shagie wrote:
| > Meta blocked access to staff from using DALLE2 or ChatGPT
| from their work Google accounts.
|
| I'm trying to reason this one out.
|
| Does Meta have work Google accounts? How would Meta block
| someone from using a work account to auth to some other
| service?
|
| Are people working at Meta signing into OpenAI with a Google
| account?
|
| (seriously, if it isn't work related - don't use a work
| account)
|
| Is Meta concerned about people uploading their code (or
| downloading code) from ChatGPT? What is their policy on
| Copilot?
|
| Why are Meta people using DALLE while at work?
| rafaelero wrote:
| 26% of true positive and 9% of false positive is just terrible. I
| don't see how this can be usable.
| yboris wrote:
| Quote:
|
| > In our evaluations on a "challenge set" of English texts
|
| I wonder if they mean "challenge" in the sense that these are
| some of the hardest-to-discern passages. Meaning that with
| average human writing / average type of text, the % is better.
| I'm unsure.
| [deleted]
| PUSH_AX wrote:
| Might be better to store outputs and implement a way to detect it
| within a larger piece of text. Think like a reverse Shazam but
| for text.
| blueberrychpstx wrote:
| Doesn't this get us into a sort of perpetual motion machine with
| the back and forth being
|
| 1) generate paragraph of my essay 2) feed it into this classifier
| 3a) if AI -> make it sound more human 3b) if human -> $$$ Profit?
|
| Obviously it could be more fine tuned than this and is in general
| good to know, but I just love watching this game play out of ...
| errr how do we manage the fact that humans are relatively less
| and less creative compared to their counterparts.
| dakiol wrote:
| The thing is point 1 costs money (I imagine at some point,
| ChatGPT will cost money), but point 2 also will cost money. So
| OpenAI will charge you double for generate AI-written text that
| is undetectable. Poor move. I could happily pay a lot for
| ChatGPT, but if they also commercialize a (more accurate)
| classifier then I won't use ChatGPT at all.
| bluefone wrote:
| What's the objective of tests in education? to test human
| biological abilities of memorizing and analyzing? But in real
| life, these abilities are always augmented by tech. It is like
| testing your prospective employees for their running skills,
| though they don't really need running to commute.
| karaterobot wrote:
| Given a set of determinations about whether a given source text
| was written by an AI or not, how do we know whether those
| classifications were made by an AI or a human? We need to train
| an AI classifier classifier, pronto!
| happytiger wrote:
| 9% false positives? That's a troubling level of falsies.
|
| The implications of using this tool are fun to think about
| though.
|
| If it had a very low level of false positives, but wasn't very
| good at identifying ai text, it would be very useful.
|
| But false positive rates above very, very low levels will
| undermine any tool in this category.
| klabb3 wrote:
| Yeah it's useless currently and will become more useless
| quickly, because people will scramble AI generated text, mix in
| human edits and people who use AI generators a lot will mimic
| their writing style. In short, the SNR will be abysmal outside
| of controlled environments.
|
| Im pretty sure the smart people at OpenAI know this. I think
| this is a PR move signaling that they are "doing something",
| looking concerned, yet insisting that everything is under
| control. In reality, nobody can predict the societal rift that
| this will cause, so this corporate playbook messaging is
| dishonest in spirit and muddies the waters. This is bad, both
| long term for OpenAI's trust, but also because muddy waters
| makes it harder to have fruitful discussions about safeguards
| in commercial deployments of this tech.
|
| That said, they're incorrectly getting blamed for controlling
| the _use_ of this tech, they're no more than a prolific and
| representative champion of it. But the cat is out of the bag,
| and they absolutely cannot stop this train, and so they
| shouldn't be blamed for not trying.
| IncRnd wrote:
| I think the main takeaway from this is that both the AI
| classifier and the AI output come from the same company,
| openai.com.
|
| That likely explains the extremely low accuracy of the AI
| classifer. This is user training for chatgpt output being
| accepted as human authored text.
| maxehmookau wrote:
| 26% seems awfully low for a tool of this importance. Granted they
| are upfront about it, but still, it doesn't seem immediately
| useful to release it to the public.
| nothrowaways wrote:
| Sha is all you need LoL.
|
| Sha every generated text and do a bloom filter... I guarantee
| much better than 27%.
| klabb3 wrote:
| A cryptographic hash function on stochastically generated
| variable-length natural language? That sounds.. not very
| effective.
| TDiblik wrote:
| I mean it depends, is number of possible answears per prompt
| known? If so, can we even realistically calculate number of
| possible prompts? AFAIK ChatGPT answears even tho there is a
| gramatical mistake in your sentence, does that affect
| answears/is that considered a new prompt? Ok, let's say you
| feed all N possible answears in and make a rainbow table of
| hashes. Sha is basically random (not really, but lets not go
| there), after I generate my text using AI (which would get
| flagged by your detection system) and change few letters/words
| here and there, your whole Sha rainbow table becomes useless -
| right? I could be totally wrong, but I don't see Sha as a way
| to solve this problem, because of these complications :/
| oldstrangers wrote:
| Funny, I helped ChatGPT write a fake scientific article the other
| day for a project I made (https://solipsismwow.com/). It's
| result: The classifier considers the text to be unlikely AI-
| generated.
| knaik94 wrote:
| I fed it some of my old HN comments, comfortably longer than the
| 1000 character minimum and found 9/10 times the classifier marked
| it as "unclear". This was a comment from 2020.
|
| I found out that just repeating a sentence a few times causes it
| to classify something as "likely". This is not only an unwinnable
| race, I know for a fact some teachers will use anything above
| unlikely to mean used AI. At some point in the future, compute
| will be cheap enough to where a lot of online content will be put
| through a similar classifier. I am curious to know how
| conservative the estimates are. This was a non-technical comment,
| I wonder if a more technical comment would be even more likely to
| be misclassified.
| bhouston wrote:
| I used ChatGPT to rewrite a number of paragraphs of my own
| writing earlier today. It rewrote them completely. I just pasted
| those into this detection tool and it responded for both "The
| classifier considers the text to be unlikely AI-generated."
|
| So it can not detect AI re-written/augmented text it seems, even
| things that ChatGPT itself generates.
| mitchdoogle wrote:
| Well OpenAI admits it is wrong most of the time, so your
| results are consistent with what is expected
| m3kw9 wrote:
| I mean the way they could do it is to save all model outputs and
| then let users input and match that would guarantee. You could
| make changes but it'd match a high percentage. Of course, a
| student can test it himself to make sure to change the text so it
| will be < a certain threshold.
|
| Also ignore prompts that purposefully output a pre written text
| incase ppl want to mess with the system
| netsec_burn wrote:
| Couldn't it just use conversation history, where it already
| stores the responses, and search within that?
| hkalbasi wrote:
| Open AI knows every text that is generated by ChatGPT, so it can
| run a simple search algorithm instead of an AI model and achieve
| way higher true positive rate?
| cloudking wrote:
| Let's assume this tool works better in the future and is used in
| education, what are the next steps for a teacher after
| identifying AI written homework?
| macksd wrote:
| _Allegedly_ identifying AI written homework.
| antihero wrote:
| Use AI to write a letter to their parents
| amelius wrote:
| Next step is to stop worrying about it, just as they did with
| automated spelling correction.
| jupp0r wrote:
| I can now go and incorporate this detector into the retraining
| pipeline for my evil language model or put it at the end of my
| architecture to emit only human-like results (as labeled by the
| detector). I don't see how detectors can win this cat and mouse
| game.
| bioemerl wrote:
| Now they get to monetize Chat GPT and this new classifier.
| Starting fires and providing the extinguishers, charging for both
| of them.
|
| All while pretending to be morally responsible in order to do it.
| sharemywin wrote:
| They did say big tech was starting to take over the role of
| government.
| titaniumtown wrote:
| What does this have to do with the government?
| urbandw311er wrote:
| I can see the point the parent comment is trying to make.
| The applications of this classifier include potentially
| arbitrating in decisions relating to things like education
| (ie assessment of grades) which is a matter traditionally
| associated with the public sector.
| [deleted]
| dakiol wrote:
| No way. If I were a student trying to use ChatGPT in order to
| improve my writing, I would definitely not pay for it if I know
| my teachers are using their AI Classifier. I mean, what's the
| point? I don't think OpenAI will be able to reach that (big)
| chunk of potential customers that want to use ChatGPT to write
| essays, social media comments, etc. if OpenAI at the same time
| sells their classifier. It's just nuts.
| Balgair wrote:
| If you're in a STEM-y major, now is _the_ time to pick up an
| essay heavy humanities degree. If you 're in an essay heavy
| humanities degree, now is the time to pick up a few more.
|
| Think of it like this: How much is your degree costing
| you/your family?
|
| On average, it's ~$150k.
|
| How much would an extra degree cost you? How about 80% of an
| extra degree? How about 20%? How about all those books and
| course materials? Those are in the $1000s already, per
| degree. (and yes, we all have head of torrents).
|
| What I'm saying is that chatGPT can easily be seen as 'just
| another college cost'. And when it's 'for education', the
| justification for those costs gets a lot more flexible. I can
| see students spitting out ~$10,000 for something like chatGPT
| that is specific towards their major, will pass these
| classifiers, and gets you just ~25% of the way to your major
| (however that is defined). The cost 'for the masses' could
| easily be in the ~$1000s for a per class subscription.
|
| With ~20M college students in the US, assuming even a 10%
| uptake rate, you're in the _billions_ of dollars of nearly
| pure profit (the overhead would be negligible).
|
| The money potential of something like chatGPT is just too
| damn high. Too high for essays to ever go out of style, as
| the lobbying effect of companies like this will force
| colleges to keep these essays that they are making the money
| off of. Oh, any they'll sell the classifier to the colleges
| to. Arming both sides!
| eric-hu wrote:
| You make a well reasoned argument here. At the same time,
| respectfully, you may be too intelligent to be the target
| audience for the student service. Can you see a college
| version of yourself paying $500 to write a college essay for
| yourself today?
| caxco93 wrote:
| Could a newer language model use this to penalize output that
| fails the classifier during training?
| twayt wrote:
| Its a great way to swindle overzealous educators. The kind that
| do hugely unproductive things to students because they think that
| its in their best interests.
| gillesjacobs wrote:
| I found a great way to fool these detectors: piping output
| generative models.
|
| 1. Generate text by promoting ChatGPT.
|
| 2. Rewrite / copyedit with Wordtune [1], InstaText [2] or Jasper.
|
| This fools GPTZero [4] consistently.
|
| Of course soon these emotive, genre or communication style
| specialisations will be promptable too by a single model too.
| Detectors will be integrated as adversarial agents in training.
| There is no stopping generative text tooling, better adopt and
| integrated it fully into education and work.
|
| 1. https://www.wordtune.com/
|
| 2. https://instatext.io/
|
| 3. https://www.jasper.ai/
|
| 4. https://gptzero.me/
| kriro wrote:
| I'd rather try to empower students to use ChatGPT as a tool or
| incorporate it into class work than worry about cheating. This is
| a pretty unique time for teachers to step up and give their
| students a nice edge in life by teaching them how to become early
| adopters for these kinds of things.
| discreteevent wrote:
| The purpose of writing an essay is to teach students how to
| think. Being able to prompt is a subset of being able to think.
| If you only teach them to prompt you have taken away any edge
| they might have had. Its like those schools that think that
| getting more ipads will make the kids smarter.
| janalsncm wrote:
| I posted yesterday about how GPTZero was such a horrible idea,
| and now this nightmare. Detecting AI generated text is
| _impossible_ without knowing what model was used. It could be
| more feasible for them to detect text written by their own models
| given that they know the logits. But OpenAI doesn't have a
| monopoly on large language models.
|
| However, the consequences for false positives are so dire that I
| would never want to create such a tool. It will be misused, and
| hiding behind "it's just information" is no excuse. You don't
| admit testimony unless it's reliable.
|
| At one time I thought that OpenAI was out to make the world a
| better place. But now it's clear to me that ethics is the last
| thing on their mind.
| brap wrote:
| Wouldn't better classifiers (discriminators) necessarily lead to
| better generators that can trick them?
| alphabetting wrote:
| Tools like this are promising and needed but this still needs
| work. I gave it two sets of 100% AI generated text. It said
| possibly for one and very unlikely for the other. Very unlikely
| example here: https://i.imgur.com/XoFQuYE.png
| https://i.imgur.com/PwGzTBM.png
| dqpb wrote:
| The LLM watermark seems like a better approach.
|
| https://arxiv.org/abs/2301.10226
| mlsu wrote:
| This one is very cool. Steps are:
|
| - Generate seed of LLM output token t0 - Use the seed to mark
| output tokens into "red" and "green" list - For token t1, only
| sample from "green" list when producing the next token
|
| Repeat.
|
| Now, let's say you read a comment online and you want to see if
| it's written by a robot or not. It's 20 tokens long. For each
| token, you reconstruct the blacklist. If they use "red" words
| with 50% probability, you can safely assume that they are
| human. But if they use only green words, you can begin to
| assume that they're a bot very quickly.
|
| For simplicity's sake, if you mark half of the tokens as "red"
| for each new token, correctly writing 20 tokens in a row that
| are on the "green" list is like flipping a coin and getting
| heads 20 times in row -- vanishingly unlikely. This allows you
| to very robustly watermark even short passages. And if the
| human makes adversarial edits, they still have to fight that
| probability distribution; 19 heads and 1 tails is still
| vanishingly unlikely.
___________________________________________________________________
(page generated 2023-01-31 23:00 UTC)