hngopher.com

       [HN Gopher] GitHub accused of varying Copilot output to avoid co...
       ___________________________________________________________________
        
       GitHub accused of varying Copilot output to avoid copyright
       allegations
        
       Author : belter
       Score  : 105 points
       Date   : 2023-06-10 13:46 UTC (9 hours ago)
        
 (HTM) web link (www.theregister.com)
 (TXT) w3m dump (www.theregister.com)
        
       | WalterBright wrote:
       | One of the specific complaints is:
       | 
       | https://devclass.com/2022/10/17/github-copilot-under-fire-as...
       | 
       | It's a 25 or so line function that looks like a pedestrian
       | implementation of a sparse matrix transpose algorithm. The author
       | should have been patented it to protected it, not copyrighted it.
        
       | belter wrote:
       | https://storage.courtlistener.com/recap/gov.uscourts.cand.40...
        
       | jimnotgym wrote:
       | Taking code off github, changing it a bit and passing it off as
       | ones own crosses a line. Now we really can't tell the AI from the
       | humans!
        
         | unkulunkulu wrote:
         | oh come on, which code? writing imports? or iterating over
         | collections? or am I underusing copilot? :)
         | 
         | I basically use it as stackoverflow on steroids. it is not even
         | close to gpt-4 in terms of reproducing some original idea I
         | could not find in a search engine
        
           | missingdays wrote:
           | Why would you ever write imports? IDEs autocomplete them for
           | you
        
             | unkulunkulu wrote:
             | Copilot understands some convetions when there's more than
             | one way. I used it extensively with react bootstrap where I
             | decided to go with the recommended way of importing each
             | components like import Tab from 'react-bootstrap/Tab' It
             | also knows which components are used in the file.
        
             | sureglymop wrote:
             | But that's pretty much what copilot is... It's just
             | Intellisense 2.0 and I would say even only marginally more
             | useful. You can't even really instruct it except with some
             | comments which may not work.
        
       | rolph wrote:
       | [The judge overseeing the case has permitted the plaintiffs to
       | remain anonymous in court filings because of credible threats of
       | violence [PDF] directed at their attorney. The Register
       | understands that the plaintiffs are known to the defendants.]
       | 
       | https://storage.courtlistener.com/recap/gov.uscourts.cand.40...
        
         | formerly_proven wrote:
         | > go f** _g cry about github you f**_ g piece of s*t n**r, I
         | hope your throat gets cut open and every single family member
         | of you is burnt to death
         | 
         | Are github users gamers? Really puts the "git" into "github"
         | there.
        
           | arp242 wrote:
           | Some more here:
           | 
           | https://storage.courtlistener.com/recap/gov.uscourts.cand.40.
           | ..
           | 
           | https://storage.courtlistener.com/recap/gov.uscourts.cand.40.
           | ..
           | 
           | https://storage.courtlistener.com/recap/gov.uscourts.cand.40.
           | ..
           | 
           | Friendly people.
           | 
           | I've received emails like that too over the years. What
           | hugely controversial thing do I do? I have a website where I
           | sometimes write about $stuff and I post on HN. Keeping the
           | basic info private is probably a good thing especially if
           | they're based in the US, because "SWATting" etc, but beyond
           | that it doesn't seem "credible" in the sense that it's very
           | likely someone will show up at their door with a gun.
           | 
           | Since the first two are redacted, I wonder if they sent them
           | with their real names.
        
             | z3t4 wrote:
             | It can be explained by the normal curve. The bigger your
             | audience is the weirder the outliers will be.
        
               | arp242 wrote:
               | Pretty much, yeah. There's about 26.8 million developers
               | in the world. Assuming 5 million read this story (not
               | everyone speaks English) and 0.01% of people is a bit
               | unhinged then you've got 50,000 unhinged people, and only
               | about 0.006% of those 50,000 (or 0.00006% of total) need
               | to be unhinged enough to actually shoot off an email.
        
           | web3-is-a-scam wrote:
           | Considering how large GitHub is (in the industry) it's like
           | asking is "are Facebook users gamers"?
        
             | notjoemama wrote:
             | That's what struck me about it too. Isn't it the case in a
             | large enough population you can always find representation
             | of something you dislike or hate? I've seen lists of
             | "Republicans" (meaning anyone in, near, or related to a
             | Republican politician) showing those people being caught or
             | convicted of various moral, economic, and social "crimes".
             | Ok. But if I sat down and looked using the same criteria,
             | couldn't I just as easily create a long list for the
             | Democrat party? Having made that statement on Reddit, the
             | response I got was, "well, there are MORE republicans".
             | That struck me as odd too. Are you trying to say of the two
             | horrible things, one is worse, and so I have a moral
             | imperative to chose the less horrible one? I'm fairly sure
             | I get to abandon both in search of a better option. lol
        
               | bandyaboot wrote:
               | > I've seen lists of "Republicans" (meaning anyone in,
               | near, or related to a Republican politician) showing
               | those people being caught or convicted of various moral,
               | economic, and social "crimes". Ok.
               | 
               | I'm intrigued. I'd like to see the subset of the list
               | that are people who were _in_ Republican politicians.
        
           | indrora wrote:
           | An impressive number of 4chan's /g/ users are on github. Some
           | even actively contribute to Linux (though usually to Arch,
           | Gentoo, and now more and more Nix).
           | 
           | I wrote a paper during college that I should release some
           | time about when /g/ threw an absolute shitfit over Linus
           | going "so, I've been a kinda shit human being to people and
           | I'm going to step back and get some help", going as far as to
           | blame his daughter/"the woke mob"/multiple named core kernel
           | contributors for killing their god.
           | 
           | At one point, I attended a GitHub event that wasn't directly
           | sponsored by github but encouraged a lot of github users to
           | show up. While there I met several people who, outside the
           | venue, were talking animatedly about Terry Davis. Listening
           | in on the conversation revealed that they more or less just
           | approved of his extensive use of racist language and
           | epithets.
           | 
           | I haven't checked, but I would suspect that Linus' recent
           | "trans rights" by proxy post has caused at least one or two
           | aneurisms in the /g/ user group.
        
             | StrauXX wrote:
             | I would love to read that paper if you do decide to
             | publicise it! 4chan mob dynamics never fail to make
             | interesting (albeit often nasty) stories.
        
             | Dma54rhs wrote:
             | /g/ is pretty mainstream among the zoomers they browse it
             | publicly. Also 4chan is among one of the most popular
             | websites on the internet so it doesn't come off as a
             | surprise.
        
               | edgyquant wrote:
               | A large number of people you meet, from all walks of
               | life, will admit that 4chan is a guilty pleasure. At
               | least I've met a ton and none of them were right leaning
               | to say the least.
        
               | waboremo wrote:
               | That would be my general theory as well, you're far more
               | likely to meet someone who is left leaning who admits to
               | having posted on 4chan (or still does) than you are
               | otherwise. Maybe it has to do with perceived biases, in
               | that a right leaning person/group is probably then going
               | to feel they are aligned with the seediest aspects of
               | 4chan, whether they actually do or not and their
               | perceived social impact/failings for using 4chan.
        
               | edgyquant wrote:
               | Yeah moderate to right wingers would probably not admit
               | accept to people close to them due to that perception.
               | But using the site you can tell there are a lot of very
               | intellectual liberals and fiscal conservatives. The
               | racist and sexist stuff is just their equivalent of the
               | dumb Reddit memes that encompass 80% of its content.
        
             | pxc wrote:
             | > An impressive number of 4chan's /g/ users are on github.
             | Some even actively contribute to Linux (though usually to
             | Arch, Gentoo, and now more and more Nix).
             | 
             | An aside about this from a moderately longtime Nix user and
             | very occasional Nixpkgs contributor:
             | 
             | I used to occasionally post about Nix on /g/ before
             | virtually anyone there knew what it was just to gauge
             | reactions, and boy were people shitty and dismissive about
             | it. It was all hot takes, broad strokes, and very little
             | curiosity about the technical details. And even though Nix
             | is 'cool' on /g/ now, all of those things are still true
             | about the way /g/ treats NixOS and other distros.
             | 
             | The interest that 90% of /g/ users have in Linux distros
             | like NixOS is as a bullshit status symbol, a token in some
             | consumerist identity game. The presence of that shallow,
             | status-obsessed, needlessly edgy type of person in the Nix
             | community is definitely more visible in the Nix(OS)
             | community now than it was a few years ago, but it still
             | sticks out like a sore thumb against the backdrop of
             | longtime Nix users and the culture they've evolved
             | together.
             | 
             | For that reason, I strongly recommend engaging with the Nix
             | community in community-owned channels, like
             | discourse.nixos.org or the community Matrix channels,
             | rather than message boards like 4chan or mainstream social
             | media platforms like Reddit. If you do that, you'll find
             | kinder, more knowledgeable people (and perhaps in some
             | cases, kinder more knowledgeable personas for the _same_
             | people).
             | 
             | If you're reading this and you've unfortunately encountered
             | Nix 'evangelists' with those shitty attitudes online,
             | please understand that those influences are external to the
             | community, and as far as most participants in the community
             | are concerned, quite unwelcome.
        
           | zer0tonin wrote:
           | No, but I assume a lot of AI bros are
        
             | obiefernandez wrote:
             | ffs, can we just not make this a thing?
        
               | faangsticle wrote:
               | Too late, the AI bros already did.
        
               | edgyquant wrote:
               | Too late, a demographic or people who could just barely
               | scrape together a script making REST requests are now
               | selling themselves as "AI specialists" or "prompt
               | engineers" to the corporate class. These are this cycles
               | cryptobros, who were mostly not engineers but people
               | riding a hype wave.
               | 
               | The age of the AI bro is here, and as I've been in the
               | space as someone genuinely interested in the models,
               | working with them from time to time, for a while. I'm
               | giving a lot of eye rolls in meetings when these people
               | start talking about the underlying tech.
        
               | hooomil wrote:
               | [dead]
        
               | foobarbazetc wrote:
               | "Prompt engineers"... sigh.
        
       | rkagerer wrote:
       | The plaintiffs were granted anonymity due to credible threats
       | against their attorney. Is there any mechanism other than
       | publication ban that ensures the protection? Can't someone just
       | attend the day of the hearing to see who the attorneys are?
       | 
       | EDIT: Apparently the lawyers are attending via Zoom.
        
         | coryrc wrote:
         | The plaintiffs, not the plaintiffs' lawyers.
        
       | JVillella wrote:
       | They say crypto is "regulatory arbitrage", I say this AI co-pilot
       | stuff is "copyright arbitrage".
       | 
       | Being a bit hand-wavy with it: It's akin to torrenting
       | music/movies. The torrented files are lossy compressed
       | representations of the original waveform from the music producer.
       | Limewire, or Pirate Bay, or whatever provide interface to
       | retrieve them (download or stream). The model weights are a form
       | of lossy compression, and inference is like a document retrieval.
       | 
       | One may say, "it's like an employee working at company X, then
       | going to work at company Y, they retain their knowledge and
       | experience." I would say it's more like, employee going from X to
       | Y, but retaining audio and video recordings of all interactions
       | he had, notes, documents, and other proprietary info and bringing
       | it to company Y.
        
         | az226 wrote:
         | Yea but only if you get to download a few seconds of the movie
         | and not more.
        
         | mjburgess wrote:
         | I call it copyright laundering
        
         | soultrees wrote:
         | What would you say the basis of all knowledge you know is? You
         | are a collection of everything you have consumed and the stuff
         | you create is all influenced by that.
         | 
         | Personally this whole llm debate about copyright is quite
         | funny. As someone who very much has skin in the game(my art is
         | trained on midjourney.), and who runs in a circle of artists,
         | it's interesting to see people's ego's come at play here. The
         | ones who are excited about these as tools are the ones who are
         | openly inspired and want to inspire however the ones who claim
         | copyright infringement seem to come off as insecure, almost
         | like they are afraid that this idea of theirs will be the last
         | great idea they have. There's already a separation happening in
         | the art world of people who are exploding in creative output vs
         | the people who are so defensive and cling to the old way of
         | doing things.
         | 
         | If I had my way, I'd see copyright laws abolished completely. A
         | complete free for all in innovation. And people who claim that
         | without parents and copyright then there's no incentive to make
         | money seriously underestimate humans and their ego to
         | continually innovate.
        
           | saurik wrote:
           | > What would you say the basis of all knowledge you know is?
           | You are a collection of everything you have consumed and the
           | stuff you create is all influenced by that.
           | 
           | FWIW, humans certainly can infringe other peoples' copyrights
           | and can do so even if they aren't actively intending to do
           | so. There is some boundary across which you are no longer
           | just learning something and you are now copying, and it isn't
           | clear at all that these generative AI techniques are actively
           | considering the latter the way a human is required to.
           | 
           | But, sure: if you are against the idea of copyright entirely
           | then it is hard to consider the idea inconsistent, though I
           | would think a world without copyright would be a particularly
           | hard one for an artist to make money at all...
        
           | JVillella wrote:
           | >What would you say the basis of all knowledge you know is?
           | You are a collection of everything you have consumed and the
           | stuff you create is all influenced by that.
           | 
           | Surely you're not suggesting that there's no such thing as
           | "original work". The production of which may have very high
           | capital and labour costs - which if not protected from theft
           | - would remove the incentives of producing original work.
           | 
           | >As someone who very much has skin in the game(my art is
           | trained on midjourney)
           | 
           | I don't know your specific situation, but there's obviously
           | different scales of importance here. What if your art was
           | your sole source of income, and people were reproducing it
           | under their own name? or if you had a product where you
           | poured millions into developing some novel IP/methods, and
           | some employee brought it with them when they went to work at
           | your competitors?
        
             | WalterBright wrote:
             | Over here at the D Language Foundation, we _encourage_
             | people to download it for free and do whatever they want to
             | with it. It 's all Boost licensed.
             | 
             | > some employee brought it with them when they went to work
             | at your competitors?
             | 
             | Other programming languages have copied lots of D features.
             | We at the DLF don't mind at all. Though often they copy
             | them and kinda miss the mark.
             | 
             | (Yes, we sometimes copy features from other languages, too,
             | and try to improve on them.)
        
           | zzzzzzzza wrote:
           | some things like drug discovery could probably be done with a
           | bounty system rather than intellectual property, and could
           | probably get much better results for a fraction of the cost
           | for maintaining the intellectual property component of the
           | court system
        
         | ldehaan wrote:
         | [dead]
        
       | blibble wrote:
       | so not only is it a shitty boilerplate generator, now it also
       | introduces deliberate random changes (i.e. bugs)
        
       | cmrdporcupine wrote:
       | Copilot is to license violations (esp of copyleft licenses) what
       | cryptocurrency mixers are for money laundering.
       | 
       | My employer (IMHO smartly) forbids use of LLMs in company IP and
       | company laptops, etc. Many others I'm sure are doing the same,
       | and many others will follow.
        
         | theRealMe wrote:
         | Nobody uses copilot intentionally to violate copyright law.
         | People do use crypto mixers intentionally to violate money
         | laundering laws.
        
           | SpicyLemonZest wrote:
           | Nobody affirmatively says "yes, my goal is to violate
           | copyright law, and Copilot is the best tool I've found". But
           | it doesn't seem impossible to me that the value of Copilot
           | comes partially from the fact that it can copy paste code
           | from copyrighted repositories in ways which would be illegal
           | for you or I to do. I'm not sure it's proven yet but I
           | wouldn't be shocked if it is in the future.
        
             | shagie wrote:
             | It provides the same value as someone who copies and pastes
             | code from Stack Overflow or any of the predecessors without
             | concerning themselves with the license.
             | 
             | I am certain that I can find code from Linux or gcc or
             | emacs on Stack Overflow that is under a GPL license and not
             | compatible with the CC license Stack Overflow uses... and
             | yet it's there. What's more, people will copy that code
             | into their own ignoring the CC license too.
             | 
             | How is that really any different than using Copilot if the
             | original license and attribution are something to respect.
             | 
             | Note that I _do_ think that the original license is
             | something to respect which is why for any of the code that
             | I write that has copyright that matters on it (toy program
             | for home? meh. Hobby project repo that I 'm working on that
             | I'll publish? yep. Employer's code for work? absolutely.) I
             | either don't touch questionable sources or run a license
             | check on it when using it.
             | 
             | The key thing is that I don't consider the use of Copilot
             | to be any more controversial than copying from Stack
             | Overflow - which has been done by countless programmers for
             | a decade before Copilot existed and no one got up in arms
             | about it then.
        
         | fooster wrote:
         | Sorry your employer forbids the use of tooling that makes your
         | life better and reduces drudgery. Perhaps you should vote with
         | your feet and find a less Luddite employer.
        
           | reaperducer wrote:
           | _Sorry your employer forbids the use of tooling that makes
           | your life better and reduces drudgery. Perhaps you should
           | vote with your feet and find a less Luddite employer._
           | 
           | Does your company allow you to outsource your work to people
           | in a poorer nation for a fraction of the cost that you are
           | paid? Why not? Perhaps you should vote with your feet and
           | find a less Luddite employer.
        
             | Dylan16807 wrote:
             | If you have the skills for that, hell yes find an employer
             | that will let you do it, either explicitly or implicitly.
        
           | indrora wrote:
           | My company forbids the use of LLMs that aren't validated (and
           | we make one).
           | 
           | Our managers get emails if we make calls to known LLMs, and
           | there's guidance on locally running LLMs and using their
           | output ("it's okay for small things maybe, but be careful").
           | Why?
           | 
           | Because legal's job is to protect the company from legal
           | threats. Sometimes that means making some awkward choices,
           | like handwringing over the use of GPL licensed software in
           | publicly exposed example code (such as sample apps) purely
           | because some aspects of the GPL haven't been tested in
           | American courts, much less international ones.
           | 
           | So the use cases for LLMs there are mostly source-to-source
           | transformative ("Turn this function and documentation into
           | javadoc format please") or similar -- stuff where you can
           | show that the LLM isn't introducing anything that might maybe
           | possibly have any hint of externally licensed software.
        
             | renewiltord wrote:
             | Wild. I suppose it's good that people who like these
             | conditions can find employers like this and people like me
             | who don't can find employers not like this.
             | 
             | I could never countenance operating under these conditions.
        
         | bushbaba wrote:
         | Once the ip rules are figured out it'll open the door to a lot
         | of usecases. This reminds me more of p2p file sharing being
         | precursor to paid streaming services.
        
       | taneq wrote:
       | Isn't "rewrite the example code in your own style" accepted best
       | practice for human coders, when working from an example that does
       | what you need?
       | 
       | I'm not sure what would be acceptable output for a code
       | generation tool if rewriting the examples isn't ok and
       | reimplementing something that performs the same function still
       | isn't ok. Are we automatically granting de-facto code patents on
       | all published code now?
        
         | jazzyjackson wrote:
         | "Isn't "rewrite the example code in your own style" accepted
         | best practice [...]?"
         | 
         | Why would it be? If a function performs the data transform I
         | need you better believe i'm copy pasting that sucker with a
         | hyperlink to where I found it
         | 
         | But then again, I'm not trying to win in court.
        
           | rolph wrote:
           | what would happen without that hyperlink? the overall issue
           | seems to be a lack of attribution to the originator.
        
             | patmcc wrote:
             | That depends a lot on the license - some require
             | attribution, some don't, some care not a bit (in that they
             | don't permit copying).
        
         | waboremo wrote:
         | I can't recall a single time that's been common advice given to
         | programmers. It's usually either don't reinvent the wheel
         | (therefore use the source while adhering to license), or come
         | up with your own solution.
         | 
         | Don't know how you would even write code in your own style. As
         | soon as you start altering it, the result is different. It's
         | more/less efficient.
        
           | williamcotton wrote:
           | I interpreted the comment you are responding to as "make sure
           | it uses the same style conventions as the rest of this file",
           | which is something that Copilot does very well!
        
           | njharman wrote:
           | Depending on language there are ton of style choices. There's
           | style guides as examples of trivial.
           | 
           | Non trivial include names, comments, logging, error checking,
           | structure, ordering of operations that aren't sequential.
        
             | waboremo wrote:
             | Yes, but all of those have impact to the actual function
             | and performance of the proposed solution. By doing so, you
             | are changing the solution.
             | 
             | Look at FizzBuzz. If you were to set strict requirements on
             | performance (and allow for reiterative testing), the
             | results from different groups of people would be identical.
             | They would reach the same conclusion because that's how
             | code works, it's far more aligned to math than it is
             | creative writing.
             | 
             | So you cannot take an existing code solution and translate
             | it to your own style. You are altering the program, the
             | efficiency, and therefore the solution itself. Even when
             | you do something like changing 1 single variable name!
        
           | mistrial9 wrote:
           | this comment really hits hard for me -- its like there is a
           | place to buy food where every menu item is clearly shown,
           | with a large color picture and a printed price.. and the
           | person talking has only every purchased food in that way.. as
           | if there are no alternatives that "really exist"
           | 
           | there really are a lot of other scenarios that involve
           | writing software, to make software. Its not possible to list
           | them all.. the list changes while I type
        
         | l__l wrote:
         | The point here is that this isn't some example from a textbook
         | or even stack overflow, but licensed pieces of work with all
         | the legal complications that come with that. This is about the
         | potential use of this code in proprietary code (or code
         | otherwise incompatible with the original licenses), and I
         | really don't think anyone would say it is "accepted best
         | practice" to copy out someone else's work you find online,
         | licenses be damned, in a professional setting.
        
           | 542458 wrote:
           | > this isn't some example from a textbook or even stack
           | overflow, but licensed pieces of work with all the legal
           | complications that come with that
           | 
           | I understand why these might _feel_ different to you, but
           | textbooks and stack overflow are also proprietary, licensed
           | pieces of work. I don't see why there would be much of a
           | legal distinction.
        
           | salawat wrote:
           | No, you're missing the point.
           | 
           | There are two worlds.
           | 
           | In one, everytime someone publishes code with a license
           | attached, they've taken a chunk out of the set of valid lines
           | of software capable of being permissibly written without
           | license encumberance. This is the world the poster you are
           | replying to is imagining we're headed toward, and this case
           | basically does a fantastic job of laying a test
           | case/precedent for.
           | 
           | The other world, is one where everyone accepts all
           | programming code is math, and copyrighting things is like
           | erecting artificial barriers to facilitate information
           | asymmetry. I.e. trying to own 2 + 2. In this second
           | hypothetical world, we summarily reject IP as a thing.
           | 
           | The 2nd world is what I'd rather live in, as the first truly
           | feels more and more like hell to me. However, given the first
           | one is the world we're in, I'd like to see the mental
           | gymnastics employed to undermine Microsoft's original
           | software philosophy.
           | 
           | EDIT: Voir dire will be a hoot. Any wagers on how many
           | software people make it onto the jury if any?
        
             | harles wrote:
             | > In one, everytime someone publishes code with a license
             | attached, they've taken a chunk out of the set of valid
             | lines of software capable of being permissibly written
             | without license encumberance.
             | 
             | If this were true of copyright, we would've run out of
             | permissible novels a long time ago. There's plenty to
             | complain about with how software IP works, but copyright
             | seems pretty sane. The alternative of protecting IP via
             | trade secret is not a world I want to live in. That seems
             | bad for open source.
        
               | mitthrowaway2 wrote:
               | Code is a more restrictive space than prose. Prose has to
               | be grammatical and meaningful, but code has to compile
               | and efficiently serve a useful specification.
               | 
               | The central idea of programming languages is that the
               | grammar is very restrictive compared to natural
               | languages. It's quite likely that, with the exception of
               | variable names and whitespace, some function you wrote to
               | implement a circular buffer is coincidentally identical
               | to code that exists in Sony's or Lockheed Martin's
               | codebases.
               | 
               | Plus there's the birthday problem -- coincidences can
               | happen way more than you expect. And even with prose,
               | constraints like non-fiction can narrow things down
               | quickly. If everyone on HN had to write a theee-sentence
               | summary of, say, how a bicycle works, there would
               | probably be coincidentally identical summaries.
        
               | edgyquant wrote:
               | ReactOS actually got sued by Microsoft for stealing code
               | and one of their proofs was a piece of code (can't
               | remember exactly what it did) that basically matched the
               | same function Windows code with a few things changed.
               | 
               | It was ASM code I think, and their defense was that there
               | was basically one way to write a function that does this.
        
               | moyix wrote:
               | I think you're misremembering here; as far as I know (and
               | as far as I can tell from searching just now) MS has
               | never sued ReactOS. There was a claim made back in 2006
               | on the mailing list that a portion of syscall.S was
               | copied, and this caused ReactOS to do their own audit:
               | 
               | https://en.wikipedia.org/wiki/ReactOS#Internal_audit
        
               | harles wrote:
               | Three sentence summaries probably wouldn't qualify for
               | copyright protection. The same should be true of code -
               | if we think the standard for copyright protection is too
               | low, we should raise the bar on complexity requirements,
               | not throw out copyright.
               | 
               | Even if a programming grammar is more restrictive,
               | there's some length where things become almost certainly
               | unique.
        
               | quesera wrote:
               | It raises an interesting question though.
               | 
               | Aside from obligatory syntactic bits, what is the most
               | common line of code across all software ever developed?
               | 
               | It'll probably be C or Java. HTML doesn't count.
               | 
               | And it's probably something boring like:
               | i++;
        
             | l__l wrote:
             | I'm don't think this dichotomy is at all fair. Just because
             | someone makes a piece of software public does not mean they
             | want it freely copied, and I think that can be a completely
             | reasonable stance to have. I'm struggling to make sense of
             | your argument unless you believe either:
             | 
             | - Code is not intellectual property; I don't see this as
             | easily defensible. It takes time, effort, and in some cases
             | seriously heavy resources to come up with some of the tech
             | companies rely on. Should all private companies rescind
             | copyright on literally everything their staff write?
             | 
             | - Intellectual property is a nonsense concept altogether;
             | in this case, I don't think you're ever going to get your
             | way in the court of public opinion.
        
               | williamcotton wrote:
               | This might help shed some light:
               | 
               | https://en.wikipedia.org/wiki/Idea%E2%80%93expression_dis
               | tin...
        
             | rolph wrote:
             | in many cases a snip;routine;proc...whatever you work with,
             | is rote procedure. such as device access. ie retrieving a
             | directory listing.
             | 
             | code that reverts to a conserved sequence of bytes
             | interchanged ,no functional variations.
             | 
             | code that is so common knowledge it has become street
             | graffiti, belongs in world 2
             | 
             | versus code that creates a functionality not available by
             | direct command, is innovative and should be attributed.
             | this sounds like what 1st world should be.
        
               | williamcotton wrote:
               | That's not actually how it works. Purely functional code,
               | such as code that it written in a certain way to achieve
               | maximum performance, is not deemed expressive and
               | therefore not covered by copyright. This code would be
               | covered by patent.
        
               | rolph wrote:
               | i think we are actually talking about the same thing.
               | 
               | in simpl terms:
               | 
               | mov bax eax ; an obvious function; no IP
               | 
               | mov eax eax ; seems useless unless you know what de-
               | referencing is. probably IP
               | 
               | this is of course example not considering granularities
               | at level of patents on a language, or macro directives
        
         | rolph wrote:
         | proper attribution to the writer seems to be a big part of
         | this. there is also suggestion ms knows, all about it but
         | passes the liability buck to the end user of copilot
         | suggestions.
         | 
         | [Lawyer and developer Matthew Butterick announced last month
         | that he'd teamed up with the Joseph Saveri Law Firm to
         | investigate Copilot. They wanted to know if and how the
         | software infringed upon the legal rights of coders by scraping
         | and emitting their work without proper attribution under
         | current open-source licenses.]
         | 
         | https://www.theregister.com/2022/11/07/in_brief_ai/
         | 
         | https://www.theregister.com/2022/10/19/github_copilot_copyri...
        
         | layer8 wrote:
         | Mitigating copyright issues by "rewriting in your own style"
         | arguably only applies to humans doing the rewriting as a
         | creative task, because copyright only applies to human creative
         | works.
        
       | ShamelessC wrote:
       | Eh, their argument is simply that they tuned temperature settings
       | to encourage the model to output slight variations on memorized
       | data. But this is kind of just one of many things you do with a
       | language model and certainly doesn't imply intent to avoid
       | copyright allegations.
       | 
       | Just implies they tuned it for user experience.
       | 
       | I was expecting there to be some discovery around them
       | deliberately fine tuning their model to output modifications if
       | and only if the code had a certain license.
        
         | kevingadd wrote:
         | What's the value of slight variations? Isn't it more likely
         | that the memorized data was already known to be good and
         | effective? It doesn't seem like a useful change unless your
         | goal is to avoid infringement. I don't see how randomly
         | permuting the suggestions improves UX.
        
           | moyix wrote:
           | The lowest temperature isn't always the one that results in
           | working code! This was shown in the original Codex paper:
           | 
           | > When evaluating pass@k, it is important to optimize
           | sampling temperature for the particular value of k. In Figure
           | 5, we plot pass@k against the number of samples k and the
           | sampling temperature. We find that higher temperatures are
           | optimal for larger k, because the resulting set of samples
           | has higher diversity, and the metric rewards only whether the
           | model generates any correct solution.
           | 
           | > In particular, for a 679M parameter model, the optimal
           | temperature for pass@1 is T* = 0.2 and the optimal
           | temperature for pass@100 is T* = 0.8. With these
           | temperatures, we find that pass@1 and pass@100 scale smoothly
           | as a function of model size (Figure 6).
           | 
           | So even with pass@1 (likelihood of getting the right answer
           | in 1 attempt) you don't use T=0, so there will be slight
           | variations in the output each time.
        
         | Brian_K_White wrote:
         | Why else bother with such an input? Are randomizations more
         | likely to be correct or more useful?
        
           | slashdev wrote:
           | I don't know much about AI, but I think one reason you might
           | do that is to learn which variations are preferred (which are
           | committed unmodified) so you can tune the model in the
           | future. I don't know if Github does that, but given they've
           | cited how often code from copilot is committed without
           | modification, I assume they are measuring it at least in some
           | cases.
        
             | Brian_K_White wrote:
             | makes sense
        
           | brookst wrote:
           | Huge topic, worth Googling. Short version is that too little
           | randomness limits the solution space, so retrying suboptimal
           | results yields the same problems.
        
           | 2gremlin181 wrote:
           | Ye olde Bias-Variance tradeoff
        
           | seanhunter wrote:
           | Generally the reason behind adding randomness to machine
           | learning is avoiding "local minima" in the search space of
           | the optimization function(s) used for training the model. If
           | your training produces a very smooth descent to an optimum it
           | can lead to the model converging on a solution that is not
           | globally the best. Adding some randomness helps to avoid
           | this.
           | 
           | Specifically for GPT models, the temperature parameter is
           | used to get outputs wihch are a bit more "creative" and less
           | deterministic. https://help.promptitude.io/en/ai-
           | providers/gpt-temperature
        
           | cubefox wrote:
           | Well, temperature 0 means the completion is always the most
           | "likely" (or "best", after fine-tuning) token, while
           | temperature 1 means to choose the next tokens stochastically
           | according to their probability (or "goodness" after fine-
           | tuning). Usually some temperature in between is chosen, like
           | 0.7. It's not _a priori_ clear to me which is the best way to
           | do it.
        
           | ianbutler wrote:
           | Potentially more correct, yes. It frees the model to choose
           | lower probability tokens to some degree, technically it
           | boosts their probabilities, which may be more correct
           | depending on the task.
           | 
           | There are also sampling schemes, top_p and top_k which can
           | each individually help choose tokens that are less probable
           | (but still highly probable) but more correct, and they are
           | often used together for the best effect.
           | 
           | And then there are various decoding methods like beam search
           | where choosing the most optimal beam may not mean the most
           | optimal individual token.
           | 
           | By default a simple greedy search is used which always
           | chooses the next highest probability token.
        
           | golemotron wrote:
           | Yes.
        
           | GuB-42 wrote:
           | It is worthwhile with creative writing. For example if you
           | ask ChatGPT to write a short story, you want some
           | originality. Even when asking for an explanation it can be
           | useful as you may want to try different things for the
           | explanation that speaks to you the most.
           | 
           | But here we are talking about autocompleting code. I don't
           | think programmers want the autocompleter to be creative. They
           | want the exact same solution everyone uses, hopefully the
           | right one, with only minor changes so that it matches their
           | style and use their own variable names. In my case, I am the
           | programmer, I decide what to do, I just want my autocompleter
           | to save me some keystrokes and copy-pasting boilerplate from
           | the web, the more it looks like existing code the better. I
           | have enough work fixing my own bugs, thank you.
           | 
           | Speaking about bugs, how come everyone talks about code
           | generation that, I think, doesn't bring that much value.
           | Sure, it saves a few keystrokes and copy-pasting from
           | StackOverflow, but I don't feel like it is the thing
           | programmers spend most of the time doing. Dealing with bugs
           | is. By bugs, there are the big ones that have tickets and can
           | take days to analyze and fix, but also the ones that are just
           | a normal part of writing code, like simple typos that result
           | in compiler errors. I think that machine learning could be of
           | great help here.
           | 
           | Just a system that tells me "hey, look here, this is not what
           | I expected to see" would be of great help. Unexpected doesn't
           | mean there is a bug, but it is something worth paying
           | attention to. I know it has been done, but few people seem to
           | talk about it. Or maybe a classifier trained on bug fix
           | commits. If a piece of code looks like code that has been
           | changed in a bug fix commit, there is a good chance it is
           | also a bug. Have it integrated to the IDE, highlight the
           | suspicious part as I type, just as modern IDEs highlight
           | compilation errors in real time.
        
       | brookst wrote:
       | [flagged]
        
       | williamcotton wrote:
       | [flagged]
        
         | matkoniecz wrote:
         | > Downvoting
         | 
         | Presumably people downvoted it because it is really unclear
         | what exactly you are claiming.
         | 
         | Instead of "Everyone needs to first familiarize themselves
         | with" you could write a very simple summary of that and how it
         | relates to this case and your next claim that
         | 
         | > If you're under the impression that every line of code is
         | covered by copyright you are very mistaken.
         | 
         | Well, for example empty ones are really unlikely to be.
         | 
         | Ones that quote out-of copyright works also will not be.
        
           | williamcotton wrote:
           | [flagged]
        
             | catiopatio wrote:
             | The downvotes probably have to do with the fact that:
             | 
             | (1) you lead with a rude and mostly contentless comment,
             | and
             | 
             | (2) your follow-up is merely a dump of Wikipedia quotes,
             | instead of actually summarizing what you've been trying to
             | say.
        
       ___________________________________________________________________
       (page generated 2023-06-10 23:02 UTC)