[HN Gopher] Copilot regurgitating Quake code, including sweary c...
___________________________________________________________________
Copilot regurgitating Quake code, including sweary comments
Author : bencollier49
Score : 1063 points
Date : 2021-07-02 11:52 UTC (11 hours ago)
(HTM) web link (twitter.com)
(TXT) w3m dump (twitter.com)
| kklisura wrote:
| Phew! Are jobs are safe!
| unknownOrigin wrote:
| _snickers_
| CookieMon wrote:
| though our companies will one day be competing with product
| manufacturers in China who get to use it to its fullest
| crescentfresh wrote:
| Direct url to the "gif" in that twitter post:
| https://video.twimg.com/tweet_video/E5R5lsfXoAQDRkE.mp4
|
| I could not figure out how to show it larger on the twitter UI. I
| don't have a twitter account so that may be the problem.
| Thomashuet wrote:
| It seems like a very sensible answer from copilot since the
| prompt includes "Q_" which makes it obvious that the programmer
| is specifically looking for the Quake version of this function.
|
| To me it doesn't show that copilot will regurgitate existing code
| when I don't want it to, just that if I ask it to copy some
| famous existing code for me it will oblige.
| cjaybo wrote:
| Apparently you haven't seen many of the demos that people are
| showing off? Because saying that this only occurs when the
| author is explicitly asking for copied code is blatantly false.
| Thomashuet wrote:
| No I haven't. If you think the other demos are more
| interesting please link to them. I'm just saying that this
| demo is biased and that we can't draw any conclusion from it.
| Actually the author has just confessed optimizing it for
| entertainment in a sister comment. That doesn't mean that the
| claim is false but it doesn't show that it is true either.
| the_mitsuhiko wrote:
| I think you misunderstood my comment. The same code gets
| generated if you call the function `float fast_rsqrt` or
| `float fast_isqrt` for instance. I intentionally wanted it
| to be looking like `Q_rsqrt` so that people pick up on it
| quicker.
| heavyset_go wrote:
| Thanks for making the video in the OP.
|
| Do you have more examples like this that I can share with
| those who don't use Twitter, like a repo or blog post?
| OskarS wrote:
| You're not wrong, but the very idea that it will regurgitate
| copyrighted code _at all_ (and especially at this length, word
| for word), means that it will be totally unacceptable for many
| places. In fact, it is arguably not acceptable to use anywhere
| if you care deeply about copyright.
| advisedwang wrote:
| The claim for AI systems like this is that it has actually
| learned something and is generating code from scratch.
| Oftentimes the authors will claim regurgitation is simply not
| possible, and this example shows that's a lie.
|
| Many arguments on the benefits, legality and power of AI
| systems rely on this claim.
|
| To turn around now and say it's OK to regurgitate in the right
| setting is to move the goalposts.
| caconym_ wrote:
| > Oftentimes the authors will claim regurgitation is simply
| not possible
|
| Do the Copilot authors claim this?
|
| I get that you're suggesting that Copilot may benefit from
| absolute claims made by the authors of other, similar systems
| (or their proponents), but I also don't think it's reasonable
| to exclude nuance and the specifics of Copilot from ongoing
| discussions on that basis. The Copilot authors have publicly
| acknowledged the regurgitation problem, and by their account
| are working on solutions to it (e.g. attribution at
| suggestion-generation time) that don't involve sweeping it
| under the rug.
| rckstr wrote:
| They did! In the faq which I can't find anymore they said:
|
| >GitHub Copilot is a code synthesizer, not a search engine:
| the vast majority of the code that it suggests is uniquely
| generated and has never been seen before. We found that
| about 0.1% of the time, the suggestion may contain some
| snippets that are verbatim from the training set.
| caconym_ wrote:
| This actually seems like an explicit acknowledgement that
| regurgitation _is_ possible, and not remotely a claim
| that it is "simply not possible".
|
| It stands to reason that cases where people are
| intentionally trying to produce regurgitation will
| strongly overlap with the minority of cases where it
| actually happens. So I think we are probably suffering
| from some selection bias in discussions on HN and similar
| forums--that might be unavoidable, and it certainly
| stimulates some interesting discussion, but we should try
| to avoid misrepresenting the product as a whole and/or
| what its creators have said about it.
| sombremesa wrote:
| I think only Github's lawyers would interpret what GP
| posted the way you did. Looks like weasel wording to make
| such an interpretation possible, while making customers
| believe that code is more or less synthesized in
| realtime. "Snippets" makes one think one or two lines of
| code, not entire functions and classes.
| caconym_ wrote:
| I think that until somebody shows that Copilot is willing
| to copy _distinctive_ code fragments verbatim,
| unprompted, with a high occurrence rate, I 'm not going
| to start accusing Github of building an engine to
| cynically exploit the IP rights of open source copyright
| holders for profit. I've seen no evidence of that, and in
| absence of evidence I prefer to remain neutral and open-
| minded.
|
| How would that work, anyway? Rare, distinctive code forms
| seem much more difficult for an ML thing to suggest with
| a high-ish confidence level, since there won't be much
| training data. The Quake thing makes sense because it's
| one of the most famous sections of code in the world, and
| probably exists in thousands of places in the public
| Github corpus.
|
| I'm emphasizing _distinctive_ because a lot of
| boilerplate takes up a lot of room, but still doesn 't
| make a reasonable argument for copyright infringement
| when yours looks like somebody else's.
| sombremesa wrote:
| It looks like you're responding to the wrong comment. I
| don't recall alleging that Github is "building an engine
| to cynically exploit the IP rights of open source
| copyright holders for profit".
| caconym_ wrote:
| > I think only Github's lawyers would interpret what GP
| posted the way you did. Looks like weasel wording to make
| such an interpretation possible,
|
| So what are you suggesting here, except that Github is
| attempting a legal sleight-of-hand to hide real
| infringement?
|
| > while making customers believe that code is more or
| less synthesized in realtime.
|
| What are you suggesting here except that Github is
| (essentially) lying to customers, making them believe
| something that is substantially untrue?
|
| When I say "building an engine to cynically exploit the
| IP rights of open source copyright holders for profit", I
| am talking about a scenario in which they are sweeping
| legitimate IP concerns under the rug with bad faith legal
| weaselry and misrepresentation of how the product
| functions, etc., to chase profit. I do not see how that
| is substantially different from the implications of your
| comment, especially in the context of this subthread.
|
| Could you enlighten me as to how your intended meaning
| substantially differs from my interpretation? If you
| don't mean to accuse Github of malfeasance, we probably
| don't have much to discuss.
| ohazi wrote:
| Nat Friedman explicitly stated that it shouldn't
| regurgitate [0]:
|
| > It shouldn't do that, and we are taking steps to avoid
| reciting training data in the output
|
| He's being woefully naive. To put it bluntly, we don't know
| how to build a neural network that isn't capable of
| spitting out training data. The techniques he pointed to in
| other threads are academic experiments, and nobody seems to
| have a credible explanation for why we should believe that
| they work.
|
| [0] https://news.ycombinator.com/item?id=27677177
| caconym_ wrote:
| "Shouldn't" isn't the same as "doesn't".
|
| I'm not anything close to an ML expert, and I have no
| opinion on whether what they're aiming for is possible,
| but this document^[1] (linked in your linked comment)
| states explicitly that they are aware of the recitation
| issue and are taking steps to mitigate it. So, in the
| context of the comment I replied to, I think Github is
| very far from claiming that recitation is "simply not
| possible".
|
| ^[1] https://docs.github.com/en/github/copilot/research-
| recitatio...
| ohazi wrote:
| That kind of bullshit phrasing can only get you so far.
|
| It's like if some corporate PR department told you "we're
| aware of the halting problem, and are taking steps to
| mitigate it." You would rightly laugh them out of the
| room.
|
| It's not going to work, and the people making these
| statements either don't understand how much they don't
| understand, or are deluding themselves, or are actively
| lying to us.
|
| An honest answer would be something like "We are aware
| that this is a problem, and solving it is an active area
| of research for us, and for the machine learning
| community at large. While we believe that we will
| eventually be able to mitigate the problem to an
| acceptable degree, it is not yet known whether this
| category of problem can be fully solved."
| caconym_ wrote:
| You're using some pretty strong language here, but do you
| have any more substantive criticisms of the analysis they
| present at
| https://docs.github.com/en/github/copilot/research-
| recitatio... ? They seem to think the incidence of
| meaningful (i.e. substantively infringing) recitation is
| very low, and that their solution in those cases will be
| attribution rather than elimination.
|
| Again, I'm not an ML expert, but that sounds a lot more
| reasonable to me than announcing one's intention to solve
| the halting problem.
| ohazi wrote:
| They had some people use the thing for a while, and
| concluded "Hey look, it doesn't seem to quote verbatim
| very often. Yay!" There is nothing in there that
| describes any sort of mitigation. The three sentences
| about an attribution search at the very end are
| aspirational at best, and are presented as "obvious" even
| though it's not at all clear that such a fuzzy search can
| be implemented reliably.
|
| I use the halting problem as an analogy because their
| naive attempts to address this problem feel a lot like
| naive attempts to get around the halting problem ("just
| do a quick search for anything that looks like a loop,"
| "just have a big list of valid programs," etc.). I can
| perform a similar analysis of programs that I run in my
| terminal and come to a similar "Hey look, most of them
| halt! Yay!" conclusion. I can spin a story about how most
| of the ones that don't halt are doing so intentionally
| because they're daemons.
|
| But this approach is inherently flawed. I can use a fuzz
| tester to come up with an infinite number of inputs that
| cause something as simple as 'ls' to run forever.
|
| Similarly, I can come up with an infinite number of
| adversarial inputs that attempt to make Copilot spit out
| training data. Some of them will work. Some of them will
| produce something that's close enough to training data to
| be a concern, but that their "attribution search" will
| fail to catch. That's the "open research question" that
| they need to solve.
|
| We _don 't have_ a general solution to this problem yet,
| and we may never have one. They're trying to pass off a
| hand-wavey "we can implement some rules and it won't be a
| problem most of the time" solution as adequate. I don't
| see any reason to believe that it will be adequate. Every
| attempt I've seen at using logic to try and coax a
| machine learning model into not behaving pathologically
| around edge cases has fallen flat on its face.
| caconym_ wrote:
| > The analysis you're citing is just that -- a
| statistical analysis. They had some people use the thing
| for a while, and concluded "Hey look, it doesn't seem to
| quote verbatim very often. Yay!" There is nothing in
| there that describes any sort of mitigation.
|
| > The three sentences about an attribution search at the
| very end are aspirational at best, and are presented as
| "obvious" even though it's not at all clear that such a
| fuzzy search can be implemented reliably.
|
| I agree with all of this, though I do think that the
| attribution strategy they describe sounds a lot easier
| than solving the halting problem or entirely eliminating
| recitation in their model. Obviously, the proof will be
| in the pudding.
|
| Maybe you and others are reacting to them framing this as
| "research", as if they're trying to prove some
| fundamental property of their model rather than simply
| harden it against legally questionable behavior in a more
| practical sense. I think a statistical analysis is fine
| for the latter, assuming the sample is large enough.
| csande17 wrote:
| The biggest issue with that analysis is that their model
| is clearly very able to copy code and change the variable
| names, copying code and changing variable names is very
| clearly still "copying", and the analysis doesn't seem to
| include that in its definition of "recitation event".
| caconym_ wrote:
| I'd fully expect it to copy code and change variable
| names in a lot of cases--if it wants to achieve the goal
| of filling in boilerplate, how could it do anything else?
| That's pretty much the definition of boilerplate: it's
| largely the same every time you write it.
|
| What's less clear to me is that Copilot regularly does
| that sort of thing with code distinctive enough that it
| could reasonably be said to constitute copyright
| infringement. If somebody's actually shown that it does,
| I'd love to see that analysis.
| the_mitsuhiko wrote:
| I was able to trigger it without the Q prompt before. It just
| made the a nicer looking gif that way.
|
| I got it to produce more GPL code too, that one is just not
| entertaining.
| throwaway_egbs wrote:
| Welp, guess I'll be taking all my code off of GitHub now, lest it
| be copied verbatim while ignoring my licenses.
|
| (I'm no John Carmack, but still.)
| throwaway_egbs wrote:
| This reply from @AzureDevOps is bizarre: "We understand. However,
| the way to report this issues related to Windows 11 is through
| our Windows Insider even from another device. Thanks in advance."
|
| I think I'm gonna give "AI" a few more years.
|
| https://twitter.com/AzureDevOps/status/1411018079849619458
| bob1029 wrote:
| This is pretty clearly just a search engine with more parameters.
|
| I thought there was something more going on with copilot, but the
| fact that it is regurgitating arbitrary code comments tells me
| that there is zero semantic analysis going on with the actual
| code being pulled in.
| josefx wrote:
| They openly claim it is an AI. What about the state of AI
| currently in use made you think that there was any intelligence
| behind it?
| thegeomaster wrote:
| It is decidedly not "just a search engine with more
| parameters." Language models are just prone to repeating
| training examples verbatim when they have a strong signal with
| the prompt. Arguably, in this case, it is the most correct
| continuation.
| saynay wrote:
| It's more that the model is so large it is capable of
| memorizing a lot. This can be seen in other language models
| like GPT-3 as well.
|
| Comments, I suspect, will be more likely to be memorized since
| the model would be trained to make syntactically correct
| outputs, and a comment will always be syntactically correct.
| That would mean there is nothing to 'punish' bad comments.
| username90 wrote:
| The model in this case is just a lossy compression of github,
| and you search that.
| sydthrowaway wrote:
| What causes this in a net? I'm guessing the RNN gets in a
| catastrophic state..
| salawat wrote:
| Neural nets aren't magic. You actually need quite a bit of
| complexity and modeling of interrelated problem spaces to get
| anything more than a childlike naivete or trauma savant-like
| mastery of one particular area with crippling deficiencies
| elsewhere.
| otabdeveloper4 wrote:
| > catastrophic state
|
| No, overfitting is the normal state for neural nets.
| captainmuon wrote:
| I would say overfitting - the net doesn't "understand" the code
| in any meaningful sense. It just finds fitting examples and
| jumbles them a bit.
|
| Understanding would mean to have an internal representation
| related to the intention of the user, the expected behavior,
| and say the AST of the code. My pessimistic interpretation of
| this and many other recent AI applications is that it is a
| "better markov chain".
| LeanderK wrote:
| a markov chain can have an internal representation related to
| the intention of the user. I guess this example just got
| copied a lot and is therefore included multiple times in the
| training data, forcing the network to memorise it. Neural
| networks always memorise things that appear too frequent.
| Memorized Artifacts in an otherwise working neural network is
| usually seen as a "bug" (since the training allowed the
| network to cheat), not as a proof that the network didn't
| generalise.
| michaelt wrote:
| This is the network working as designed.
|
| I mean, if you wrote an autocomplete system for written english
| and asked it to complete the sentence "O Romeo, Romeo" what
| would you expect to happen?
|
| You'd expect it to complete to "O Romeo, Romeo, wherefore art
| thou Romeo?" - a very famous quote.
|
| How else could you produce the single right output for that
| unique input, other than memorising and regurgitating?
| NumberCruncher wrote:
| > You'd expect it to complete to "O Romeo, Romeo, wherefore
| art thou Romeo?" - a very famous quote.
|
| What about completing it to "O Romeo, Romeo, brave Mercutio
| is dead", based on the context, as advertised?
| villgax wrote:
| That salesforce engineer is the average developer when prompting
| to write an About me section lol
| ronenlh wrote:
| There are a lot of good points made against copilot. But I'm
| optimistic in that it will improve with time. At worst it's an
| efficient code copy-pasting tool, but at best it could be the
| next level of abstraction.
| littlestymaar wrote:
| Copilot was only trained on public code right? Because that would
| be an absolute PR disaster if some proprietary code was leaked
| that way.
| ezekg wrote:
| I'm curious if you can find certain keywords that may leak
| private code...
| imnitishng wrote:
| Waiting for the access to test this out
| nadsat2021 wrote:
| I don't understand why this is an issue. And now I have a tool to
| do my coding, as well as my social media contributions:
| "'Sweary comments' is not grammatically correct English." ->
| GPT-2 -> "Sweary comments" is not grammatically
| correct English. The "sweary" meaning is used as slang by those
| in charge, especially the ones who get a free ride through the
| media, for the sake of a quick buck, or the sake of getting a
| better job with a promotion. The word is commonly used in a
| social context and comes from an actual "sweary" comment. The
| statement or phrase, "You're too shy!" is derived from this
| phrase: "I wish he were as outgoing as you."
|
| Am I right?
| PaulHoule wrote:
| This kind of thing devalues the work that programmers do.
|
| The head example they show is using a sentiment analysis API
| which is about the most useless use of technology there is.
| saiojd wrote:
| Based on all the negative comments so far, and based on this
| website's aptitude at predicting the viability of a product, it
| really seems like Copilot is bound to be a success.
| DantesKite wrote:
| Right?
|
| Even given all its initial problems, I don't see a world where
| people completely avoid using it.
| sydthrowaway wrote:
| We're gonna Dropbox this thing all the way to the top!
| stusmall wrote:
| No wireless. Less space than a nomad. Lame.
| saiojd wrote:
| Yeah. I get why people's initial reaction is to dislike it
| tbh. Honestly I doubt the utility will be huge for experts,
| mostly likely it will just alleviate having to remember
| certain how a certain language implements a specific concept.
| sktrdie wrote:
| I'm going to go against the flow here and say that worrying about
| this is similar to worrying about the license we give to snippets
| of code we copy-paste from other licensed code.
|
| The reality is that we never attribute the original source
| because we copy-paste it, change it up a bit, and make it our
| own. Literally everybody does this.
|
| I still care about licensing and proper attribution but the
| reality is that a snippet of code is not something so easy to
| attribute. Should we attribute all kinds of ideas, even the very
| small ones? How quickly is an idea copied, altered & reused? Can
| we attributes all the thoughts humans have?
| nadsat2021 wrote:
| "sweary comments"
|
| When I hear phrases like that, I worry more about human
| intelligence. "I'm a little tea pot, short and stout," said
| social media.
|
| I watched Kubrik's "A Clockwork Orange" again this week, after a
| certain amount of fearful anticipation.
|
| When Alex said "eggy weggie", it clicked. It's like Burgess time-
| traveled to 2021 to document our modern infantilization and
| antisocialization. He forgot to include the Internet, loss of
| humor, and emerging AI, but I guess he was overwhelmed by the
| enormity of the "baddiwad".
|
| Later on, droogs.
| Osiris wrote:
| I assumed it was trained on source code that was explicitly
| licensed with a permissive license. Are they training it using
| private unlicensed repos also?
| GhostVII wrote:
| Sounds like we need another tool called "Auditor" that scans your
| code to see if it violates copyright laws.
| omgwtfbbq wrote:
| The uproar over Copilot is kind of hilarious. Maybe it's SWE's
| realizing that they might not be as irreplaceable as they seem
| but its an awful lot of salty comments. If anything I think
| Copilot is a really cool PoC and shows just how close we are
| getting to automating large portions of the code writing process
| which we should all welcome as more cycles can be spent on
| architecture and system design.
| ethbr0 wrote:
| The irony is that we're whinging about a tool that generates code
| that will be difficult to understand _in the future_...
|
| ... and the example is mathematically- and floating-point-spec
| obtuse enough that it was incomprehensible at the time _it was
| written_. (As evidenced by id comments)
| maliker wrote:
| Copilot transitions programmers from writing code to reading
| auto-generated code. And the feeling is that reading code is 10x
| harder than reading it? Seems like a rich source of problems.
|
| (However, I'm still definitely going to try this out once I get
| off the waitlist.)
| rgbrenner wrote:
| So this makes it official... this post[0] and the comments on the
| announcement[1] concerned about licensing issues were absolutely
| correct... and this product has the possibility of getting you
| sued if you use it.
|
| Unfortunately for GitHub, there's no turning back the clocks.
| Even if they fix this, everyone that uses it has been put on
| notice that it copies code verbatim and enables copyright
| infringement.
|
| Worse, there's no way to know if the segment it's writing for you
| is copyrighted... and no way for you to comply with license
| requirements.
|
| Nice proof of concept... but who's going to touch this product
| now? It's a legal ticking time bomb.
|
| 0. https://news.ycombinator.com/item?id=27687450
|
| 1. https://news.ycombinator.com/item?id=27676266
| sktrdie wrote:
| If they get rid of licensed stuff it should be ok no? I really
| want to use this and seems inevitable that we'll need it just
| as google translate needs all of the books + sites + comments
| it can get a hold of.
| ianhorn wrote:
| Unlicensed code just means "all rights reserved." You'd need
| to limit it to permissively licensed code and make sure you
| comply with their requirements.
| runeb wrote:
| How would they do that?
| oauea wrote:
| Read the LICENSE file in each repo.
| rovr138 wrote:
| What guarantees it's intact?
| [deleted]
| mmastrac wrote:
| Well... the whole training set is licensed, so you can't
| really get rid of it. I think that the technology they are
| using for this is just not ready.
| fragmede wrote:
| Just retrain the model using properly licensed code?
| ("just" is doing a ton of heavy lifting, but let's be real,
| that's not impossibly hard)
| [deleted]
| eCa wrote:
| Which licenses would it be ok that the training material is
| licensed under, though? If it produces verbatim enough copies
| of eg. MIT licensed material, then attribution is required.
| Similar with many other open source-friendly licenses.
|
| On the other hand, if only permissive licenses that also
| don't require attribution is used, well, then for a start,
| the available corpus is much smaller.
| eganist wrote:
| Adding to this:
|
| I run product security for a large enterprise, and I've already
| gotten the ball rolling on prohibiting copilot for all the
| reasons above.
|
| It's too big a risk. I'd be shocked if GitHub could remedy the
| negative impressions minted in the last day or so. Even with
| other compensating controls around open source management, this
| flies right under the radar with a c130's worth of adverse
| consequences.
| fragmede wrote:
| Do you also block stack overflow and give guidance to never
| copy code from that website or elsewhere on the Internet? I'm
| legitimately curious - my org internally officially denounces
| the copying of stack overflow snippets. Thankfully for my
| role it's moot as I mostly work with an internal non-public
| language, for better or worse, and I have no idea how well
| that's followed elsewhere in the wider company.
| samtheprogram wrote:
| Anything posted to Stack Overflow has a specific (Creative
| Commons IIRC) license associated with it. The same is not
| true of GitHub Copilot, and in fact their FAQ doesn't
| specify a license at all, probably because they are
| technically unable to since it is trained on a wide variety
| of code from differing licenses (and code not written by a
| human is currently a grey area for copyright). The FAQ
| simply says to use it at your own risk.
| summerlight wrote:
| Google (and most of other big techs I guess?) also
| explicitly prohibit employees from use of stack overflow
| code snippets.
| Noumenon72 wrote:
| I tried Googling this and couldn't find it. I also don't
| want to believe it because it seems like the world
| suddenly turned into an apocalyptic hellscape with no
| place for developers like me. Do you have a source?
| gunapologist99 wrote:
| Apples and oranges: Stack overflow snippets are explicitly
| granted under a permissive license, as long as you
| attribute.
|
| https://stackoverflow.com/help/licensing
|
| It appears that the code that copilot is using is created
| under a huge variety of licenses, making it risky.
|
| On the other hand, a small snippet in a function that is
| derived from many existing pieces of other code may fall
| under fair use, even if it is not under an open source
| license of some sort.
| rorykoehler wrote:
| It just seems bizarre that this wasn't flagged internally
| at Microsoft. They have tons of compliance staff.
| mustacheemperor wrote:
| Maybe we'll even get a sneak peak at Windows 11's source
| code. Time to start writing a Win32 API wrapper and see
| what the robot comes up with!
| snicker7 wrote:
| That's because Microaoft doesn't dare use this for
| production code (presumably).
|
| They are 100% okay with letting their competitors get
| into legal hot water.
| rorykoehler wrote:
| It's surely a bit of a liability grey area?
| ngcazz wrote:
| Could bet they baked in the legal fees and are taking a
| calculated risk
| comex wrote:
| Except that CC-BY-SA is not a permissive license; the SA
| part is a form of copyleft. It's just that nobody
| enforces it. From the text [1]:
|
| - "[I]f You Share Adapted Material You produce [..] The
| Adapter's License You apply must be a Creative Commons
| license with the same License Elements, this version or
| later, or a BY-SA Compatible License."
|
| - "Adapted Material means material [..] that is _derived
| from_ or based upon the Licensed Material " (emphasis
| added)
|
| - "Adapter's License means the license You apply to Your
| Copyright and Similar Rights in Your contributions to
| Adapted Material in accordance with the terms and
| conditions of this Public License.'
|
| - "You may not offer or impose any additional or
| different terms or conditions on, or apply any Effective
| Technological Measures to, Adapted Material that restrict
| exercise of the rights granted under the Adapter's
| License You apply."
|
| A program that includes a code snippet is unquestionably
| a derived work in most cases. That means that if you
| include a Stack Overflow code snippet in your program,
| and fair use does not apply, then you have to license the
| _entire program_ under the CC-BY-SA. Alternately, you can
| license it under the GPLv3, because the license has a
| specific exemption allowing you to relicense under the
| GPLv3.
|
| For open source software under permissive licenses, it
| may actually be okay to consider the entire program as
| licensed under the CC-BY-SA, since permissive licenses
| are typically interpreted as allowing derived works to be
| licensed under different licenses; that's how GPL
| compatibility works. But you'd have to be careful you
| don't distribute the software in a way that applies any
| Effective Technological Measures, aka DRM. Such as via
| app stores, which often include DRM with no way for the
| app author to turn it off. (It may actually be better to
| relicense to the GPL, which 'only' prohibits adding
| additional terms and conditions, not the mere use of DRM.
| But people have claimed that the GPL also forbids app
| store distribution because the app store's terms and
| conditions count as additional restrictions.)
|
| For proprietary software where you do typically want to
| impose "different terms or conditions", this is a dead
| end.
|
| Note that copying extremely short snippets, or snippets
| which are essentially the only way to accomplish a task,
| may be considered fair use. But be careful; in Oracle v.
| Google, Google's accidental copying of 9 lines of utterly
| trivial code [2] was found to be neither fair use nor "de
| minimis", and thus infringing.
|
| Going back to Stack Overflow, these kinds of surprising
| results are why Creative Commons itself does not
| recommend using its licenses for code. But Stack Overflow
| does so anyway. Good thing nobody ever enforces the
| license!
|
| See also:
| https://opensource.stackexchange.com/questions/6777/can-
| i-us...
|
| [1] https://creativecommons.org/licenses/by-
| sa/4.0/legalcode
|
| [2] https://majadhondt.wordpress.com/2012/05/16/googles-9
| -lines/
| [deleted]
| wrs wrote:
| Yes. In a past life, after researching the situation, we
| had to find and remove all the code copied from Stack
| Overflow into our codebase. I can't fathom why SO won't
| fix the license.
|
| What makes it even worse is if you try to do the right
| thing by crediting SO (the BY part) you're putting a red
| flag in the code that you should have known you have to
| share your code (the SA part).
| aasasd wrote:
| In addition to other licensing gotchas, a ton of SO
| snippets are copied wholesale from elsewhere--docs or
| blog posts. So it's pretty likely that the poster can't
| license them in the first place because they never
| checked the source's license requirements.
| mediaman wrote:
| Who really copies stack overflow snippets verbatim? It's
| usually just easier to refer to it for help figuring out
| the right structure and then adapt it for your own needs.
| Usually it needs customization for your own application
| anyway (variables, class instances, etc).
| canadev wrote:
| Yeah! I've uh, ... never copied a bit of code into my
| repo verbatim, right?
|
| yeah right. I wish.
|
| (Not saying every dev does this)
| TillE wrote:
| I've copied plenty of Microsoft sample code verbatim,
| because the Win32 API sucks and their samples usually get
| the error handling right.
|
| But, I can't think of a single scenario where I've copied
| something from Stack Overflow. I'm searching for the idea
| of how to solve a problem, and typically the relevant
| code given is either too short to bother copying, or it's
| long and absolutely not consistent with how I want to
| write it.
| Noumenon72 wrote:
| "Too short to bother copying"? I copy single words of
| text to avoid typing and typos. I would never type out
| even a single line of code when I could paste and edit.
| blooalien wrote:
| I don't think I've _ever_ copied code directly from any
| of the Stack* sites. I generally read all the answers
| (and comments) and then use what I learn to write my own
| (hopefully better) code specific to my needs.
| corobo wrote:
| Yeah my experience has always been "ohhh that solution
| makes sense" then I go write it myself
|
| If nothing else this whole copilot thing is helping ease
| some chronic imposter syndrome
| bartread wrote:
| Ha! Well, I think a lot of people copy code from
| StackOverflow verbatim once at least - including me.
|
| Of course it turned out the code I'd blindly inserted
| into my project contained a number of bugs. In one or two
| cases, quite serious ones. This, even though it was the
| accepted answer.
|
| It was probably more effort to fix up the code I'd copy
| pasta'd than write it from scratch. Since then I've never
| copied and pasted from StackOverflow verbatim.
| baud147258 wrote:
| I think I did a few times, usually for languages that I
| wasn't going to spend to much time with (so no benefits
| in figuring how to do it from the answers) and for
| specific tasks.
| jpswade wrote:
| Not only this but a huge amount of publicly available code is
| truly terrible and should never really be used other than a
| point of reference, guidance.
| Kiro wrote:
| No-one cares about this. People have no clue about licenses and
| just copy-paste whatever. If someone gets access to their code
| and see all the violations they're screwed anyway.
| jerf wrote:
| Ask your legal department about that. Sure, engineers don't
| care about licensing at all, but we are not the only players
| here.
| [deleted]
| [deleted]
| __MatrixMan__ wrote:
| Is it still a legal concern if I'm just coding because I want
| to solve a problem and I'm not trying to use it to do business?
| maclockard wrote:
| If you publish the code anywhere, potentially. You could be
| (unknowingly) violating the original license if the code was
| copied verbatim from another source.
|
| How much of a concern this is depends heavily on what the
| original source was.
| kevin_thibedeau wrote:
| Distributing binaries to third parties is enough to trigger
| a license violation. For internal corporate tools, it would
| be less of an issue as "distribution" hasn't happened.
| lolinder wrote:
| And the problem with copilot is that you have no way of
| knowing. If it changes even a little bit of the code, it's
| basically ungoogleable but still potentially in violation.
| saurik wrote:
| Yes: not all code on GitHub is licensed in a way that lets
| you use it _at all_. People focus on GPL as if that were the
| tough case; but, in addition to code (like mine) under AGPL
| (which you need to not use in a product that exposes similar
| functionality to end users) there is code that is merely
| published under "shared source" licenses (so you can look,
| but not touch) and even literally code that is stolen and
| leaked from the internals of companies--including
| Microsoft!... this code often gets taken down later, but it
| isn't always noticed and either way: it is now part of
| Copilot :/--that, if you use this mechanism, could end up in
| your codebase.
| eximius wrote:
| Seems like the liability should also be on _Copilot itself_ ,
| as a derivative work.
| fourseventy wrote:
| Ahh yes the infamous "evil floating point bit level hacking" code
| tyingq wrote:
| They have 4 hand picked examples on their homepage:
| https://copilot.github.com/
|
| One has the issue with form encoding:
| https://news.ycombinator.com/item?id=27697884
|
| The python example is using floats for currency, in an expense
| tracking context.
|
| The golang one uses a word ("value") for a field name that's been
| a reserved word since SQL-1999. It will work in popular open
| source SQL databases, but I believe it would bomb in some servers
| if not delimited...which it is not.
|
| The ruby one isn't outright terrible, but shows a very
| Americanized way to do street addresses that would probably
| become a problem later.
|
| And these are the hand picked examples. This product seems like
| it needs some more thought. Maybe a way to comment, flag, or
| otherwise call out bad output?
| xyzzy_plugh wrote:
| > The golang one uses a word ("value") for a field name that's
| been a reserved word since SQL-1999. It will work in popular
| open source SQL databases, but I believe it would bomb in some
| servers if not delimited...which it is not.
|
| In their defense they created the table with this column before
| invoking the autocomplete, so they sort of reap what the sow
| here.
|
| It could at least auto-quote the column names to remove the
| ambiguity, but it's not a compiler, is it.
| mempko wrote:
| These are great examples. I wrote about how this will propagate
| all sorts of bugs.
|
| But my argument was that it's good enough developers may get
| complacent and not review the auto complete closely enough. But
| maybe I'm wrong! Maybe it's not that good yet.
| shadowgovt wrote:
| Now that they have an AI that can be trained to replicate code,
| it looks like the next step is training it to replicate good
| code. That will be non-trivial, since step one is identifying
| good code and they may not have much big data signal to draw
| from for that.
|
| We know you can't use StackOverflow upvotes. However, they
| should have enough signal to identify what snippets of code
| have been most frequently copy-pasted from one project to
| another.
|
| Question is whether that serves as a good proxy for good code
| identification.
| slver wrote:
| > And these are the hand picked examples. This product seems
| like it needs some more thought.
|
| Everyone's self-preservation instincts kicking in to attack
| Copilot is kinda amusing to watch.
|
| Copilot is not supposed to produce excellent code. It's not
| even supposed to produce final code, period. It produces
| suggestions to speed you up, and it's on you to weed out stupid
| shit, which is INEVITABLE.
|
| As a side note, Excel also uses floats for currency, so best
| practice and real world have a huge gap in-between as usual.
| Supermancho wrote:
| > Everyone's self-preservation instincts kicking in to attack
| Copilot is kinda amusing to watch
|
| Nobody is threatened by this, assuredly. As with IDEs giving
| us autocomplete, duplication detection, etc this can only be
| helpful. There is an infinite amount of code to write for the
| foreseeable future, so it would be great if copilot had more
| utility.
| mkr-hn wrote:
| Have you met programmers? Even those who care about quality
| are often under a lot of pressure to produce. Things slip
| through. Before, it was verbatim copies from Stack Overflow.
| Now it'll be using Copilot code as-is.
| slver wrote:
| So, nothing new, is your point?
| mkr-hn wrote:
| Then why are you complaining? Unless something is new
| that warrants you getting mad about people getting mad at
| technology.
| saiojd wrote:
| Not the parent, but people really like to get riled up on
| the same topics, over and over again, which quickly
| monopolizes and derails all conversion. Facebook bad, UIs
| suck, etc. We can now add to the list, "AI will never
| reduce demand for software engineering".
| volta83 wrote:
| So how do you know if the code that Copilot regurgitates is
| almost a 1:1 verbatim copy of some GPL'ed code or not ?
|
| Because if you don't realize this, you might be introducing
| GPL'ed code into your propiertary code base, and that might
| end up forcing you to distribute all of the other code in
| that code base as GPL'ed code as well.
|
| Like, I get that Copilot is really cool, and that software
| engineers like to use the latest and bestest, but even if the
| code produced by Copilot is "functionally" correct, it might
| still be a catastrophic error to use it in your code base due
| to licenses.
|
| This issue looks solvable. Train 2 copilots, one using only
| BSD-like licensed software, and one using also GPL'ed code,
| and let users choose, and/or warn when the snippet has been
| "heavily inspired" by GPL'ed code.
|
| Or maybe just train an adversarial neural network to detect
| GPL'ed code, and use it to warn on snippets, or...
| the_rectifier wrote:
| You have the same issue with MIT because it requires
| attribution
| slver wrote:
| It's very easy: don't use copilot code verbatim, and you
| won't have GPL code verbatim.
| volta83 wrote:
| > It's very easy: don't use copilot
|
| Fixed that for you.
|
| Verbatim isn't the problem / solution. If you take a
| GPL'ed library and rename all symbols and variables, the
| output is still a GPL'ed library.
|
| Just seeing the output of GPL'ed code spitted by copilot
| and writing different code "inspired" by it can result in
| GPL'ed code. That's why "clean room"s exist.
|
| Copilot is going to make for a very interesting to follow
| law case, because probably until somebody sues, and
| courts decide, nobody will have a definitive answer of
| whether it is safe to use or not.
| throw_2021-07 wrote:
| Stack Overflow content is licensed under CC-BY-SA. Terms
| [1]:
|
| * Attribution -- You must give appropriate credit,
| provide a link to the license, and indicate if changes
| were made. You may do so in any reasonable manner, but
| not in any way that suggests the licensor endorses you or
| your use.
|
| * ShareAlike -- If you remix, transform, or build upon
| the material, you must distribute your contributions
| under the same license as the original.
|
| In over a decade of software engineering, I've seen many
| reuses of Stack Overflow content, occasionally with links
| to underlying answers. All Stack Overflow content use
| I've seen would clearly fail the legal terms set out by
| the license.
|
| I suspect Copilot usage will similarly fail a stringent
| interpretation of underlying licenses, and will similarly
| face essentially no enforcement.
|
| [1] https://creativecommons.org/licenses/by-sa/4.0/
| guhayun wrote:
| The solution might be simpler than we think,just tell the
| algorithm
| didibus wrote:
| Doesn't this go beyond license and into copyright?
|
| The license lets you modify the program, but the copyright
| still enforces that you can't copy/past code from it to
| your own project no?
| pydry wrote:
| It's true I probably wouldnt have laughed quite as loudly if
| there werent a chorus of smug economists telling us that
| tools like this are gonna put me out of a job.
| slver wrote:
| Business types hate dealing with programmers, that's a
| fact. And these claims of "we'll replace programmers"
| happen with certain precise regularity.
|
| Ruby on Rails was advertised as so simple, startup founders
| who can't program were making their entire products in it
| in a few days, with zero experience. As if.
| j-pb wrote:
| If I want random garbage in my codebase that I have to fix
| anyways I might as well hire a underpaid intern/junior.
|
| It's easier to write correct code than to fix buggy code. For
| the former you have to understand the problem, for the latter
| you have to understand the problem, and a slightly off
| interpretation of it.
| tyingq wrote:
| _" self-preservation"_
|
| My suggestion was a way to comment or flag, not to kill the
| product. These were particularly notable to me because
| someone hand-picked these 4 to be the front page examples of
| what a good product it was.
| saiojd wrote:
| I agree with you. This is basically similar to autocomplete
| on cellphone keyboard (useful because typing is hard on
| cellphone), but for programming (useful because what we type
| tends to involve more memorization than prose).
| tyingq wrote:
| >As a side note, Excel also uses floats for currency
|
| It's still problematic, but the defaults and handling there
| avoid some issues. So, for example:
|
| Excel: =1.03-.42 produces 0.61, by default, even if you
| expand out the digits very far.
|
| Python: 1.03-.42 produces 0.6100000000000001, by default.
| slver wrote:
| Excel rounds doubles to 15 digits for display and
| comparison. The exact precision of doubles is something
| like 15.6 digits, those remaining 0.6 digits causing some
| of those examples floating (heh) around.
| okl wrote:
| That depends
| https://randomascii.wordpress.com/2012/03/08/float-
| precision...
| slver wrote:
| A lot of these edge cases are about theoretical concerns
| like "how many digits we need in decimal to represent an
| exact IEEE binary float".
|
| In practice a double is 15.6 digits precise, which Excel
| rounds to 15 to eliminate some weirdness.
|
| In their documentation they do cite their number type as
| 15 digit precision type. Ergo that's the semantic they've
| settled on.
| ssss11 wrote:
| "Maybe a way to comment, flag, or otherwise call out bad
| output?"
|
| A copilot for copilot? :)
| TeMPOraL wrote:
| - The Go one (averaging) is non-idiomatic, and has a nasty bug
| in it: https://news.ycombinator.com/item?id=27698287
|
| - The JavaScript one (memoization) is a bad implementation, it
| doesn't handle some argument types you'd expect it to handle:
| https://news.ycombinator.com/item?id=27698125
|
| You can tell a lot about what to expect, if there are so many
| bugs in the very examples used to market this product.
| gentleman11 wrote:
| > The python example is using floats for currency.
|
| Dumb question, but what is the proper way to handle currency?
| Custom number objects? Strings for any number of decimal
| places?
| spamizbad wrote:
| For Python, I prefer decimal.Decimal[1]. When you serialize,
| you can either convert it to a string (and then have your
| deserializer know the field type and automatically encode it
| back into a decimal) OR just agree all numeric values can
| only be ints or decimals. You can pass
| parse_float=decimal.Decimal to json.loads[2] to make this
| easier.
|
| My most obnoxious and spicy programming take is that ints an
| decimals should be built-in and floats should require
| imports. I understand why though: Decimal encoding isn't
| anywhere near as standardized as other numeric types like
| integers or floating-point numbers.
|
| [1] https://docs.python.org/3/library/decimal.html [2]
| https://docs.python.org/3/library/json.html
| dragonwriter wrote:
| > My most obnoxious and spicy programming take is that ints
| an decimals should be built-in and floats should require
| imports
|
| I don't care about making inexact numbers require imports,
| but the most natural literal formats should produce exact
| integers, decimals, and/or rationals.
| dangerbird2 wrote:
| Either a fixed-point decimal (i.e. an integer with the ones
| representing 1/100, 1/1000, etc. of a dollar, or a ratio type
| if you need arbitrary precision.
| Quekid5 wrote:
| > ratio type if you need arbitrary precision.
|
| This is the better default, so I'd ditch the qualifier,
| personally. At the very least when it comes to the
| persistent storage of monetary amounts. People often start
| out _thinking_ that they won 't need arbitrary precision
| until that _one little requirement_ trickles into the
| backlog...
|
| Arbitrary precision rationals handles all the artithmetic
| you could reasonably want to do with monetary amounts and
| it lets you decide where to round _at display time_ (or
| when generating a final invoice or whatever), so there 's
| no information loss.
| SmooL wrote:
| Yeah, you probably want to use some sort of decimal package
| for a configurable amount of precision, and then use strings
| when serializing/storing the values
| wodenokoto wrote:
| A lot of good answers, but they mostly relate to accounting
| types of problems (which granted, is what you need to do with
| currency data 99% of the time)
|
| I'd just add that if you are building a price prediction
| model, floats are probably what you need.
| tyingq wrote:
| The example code is the start of an expense tracking
| tool...
| stickfigure wrote:
| Create a Money class, or use one off the shelf. It should
| store the currency and the amount. There are a few popular
| ways of storing amounts (integer cents, fixed decimal) but it
| should not be exposed outside the Money class.
|
| There's plenty of good advice in this subthread for how to
| represent currency inside your Money abstraction, but
| whatever you do, keep it hidden. If you pass around numbers
| as currency values you will be in for a world of pain as your
| application grows.
| pizza234 wrote:
| This is a complex topic, mainly for two reasons: 1. it works
| on two layers (storage and code) 2. there is a context to
| take care of.
|
| [Modern] programming languages have decimal/rational data
| types, which (within limits) are exact. Where this is not
| possible, and/or it's undesirable for any reason, just use an
| int and scale it manually (e.g. 1.05 dollars = int 105).
|
| However, point 2 is very problematic and important to
| consider. How do account 3 items that cost 1/3$ each (e.g. if
| in a bundle)? What if they're sold separately? This really
| depends on the requirements.
|
| My 20 cents: if you start a project, start storing currency
| in an exact form. Once a project grows, correcting the FP
| error problem is a big PITA (assuming it's realistically
| possible).
| himinlomax wrote:
| > How do account 3 items that cost 1/3$ each (e.g. if in a
| bundle)?
|
| You never account for fractional discrete items, it makes
| no sense. A bundle is one product, and a split bundle is
| another. For products sold by weight or volume, it's
| usually handled with a unit price, and a fractional
| quantity. That way the continuous values can be rounded but
| money that is accounted for needs not be.
| XorNot wrote:
| The problem is also stupid people and companies.
|
| My last job they wanted me to invoice them hours worked,
| which was some number like 7.6.
|
| This number plays badly when you run it through GST and
| other things - you get repeaters.
|
| So I looked up common practice here, even tried asking
| finance who just said "be exact", and eventually settled on
| that below 1 cent fractions I would round up to the nearest
| cent in my favour for each line item.
|
| First invoice I hand them, they manually tally up all the
| line items and hours, and complain it's over by 55 cents.
|
| So I change it to give rounded line items but straight
| multiplied to the total - and they complain it doesn't
| match.
|
| Finally I just print decimal exact numbers (which are
| occasionally huge) and they stop complaining - because
| excel is now happy the sums match when they keep second
| guessing my invoices.
|
| All of this of course was irrelevant - I still had to put
| hours into their payroll system as well (which they checked
| against) and my contract specifically stated what my day
| rate was to be in lieu of notice.
|
| So how should you do currency? Probably in whatever form
| that matches how finance are using excel, which does it
| wrong.
| hobs wrote:
| I wish this was untrue, but I have spent years hearing
| the words "why dont my reports match?" - no amount of
| logic, diagrams, explaining, the next quarter or instance
| - "why dont my reports match?"
|
| BECAUSE EXCEL SUCKS MY DUDE.
| zdragnar wrote:
| Well, they did say to be exact, and you handed them
| approximations, so...
| mokus wrote:
| The "exact" version they wanted was full of
| approximations too. They just didn't have enough
| numerical literacy to understand how to say how much
| approximation they are ok with.
|
| I guarantee nothing in anyone's time accounting system is
| measured to double-precision accuracy. Or at least, I've
| never quite figured out the knack myself for stopping
| work within a particular 6 picosecond window.
| XorNot wrote:
| Sure, but at the end of the day someone had to pay me an
| integer amount of cents. They wanted a total which was a
| normal dollar figure. But when you sum up 7.6 times
| whatever a whole lot, you _might_ get a nice round number
| or you might get an irrational repeater.
|
| What's notable is clearly no one had actually thought
| this through at a policy level - the answer was "excel
| goes brrrr" depending on how they want to add up and
| subtotal things.
| dralley wrote:
| >[Modern] programming languages have decimal/rational data
| types
|
| This caveat is kind of funny, in light of COBOL having
| support for decimal / fixed precision data types baked
| directly into the language.
|
| It's not a problem with "non-modern" languages, it's a
| problem with C and many of its successors. That's precisely
| why many "non-modern" languages have stuck around so long.
|
| https://medium.com/the-technical-archaeologist/is-cobol-
| hold...
|
| Additionally, mainframes are so strongly optimized for
| hardware-accelerated fixed point decimal computing that for
| a lot of financial calculations it can be legitimately
| difficult to match their performance with standard
| commercial hardware.
| caleb-allen wrote:
| It is quite simple to do the same in Julia
| adwn wrote:
| > _It 's not a problem with "non-modern" languages, it's
| a problem with C and many of its successors._
|
| Not really. Any semi-decent modern language allows the
| creation of custom types which support the desired
| behavior and often some syntactic sugar (like operator
| overloading) to make their usage more natural. Take C++,
| for example, the archetypal "C successor": It's almost
| trivial to define a class which stores a fixed-precision
| number and overload the +, -, *, etc. operators to make
| it as convenient as a built-in type, and put it in
| library. In my book, this is vastly superior to making
| such a type a built-in, because you can never satisfy
| everyone's requirements.
| pjmlp wrote:
| It is also trivial to keep doing C mistakes with a C++
| compiler, hence no matter how many ISO revisions it will
| still have, lack of safety due to C copy-paste
| compatibility will never be fixed.
| adwn wrote:
| > _[...] no matter how many ISO revisions it will still
| have, lack of safety due to C copy-paste compatibility
| will never be fixed._
|
| Okay, no idea how that's relevant to "built-in decimal
| types" vs "library-defined decimal types", but if it
| makes you feel better, you can do the same in Rust or
| Python, two languages which are "modern" compared to
| COBOL, don't inherit C's flaws, and which enable defining
| custom number types/classes/whatever together with
| convenient operator overloading.
| pjmlp wrote:
| Rust I agree, Python not really as the language doesn't
| provide any way to keep invariants.
| adwn wrote:
| > _Python not really as the language doesn 't provide any
| way to keep invariants_
|
| Again, how is that relevant? If there's no way to enforce
| an invariant in _custom data types_ , then there's also
| no way to enforce invariants in _code using built-in data
| types_.
| pjmlp wrote:
| It is surely relevant.
|
| Rust provides the mechanisms to enforce them, while in
| Python, like all dynamic languages, everything is up for
| grabs.
| adwn wrote:
| What I meant [1] was: In Python, invariants are enforced
| by conventions, not by the compiler. If that's not
| suitable for a given use case, then Python is _entirely_
| unsuited for that use case, regardless whether it
| provides built-in decimal types or user-defined decimal
| types. That 's why I said that your objection regarding
| invariant enforcement is irrelevant to this discussion.
|
| [1] (but was to lazy to write out)
| [deleted]
| kyrra wrote:
| To pile on, here's a copy/paste from when this was asked a
| few days ago:
|
| Googler, opinions are my own. Over in payments, we use micros
| regularly, as documented here:
| https://developers.google.com/standard-
| payments/reference/gl...
|
| GCP on there other hand has standardized on unit + nano. They
| use this for money and time. So unit would 1 second or 1
| dollar, then the nano field allows more precision. You can
| see an example here with the unitPrice field:
| https://cloud.google.com/billing/v1/how-tos/catalog-
| api#gett...
|
| Copy/paste the GCP doc portion that is relevant here:
|
| > [UNITS] is the whole units of the amount. For example if
| currencyCode is "USD", then 1 unit is one US dollar.
|
| > [NANOS] is the number of nano (10^-9) units of the amount.
| The value must be between -999,999,999 and +999,999,999
| inclusive. If units is positive, nanos must be positive or
| zero. If units is zero, nanos can be positive, zero, or
| negative. If units is negative, nanos must be negative or
| zero. For example $-1.75 is represented as units=-1 and
| nanos=-750,000,000.
| ronnier wrote:
| In its base unit. So cents in USD. Which can be an int64
|
| Or if your language has something specific built in, use
| that.
| umanwizard wrote:
| Not necessarily. It depends on the application.
| ainar-g wrote:
| > Or if your language has something specific built in, use
| that.
|
| Unless your language is PostgreSQL's dialect of SQL,
| apparently. https://wiki.postgresql.org/wiki/Don%27t_Do_Thi
| s#Don.27t_use...
| pilif wrote:
| It has the same issue that the other suggestion of your
| parent comment had: it can't deal with fractions of
| cents, which is an issue you will most likely run into
| before you will into floating point rounding issues.
| fredros wrote:
| Of course for databases you should use a decimal.
| tzs wrote:
| > In its base unit. So cents in USD. Which can be an int64.
|
| Note that if you use cents in the US so that everything is
| an integer then as long as you do not have to deal with
| amounts that are outside the range [-$180 trillion, $180
| trillion] you can also use double. Double can exactly
| represent all integer numbers of cents in that range.
|
| This may be faster than int64 on some systems, especially
| on systems that do not provide int64 either in hardware or
| in the language runtime so you'd have to do it yourself.
| marcosdumay wrote:
| Each country has a law or something similar that states how
| people should calculate over prices.
|
| The usual is to use decimal numbers with fixed precision (the
| actual precision varies from one country to another), and I
| don't know of any modern exception. But as late as the 90's
| there were non-decimal monetary systems around the world, so
| if you are saving any historic data, you may need something
| more complex.
| umanwizard wrote:
| Depends what you're doing. In fact it's not _always_ wrong to
| use floats for currency. For accounting you should probably
| use a fixed-precision decimal type.
| jacobsenscott wrote:
| If someone asks how to handle money the best answer is
| integers or fixed precision decimals. There may be a valid
| case for using floats, but if someone asks they shouldn't
| be using floats.
|
| Also I'm hard pressed to come up with a case where floats
| would work. Can you give an example?
| umanwizard wrote:
| > Can you give an example?
|
| The answer is the same as _any_ time you should use
| floats: where you don't care about answers being exact,
| either (1) because calculation speed is more important
| than exactness, or (2) because your inputs or
| computations involve uncertainty anyway, so it doesn't
| matter.
|
| This is more likely to be the case in, say, physics than
| it is in finance, but it's not impossible in the latter.
| For example, if you are a hedge fund and some model
| computes "the true price of this financial instrument is
| 214.55", you certainly want to buy if it's being sold for
| 200, and certainly don't if it's being sold for 250, but
| if it's being sold for 214.54, the correct interpretation
| is that _you aren't sure_.
|
| When people say "you should never use floats for
| currency", their error is in thinking that the only
| applications for currency are in accounting, billing, and
| so on. In those applications, one should indeed use a
| decimal type, because we do care about the rounding
| behavior exactly matching human customs.
| tyingq wrote:
| That's fair, though the example code I mentioned is the
| start of an expense tracker.
| umanwizard wrote:
| Fair enough -- in that case, you should definitely use
| either a decimal type or an integer.
| jacobsenscott wrote:
| Good answer. I've only ever worked on accounting style
| financial apps, so I've didn't think of those types of
| cases.
| naniwaduni wrote:
| You can't use a generic decimal type in that case either!
| You need a special-purpose type that rounds exactly
| matching the conventions you're following. This is
| necessarily use-, culture-, and probably currency-
| specific.
| bidirectional wrote:
| Most things in front office use floats in my experience,
| e.g. derivative pricing, discounting, even compound
| interest. None of these things are going to be any better
| with integers or fixed-precision, but maybe harder to
| write and slower.
| stevesimmons wrote:
| Yes, the risk management/instrument pricing part in the
| "Front Office" uses floats, because the calculations
| involve compound interest and discount rates.
|
| And the downstream parts for trade confirmation ("Middle
| Office"), settlement and accounting ("Back Office") used
| fixed precision. Because they are fundamentally
| accounting, which involves adding things up and cross-
| checking totals.
|
| These two parts have a very clear boundary, with strictly
| defined rounding rules when the floating point
| risk/trading values get turned into fixed point
| accounting values.
| lordgilman wrote:
| Integer cents or an arbitrary precision decimal type.
| shagie wrote:
| Having worked on a POS system, the issue of using cents
| alone is if you've got something like "11% rebate" and you
| need to deal with fractional cents.
|
| The arbitrary precision decimal type should be the default
| answer for currency until it is shown that the requirements
| no and at no time in the future will _ever_ require
| fractional units of the smallest denomination.
|
| As an aside, this may be constrained by the systems that
| the data is persisted into too... the Buffett Overflow is a
| real thing ( https://news.ycombinator.com/item?id=27044044
| ).
| _ZeD_ wrote:
| python has the `decimal` module in the stdlib
| tyingq wrote:
| There's no one answer, but decimal counts of the smallest
| unit that needs to be measured is common. Like pennies in the
| US, or maybe "number of 1/10 pennies" if there's things like
| gasoline tax.
| bpicolo wrote:
| You can use integers instead of decimal if you're using the
| smallest unit.
| bencollier49 wrote:
| Say what you like about COBOL, but it got this stuff right.
| bidirectional wrote:
| Every front office finance project I have ever worked on has
| used floating point, so take the dogma with a grain of salt.
| It depends entirely on the context.
| jhugo wrote:
| They probably just accumulate the rounding errors into an
| account and write it off periodically without even
| realising why it happens.
| bidirectional wrote:
| No, it's just that we're in the realm of predictions and
| modelling, not accounting. If you're constructing a curve
| to forecast 50 years of interest rates from a limited set
| of instruments, you're already accepting a margin of
| error orders of magnitude greater than the inaccuracies
| introduced by floating point.
|
| The models also use transcendental functions which cannot
| be accurately calculated with fixed point, rationals,
| integers etc.
| jhugo wrote:
| Makes sense; I wasn't aware of the meaning of "front
| office" as a term of art in finance.
| BruiseLee wrote:
| It's not like decimal or fixed point does not suffer from
| rounding errors either. In fact for many calculations,
| binary floating point gives more accurate answers.
|
| In accounting there are specific rules that require
| decimal system, so one must be very careful with the
| floating point if it is used.
| dirkt wrote:
| And the all suffer from rounding error problems?
|
| I mean, fixed point and a specific type for currency (which
| also should include the denomination, while we are at it)
| are not rocket science. Spreadsheets get that right, at
| least.
| bidirectional wrote:
| Excel uses IEEE-754 floating point, so I don't get what
| you mean with the spreadsheet comment. It has formatting
| around this which rounds and adds currency symbols, but
| it's floating point you're working with.
|
| Rounding error doesn't matter on these types of financial
| applications. It's the less glamorous accounting work
| that has to bother with that.
|
| They're not rocket science, but they're unnecessary, and
| would still be off anyway. Try and calculate compound
| interest with your fixed point numbers.
| dragonwriter wrote:
| > Dumb question, but what is the proper way to handle
| currency?
|
| In python, for exact applications (not many kinds of
| modeling, where floats are probably right), decimal.Decimal
| is usually the right answer, but fractions.Fraction is
| sometimes more appropriate, and if you are using NumPy or
| tools dependent on it, using integers (representing decimals
| multiplied by the right power of 10 to get the minimum unit
| in the ones position) is probably better.
| trevor-e wrote:
| Someone already mentioned there's a `decimal` package in
| Python that's better suited for currency. Back when I was a
| Java developer we used this: https://docs.oracle.com/javase/7
| /docs/api/java/math/BigDecim...
| kkirsche wrote:
| The Decimal class is one way if you roll your own. py-moneyed
| seems to be a well maintained library though I haven't used
| it.
|
| Disclaimer: I only work with currency in hobby projects.
| thayne wrote:
| An integer of the smallest denomination. For example, cents
| for the American dollar. And you probably would want to wrap
| it in a custom type to simplify displaying it properly, and
| maybe handle different currencies. If you language has a
| fixed point type that might also be appropriate, but that's
| pretty rare, and wouldn't work for currencies that aren't
| decimal (like the old british pound system).
| biztos wrote:
| Do they still use fractional cents (or whatever) in
| finance?
|
| https://money.howstuffworks.com/personal-
| finance/financial-p...
| TchoBeer wrote:
| What if I'm calculating sales tax? Can't use an integer
| anymore.
| kyllo wrote:
| Yes, you can. There are algorithms for rounding up,
| rounding down, rounding to nearest, and banker's
| rounding, on the results of integer division. This is a
| solved problem.
| eulers_secret wrote:
| I haven't seen anyone mention this issue for some reason, but
| in fetch_tweets.py:
| fetch_tweets_from_user(user_name): ...
| tweets = api.user_timeline(screen_name=user, count=200,
| include_rts=False)
|
| 'user' isn't defined, should be user_name, right? Side note,
| 'copilot' is a decent name for this (though copilots are
| usually very competent, moreso than this right now). You _must_
| check the suggestions carefully. Maybe it 'll make folks better
| at code review, lol.
| rjknight wrote:
| > This product seems like it needs some more thought. Maybe a
| way to comment, flag, or otherwise call out bad output?
|
| Wait for your colleagues to use it, fix the bad code in the
| pull request, and wait for copilot to learn from the new
| training data you just provided!
| more_corn wrote:
| This is actually a good idea that is missing from nearly
| every machine learning product. How do you back propagate
| lessons from user interaction into future training of the
| model? It can be done, I can't think of a place I've seen it
| done though.
| adriancr wrote:
| It would be viewed as IP theft by most companies to upload
| private code to this for use by others
| dgb23 wrote:
| It would have to be in the same range of what is
| suggested, small patches and opt in.
|
| If snippets are a legal problem, then Copilot is
| problematic by default, since it suggests code that may
| or may not be sourced from free software.
| adriancr wrote:
| Even free software snippets have clauses like GPL or
| attribution.
|
| Putting GPL code in proprietary codebase would cause a
| company massive headaches...
|
| So I agree copilot is problematic by default, liability
| to lawsuits for employers and forced open sourcing,
| liability to IP lawsuits as well which will end up on
| employees shoulders.
| TeMPOraL wrote:
| It's tricky, because once you start accepting user
| feedback, you need to _moderate_ it, or else someone will
| poison your model for fun and profit.
| giantg2 wrote:
| But what about all the bad training data provided too?
| amelius wrote:
| What are the statistics of Copilot based on a validation set?
| How often does it get code right?
|
| I want to see hard statistics, not 4 hand-picked examples.
| rubatuga wrote:
| Yeah. And like how would you even devise a metric? Like
| compile it down to assembly and see if it's similar logic?
| amelius wrote:
| Well, this is the question which the producers of the tool
| should answer.
|
| You can't just release a ML tool onto the public if you
| haven't validated it first.
| mnky9800n wrote:
| That's what I thought when I first started working in text
| generation too. It's highly annoying people pitch their
| successful models with hand picked examples. It's literally
| the opposite of STATISTICAL learning imo.
| foobiekr wrote:
| Copilot appears to be "give more efficiency leverage to the
| worst kind of coder."
| codyb wrote:
| Hmm... I mean, these all seem like mistakes I could make and
| I don't think I'm the "worst kind of coder".
|
| The currency one I learned a while back, but it's not like I
| intuited using integers by default.
|
| Value being a reserved keyword, I'm not sure I'd know that
| and I do Postgres work as part of my myriad duties at the
| startup I work at. Maybe I'd make that mistake in a
| migration, maybe I have already.
|
| In a way, is it much different then what we do now as
| engineers? I'm hard pressed to call it much of an engineering
| discipline considering most teams I work on barely do design
| reviews before they launch in to writing code, documentation
| and meeting minutes are generally an afterthought, and the
| code review process while decent isn't perfect either and
| often times relies on arcane knowledge derived over months
| and years of wrangling with particular <framework, project,
| technology>.
|
| It's pretty neat, presumably it'll learn as people correct
| it, and it'll get better over time. I mean it's not even
| version one.
|
| I get the concerns, but I think they're a bit overblown, and
| this'll be really useful for people who want to learn how to
| code. Sure they'll run into some bugs, but, I mean, they were
| going to do that anyways.
| voakbasda wrote:
| Is this any worse? Maybe not. Is it better? Absolutely not.
|
| This kind of tool will only further entrench the production
| of mediocre, bug-ridden code that plagues the world. As
| implemented, this will not be a solution; it is a express
| lane in the race to the bottom.
| pasquinelli wrote:
| it _is_ a race to the bottom, and people are trying to
| win. any skilled trade is being turned into an unskilled
| job. it might suck, the results might suck, but it 's
| more profitable, and that's what matters.
| ticviking wrote:
| I'm not really sure that type of tool could really be
| anything else.
|
| How would a model become aware of all of the various edge
| cases that depend on which SQL database you use or
| differences in language versions over time?
| sbr464 wrote:
| Can it submit pull requests to itself with if/else boolean
| logic/hacks?
| gmadsen wrote:
| a large data set covering exactly what you just mentioned?
| TeMPOraL wrote:
| > _I 'm not really sure that type of tool could really be
| anything else._
|
| It can't be, because they've chosen to use a deep learning
| approach. That makes it a dead end right from the start.
|
| > _How would a model become aware of all of the various
| edge cases that depend on which SQL database you use or
| differences in language versions over time?_
|
| A lot of things that we call "edge cases" are only a
| problem for humans. They're not "edge cases" from the point
| of view of the grammar / semantics of programming languages
| and libraries. The way a hypothetical, better Copilot could
| work, is by having directly encoded grammars and semantics
| metadata corresponding to popular languages and tools. It
| could generate code in principled and introspectable way,
| by having a model of the computation it wants to express
| and encoding it in a target language.
|
| Of course, such hypothetical Copilot is a harder task -
| someone would have to come up with a structure for
| explicitly representing understanding of the abstract
| computation the user wants to happen, and then translate
| user input into that structure. That's a lot of drudgery,
| and from my vague understanding of the "classical" AI
| space, there might be a bunch of unsolved problems on the
| way.
|
| Real Copilot uses DNNs, because they let you ignore all
| that - you just keep shoving code at it, until the black-
| box model starts to give you mostly correct answers. The
| hard work is done automagically. It makes sense for some
| tasks, less for others - and I think code generation is one
| of those things where black-box DNNs are a bad idea.
| jhgb wrote:
| > The way a hypothetical, better Copilot could work, is
| by having directly encoded grammars and semantics
| metadata corresponding to popular languages and tools. It
| could generate code in principled and introspectable way,
| by having a model of the computation it wants to express
| and encoding it in a target language.
|
| But that sounds like too much work, let's just throw a
| lot of data into an NN and see what comes out! /s
|
| > and introspectable
|
| Which most importantly means "debuggable", I assume. From
| what I get there doesn't seem to be any way to ad-hoc fix
| an NN's output.
| heroHACK17 wrote:
| This is my thought as well. I get the "make productive
| engineers even more productive" angle, but productive
| engineers' bottleneck isn't coding. Sure, coding up a
| boilerplate Go web server is tedious, but I have done it so
| many times that it takes me two seconds now.
|
| On the flip side, coding can be the bottleneck for the worst
| kind of coder. When I first started coding, coding was hard
| simply because I had very little reps and was just learning
| to understand how to code common solutions, data structures,
| libraries, etc. Fast forward a few years and, if I were still
| struggling to understand these concepts, Copilot is a
| lifeline.
| hamandcheese wrote:
| I'm gonna have to disagree - coding can and does take
| significant amounts of time even when I know exactly what
| problem I am solving.
|
| I admit that at many organizations there are so many other
| factors and bottlenecks, but it's not uncommon that I find
| myself 8+ hours deep into a coding task that I had expected
| would be much shorter.
|
| On the other hand, usually that's due to refactoring or
| otherwise not being satisfied with the quality of my
| initial solution, so copilot probably wouldn't help...
| captn3m0 wrote:
| I find it is reducing my research time by providing a decent
| starting solution space. Especially for boring stuff where
| you just need to google the signature of some standard
| library function.
| majormajor wrote:
| It takes what should be your method of last resort -
| copypaste - and makes it the first thing you try.
|
| All the steps in between - looking at the docstring for the
| function you're calling, googling for more general
| information, looking at and _deciding not to use_ not-
| applicable or poorly-written SO answers - get pushed aside.
| So instead of you having to convince yourself "yes, it's
| safe to copy-paste these lines from SO, they actually fit my
| problem" you're presented with magic and I think the burden
| for rejecting it is going to be higher once it's in your
| editor than when you're just reading it on a SO post or
| Github snippet.
|
| Even for a newcomer looking to learn, working on simple stuff
| that it has great completions for, it seems like it will
| sabotage your long-term growth, since it takes all the _why_
| and the reasoning out of it. Autocomplete for a function name
| isn 't that relevant to gaining a deeper understanding.
| Knowing _why_ a certain block of code is passed in in a
| certain style, or needs to be written at all? Probably that
| is.
| majormajor wrote:
| Thinking about it more: there's a very small subset of
| problems that I think this is actually great for. And I do
| run into this somewhat often: relatively new libraries or
| frameworks that don't really care about thorough
| documentation so they only show you a few happy path
| snippets and nothing about how to do something more
| interesting, so you have to bridge the gap between "this
| one line in the doc obviously doesn't work with me, but I'd
| like to figure it out without reading all their source code
| from scratch..." - getting more example snippets barfed up
| onto my screen from other people who've figured it out
| before could be a sort of replacement for the library
| writers having provided documentation in the first place.
| But ... this is a somewhat insane way to work around a
| problem of shitty code documentation, and is still
| insufficient in a couple ways:
|
| * some poor bastard is going to have to be the first person
| to figure out how to do something, so that copilot itself
| can know
|
| * any non-code nuances around "oh, if you do that, your
| memory usage is going to explode" or "oh, by the way, if
| you do that, make sure you don't do your own threading"
| will still fail to be communicated.
| groby_b wrote:
| I've called it "The Fully Mechanized Stack Overflow Brigade"
| before, and everything that comes to light supports that
| assessment.
|
| On the upside, think of the consultancy fees you can charge
| to clean up those messes.
| bierjunge wrote:
| The Golang example would not even compile, because `sql` is not
| imported.
| IncRnd wrote:
| That's for the best. We don't want products that pretend to
| write code for us, while copying other's code without
| attribution and that may not even work.
| stevelosh wrote:
| The golang one also silently drops rows.Err() on the floor.
|
| https://golang.org/pkg/database/sql/#Rows
| jacurtis wrote:
| > The ruby one isn't outright terrible, but shows a very
| Americanized way to do street addresses that would probably
| become a problem later.
|
| As someone who has been coding up address storage and
| validation for the past week in my current job, that one really
| made me laugh. Mostly because it tries to simplify all the
| stuff I have been analyzing and mulling over for a week into a
| single auto-complete.
|
| Spoiler: The Github Copilot's solution simply won't work. It
| would barely work for Americanized addresses, but even then not
| be ideal. Of course trying to internationalize it, this thing
| isn't even close.
|
| I get what Copilot is trying to do. But at the same time I
| don't get it. Because from my experience, typing code is the
| fastest part of my job. I don't really have a problem typing. I
| spend most of my time thinking about the problem, how to solve
| it, and considering ramifications of my decisions before ever
| putting code in the IDE. So Copilot comes around and it
| autocompletes code for me. But I still have to read what it
| suggested, making edits to it, and consider if this is solving
| the problem appropriately. I'm still doing everything I used to
| do, except it saved me from typing out a block of code
| initially. I still have to most likely rebuild, edit, or change
| the function somewhat. So it just saves me from typing that
| first pass. Well that's the easy part of the job.
|
| I have never had a manager come to me and ask why a project is
| taking so long where I could answer "it just takes so long to
| type out the code, i wish I had a copilot that could type it
| for me". That's why we call it software engineering and not
| coding. Coding is easy. Software engineering is hard. Github
| Copilot helps with coding, but doesn't help with Software
| Engineering.
| reaperducer wrote:
| _I spend most of my time thinking about the problem, how to
| solve it, and considering ramifications of my decisions
| before ever putting code in the IDE. So Copilot comes around
| and it autocompletes code for me. But I still have to read
| what it suggested, making edits to it, and consider if this
| is solving the problem appropriately._
|
| So, rather than helping people program better, all its done
| is replace a bunch of the offshore cut-and-paste shops with
| "AI."
| neutronicus wrote:
| A lot of my job is thinking hard about how to do [X],
| incidentally needing to remember how to do [trivial thing Y]
| and looking it up.
|
| Like, I did it before, remember that it was trivial, I just
| forget the snippet and I have to break focus to look it up -
| often by scrolling through my own commit history to try and
| find the time I did [trivial thing Y] four months ago.
|
| I do kind of wish I could automate that. Skipping the actual
| typing of the snippet is sort of gravy on top of that.
| nitrogen wrote:
| _It would be nice if there were a way to automate the
| "remembering what that one function is called and what
| order the parameters are in" portion of my job._
|
| IME the best thing for this is looking at the method
| listing in the docs for the classes I'm using. E.g. for
| Ruby, it's usually looking at the methods in Enumerable,
| Enumerator, Array, or Hash. Or I'll drop a _binding.pry_
| into the function, run it, and then type _ls_ to see what
| 's in scope.
| greyfox wrote:
| this sounds super interesting, is there a video or upload
| somewhere that i can watch this being performed in real
| time?
| nitrogen wrote:
| I very briefly show some of the interactivity of Ruby+Pry
| here: https://youtu.be/Gy7l_u5G928?t=805 (the overall
| code segment starts at
| https://www.youtube.com/watch?v=Gy7l_u5G928&t=626s)
|
| I'd be happy to hear about better demonstrations, and
| there's also Pry's website (https://pry.github.io/) where
| they link to some screencasts.
| shados wrote:
| Even in the 90s that was a solved problem in Visual Basic
| with autocomplete. That a lot of dev environments "lost"
| the ability to do it is mind boggling. With that said,
| doesn't Rubymine let you do that with autocomplete with
| the prompt giving you all the info you need? (I haven't
| done Ruby in a long time).
|
| Still, having to look up the doc or run the code to
| figure out how to type it is orders of magnitude slower
| than proper auto complete (be it old school Visual Studio
| style, or something like Copilot).
| nitrogen wrote:
| _orders of magnitude slower than proper auto complete_
|
| Having worked extensively with verbose but autocomplete-
| able languages like Java, compact dynamic languages like
| Ruby, and a variety of others including C, Scala, and
| Kotlin, I've come to the conclusion that, for me,
| autocomplete is a crutch and I develop deeper
| understanding and greater capabilities when I go to the
| docs. IDE+Java encourages sprawl, which just further
| cements the need for an IDE. Vim+Ruby+FZF+ripgrep+REPL
| encourages me to design code that can be navigated
| without an IDE, which ultimately results in cleaner
| designs.
|
| If there's _any_ lag whatsoever in the autocomplete, it
| breaks my flow state as well. I can maintain flow better
| when typing out code than when it just pops into being
| after some hundreds of milliseconds delay. Plus, there 's
| always the chance for serendipity when reading docs. The
| docs were written by the language creators for a reason.
| Every dev should be visiting them often.
| shados wrote:
| That's totally cool but the grandparent was talking about
| remembering shit they already knew. Not everyone has a
| fantastic memory, and remember the arguments are A then B
| or B then A doesn't deepen your understanding of a
| language. Most of the time the autocomplete and the
| official doc use the exact same source anyway, formatted
| the same way, with the same info.
|
| But if it works for you, more power to you!
| lwhi wrote:
| >Because from my experience, typing code is the fastest part
| of my job. I don't really have a problem typing. I spend most
| of my time thinking about the problem, how to solve it, and
| considering ramifications of my decisions before ever putting
| code in the ID
|
| So very true.
|
| [1] Understanding the problem > [2] thinking about all
| possible solutions > [3] working out which solution fits best
| > [4] working out which implementations are possible > [5]
| working out the most suitable implementation
|
| ... and finally, [6] implementing via code.
| ph0rque wrote:
| > I spend most of my time thinking about the problem, how to
| solve it...
|
| A few years ago, I got a small but painful cut on my
| fingertip. I thought I would have a hard time on the job as a
| dev. To my surprise, I realized I spend 90-95% of my time
| thinking, and only 5-10% of the time typing. It turned out to
| be almost a non-issue.
| shados wrote:
| > I don't really have a problem typing.
|
| Im absolutely with you and want to upvote that part of the
| comment x100. Unfortunately it's often considered a fairly
| spicy opinions.
|
| Entire frameworks (Rails) are built around the idea of typing
| as little as possible. Others can't even be mentioned without
| the topic of boilerplate/keystroke count causing a flame war
| (Redux).
|
| A lot of engineers equate their value with the amount of
| lines they can pump out, so there's definitely a demand for
| tools like these.
|
| There's also some legitimate stuff. There's a lot of very
| silly thing I have to google every time I do because I have a
| bad memory. It saves the step of googling. In a way, it was
| the same debate around autocomplete at the very beginning,
| but pushed to the next level. Autocomplete turned out to be a
| very good thing (even though new languages and tools keep
| coming out without it).
| theshadowknows wrote:
| I never commit something that I can easily google (with a
| high quality solution) to memory
| [deleted]
| amluto wrote:
| As the owner of a fairly normal American address that is
| either corrupted by the UPS address validation service, this
| is a good time to remind everyone: accept the address that
| your customer enters. If you offer a service to try to
| improve your customer's address, keep in mind that it's a
| value added service, it may be wrong, and you MUST test the
| flow in which your customer tells your service to accept the
| address as entered. And maybe even collect examples in which
| the address change is accepted to make sure it does something
| useful.
|
| Vendors have lost sales to me because they were too
| incompetent to allow me to ship things to my actual address.
| Oops.
|
| P.S. for the US, you need to offer at least two lines for the
| address part. And you need to accept really weird things that
| don't seem to parse at all. I know people with addresses that
| have a PO Box number and a PMB number _in the same address_.
| Lose one and your mail gets lost.
|
| P.P.S. If you offer discounted shipping using something like
| SurePost, make sure you let your customers pay a bit extra to
| use a real carrier. There are addresses that are USPS-only
| and there are addresses that work for all carriers except
| USPS (and SurePost, etc). Let your customer tell you how to
| ship to them. Do not second-guess your customer.
| JamesAdir wrote:
| Isn't address storage and validation a solved problem? Why is
| it so complicated?
| mgsouth wrote:
| Ex:
|
| 412 1/2 E E NE
|
| 412 1/2 A E
|
| 1E MAIN
|
| 1 E MAIN
|
| FOO & BAR
|
| 123 ADAM WEST RD
|
| 123 ADAM WEST
|
| 123 EAST WEST
| ezfe wrote:
| You are right that USPS maintains a database of canonical
| delivery points. However, it's inevitable this database
| might not be correct or up to date.
|
| If you don't want to validate, then yes addresses are just
| a series of text fields. However, mapping them to that
| delivery point is where the problems arise.
| gameswithgo wrote:
| post process with some language aware heuristics maybe
| amelius wrote:
| Co-pilot fixes the wrong problem.
|
| It should be a tool capable of one-shot learning.
|
| I.e., I'm in the middle of a refactoring operation and have to do
| lots of repetitive work; the tool should help me by understanding
| what I'm trying to do after I give it 1 example.
| marcodiego wrote:
| Now, consider quake is GPL'ed. Any proprietary software using
| such code will have to bow to the license terms.
| anyonecancode wrote:
| I think copilot is solving the wrong problem. A future of
| programming where we're higher up the abstraction tree is
| absolutely something I want to see. I am taking advantage of that
| right now -- I'm a decently good programmer, in the sense that I
| can write useful, robust, reliable software, but I'm pretty high
| up the stack, working in languages like Java or even higher up
| the stack that free me from worrying about the fine details of
| memory allocation or the particular architecture of the hardware
| my code is running on.
|
| Copilot is NOT a shift up the abstraction tree. Over the last few
| years, though, I've realized the the concept of typing is. Typed
| programming is becoming more popular and prominent beyond just
| traditional "typed" languages -- see TypeScript in JS land,
| Sorbet in Ruby, type hinting in Python, etc. This is where I can
| see the future of programming being realized. An expressive type
| system lets you encode valid data and even valid logic so that
| the "building blocks" of your program are now bigger and more
| abstract and reliable. Declarative "parse don't validate"[1] is
| where we're eventually headed, IMO.
|
| An AI that can help us to both _create_ new, useful types, and
| then help us _choose_ the best type, would be super helpful. I
| believe that's beyond the current abilities of AI, but can
| imagine that in the future. And that would be amazing, as it
| would then truly be moving us up the abstraction tree in the same
| way that, for instance, garbage collection has done.
|
| [1] https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-
| va...
| shadowgovt wrote:
| A taller abstraction tree makes tradeoffs of specialization:
| the deeper the abstractions, the more one has to understand
| when the abstractions break or when one chooses to use them in
| novel ways.
|
| This is something I'm interested in regarding this approach...
| When it works as intended, it's basically shortening the loop
| in the dev's brain from idea to code-on-screen _without_ adding
| an abstraction layer that someone has to understand in the
| future to interpret the code. The result is lower density, so
| it might take longer to read... Except what we know about
| linguistics suggests there 's a balance between density and
| redundancy for interpreting information (i.e. the bottleneck
| may not be consuming characters, but fitting the consumed data
| into a usable mental model).
|
| I think the jury's out on whether something like this or the
| approach of dozens of DSLs and problem-domain-shifting
| abstractions will ultimately result in either more robust or
| more quickly-written code.
|
| But on the topic of types, I'm right there with you, and I
| think a copilot for a dense type forest (i.e. something that
| sees you writing a {name: string; address: string} struct and
| says "Do you want to use MailerInfo here?") would be pretty
| snazzy.
| krick wrote:
| Yeah, but generating tons of stupid verbose code that nobody
| will be able to read and understand is more fun. Also, your
| superiors will be sure you are a valuable worker if you write
| more code.
| mkl95 wrote:
| Copilot is one of the worst ideas that have made it to production
| in recent years. I predict it will be quite successful
| considering Microsoft's track record.
| dempsey wrote:
| I've always wondered this about the realistic photo generators.
| How do we know they're generating new faces and not just
| regurgitating ingested faces?
| antpls wrote:
| One has to admit, Copilot raises many questions regarding global
| code quality, reviewing processes and copyright. It's a marketing
| success.
| qayxc wrote:
| Honestly, I see this exact issue as the main accomplishment of
| Copilot. It shows that the black-box machines are to be
| considered harmful and are incompatible with the current
| intellectual property and privacy frameworks.
|
| This issue goes way beyond just code - imagine GPT-like systems
| being used in medical diagnosis and results can suddenly depend
| on the date of the CT-scan or the name of patient, because the
| black-box simply regurgitates training data...
| [deleted]
| nightowl_games wrote:
| Almost feels like a developer cultural thing to hate on something
| like this. If you dont like it, dont use it. If you dont want
| your team using it, become senior and then set the rules.
|
| Kinda seems like maybe there's some level of insecurity at play
| here in the criticism. Like a "I coulda came up with that but its
| a bad idea" type of hater philosophy.
| ChrisMarshallNY wrote:
| I've always assumed that we would eventually have a low-code, or
| no-code junior dev replacement, and was wondering if this was it.
| GH and MS actually have _[Ed. had?]_ some cred for this kind of
| thing.
|
| Nope. Game over. Play again?
| marcosdumay wrote:
| Most low-code and no-code platforms go for junior dev
| empowerment, and senior dev replacement. This one also seems to
| be aimed at empowering juniors, but looks like it missed the
| senior replacement by miles.
| [deleted]
| jgilias wrote:
| Copying GPLed code as your own and passing it under an MIT
| license is not too far fetched of a thing for a junior dev to
| do.
|
| Jokes aside, to have a proper junior dev replacement you need
| something that is able to learn and grow to eventually become a
| senior dev, an architect, or a CTO. That is the most important
| value of a junior dev. Not the ability to produce subpar code.
| ChrisMarshallNY wrote:
| Depends on who you ask.
|
| I think a lot of modern software development shops, these
| days, exist only to make their founder[s] as rich as
| possible, as quickly as possible.
|
| If they are willing to commit their entire future to a
| lowest-bid outsourcing shop, then I don't think they are too
| concerned about playing the long game.
|
| Also, the software development industry, as an aggregate, has
| established a pervasive culture, based around developers
| staying at companies for 18-month stints. I don't think many
| companies feel it's to their advantage to incubate people who
| will bail out, as soon as they feel they have greener
| pastures, elsewhere.
| abeppu wrote:
| I may be over-reading, but I think this kind of example not only
| demonstrates the pragmatic legal issues, but also the fundamental
| weaknesses of a solely text-oriented approach to suggesting code.
| It doesn't really seem to have a representation of the problem
| being solved, or the relationship between things it generates and
| such a goal. This is not surprising in a tool which claims to
| work at least a little for almost all languages (i.e. which isn't
| built around any firm concept of the language's semantics).
|
| I'd be much more excited by (and less unnerved by) a tool which
| brought program synthesis into our IDEs, with at least a partial
| description of intended behavior, especially if searching within
| larger program spaces could be improved with ML. E.g. here's an
| academic tool from last year which I would love to see
| productionized. https://www.youtube.com/watch?v=QF9KtSwtiQQ
| computerex wrote:
| I think it's pretty clear that program synthesis good enough to
| replace programmers requires AGI.
|
| This solely text based approach is simply "easy" to do, and
| that's why we see it. I think it's cool and results are
| intriguing but the approach is fundamentally weak and IMO
| breakthroughs are needed to truly solve the problem of program
| synthesis.
| whimsicalism wrote:
| > fundamental weaknesses of a solely text-oriented approach to
| suggesting code.
|
| I don't think it is clear that such "fundamental weaknesses"
| exist. A text-based approach can get you incredibly far.
| abeppu wrote:
| I mean, the cases where it tries to assign copyright to
| another person in a different year highlights that context
| other than the other text in the file is semantically
| extremely important, and not considered by this approach.
| Merely generating text which looks appropriate to the model
| given surrounding text is ... misguided?
|
| If you think about it, program synthesis is one of the few
| problems in which the system can have a perfectly faithful
| model dynamics of the problem domain. It can run any
| candidate it generates. It can examine the program graph. It
| can look at what parts of the environment were changed. To
| leave all that on the table in favor of blurting out text
| that seems to go with other text is like the toddler who
| knows that "five" comes after "four", but who cannot yet
| point to the pile of four candies. You gotta know the
| referents, not just the symbols. No one wants a half-broken
| Chinese Room.
| whimsicalism wrote:
| > generating text which looks appropriate to the model
| given surrounding text is ... misguided?
|
| Agreed - it represents a failure to adequately
| model/understand the task, but I don't think it is a
| "fundamental weakness" of text-based 'Chinese room'
| approaches.
|
| > You gotta know the referents, not just the symbols. No
| one wants a half-broken Chinese Room.
|
| "Knowing the referents" is not at all clearly defined. It's
| totally possible that, under the constraint of optimizing
| for next-word prediction, the model could develop an
| understanding of what the referents are.
|
| You can't underestimate the level of complex behavior
| emerging from a big enough system under optimization. After
| all, all the crazy stuff we do - coding, art, etc. is
| produced by a system under evolutionary optimization
| pressure to make more of itself.
| abeppu wrote:
| > "Knowing the referents" is not at all clearly defined.
| It's totally possible that, under the constraint of
| optimizing for next-word prediction, the model could
| develop an understanding of what the referents are.
|
| Well, in this case, it would have been good to understand
| that "V. Petkov" is a person unrelated to the project
| being written, and that "2015" is a year and not the one
| we're currently in. Sometimes the referent will be a
| method defined in an external library, which perhaps has
| a signature, and constraints about inputs, or properties
| which apply to return values.
|
| > You can't underestimate the level of complex behavior
| emerging from a big enough system under optimization.
| After all, all the crazy stuff we do - coding, art, etc.
| is produced by a system under evolutionary optimization
| pressure to make more of itself.
|
| I think this can verge into a kind of magical thinking.
| Yes, humans also look like neural nets, and we might even
| be optimizing for something. But we learn to program (and
| we do our best job programming) by having a goal for
| program behavior, and we use interactive access to try to
| run something, get an error, set a break point, try
| again, etc. I challenge anyone to try to learn to "code"
| by never being given any specific tasks, never
| interacting with docs about the language, an interpreter,
| a compiler, etc, but merely to try to fill in the blank
| in paper code snippets. You might learn to fill in some
| blanks. I highly doubt you would learn to code.
|
| This is totally a case where the textual representation
| of programs is easier to get and train against, and that
| tail is being allowed to wag the dog to frame both the
| problem and the product.
|
| None of this is to say that high-bandwidth DNN approaches
| don't have a place here -- but I think we should be
| looking at language-specific models where the DNN
| receives information about context (including some
| partial description of behavior) and outputs of the DNN
| are something like the weights in a PCFG that is used in
| the program search.
| guhayun wrote:
| Copilot NEEDs to be trained on licensed code,so that it doesn't
| produce them
| _tom_ wrote:
| I'm reminded of the old saying:
|
| The best person to have on your team is a productive, high-
| quality coder.
|
| The worst is a productive, low-quality coder.
|
| Copilot looks like it would give us more of the latter.
| ezoe wrote:
| Similar story.
|
| He tried to write Quine in Ruby, ended up conjuring up a
| copyright claim comment and fake licensing term.
| https://twitter.com/mametter/status/1410459840309125121
| gok wrote:
| Quake and GitHub are both owned by Microsoft now, perhaps we can
| assume this is relicense?
| Jyaif wrote:
| Wow, Quake _is_ owned by Microsoft. This is mind blowing, and a
| little sad.
| josefx wrote:
| It belongs to id software -> ZeniMax -> Xbox Game Studios ->
| Microsoft.
| johndough wrote:
| Is it possible that Copilot just put Quake's source code into
| the public domain?
|
| From the Copilot FAQ: Who owns the code
| GitHub Copilot helps me write? GitHub Copilot is a
| tool, like a compiler or a pen. The suggestions GitHub
| Copilot generates, and the code you write with its help, belong
| to you, and you are responsible for it. We recommend
| that you carefully test, review, and vet the code, as you would
| with any code you write yourself.
|
| Copilot can probably recite most of Quake's source code and
| according to the FAQ, the output of Copilot belongs to the
| user.
|
| I think a point where this argumentation might fail is that
| Quake's source code does not belong to Github directly, but
| instead both Github and Quake belong to Microsoft. However, I
| am not a lawyer, so I might be wrong.
| [deleted]
| sdevonoes wrote:
| Not a problem. Just don't use Copilot :)
| hi41 wrote:
| I saw the gif on Twitter. Sorry, I am not able to understand what
| is going on. Is copilot a character in the Quake game?
| dpassens wrote:
| Copilot seems to be an AI tool to generate code for you[0]. In
| the gif, it's copying code from Quake, which is GPLv2 or later.
| If copying GPLed code wasn't bad enough, it then adds a MIT-
| like license header.
|
| [0] https://copilot.github.com/
| yk wrote:
| So, what would happen if I train a neural network to recreate
| Disney movies?
| avaldes wrote:
| Isn't already?
| oiu45hunegn wrote:
| This reminds me of an issue that came up when I was working with
| a intelligence agency, training machine translation.
|
| If you think about language in general, individual words aren't
| very sensitive. The word for bomb in any language is public
| knowledge. But when you start getting to jargony phrases, some
| might be unique to an organization. And if you're training your
| MT on translated documents surreptitiously intercepted from West
| Nordistan's nuclear program, and make your MT model public, the
| West Nordistanis might notice - "hey, this accurately translates
| our non-public documents that contain rather novel phrases ... I
| think someone's been listening to us!"
| dredmorbius wrote:
| Backstory?
|
| WTF is "Copilot"?
| gregsadetsky wrote:
| It's a new product launched by GitHub in association with
| OpenAI
|
| https://news.ycombinator.com/item?id=27676266
| loloquwowndueo wrote:
| "Your AI pair programmer" - auto-completes entire functions
| while you're coding. https://copilot.github.com/
| dredmorbius wrote:
| Thanks.
| klohto wrote:
| I'm really dumbfounded by the Copilot team decision to not
| exclude GPL licensed code.
|
| Why was this direction chosen? Is the inclusion of GPL really
| worth the risk and potential Google v. Oracle lawsuit? I'd like
| to know the reasoning.
| throwaway287391 wrote:
| Isn't it entirely possible that they _did_ exclude GPL licensed
| code, but somebody somewhere has violated copyright and copy-
| pasted that snippet into non-GPL-licensed code that they
| trained on?
|
| They could try to trace every single code snippet they train on
| to its "true source" and use the license for that, but that's
| not very well-defined, and is a lot harder, and it's never
| going to be 100%.
| another-dave wrote:
| Which raises another question: ideally Copilot wouldn't be
| trained on "somebody somwhere", but is that happening?
|
| To use the old trope -- if the majority of programmers can't
| implement Fizzbuzz, but they do have a Github profile, are
| they being included too?
|
| Hopefully there's some quality bar for the training set, i.e.
| some subset of "good" code (e.g. release candidate tags from
| fairly established OSS tools/frameworks in different
| languages) rather than any old code on the internet.
| ttt0 wrote:
| Nope. They did include GPL code.
|
| > Once, GitHub Copilot suggested starting an empty file with
| something it had even seen more than a whopping 700,000
| different times during training -- that was the GNU General
| Public License.
|
| https://docs.github.com/en/github/copilot/research-
| recitatio...
| pessimizer wrote:
| Looks like Copilot is smart enough to understand its own
| licensing situation. It should continue to suggest this for
| any empty file.
| goodpoint wrote:
| Apache / MIT / BSD all have restrictions e.g. attribution
| clause.
|
| Excluding GPL does not solve the problem.
| Anon1096 wrote:
| Why would excluding GPL'd code be enough to not violate
| licenses? I don't understand why people think MIT or other
| licenses are free for alls to take code as they wish. The MIT
| license includes an attribution clause. And, as the linked
| video shows, Copilot is more than happy to take its code and
| put your pet license and copyright notice on instead. Isn't
| that equally as infringing as stealing GPL code? The idea of
| mining GitHub for training data was doomed from the start
| copyright-wise, as there's so much code that's misattributed,
| wrongly-licensed, or unlicensed.
| NavinF wrote:
| Has anyone ever been sued IRL for using MIT/Apache/... code?
| Or are we stuck in imaginary land where this is something to
| be worried about?
|
| Btw the GPLv2 death penalty is rather unique and I don't
| think anyone will deny that including GPL code in proprietary
| code is a hell of a lot worse in every way (liability,
| ethically, etc) than including permissively licenced code and
| forgetting to attribute it
| ghaff wrote:
| At some level though, this suggests that the only way to be
| safe if you're writing a program (outside of a Copilot
| context) is probably simply not to look at GitHub (or maybe
| Stack Overflow and other code sources) except for, perhaps,
| using properly attributed entire functions. If you take a
| couple lines of code and tweak it a bit are you now required
| to attach copyright attribution? IANAL, but I'm guessing not.
| aj3 wrote:
| Copilot is a tool. If you take copilot's suggestions
| uncritically and push them to Github, - that's on you.
| croes wrote:
| Yeah, because I always check the code of my programming
| partner for license violations.
|
| That's more Trainee than Copilot.
| aj3 wrote:
| If you use it as a programming partner it will simply
| autofill whatever you're writing line-by-line. You're not
| forced to use code completion at a whole-function level
| and it's not even the suggested use-case.
| GhostVII wrote:
| Sure but if you have to audit every suggestion to see if it
| violates copyright laws that's not a particularly useful
| tool.
| aj3 wrote:
| Depends. If you find useful code on Github, Stack
| Overflow or anywhere else in the internet, you still need
| to check whether it is suitable with your licensing or
| not.
| TeMPOraL wrote:
| If you find useful code on Github or StackOverflow, you
| can check for the license directly there, or you can try
| to find where it was copied from, and look for a license
| there.
|
| Copilot isn't copying, it's regurgitating patterns from
| its training dataset. The result may be subject to a
| license you don't know about, but modified enough that
| you won't find the original source. The result can be a
| _blend_ of multiple snippets with varying licenses. And
| there 's no way to extract attribution from Copilot - DNN
| models can give you an output for your input, they can't
| tell you which exact parts of the training dataset were
| used to generate that output.
| FemmeAndroid wrote:
| But Copilot won't accurately tell you if it's directly
| copying code, and if so what the license is. If it
| provides MIT licensed code that I then need to include,
| how do I know that? Do I need to search for each set of
| lines of code it provides on GitHub?
|
| When a person gets code from another source on the
| internet, they generally know where the code has come
| from.
| aj3 wrote:
| In a real world scenario you wouldn't be mindlessly
| pressing Tab right after linebreak and accepting the
| first suggestion that comes your way. While entertaining,
| nobody gets paid to do that.
|
| What you get paid is to write your own code. When you
| write your own code, generally you think first and then
| type. Well, with Copilot you think first and then start
| typing a few symbols before seeing automatic suggestions.
| If they are right, you accept changes and if they happen
| to be similar to any other code out there, you deal with
| it exactly the same as if you typed those lines yourself.
| user-the-name wrote:
| But it is not the same as if you typed it yourself.
|
| If you happen to type code that is similar to copyright
| code, that is generally considered legally OK.
|
| If you copypaste copyrighted code, that is not legally
| OK.
|
| If you accept that same code from an autocomplete tool,
| that can easily be seen as equivalent to the latter case
| rather than the former.
| user-the-name wrote:
| Then name a usage of the tool that is legally sound. I can
| not think of one.
| aj3 wrote:
| Code completion that can suggest the whole line instead
| of a single word (e.g. often it guesses function
| parameters and various math operations when you haven't
| even typed function name yet).
| summerlight wrote:
| At least that will reduce the chance of license violation as
| well as make a good legal argument for any uncovered
| violations as "unintentional" incidents.
| [deleted]
| LeicaLatte wrote:
| Curious if Microsoft is training Co-Pilot on my private
| repositories.
| bencollier49 wrote:
| This does make me wonder if this is susceptible to the same form
| of trolling as that MS AI got. Commit a load of grossly offensive
| material to multiple repos, and wait for Copilot to start
| parroting it. I think they're going to need some human
| moderation.
| lawl wrote:
| Way better. It's susceptible to copyright trolling.
|
| Put up repos with snippets for things people might commonly
| write. Preferably use javascript so you can easily "prove" it.
| Write a crawler that crawls and parses JS files to search for
| matching stuff in the AST. Now go full patent troll, eh, i mean
| copyright troll.
| handrous wrote:
| 1) Write a project heavily using Copilot (hell, automate it
| and write thousands of them, why not?)
|
| 2) AGPL all that code.
|
| 3) Search for large chunks of code very similar to yours, but
| written after yours, licensed more liberally than AGPL.
| Ideally in libraries used by major companies.
|
| 4) Point the offenders to your repos and offer a "convenient"
| paid dual-license to make the offenders' code legal for
| closed-source use, so they don't have to open source their
| entire product.
|
| 5) Profit?
| armatav wrote:
| 6) Arms race with someone who trained an obfuscation
| version that goes through your AGPL code and tweaks it to
| not be in violation.
| SSLy wrote:
| I love living in cyberpunk already.
| gruez wrote:
| Offensive code is the least of my worries. What about
| vulnerable/exploitable code?
| tjpnz wrote:
| Given that code is easier to write than it is to read this
| one is troubling.
|
| I certainly wouldn't want to be using this with languages
| like PHP (or even C for that matter) with all the decades of
| problematic code examples out there for the AI to learn from.
| macNchz wrote:
| This was my first thought when reading about Copilot...it
| feels almost certain that someone will try poisoning the
| training data.
|
| Hard to say how straightforward it'd be to get it to produce
| consistently vulnerable suggestions that make it into
| production code, but I imagine an attacker with some
| resources could fork a ton of popular projects and introduce
| subtle bugs. The sentiment analysis example on the Copilot
| landing page jumped out to me...it suggested a web API and
| wrote the code to send your text there. Step one towards
| exfiltrating secrets!
|
| Never mind the potential for plain old spam: won't it be fun
| when growth hackers have figured out how to game the system
| and Copilot is constantly suggesting using their crappy,
| expensive APIs for simple things!? Given the state of Google
| results these days, this feels like an inevitability.
| joe_the_user wrote:
| Targeted attacks to elicit output only at a give context
| are generally possible with AIs. And here, writing an
| implementation of a difficult and vulnerable process seems
| easy. Bad implementations of various hard things become
| common 'cause people cut and paste the code without looking
| closely since they don't understand it anyway.
|
| //Implement eliptic cryptography below
|
| //Sanitize input for SQL call below
|
| Etc.
| bencollier49 wrote:
| Yep, trivial to implement as an attack.
| guhayun wrote:
| Just ask it to prioritize safety
| littlestymaar wrote:
| 1- re-upload all the shell script you can find, after having
| inserted `rm -rf --no-preserve-root /` every other line
|
| 2- ...
|
| 3- profit
| raffraffraff wrote:
| Perhaps they think that any code that passed a review and got
| merged = human moderated
| NullPrefix wrote:
| Coding with Adolf?
| heavyset_go wrote:
| Jojo Rabbit except Adolf is in the cloud and not in a kid's
| imagination.
| mhh__ wrote:
| YC submission when?
| tyingq wrote:
| Just the MP4, since it's hard to read in the smaller size:
| https://video.twimg.com/tweet_video/E5R5lsfXoAQDRkE.mp4
| hawski wrote:
| Copilot may do more to move open source projects out of GitHub
| than the message that Microsoft is the buyer. Now you can host
| the code on GitHub to get your license violated, or DMCA-ed in a
| long run, when your code will become a part of some big
| proprietary project. At least it makes me think about my choice
| for code hosting more then whatever happened before.
| woah wrote:
| It looks like the author of the linked tweet intended for it to
| reproduce the Quake code, by using the exact same function name
| and comment. Whatever the merits of CoPilot, in this case the
| human intended to write the quake function into their file, and
| put the wrong license on it.
| king_magic wrote:
| Yep, Copilot is insanely poorly thought out. Astonishing they'd
| release something as half-baked as this.
| [deleted]
| toss1 wrote:
| License Laundering
|
| Like Money Laundering for cash equivalents, or Trust-Washing for
| disinformation, but for hijacking software IP.
|
| It might not be the intended use case, but that winds up being
| the practical result.
|
| (on a related note, it would make me want to fun GPT-* output
| through plagiarism filters, but maybe they already do that before
| outputting?)
| rasz wrote:
| "0.1% of the time" indeed
| unknownOrigin wrote:
| I'm honestly kinda amazed this as upvoted here as it is.
| Typically anything ML-related is upvoted to the top positions and
| any dissent harshly ridiculed. Anyways... it appears those who
| thought about this as if it was a glorified code search engine
| were close to being right.
| qayxc wrote:
| I still don't think it's _just_ a glorified code search engine.
|
| Context-sensitive data retrieval is undoubtedly a part of it,
| though and the question is how big and relevant is that part
| and what are the consequences?
|
| To me the biggest issue is that it's impossible to tell whether
| the suggestions are verbatim reproductions of training material
| and thus problematic.
|
| It goes to show that this tool and basically every tool relying
| on the same or similar technology must now be assumed to do
| this and thus any code suggestion must be regarded plagiarism
| until proven otherwise. As a consequence such tools are now
| off-limits for commercial or open source development...
| coolspot wrote:
| Time to write a GPL-licensed Win32 and Win64 -compatible OS with
| the help of CoPilot...
| fencepost wrote:
| So what I'm reading here is that "Tay for code" is maybe going to
| need to be rethought and perhaps trained differently?
| 0-_-0 wrote:
| This is a very famous function [0] and likely appears multiple
| times in the training set (Google gives 40 hits for GitHub),
| which makes it more likely to be memorized by the network.
|
| [0]:
| https://en.wikipedia.org/wiki/Fast_inverse_square_root#Overv...
| 0-_-0 wrote:
| It's worth keeping in mind that what a neural network like this
| (just like GPT3) is doing is generating the most probable
| continuation based on the training dataset. Not the _best_
| continuation (whatever that means), simply the most likely one.
| If the training dataset has mostly bad code, the most likely
| continuation is likely to be bad as well. I think this is still
| valuable, you just have to think before accepting a suggestion
| (just like you have to think before writing code from scratch
| or copying something from Stack Overflow).
| abecedarius wrote:
| > the most probable continuation based on the training
| dataset
|
| This is not wrong, but it's easy to misread it as implying
| little more than a glorified Markov model. If it's like
| https://www.gwern.net/GPT-3 then it's already significantly
| cleverer, and so you should expect to sometimes get the kind
| of less-blatant derivation that companies aim to avoid using
| a cleanroom process or otherwise forbidding engineers from
| reading particular sources.
| dematz wrote:
| I have no idea how this or GPT3 works or how to evaluate
| them, but couldn't you argue that it's working as it should?
| You tell copilot to write a fast inverse square root, it
| gives you the super famous fast inverse square root. It'd be
| weird and bad if this _didn 't_ happen.
|
| As far as licenses go, idk. Presumably it could delete
| associated comments and change variable names or otherwise
| obscure where it's taking code from. Maybe this part is
| shady.
| tirpen wrote:
| Maybe I could build a robot that goes out in the city and
| steal cars.
|
| As far as licenses go, idk. Presumably it could delete the
| number plate and repaint the car or otherwise obscure where
| it's taking the car from. Maybe this part is shady.
|
| Maybe.
| 0-_-0 wrote:
| > couldn't you argue that it's working as it should?
|
| Let's say that it's doing exactly what it was trained to
| do.
| bee_rider wrote:
| In particular, fast approximate inverse square root is an x86
| instruction, and not a super new one. I'd be surprised if it
| wasn't in every major instruction set.
|
| This is an interesting issue. I suspect training on datasets
| from places like Github would be likely to provide lots of
| "this is a neat idea I saw in a blog post about how they did
| things in the 90's" codes.
| LeanderK wrote:
| I think the problem might be in the training data. Famous code
| examples are probably copied a lot and therefore appear multiple
| times in the training data, prompting the neural network to
| memorise it completely.
| avian wrote:
| Famous code examples are also much more likely to be noticed.
| For all I know, the thing might be spewing random GPL'd code
| from the long tail of GitHub all the time and nobody notices
| because it was written by some random guy and not John Carmack.
| mkl wrote:
| Carmack didn't write this:
| https://en.wikipedia.org/wiki/Fast_inverse_square_root
| LeanderK wrote:
| Well, it's sure speculation on my part what the root cause
| is, but i think OpenAI is already trying to ensure the
| network generalises. It's just common behaviour for neural
| network to memorise frequent samples, so I think my guess is
| quite realistic. I don't think OpenAI would not notice large-
| scale memorisation in their model. But as long as they don't
| publish more details it's just guesswork.
|
| Just keep in mind that it's a statistical tool. You can't
| really formally prove that it won't memorise, but I think
| with enough work you can get it unlikely enough that it won't
| matter. It's their first iteration.
| visarga wrote:
| Hash 10-grams and make a bloom filter. It will not generate
| more than 10 GPLed tokens from a source.
| deckard1 wrote:
| Also the Pareto principle. 80% of code is shit that you _don
| 't_ want to copy. The vast majority of github is awful hacks
| and insecure code that should not be touched with a ten foot
| pole.
| cblconfederate wrote:
| Is this function used verbatim in multiple projects? I know
| it's famous but how often does one use an approximation of
| inverse sqrt instead of the readily available cpu call in the
| past 20 years
| ocdtrekkie wrote:
| Probably an excellent reminder that both Google and Microsoft
| decided to use your private emails for a training set to create
| Smart Reply behavior that can "write emails for you", and they
| swore up and down there's no way that could ever leak private
| information.
|
| We need legislation banning companies from ingesting data into AI
| training sets without explicit permission.
| nebuke wrote:
| Makes me wonder if github are using private repos in their
| training data.
| ocdtrekkie wrote:
| GitHub clearly stated they only used publicly available repos
| in this project. However, as many people are rightfully
| pointing out, those projects might still be either closed
| source or copylefted, and if Copilot regurgitates chunks of
| those projects, people who use it may be subject to
| infringement lawsuits in the future.
| grawprog wrote:
| I'm not surprised to be honest. I've played around with AI
| dungeon, which also uses GPT-3. It regularly reproduces content
| directly from its training material, including even comments
| attached to the stories they trained the ai on.
| Tarucho wrote:
| Is Copilot aimed at programmers or at non-technical hiring
| managers?
|
| I mean, it goes right away with the devaluing narrative of
| programming that is going around from the last couple of years.
| To the "anyone can code" narrative we are adding "more so, if
| they have AI assisted Copilot"
| KoftaBob wrote:
| Seems like this is less of an "AI that intelligently generates
| code based on context given" and more of a "google search
| autocomplete for code".
| thinkingemote wrote:
| From the GPLv2 licensed code:
|
| https://github.com/id-Software/Quake-III-Arena/blob/master/c...
|
| copilot repeats it word for word almost, including comments, and
| adds an MIT like license up the top
| arksingrad wrote:
| I guess this confirms John Carmack to be an AI
| OskarS wrote:
| Apparently Carmack was not the original author, the origin I
| believe is SGI somewhere in the deep dark 90s.
| Haga wrote:
| Was a optimization for a fluid simulation originally..
| Fordec wrote:
| I get why some people were saying it made them a better
| programmer. Of course it did, it's copy-pasting Carmack code.
| Thomashuet wrote:
| Actually the indentation of the first comment and the lack of
| preprocessor show it's not copied from this code directly but
| from Wikipedia (https://en.wikipedia.org/wiki/Fast_inverse_squa
| re_root#Overv...) So It could be that the Quake source code is
| not part of the training set but the Wikipedia version is.
| SamBam wrote:
| While I strongly doubt they would use Wikipedia as a training
| set, has anyone done a search of GitHub code to see if other
| projects have copied-and-pasted that function from Wikipedia
| into their more-permissive codebases?
| edgyquant wrote:
| It's GPT though and the GPT models were trained on data
| from Wikipedia
| ajayyy wrote:
| It is probably based off GPT-3 with a layer on top trained
| for programming specifically, like what is done with AI
| dungeon.
| an_opabinia wrote:
| Wait until people on the toxic orange site find out what
| has happened to AI Dungeon.
| SamBam wrote:
| I'm out of the loop.
| grawprog wrote:
| https://gitgud.io/AuroraPurgatio/aurorapurgatio
|
| https://www.reddit.com/user/non-taken-name
| zxzax wrote:
| I don't get it, that seems like standard fare for an
| R-rated movie? And then it seems like some complained
| because they decided to start editing it down to a PG-13
| movie?
| grawprog wrote:
| Essentially, from my understanding, there was a data leak
| they never commented on, they instituted a poorly made
| content filter without saying anything. The filter
| frequently has false positives and negatives, someone
| discovered they trained the game using content the filter
| was designed to block, meaning the ai itself would
| frequently output filter triggering stuff, more people
| found out their private unpublished stories were being
| read by third parties after a job ad and the stories were
| posted on 4Chan, people recognized stories they wrote
| that had triggered the filter that were posted, and then
| they started instituting no warning bans.
|
| I might have missed something, but that's the gist of it.
| armatav wrote:
| It's pre-trained, partially, on Wikipedia. GPT-2 did this
| sort of thing all the time: native to the architecture to
| surface examples from the fine-tuning training set by
| default.
| bootlooped wrote:
| Almost 2000 results for one of the comment lines. I'm not
| going to read through those or check the licenses, but I
| think it's safe to say that block of code exists in many
| GitHub code bases, and it's likely many of those have
| permissive licenses. Given how famous it is (for a block of
| code) it's not unexpected.
|
| https://github.com/search?q=%22evil+floating+point+bit+leve
| l...
|
| A question that popped into my head is: if the machine sees
| the same exact block of code hundreds of times, does that
| suggest to it that it's more acceptable to regurgitate the
| entire thing verbatim? Not that this incident is totally
| 100% ok, but if it was doing this with code that existed in
| only a single repo that would be much more concerning.
| Animats wrote:
| _if the machine sees the same exact block of code
| hundreds of times, does that suggest to it that it 's
| more acceptable to regurgitate the entire thing
| verbatim?_
|
| From a copyright standpoint, quite possibly. This is
| called the "Scenes a faire" doctrine. If there are some
| things that have to be there in a roughly standard form
| to do a standard job, that applies.
|
| [1]
| https://en.wikipedia.org/wiki/Sc%C3%A8nes_%C3%A0_faire
| nojito wrote:
| It's up to the end user to accept the suggestions.
| user-the-name wrote:
| And it is completely impossible for the user to do so.
|
| So, the tool is worthless if you want to use it legally.
| nojito wrote:
| Doubtful.
|
| You can be almost certain it's being widely used or will be
| widely used shortly.
|
| The conversations around copilot are eerily similar to the
| conversations around the first autocomplete tools
| gnulinux wrote:
| It's more like a writer using an autocomplete tool to
| write the first chapter to their novel.
| caconym_ wrote:
| As someone who gets paid to write code (nominally) and
| has also written a few novels, I don't agree with this
| characterization. From what I've seen of Copilot, it's
| more like having a text editor generate your next
| sentence or paragraph^[1]. The idea (as I see it) is that
| you might use it to generate some prose "boilerplate",
| e.g. environmental descriptions, and hack up the results
| until you're satisfied.
|
| It's content generation at a fragmentary level where each
| "copied" chunk does not form a substantive whole in the
| greater body of the new work. Even if you were training
| it on other authors' works rather than just your own, as
| long as it wasn't copying _distinctive_ sentences
| wholesale, I think there 's a strong argument for it
| falling under fair use--if it's even detectable.
|
| On the other hand, if it regurgitated somebody else's
| paragraph wholesale, I don't think that would be fair
| use. Somewhere in-between is where it gets fuzzy, and
| really interesting; it's also where internet commenters
| seem to prefer flipping over the board and storming out
| convinced they're _right_ to exploring the issues with a
| curious and impartial mind. I see way too much unreasoned
| outrage and hyperbolic misrepresentation of the Copilot
| tool in these threads, and it 's honestly kind of
| embarrassing.
|
| As far as this analogy goes, it's worth noting that the
| structure of a computer program doesn't map onto the
| structure of a piece of fiction (or any work of prose) in
| a straightforward way. Since so much of code _is_
| boilerplate, I would (speculatively, in the copyright law
| sense) actually give more leeway to Copilot in terms of
| absolute length of copied chunks than I would for a prose
| autocompleter. For instance, X program may be licensed
| under the GPL, but that doesn 't mean X's copyright
| holder(s) can sue somebody else because their program
| happened to have an identical expression of some RPC
| boilerplate or whatever. It would be like me suing
| another author because their work included some of the
| same words that mine did.
|
| ^[1] At least one tool like this (using GPT-3) has been
| posted on HN. At this point in time I wouldn't use it,
| but I have to admit that it was sort of cool.
| user-the-name wrote:
| That does not seem like a response to what I just said?
|
| I said that it is impossible for the user to check that
| the code copilot gives is OK, license-wise, and
| therefore, they can not be sure that it is legally OK to
| include in any project.
| freshhawk wrote:
| And it's up to the end user to evaluate the tool that makes
| the suggestions.
| flatiron wrote:
| As someone who does code reviews the thought the developer
| didn't code the code submitted to be merged never would cross
| my mind.
| croes wrote:
| Good luck checking every code line for license violations
| duckmysick wrote:
| SaaS idea: code linter, but for licenses.
| adrianN wrote:
| Extend Fossology: https://www.fossology.org/
| SahAssar wrote:
| That's one of blackduck's offerings:
| https://www.synopsys.com/software-integrity/open-source-
| soft...
|
| At a previous job we had a audit from them, it seemed to
| not be too accurate but probably good enough for
| companies to cover their asses legally.
| dr_kiszonka wrote:
| There will be a VSCode extension for that.
| TheDong wrote:
| It's impossible to automate checking for code license
| violations.
|
| If you and I write the exact same 10 lines of code, we
| both have independent and valid copyrights to it. Unlike
| patents, independent derivation of the same code _is_ a
| defense for copyright.
|
| If I write 10 lines of code, publish it as GPL (but don't
| sign a CLA / am not assigning it to an employer), and
| then re-use it in an MIT codebase, I can do that because
| I retained copyright, and as the copyright holder I can
| offer the code under multiple incompatible licenses.
|
| There's no way for a machine to detect independent
| derivation vs copying, no way for the machine to know who
| the original copyright holder was in all cases, and
| whether I have permission from them to use it under
| another license (i.e. if I email the copyright holder and
| they say 'yeah, sure, use it under non-gpl', it suddenly
| becomes legal again)...
|
| It's not a problem computers can solve 100% correctly.
| croes wrote:
| Same trust issue
| atatatat wrote:
| It's people for your lawyers to blame, all the way down!
|
| /s
| croes wrote:
| It's the same problem s with self driving cars, you gets
| sued. The company that provides the service/car or the
| the programmer/driver? I think the latter.
| rmorey wrote:
| This exact code is all over github, >1k hits
|
| https://github.com/search?q=%22i++%3D+%2A+%28+long+%2A+%29+%...
| ajklsdhfniuwehf wrote:
| that will make a great defense at a copyright court.
|
| "your honor, i would like to plead not guilty, on the basis
| that i just robbed that bank because i saw that everyone was
| robbing banks on the next city"
|
| ...on the other hand, that was the exact defense tried for
| the capitol rioters. So i don't know anything anymore.
| mattowen_uk wrote:
| With apologies to Martin Niemoller[1]: First the
| automation came for the farmers, and I did not speak out --
| Because I was not a farmer. Then the automation came
| for the factory workers, and I did not speak out --
| Because I was not a factory worker. Then the
| automation came for the accountants, and I did not speak out --
| Because I was not a accountant. Then the automation
| came for me (a programmer) -- and there was no one left
| to speak for me.
|
| ---
|
| [1] https://en.wikipedia.org/wiki/First_they_came_...
| flatiron wrote:
| Honestly I've automated a large chunk of my day job. The trick
| is keeping it secret!
| mattowen_uk wrote:
| Well... wait until all the programmer salaries crash to
| minimum wage because Management believe that "CoPilot does
| most of the work anyway".
| Rooster61 wrote:
| Then wait for them to realize how brittle the code is when
| nobody is considering the context into which this code is
| being foisted. They'll TRIPLE our salaries! :D
| timdaub wrote:
| I felt the need to write an article about this whole situation
| too:
|
| "Built on Stolen Data":
| https://rugpullindex.com/blog#BuiltonStolenData
| GenerocUsername wrote:
| A closed beta with only a few previews out in the wild has bugs.
| Unbelievable.
|
| I cannot believe GitHub would do this.
| rozularen wrote:
| Guess someone had to try it
| mouzogu wrote:
| I can't even get intellisense to work correctly half the time.
| [deleted]
| swiley wrote:
| So I guess we can just copy around copyrighted source now? Great!
| Now we can share all the proprietary driver and DSP code from
| Qualcomm.
| supernintendo wrote:
| I wonder when someone will try to use the "it came from
| Copilot" defense to get away with stealing copyrighted code.
| louthy wrote:
| This is utterly damning. I have already instructed my team that
| Copilot can never be used for our projects. Compromising the
| product because of unknowable license demands isn't acceptable in
| the professional world of software engineering.
|
| But if we put the licensing to one side for a moment...
|
| 1/ Everything I've seen it generate so far is 'imperative hell'.
| It is practically a 'boilerplate generator'. That might be useful
| for pet projects, smaller code bases, or even unit-test writing.
| But large swathes of application code looking like the examples
| I've seen so far is hard to manage.
|
| 2/ The boilerplate is what bothers me the most (as someone who
| believes in the declarative approach to software engineering).
| The future for programming and programming languages should be an
| attempt to step up to a higher level of abstraction, that has
| been historically the way we step up to higher levels of
| productivity. As applications get larger and code-bases grow
| significantly we need abstraction, not more boilerplate.
|
| 3/ As someone who develops a functional framework for C# [1], I
| could see Copilot essentially side-lining my ideas and my
| approach to writing code in C#. Not just style, but choice of
| types, etc. I wonder if the fall out of what is Copilot's 'one
| true way' of generating code was ever considered? It appears to
| force a style that is at odds with many who are looking for more
| robust code. At worst it will homogenise code "people who wrote
| that, also wrote this" - stifling innovation and iterative
| improvements in the industry.
|
| 4/ Writing code is easy. Reading and understanding code written
| by another developer is hard. Will we spend most of our time as
| code-reviewers going forwards? Usually, you can ask the author
| what their intentions were, or why they think their approach is
| the correct one. Copilot (as far as I can tell) can't justify its
| decisions. So, beyond the simple boilerplate generation, will
| this destroy the art of programming? I can imagine many juniors
| using this as a crutch, and potentially never understanding the
| 'why'.
|
| I'm not against productivity tools per se; it's certainly a neat
| trick, and a very impressive feat of engineering in its own
| right. I am however dubious that this really adds value to
| professional code-bases, and actively may decrease code quality
| over time. Then there's the grey area of licensing, which I feel
| has been totally brushed to one side.
|
| [1] https://github.com/louthy/language-ext
| ethangk wrote:
| This is a little off topic, but your framework looks really
| interesting! How come you opted for building a functional
| framework in C#, vs using F#? I couldn't see anything in the
| README about what was specifically frustrating about F#? I ask
| because we're looking at introducing it at my company.
| louthy wrote:
| I cofounded a company in 2005, the primary product is a
| never-ending C# web-application project. As the code-base
| grew to many millions of lines of code I started to see the
| very real problems of software engineering in the OO
| paradigm, and had the _functional programming enlightenment
| moment_.
|
| We started building some services in F#, but still had a
| massive amount of C# - and so I wanted the inertia of my team
| to be in the direction of writing declarative code. There
| wasn't really anything (outside of LINQ) that did that, so I
| set about creating something.
|
| We don't write F# any more and find functional C# (along with
| the brilliant C# tooling) to be very effective for us
| (although we also now use PureScript and Haskell).
|
| I do have a stock wiki post on the repo for this though [1].
| You might not be surprised to hear it isn't the first time
| I've been asked this :)
|
| [1] https://github.com/louthy/language-ext/wiki/%22Why-
| don't-you...
| ethangk wrote:
| Ha, it's good to see I'm full of original thoughts.
|
| That post in the wiki sums it up perfectly, much
| appreciated!
| bostonsre wrote:
| I'm not sure we should throw the baby out with the bath water
| here due to the large blurbs it stubs in when when it doesn't
| have a lot to go on in mostly empty files. It is a preview
| release. They are working on proper attribution of suggested
| code and explainability [1]. Having a stochastic parrot that
| types faster than I do would be useful in a lot of cases.
|
| Yes, better layers of abstraction could make us more productive
| in the future, but we're not there yet. By all means, don't
| accept the larger blurbs it proposes, but there is productivity
| to be gained in the smaller suggestions. If it correctly
| intuits the exact rest of the line that you were thinking of,
| it will save time and not make you lose understanding of the
| program.
|
| In some areas complete understanding and complete code
| ownership is required but in a lot of places, it's not. If it
| produces the work of a moderately skilled developer it would be
| sufficient. I don't remember all code I write as time passes.
| If it produces work that I would have produced, then I don't
| see how that's any different that work that was produced by my
| past self.
|
| It may feel offensive but a lot of the comments against it
| sound like rage against the machine/industrialization opponents
| and the arguments sound pretty similar to those made in the
| past by those that had their jobs automated away. I'm not sure
| we're all as unique snowflakes as we like to think we are.
| Sure, there will be some code that requires an absolute master
| that is outside the capabilities of this tool. But I'd guess
| there is a massive amount of code that doesn't need that
| mastery.
|
| [1] https://docs.github.com/en/github/copilot/research-
| recitatio...
| vlovich123 wrote:
| I think it depends on how you look at it.
|
| For small snippets that have likely been already written by
| someone else, this probably works great. For those though,
| the time savings is probably at most 5-10 min down to 1 or
| less. The challenge is that that's not where my time goes
| unless I'm working in an unfamiliar language.
|
| As someone who writes a lot of code quickly, I'm usually
| bottlenecked by reviews. For more complex changes I'm
| bottlenecked by understanding the problem and experimenting
| with solutions (and then reviews, domain-specific tests
| usually, fixing bugs etc). Writing code isn't like waiting
| for code to compile since I'm not actually ending up task
| switching that frequently.
|
| This does sound like a fantastic tool when I'm not familiar
| with the language although I wonder if it actually generates
| useful stuff that integrates well as the code gets larger (eg
| can I say "write an async function that allocates a record
| handle in the file with ownership that doesn't outlive the
| open file's lifetime"). I'm sure though that this is what a
| lot of people are overindexing on. For things like that I
| expect normal evolution of the product will work well. For
| things like "cool, understand your snippets but also weight
| my own codebase higher and understand the specifics of my
| codebase", I think there's a lot of groundbreaking research
| that would be required. That is what I see as a true
| productivity boost - I'd make this 100% required for anyone
| joining the codebase. The more mentorship can be offloaded,
| the lower the cost is to growing teams. OSS projects can more
| easily scale similarly.
| 0xbadcafebee wrote:
| > programming languages should be an attempt to step up to a
| higher level of abstraction
|
| Adding abstraction buries complexity. If all you do is keep
| adding more abstractions, you end up with an overcomplicated,
| inefficient mess. Which is part of why application sizes are so
| bloated today. People just keep adding layers, as long as they
| have room for more of them. Everything gets less efficient and
| definitely not better.
|
| The right way to design better is to iterate on a core design
| until it cannot be any simpler. All of the essential complexity
| of software systems today comes from 40 year old conventions.
| We need a redesign, not more layers.
|
| One example is version management. Most applications today
| _can_ implement versioned functions and keep multiple versions
| in an application, and track dependencies between external
| applications. Make a simple DAG of the versions and let apps
| call the versions they were designed against, or express what
| versions are compatible with what, internally. This would make
| applications infinitely backwards-compatible.
|
| The functionality exists right now in GNU Libc. You can
| literally do it today. But rather than do that, we stumble
| around replacing entire environments of specific versions of
| applications and dependencies, because we can't seem to move
| the entire industry forward to new ideas. Redesign is hard,
| adding layers is easy.
| louthy wrote:
| > Adding abstraction buries complexity. If all you do is keep
| adding more abstractions, you end up with an overcomplicated,
| inefficient mess. Which is part of why application sizes are
| so bloated today. People just keep adding layers, as long as
| they have room for more of them. Everything gets less
| efficient and definitely not better.
|
| Presumably you're writing code in binary then? This is a non-
| argument, because there's evidence that it's worked.
| Computers were first programmed with switches and punch
| cards, then tape, then assembly, then low level languages
| like C, then memory managed languages etc.
|
| Abstraction works when side-effects are controlled.
| Composition is what we're after, but we must compose the
| bigger bits from smaller bits that don't have surprises in.
| This works well in functional programming, a good example
| would be monadic composition: monads remove the boilerplate
| of dealing with asynchrony, value availability, list
| iteration, state management, environment management, etc.
| Languages that have first-class support for these tend to
| have significantly less boilerplate.
|
| The efficiency argument is also off too. Most software
| engineering teams would trade some efficiency for more
| reliable and bug free code. At some point (and I would argue
| we're way past it) programs become too complex for the human
| brain to comprehend, and that's where bugs come from. That's
| why we're overdue an abstraction lift.
|
| Tools like Copilot almost tacitly agree, because they're
| trying to provide a way of turning the abstract into the
| real, but then all you see is the real, not the abstract.
| Continuing the assault on our weak and feeble grey matter.
|
| I spent the early part of my career obsessing over
| performance on crippled architectures (Playstation 3D engine
| programmer). If I continued to write applications now like I
| did then, nothing would go out the door and my company
| wouldn't exist.
|
| Of course there are times when performance matters. But the
| vast majority of code needs to be correct first, not the most
| optimal it can be for the architecture.
| mdellavo wrote:
| Generating code has never been a problem for developers :)
|
| I'd be more interested in a tool that notices patterns and
| boilerplate. It could offer a chance for generalization,
| abstraction or use of a common pattern from the codebase. This
| is of course much harder.
| bohemian99 wrote:
| My question is would Copilot be useful if you could choose the
| codebase it would be drawing from? Almost as an internal
| company tool?
| freshhawk wrote:
| That would actually be potentially useful, it could do a kind
| of combination of autocompletion of internal libraries,
| automatic templates for common patterns and internal
| style/linting type tasks all in one. Certainly augmenting
| those other things.
|
| It would be interesting how much code you would need before
| it was useful (and how good does it have to be to be useful?
| Does even a small error rate cost so much that it erases
| other gains, because so many of the potential errors in usage
| of this type of tool are very subtle?)
| saurik wrote:
| If you find yourself copying code someone else in your
| organization wrote rather than abstracting it to a function
| in a shared library or building a more declarative framework
| to manage the problem, something horrible has happened.
| maccard wrote:
| Sometimes boilerplate is unavoidable. As an example, how do
| you send a GET request with libcurl in C with an
| authorization header? I can't tell you offhand, but I can
| tell you the file in my codebase that does have it, because
| I've duplicated the logic for two separate systems.
| saurik wrote:
| So you are saying you would rather every project in the
| world have at least one--if not, thanks to making it
| easier via Copilot, many--copies of this code rather than
| one shared library that provides a high-level abstraction
| for libcurl?... At least for your own code, how did you
| end up with two copies of duplicated logic rather than a
| shared library of functionality?
| maccard wrote:
| > So you are saying you would rather every project in the
| world have at least one--if not, thanks to making it
| easier via Copilot, many--copies of this code.
|
| Absolutely not, not at all. I'm suggesting that copying
| and pasting happens, particularly in the context of a
| single project.
|
| > At least for your own code, how did you end up with two
| copies of duplicated logic rather than a shared library
| of functionality?
|
| At what point is it worth introducing an abstraction
| rather than copying? Using my libcurl example, you can
| create an abstraction over the~ 10 lines of
| initialization, but if you need to change it to a POST,
| then you're just implemnenting an abstraction over
| libcurl, which is just silly.
| saurik wrote:
| If you have 10 lines of repeated code with one line
| changed to make it GET vs POST, introducing an
| abstraction isn't "silly": it is simultaneously both
| ergonomic and advantageous, as if you ever need to add
| another line of code to that initialization--which
| totally happens, due to various security extensions you
| might want to make to what TLS settings you accept, or to
| tune performance parameters related to connection
| caching, or to add a header to every request (for any
| number of reasons from debugging to authentication)--you
| can do it in one place instead of umpteen number of
| places. And like... further to the point: I use libcurl
| as a _fallback_ on Linux, but if you want to correctly
| support the user 's settings for proxy servers--which are
| sometimes needed for your requests to work at all--my
| code is abstracted so I can plug in _entirely different
| backends_ to libcurl, such as Apple 's CFNetwork. You act
| like abstraction is somehow a bad thing or a complex
| thing, when it should absolutely take you less time to
| wrap duplicated code into a function than to duplicate
| it.
| tyingq wrote:
| That sounds interesting, though it still feels like it would
| need work. Like a way to annotate suggestions with comments,
| or flag them. Definitive licensing shown for each snippet. A
| way to mark deprecated code as deprecated to the training
| algorithm, etc.
| louthy wrote:
| It would certainly alleviate the license concerns. If it was
| possible to train it to a level (that produces effective
| output), then sure.
|
| As a thought experiment, I thought "what would happen if we
| trained it on our 15 million lines of product code + my
| language-ext project". It would almost certainly produce
| something that looks like 'us'.
|
| But:
|
| * It would also trip over a million or so lines of generated
| code
|
| * And the legacy OO code
|
| * It will 'see' some of the extreme optimisations I've had to
| built into language-ext to make it performant. Something like
| the internals of the CHAMP hash-map data-structure [1]. That
| code is hideously ugly, but it's done for a good reason. I
| wouldn't want to see optimised code parroted out upfront.
| Maybe it wouldn't pick up on it, because it hasn't got a
| consistent shape like the majority of the code? Who knows.
|
| Still, I'd be more willing to allow my team to use it if I
| could train it myself.
|
| [1] https://github.com/louthy/language-
| ext/blob/main/LanguageExt...
| carlmr wrote:
| > legacy OO code
|
| Aside from OO vs FP. A concern with that I'd have is that
| it would encourage and enforce idiosyncracies in large
| corporate codebases.
|
| If you've ever worked for a large corporation on their
| legacy code, you know you don't want any of that to be
| suggested to colleagues.
|
| This would enforce bad behaviors and make it even harder
| for fresh developers to argue against it.
| louthy wrote:
| > This would enforce bad behaviors and make it even
| harder for fresh developers to argue against it.
|
| I think this is a significant point. It maintains the
| status quo. We change our guidance to devs every other
| year or so. New language features become available, old
| ones die, etc. But we're not rewriting the entire code-
| base every time, we know if we hit old code, we refactor
| with the new guidance; but we don't do it for the sake of
| it, so there's plenty of code that I wouldn't want in a
| training set (even if I wrote it myself!)
| [deleted]
| goodpoint wrote:
| 5/ Boilerplate is easy to write but expensive to maintain in
| large quantities. Proper abstraction/templating requires
| careful thinking. Copilot encourages the first and discourages
| the second.
|
| 6/ Copilot learns from the past. It can only favor popularity
| and familiarity in code patterns over correctness and
| innovation.
| hallqv wrote:
| Neural net: "It's all in the training data, stupid."
| reader_mode wrote:
| >2/ The boilerplate is what bothers me the most (as someone who
| believes in the declarative approach to software engineering).
| The future for programming and programming languages should be
| an attempt to step up to a higher level of abstraction, that
| has been historically the way we step up to higher levels of
| productivity. As applications get larger and code-bases grow
| significantly we need abstraction, not more boilerplate.
|
| Just the other day someone on copilot threads was arguing that
| this kind of boilerplate optimizes for readability... It's like
| Java Stockholm syndrome and the old myth of easy to approach =
| easy to read (how long it took them to introduce var).
|
| I've always viewed code generators as a symptom of language
| limitations (which is why they were so popular in Java land)
| that lead to unmaintainable code, this seems like a fancier
| version of that - with all the same drawbacks.
| 3pt14159 wrote:
| I'm all for abstracting. I like Rails, for example. That
| said, it gets _truly_ difficult to add or change stuff at the
| more abstract layers. For example, adding recursive querying
| to an existing ORM is _tough_. And on the rare occasion that
| there is a bug in the abstract layer, debugging that from the
| normal application code is also tough.
|
| I understand why some corporations prefer dumb boilerplate
| everywhere for some applications. If there is an outage it's
| usually easy to fix quickly. Sometimes it's not, if it's a
| issue in the boilerplate (say, Feb 29 rolls around and all of
| the boilerplate assumed a 28 day month) that means a huge
| update all across the system, but that rarely happens in
| practice.
| reader_mode wrote:
| I would say ORM is tough with code gen or with
| metaprogramming because it maps two mismatched paradigms
| (OOP and relational) and tries to paper over the
| differences.
|
| I do agree on the debugging aspect - especially in dynamic
| languages - metaprogramming stack traces can be really hard
| to follow.
| slumdev wrote:
| Tools like xsd or T4 (in the .NET ecosystem) are great time-
| savers, but you would never consider directly modifying the
| code they generate. You would leave the generated code
| untouched (in case it ever needed to be generated again) and
| subclass it to make whatever changes you intend.
|
| I think Copilot is so unfortunate because it's not building
| abstractions and expecting you to override parts of them.
| It's acting as an army of monkeys banging out Shakespeare on
| a typewriter. And the code it generates is going to require
| an army to maintain.
| hu3 wrote:
| Linq2Db is a great example of T4 code generation that
| works. It creates partial classes from database schema.
| Together with C# I have strongly typed database access.
|
| https://github.com/linq2db/linq2db
| reader_mode wrote:
| Even there I feel like code generators are just a band aid
| around the fact that metaprogramming facilities suck. If
| you would never modify the generated code why generate in
| the first place. You could argue that stack traces are
| easier to follow but TBH generated code is rarely pretty in
| that regard as well.
|
| For example I think F# idea of type providers > code
| generators.
| ethbr0 wrote:
| Code generators = out of practice, out of mind
| mimixco wrote:
| Awesome summary and thanks for trying it for the rest of us!
|
| Copilot sounded terrible in the press release. The idea that a
| computer is going to pick the right code for you (from
| comments, no less) is really just completely nuts. The belief
| that it could be better than human-picked code is really way
| off.
|
| You bring up a really important point. When you use a tool like
| Copilot (or copypasta of any kind), you are introducing the
| _additional_ burden of understanding that other person 's code
| -- which is worse than trying to understand your own code or
| write something correct from scratch.
|
| I think you've hit the nail on the head. Stuff like Copilot
| makes programming worse and more difficult, not better and
| easier.
| res0nat0r wrote:
| Isn't the entire point of this to _suggest_ code you may use,
| not to just blindly accept is correct without thinking?
| petercooper wrote:
| While I accept most of the concerns, it's better than your
| comment suggests. I see some promise for it as a tool for
| reminding you of a technique or inspiring you to a different
| approach than you've seen before.
|
| For example, I wrote a comment along the lines of "Find the
| middle point of two 2D positions stored in x, y vectors" and
| it came up with two totally different approaches in Ruby -
| one of which I wouldn't have considered. I did some similar
| things with SQL, and some people might find huge value in it
| suggesting regexes, too, because so many devs forget the
| syntax and a reminder may be all it takes to get out of a
| jam.
|
| I'm getting old enough now to see where these sorts of
| prompts will be a game changer, especially when dabbling in
| languages I'm not very proficient in. For example, I barely
| know any Python, so I just created a simple list of numbers,
| wrote a "Sort the numbers into reverse order" comment, and it
| immediately gave me the right syntax that I'd otherwise have
| had to Google, taking much longer.
|
| Maybe to alleviate the concerns it could be sandboxed into a
| search engine or a separate app of its own rather than
| sitting constantly in my main editor - I would find that a
| fair compromise which would still provide value but require
| users to engage in more reflection as to what they're using
| (at least to a level that they would with using SO answers,
| say).
| rob74 wrote:
| Yeah, but... I mean, I guess we all agree that copying code
| from, let's say StackOverflow without checking if it really
| does what you want it to do is a bad thing? Now here we
| have a tool that basically automates that (except it's
| copying from GitHub, not StackOverflow), and that's
| supposed to be a good thing? Even if its AI is smarter, you
| would still have to check the code it suggests, and that
| can actually be harder than writing it yourself...
| ethbr0 wrote:
| The big boost, that I think parent is alluding to, is for
| rusty (not Rust!) languages in the toolbox, where you may
| not have the standard library and syntax loaded into your
| working memory.
|
| As a nudge, it's a great idea. As a substitute for
| vigilance, it's a terrible idea.
|
| I suspect that's why they named it Copilot instead of
| Autopilot, but it's unfortunately more likely to be used
| as the latter, humans being humans.
| [deleted]
| toss1 wrote:
| Right, so it might occasionally be useful as a search tool
| for divergent ideas of different approaches to a problem,
| and your suggestion to sandbox it in a separate area works
| for that.
|
| But that does not seem to be it's advertised or configured
| purpose, sitting in your main editor.
| mimixco wrote:
| This is good stuff. As a search engine, it could very well
| be useful. As another poster pointed out, if some context
| or explanation were provided along with the source
| suggestions, its utility as a reference would really grow.
|
| I totally agree with you that prompted help is a big deal
| and just going to get bigger. We have developed a language
| for fact checking called MSL that works exactly this way in
| practice -- suggesting multiple options rather than just
| inserting things.
|
| One of the things that interests me about this thread is
| the whole topic of UI vs. AI and how much help really comes
| from giving the user options (and a good UI to discover
| them) vs how much is "AI" or really intelligence. I think
| the intelligence has to belong to the user, but a computer
| can certainly sift through a bunch of code to find a search
| engine result and, those results could be better than you
| get now from Google &Co.
| osmarks wrote:
| If they're using something like GPT-3 on the backend,
| which they probably are, it probably _can 't_ provide any
| explanations or context (unless the output is memorized
| training data, like this); the output can be somewhat
| novel code not from any particular source, and while it
| might be possible to find relevant information on similar
| code, this would be a hard problem too.
|
| EDIT: they appear to be interested in making it look for
| similar code, see here:
| https://docs.github.com/en/github/copilot/research-
| recitatio...
| vmception wrote:
| Hm odd takes here.
|
| It's really weird for software engineers to judge something
| by its current state and not by its potential state.
|
| To me, it's clearly solvable by Copilot filtering the input
| code by that repository's license. It should only be certain
| open source licenses, maybe even user-choosable, or code-
| creators can optionally sublicense their code to Copilot in a
| very permissable way.
|
| Secondly, a way for the crowd to code review suggestions
| would be a start.
| gpm wrote:
| Practically every open source license requires attribution,
| if copilot has a licensing issue, training a model on only
| repositories with the same license won't fix it except for
| the extremely rare licenses which do not require
| attribution.
| vmception wrote:
| why not? it can just generate an attribution file or
| reminder
| gpm wrote:
| Because it's an opaque neural network on the backend, it
| doesn't know if or from whom it copied code.
| buu700 wrote:
| Could they handle this by generating a collective
| attribution file that covers every (permissively
| licensed) repository that Copilot learned from?
|
| Of course this would be massive, so from a practical
| consideration the attribution file that Copilot generates
| in the local repository would have to just link to the
| full file, but I don't think that would be an issue in
| and of itself.
| gpm wrote:
| Maybe? Might depend on the license, I doubt the courts
| would be amused.
|
| Almost certainly a link would not suffice, basically
| every license requires that the attribution be directly
| included with the modified material. Links can rot, can
| be inaccessible if you don't have internet access, can
| change out from underneath you, etc.
|
| (I am not a lawyer, btw)
| buu700 wrote:
| Makes sense. Maybe something like git-lfs/git-annex would
| be sufficient to address the linking issue, but it seems
| like the bigger concern is whether a court would accept
| this as valid attribution. In a sense it reminds me of
| the LavaBit stunt with the printed key.
| joeyh wrote:
| I think a judge could be persuaded that a list of every
| known human does not constitute a valid attribution of
| the actual author, even though their name is on the list.
| The purpose of an attribution is to acknowledge the
| creator of the work, and such a list fails at that.
| buu700 wrote:
| Makes sense. That's probably the best interpretation
| here. Any other decision would make attribution lists
| optional in general for all practical purposes.
| mimixco wrote:
| I've been in the business a long time and I just don't
| believe in generalized AI at all. Writing code requires
| general (not artificial) intelligence. All of these "code
| helping" tools break down quickly because they may be
| searching for and finding relevant code blocks (the
| "imperative hell" referred to by another commenter), but
| they don't understand the _context_ or the overall behavior
| and goals of the program.
|
| Writing to overall goals and debugging actual behavior are
| the real work of programmers. Coming up with syntax or
| algorithms are 3rd and 4th on the priority list because,
| lets face it, it's not that hard to find a reference for
| correct syntax or the overall recipe implied by an
| algorithm. Once you understand those, you can write the
| correct code for your project.
|
| I do think Copilot has potential as a search engine and
| reference tool -- if it can be presented that way. But the
| idea of a computer actually coming up with the right code
| in the full context of the program seems like fantasy.
| gpm wrote:
| If we're coming up with potential uses, I think they got
| the direction wrong.
|
| Don't tell me what to do, tell me what not to do. "this
| line doesn't look like something that belongs in a code
| base", "this looks like a line of code that will be
| changed before the PR is merged". Etc.
| mimixco wrote:
| _That_ would be fantastic! Imagine if it could catch
| common errors before you make them. So many things in
| loops and tests that we mess up all the time. My favorite
| is to confuse iterating through an array vs an object in
| JS. I 'd love to have Gazoo step in and say, "Don't you
| mean, _this_ , David?"
| slumdev wrote:
| > It's really weird for software engineers to judge
| something by its current state and not by its potential
| state.
|
| No, we're not afraid of Copilot replacing us. The thought
| is ridiculous, anyway. If it actually worked, we would be
| enabled to work in higher abstractions. We'd end up in even
| higher demand because the output of a single engineer would
| be so great that even small businesses would be able to
| afford us.
|
| Yes, we are afraid of Copilot making the entire industry
| worse, the same way that "low-code" and "no-code" solutions
| have enabled generations of novices to produce volumes of
| garbage that we eventually have to clean up.
| vmception wrote:
| Sounds like projecting because thats not what I was
| referring to
|
| I'm saying copilot can be better with very simple tweaks
| [deleted]
| onion2k wrote:
| _Stuff like Copilot makes programming worse and more
| difficult, not better and easier._
|
| Copilot makes programming worse and more difficult if you're
| aiming for a specific set of coding values and style that
| Copilot doesn't generate (yet?). If Copilot generates the
| sort of code that you would write, and it does for _a lot_ of
| people, then it 's definitely no worse (or better) than
| copying something from SO.
|
| The author of a declarative, functional C# framework likely
| has very different ideas to what code should be than some PHP
| developer just trying to do their day-to-day job. We
| shouldn't abandon tools like Copilot just because they don't
| work out at the more rigorous ends of the development
| spectrum.
| serf wrote:
| >If Copilot generates the sort of code that you would
| write, and it does for a lot of people, then it's
| definitely no worse (or better) than copying something from
| SO.
|
| Disagree.
|
| Most SO copy-paste must be integrated into your project --
| maybe it expects different inputs, maybe it expects or
| works with different variables -- whatever, it must be
| partially modified to work with the existing code-base that
| you're working with.
|
| Copilot does the integration tasks for you. When one might
| have had to read through the code from SO to understand it
| enough to integrate it, the person using Copilot need not
| even invest that much understanding.
|
| Because of these workflow differences, it seems to me as if
| Copilot enables an even more low-quality workflow than
| offered by copy-pasting from SO and patching together
| multiple code-styles and paradigms while hoping for the
| best; Copilot does that without even the wisdom that an SO
| user might have that 'this is a bad idea.'
| buu700 wrote:
| I'm not firmly for or against the concept of Copilot, but
| it's fascinating to me that it will introduce an entirely
| new class of bugs. Rather than specific mistakes in
| certain blocks of code and edge case errors in handling
| certain inputs, now we're going to have
| lazy/overworked/junior developers getting complacent and
| committing code they haven't reviewed that isn't even
| close to their intent. Like you could have a backend
| method that was supposed to run a database query, but
| instead it sends the content of an arbitrary variable in
| a POST request to a third-party API or invokes a shell to
| run `rm -rf /`.
| marcosdumay wrote:
| To me, the most interesting aspect is the new class of
| supply chain security vulnerabilities it will create. How
| people will act to exploit or protect1 against those will
| be very interesting.
|
| 1 - I don't expect "not using a tool that generates bad
| code" to be the top option.
| nightpool wrote:
| The arguments that the GP makes are not based on a specific
| style or value of coding. Instead, they're based on the
| simple truth that it is harder to understand code that
| somebody else wrote.
|
| In some cases the benefits of doing so outweigh the costs
| (such as using a stack overflow answer that's stood the
| test of time for something you don't know how to do), but
| with Copilot you don't even get the benefit of upvotes,
| human intent, or crowdsourced peer review.
| mimixco wrote:
| I don't think they work out past trivial applications. Any
| non trivial app requires an understanding of a much larger
| part of the codebase than a tool like Copilot is looking at
| at any one time.
|
| Copilot does not understand the code _in toto_ and is
| therefore really useless for debugging (70% of all coding)
| and probably useless for anything other than very simple
| parts of an app.
| onion2k wrote:
| _Any non trivial app requires an understanding of a much
| larger part of the codebase than a tool like Copilot is
| looking at at any one time._
|
| I don't think that's important. Copilot, at least as it's
| been demo'd so far judging by the examples, is to help
| you write small, standalone functions. It shouldn't need
| to know about the rest of the application. Just as the
| functions that you write yourself shouldn't need to know
| about the rest of the application either.
|
| If your functions need a broad understanding of the
| codebase as a whole how the heck do you write tests that
| don't fail the instant anything changes?
| mimixco wrote:
| The reality of code is that stuff breaks when connected
| to other stuff, as it eventually must be for real work to
| happen. There's no getting around that.
|
| Since that's where the work of programming is, debugging
| connected applications (not writing fresh, unencumbered
| code, a rare luxury), a tool that offers no help for that
| is, well, not much help.
| GlennS wrote:
| I'm inclined to agree with you, and actually I'm rather
| mistrustful of even basic autocomplete ever since a colleague
| caught me using it without even looking at the screen!
|
| But I wonder...
|
| Is this a difference of programmer culture?
|
| I think there are people who write successful computer programs
| for successful businesses without delving into the details.
| Without considering all the things that might go wrong. Without
| mapping the code they're writing to concepts.
|
| Lots of people.
|
| What would they do with this?
| louthy wrote:
| > What would they do with this?
|
| Not get a job working for me ;)
|
| More seriously, when I think back to when I was first
| learning programming - in the heady days of 1985 - I would
| often copy listings out of computing magazines, make a
| mistake whilst doing it, and then have no idea what was
| wrong. The only way was to check character by character. I
| didn't have the deeper understanding yet, and so I couldn't
| contribute to solving the problem in any real way.
|
| If they're at that level as a programmer, to the point where
| their code is being written for them and they don't really
| understand it, then they're going to make some serious
| mistakes eventually.
|
| If you want to step up as a dev, understanding is key.
| Programming is hard and gets harder as you step up and bite
| off bigger and more complex problems. If you're relying on
| the tools to write your code, then your job is one step away
| from being automated. That should be enough to light a fire
| under your ambition!
| biztos wrote:
| I also typed stuff in from magazines in the 80's, and my
| fast but imperfect typing really helped me learn
| programming: I often had to stop, go back to the first
| page, and actually _read_ the damned thing in order to make
| it work.
| code4you wrote:
| Great points. Really makes me question why so many developers
| were excited / worried about programming jobs being automated
| away by this technology. I really doubt that many jobs are
| going to be displaced by what is at best an improvement to
| autocomplete/intellisense and at worst an unreliable, copyright
| infringing boilerplate generator. Also agree with point #3 - I
| could see Copilot steering devs away from new code patterns
| toward whatever was most commonly seen in the existing
| codebases it was trained on. Doesn't seem good for innovation
| in that sense.
| influx wrote:
| I get why marketing calls machine learning "AI". I don't get why
| engineers would think this is.
|
| Dumb.
| squeaky-clean wrote:
| I still consider anything with more than 3 if-statements to be
| AI. We just need more sensible expectations about what AI can
| do haha.
| SEMW wrote:
| > I don't get why engineers would think this is.
|
| This claim that "AI" only means artificial general / human-
| equivalent intelligence completely ignores the long history of
| how that term has been used, by computer science researchers,
| for the last 70-odd years, to include everything from Shannon's
| maze-solving algorithms, to Prolog-y systems, to simple
| reinforcement learning, and so on.
|
| https://web.archive.org/web/20070103222615/http://www.engagi...
|
| It's true that there has been linguistic drift in the direction
| of the definition getting narrower (to the point where it's a
| joke that some people use 'AI' to mean whatever computers can't
| do _yet_). And you can have reasons to prefer your own very-
| narrow definition. But claiming that your own definition is the
| only valid one to the point that anyone using a wider
| definition (one that has a long etymological history, and which
| remains in widespread use) are "dumb" is... not how language
| works.
| influx wrote:
| It hasn't been AI the entire time. It's borderline fraud,
| tbh.
| konfusinomicon wrote:
| it's the marketing magic bullet. each person shot is entranced
| by its promises, and given unlimited ammo to spread its lies.
| few possess armor capable of stopping them
| yepthatsreality wrote:
| Co-pilot is just lowest common denominator solutions with flashy
| tabbing.
| axiosgunnar wrote:
| I hate to be the one that says this but I think it's true:
|
| "So you are an SWE and you take a break from work to go to
| Hackernews to complain that Github's Copilot, which is an AI-
| based solution meant to help SWEs, is utter shit and completely
| unusuable.
|
| And then you go back to writing AI-based solutions for some other
| profession. Which is totally not shit or anything."
|
| Can anybody put this more elegantly?
| MajorBee wrote:
| A variation of the Gell-Mann Amnesia effect?
|
| "Briefly stated, the Gell-Mann Amnesia effect is as follows.
| You open the newspaper to an article on some subject you know
| well. In Murray's case, physics. In mine, show business. You
| read the article and see the journalist has absolutely no
| understanding of either the facts or the issues. Often, the
| article is so wrong it actually presents the story backward--
| reversing cause and effect. I call these the "wet streets cause
| rain" stories. Paper's full of them. In any case, you read with
| exasperation or amusement the multiple errors in a story, and
| then turn the page to national or international affairs, and
| read as if the rest of the newspaper was somehow more accurate
| about Palestine than the baloney you just read. You turn the
| page, and forget what you know."
|
| https://www.goodreads.com/quotes/65213-briefly-stated-the-ge...
| edgyquant wrote:
| I would do this with Reddit posts. I'd see the top comment
| under something I was familiar with and see it was full of
| holes or just incorrect but then I'd go to a post about
| something I didn't know all that well and take the top
| comment at face value.
| joe_the_user wrote:
| SWEs create AI based solutions to X 'cause people pay them.
| Entrepreneurs and investors are the one who actually think
| they're the answer to everything.
|
| Also, Copilot might (or might not) be useless or even interfere
| with real work. But it's probably low on the scale of awful
| things SWEs have helped create. The AI parole app is a thing
| that should haunt the nightmare of whoever created it, for
| example. But lots of AI apps may be useless but are probably
| also harmless so doing that might not be worst thing.
| SamBam wrote:
| "'I never thought leopards would eat MY face,' sobs woman who
| voted for the Leopards Eating People's Faces Party."
| Hamuko wrote:
| > _And then you go back to writing AI-based solutions for some
| other profession._
|
| I don't know what you're talking about, I'm a webshit
| developer.
| [deleted]
| skinkestek wrote:
| You mean like the insanely annoying AIs that replaced Google
| search? The idiotic one that files Javascript books under "Law"
| in Amazon or the insulting one who runs Ad Sense and thinks my
| wife isn't good enough and I am stupid enough to leave her for
| some mail order bride?
| donkeybeer wrote:
| Javascript books under "Law" is hilarious
| drdaeman wrote:
| I'm in for JavaScript Penal Code. Make that unwarranted
| type-coercing operator use punishable by law.
| axiosgunnar wrote:
| Instead of prison you go to callback hell
| shadilay wrote:
| Maybe the Google AI is a polygamist and thinks you ought to
| have a 2nd wife?
| marcosdumay wrote:
| There are probably good ways to apply AI to software
| development (has anybody tried to build a linter already?). It
| is this product that is very bad.
|
| The same certainly apply to other tasks.
| thinkingemote wrote:
| The most common example of this would probably be complaining
| about advertising whilst working for a business that depends on
| advertising to survive.
|
| Ultimately it's a kind of Kafkaesque trap that modern living
| has us all in to a larger or lesser extent.
| srcreigh wrote:
| That's a bit different. Advertising is like a race to the
| bottom, where everybody to survive takes part. You can do
| that meanwhile wish that it could somehow not be that way.
| Same with environmental issues.
|
| The GP comment by contrast is about hypocrisy. I personally
| found it funny that I didn't ever read about (or consider)
| copyright violations of deep learning until they tried to do
| it with code :-)
|
| Of course programmers would find the problem with AI as soon
| as it exploited _them_.
| [deleted]
| TeMPOraL wrote:
| Dunno. I go to HN because it's the one place where I can whine
| about AI being total bullshit, for the exact reasons as we're
| now complaining about wrt. Copilot.
| rpmisms wrote:
| I like tabnine. It's an autocomplete tool and doesn't pretend to
| be anything more.
| celeritascelery wrote:
| From the Copilot FAQ:
|
| > The technical preview includes filters to block offensive words
|
| And somehow their filters missed f*k? That doesn't give a lot of
| confidence in their ability filter more nuanced text. Or maybe it
| only filters truly terrible offensive words like "master".
| spoonjim wrote:
| Blocks offensive words, but doesn't block carefully crafted
| malware.
| minimaxir wrote:
| In my testing of Copilot, the content filters only work on
| _input_ , not output.
|
| Attempting to generate text from code containing "genocide"
| just has Copilot refuse to run. But you can still coerce
| Copilot to return offensive output given certain innocuous
| prompts.
| aasasd wrote:
| Maybe Github just doesn't have many repos to control death
| factories and execution squads?
| Jackson__ wrote:
| Interesting how this continues to be an issue for GPT3 based
| projects.
|
| A similar thing is happening in AI Dungeon, where certain
| words and phrases are banned to the point of suspending a
| users account if used a certain amount of times, yet they
| will happily output them when it is generated by GPT3 itself,
| and then punish the user if they fail to remove the offending
| pieces of text before continuing.
| Closi wrote:
| Ahh, so it's the most pointless interpretation of the phrase
| "filters to block offensive words", where it is stopping the
| user from causing offense to the AI rather than the other way
| around.
| verdverm wrote:
| They probably don't want to repeat Microsoft's incident
| with Tay, though they seem to have created their own
| incident which dooms the product if it wasn't already
| derefr wrote:
| I believe the concept is to stop users from prompting the
| AI to generate offensive stuff specifically, and then
| publishing the so-generated stream of offensive stuff as
| negative PR for GitHub, in the same way the generated
| stream of offensive stuff coming from Microsoft's AI was a
| big PR disaster.
| bambax wrote:
| Maybe, but even if so, filtering the output would also
| prevent this.
| stingraycharles wrote:
| I suppose you're referring to the AI Twitter bot that
| initially was very lovely and within a day 4chan had
| turned into a nazi. That was both very naive and
| hilarious.
|
| https://spectrum.ieee.org/tech-talk/artificial-
| intelligence/...
|
| The big difference in this case, however, is that this AI
| was constantly learning based on user input, however,
| which I do not think is the case for Copilot.
| raffraffraff wrote:
| Easily offended AI is exactly what the world needs
| GenerocUsername wrote:
| We have too many easily offended NPC's as is.
| krick wrote:
| Lol, how does _that_ make any sense? I mean, all these word
| blacklists are always pretty stupid, but at least you can
| usually see the motivation behind them. But in this case I 'm
| not even sure what they tried to achieve, this is absolutely
| pointless.
| throwaway2037 wrote:
| Just to be clear for other readers: Are you being sarcastic
| about the last sentence that mentions the term 'master'? I hope
| not.
|
| As I understand, this movement (for lack of a better term!)
| started in the United States, which has a long and complicated
| history of slavery. In the last few years in my various jobs
| (all outside the United States), there has been a concerted
| effort to remove and instances of "master" and "slave" and
| replace with terms like "primary" and "secondary".
|
| For co-workers not familiar with the history of slavery in the
| United States, there is always a pause, and then some confusion
| about the changes. After explaining the historical context, 99%
| of people reply: "Oh, I understand. Thank you to explain."
| andrewzah wrote:
| The word master has many usages. One specific context
| (master/slave) is inappropriate, but that doesn't mean every
| other context is unusable now.
|
| Github changing master->main was the epitome of virtue
| signaling. This literally does not affect black people at
| all, nor does it do -anything- to help with racial inequality
| in the US. It's actually quite patronizing and tone-deaf to
| think that instead of all the things -Microsoft- could be
| doing to help racial inequality, they're putting in as little
| effort as possible.
|
| Congrats on granting power over words to unreasonable people
| who ignore things like context in language and common sense.
| mdoms wrote:
| I don't work in USA and I don't intend to. Your history of
| slavery is none of my concern, especially when I'm just
| trying to do my work.
|
| The word 'master' is useful for me, and I don't believe for a
| nanosecond that anyone, American or not, is ACTUALLY offended
| by it. I believe that some people (mostly affluent white
| Americans) are searching for things that they think they
| SHOULD be offended by.
| slackfan wrote:
| And in my historical context power fists that your ideology
| used were used by a regime that murdered millions. In the
| past 100 years.
| blindmute wrote:
| > After explaining the historical context, 99% of people
| reply: "Oh, I understand. Thank you to explain."
|
| A similar percentage then think to themselves, privately,
| "well that's pretty stupid."
| Isinlor wrote:
| Why is there no push-back against using the word Slave that
| originates from word "Slav" due to enslavement of Slavic
| people?
|
| By analogy, you are basically using the word African to mean
| "a person in possession of someone else".
|
| https://www.etymonline.com/word/slave
|
| @edit The fact that people down vote this highlights that the
| whole issue is just virtue signaling.
| RicardoLuis0 wrote:
| while the word 'master' can indeed be used in the sense of
| "master and slave", its use in git is more akin to the use of
| 'master' in "master record", and doesn't refer to 'ownership'
| in any way
| [deleted]
| sseagull wrote:
| Everyone has a line of how much they are willing to change
| their language, though. There will always come a point where
| someone will think some change is "silly", even though the
| old term may have upset some people. And almost every term
| has some sort of baggage associated with it.
|
| There was a post going around somewhere of a college's
| earnest attempt and change some language (like avoiding "give
| it a shot" because of the association of "shot" with guns).
| Would renaming all the various things we call "triggers" be
| ok, so we don't upset victims of gun violence?
|
| So the master->main change was the line for some people, not
| others.
| andrewzah wrote:
| As a matter of principle I don't think we should be moving
| towards ignoring any and all contexts of words. Granting
| this power of word banning to random arbiters is quite
| crazy. In this case, master was moreso changed because it
| -could- be deemed offensive, not that it -actually- is
| offensive by itself. Not one person that I've spoken to
| about it has actually cared.
|
| Words having multiple usages is not really a novel concept.
| If we ban words based on them potentially being offensive,
| we'll end up with no words at all as people move onto using
| different words, and so forth.
|
| It is not silly to have pushback when someone wants to
| grant themselves power over language usage. Dropping usage
| of a word should have a strong, tenable argument and larger
| community support than 0.00000001% of people caring.
| LAC-Tech wrote:
| > Everyone has a line of how much they are willing to
| change their language, though.
|
| But that line is constantly moving though. People are
| forced to adapt, or they are ostracised socially and
| economically.
|
| If prestigious organisations, people and institutions
| decide "master/slave" is an immoral thing to say, I have no
| choice. Eventually I'll need to fall in line or my
| livelihood will be at risk.
| username90 wrote:
| > For co-workers not familiar with the history of slavery in
| the United States, there is always a pause, and then some
| confusion about the changes. After explaining the historical
| context, 99% of people reply: "Oh, I understand. Thank you to
| explain."
|
| Most people answer like this when they realize you are an
| unreasonable person who refuse to listen. Happens all the
| time, like "Oh, I understand (you are one of those). Thank
| you for explaining!", and remember that they need to stop
| using this word when working with you.
| bestcoder69 wrote:
| By rolling your eyes you accept my terms and conditions.
| LAC-Tech wrote:
| Imagine 'explaining' the historical context to someone from
| say, Brazil.
| pydry wrote:
| Changing master to main was something Github did when they
| were taking heat for their contract with ICE. It was a nice
| bit of misdirection that cost them nothing, achieved nothing
| and garnered praise in some quarters.
|
| ICE, of course, runs an _actual_ concentration camp which has
| a slightly more troublesome history than the word master.
|
| Language policing is to racism what recycling is to global
| warming - an attempt to shift the focus away from elite
| responsibility for systemic issues to "personal
| responsibility" and forestall meaningful reform by placing
| emphasis on largely non-threatening symbolic gestures.
| samatman wrote:
| y'know it really seems like both purpose and outcome need
| to be closely examined here, if we're going to be
| emphasizing _actual_ next to concentration camps.
|
| what's the paradigm of a concentration camp? if we go
| straight for Auschwitz we'll get nowhere, how about the
| Boer concentration camps? Origin of the term after all.
|
| What was the purpose? To _concentrate_ the Boer population
| during a total war against them, so they couldn 't supply
| and hide the belligerents.
|
| What was the outcome? Tens of thousands of preventable
| deaths, mostly from disease. Success in the war, from the
| British perspective.
|
| So, let me turn my spectacles to your example of, may I
| quote?
|
| > an _actual_ concentration camp
|
| Which appears to be a migrant detention center. To put it
| succinctly, migrants who enter the country without filling
| out paperwork, and get caught, end up in one of these
| places for months-to-years while USG figures out what to do
| with them.
|
| So a Boer concentration camp is filled by the British
| riding into a farmstead or town, kidnapping the women and
| children, and driving them out to a field and sticking them
| in a tent. A migrant detention center is filled with
| someone enters the United States without following the
| rules which govern that sort of behavior, and then, gets
| caught.
|
| Where is the war?
|
| Where is the excess death?
|
| Ah well. I'm out of time and patience to express my
| contempt for your abuse of language and disrespect for the
| real horrors which you cheapen with this kind of facile
| speech.
|
| Enjoy the 4th of July.
| bloomark wrote:
| Your vacuous argument about what is an _actual
| concentration camp_ is out of place. This wasn't a
| discussion about concentration camps, it was about
| github's attempted misdirection, and their facetious show
| of supporting inclusion, by eliminating the term
| "master".
|
| https://news.ycombinator.com/item?id=26487854
| pydry wrote:
| Is this an indirect way of saying that you support ICE?
|
| Coz if so Id really rather hear it straight rather than
| indirectly via an attempt to police my language.
| SahAssar wrote:
| I get what you mean, but in a discussion about semantics it
| might be unhelpful to dilute the term "concentration camp",
| especially if prefixed with "actual" in italics. That is
| unless you actually mean that ICE camps serve the same
| purpose and are equivalent to nazi concentration camps.
| pydry wrote:
| The Nazis ran what would more accurately be termed
| extermination camps.
|
| Though what they did certainly bore a strong resemblance
| to the Boer war concentration camps/manzanar,etc. whose
| purpose was to "concentrate" people into one place rather
| than industrially slaughter them.
| pmkiwi wrote:
| To be correct, both existed.
|
| A camp like Ravensbruck was a concentration camp (for
| women) while Auschwitz-Birkenau was both a concentration
| and extermination camp.
|
| https://upload.wikimedia.org/wikipedia/commons/b/be/WW2_H
| olo...
| SahAssar wrote:
| I don't know if I've ever heard anyone use the term
| "concentration camp" without qualifiers to refer to
| anything else than the nazi concentration camps (or
| something equivalent).
|
| Maybe it's just me, but I think it would have been more
| clear if you said internment camp if your intent was to
| refer to the broader context and not invoke a comparison
| to nazis.
| pydry wrote:
| Wikipedia redirects concentration camp to:
|
| https://en.wikipedia.org/wiki/Internment
|
| Where it also makes the point that the nazi camps were
| primarily extermination camps.
|
| Maybe take it up with them and get back to me if you feel
| truly passionate about this issue.
|
| >Maybe it's just me, but I think it would have been more
| clear
|
| Gosh, it's awfully ironic that this sentence would happen
| in a thread about how language policing is used as a
| distraction from _important_ issues.
|
| Is it more important to you how people _use_ the term
| concentration camp or the fact that ICE lock up children
| in internment /concentration/[ insert favorite word here
| ] camps?
| SahAssar wrote:
| > So, is it more important to you how people use the term
| concentration camp or the fact that ICE lock up children
| in internment/concentration/[ insert favorite word here ]
| camps?
|
| Well, that escalated quickly.
|
| I don't think I ever said anything for or against what
| ICE is doing, in fact I tried not to because the only
| thing I wanted to say was that when using the words
| "literally concentration camps" people might read that as
| "camps designed to kill people" since that is the way
| I've been taught it (in history classes) and heard it (in
| general use).
|
| I don't even live in the US so I have no say in this in a
| democratic sense. If I did I'd be against the way
| migrants are treated and want more humane treatment, but
| I don't think that should be relevant to what I said.
| pydry wrote:
| Your primary worry was that somebody _might_ read that
| sentence and believe that the US is gassing immigrants?
|
| Seems unlikely.
| SahAssar wrote:
| You seem to think I have some political motive, I don't.
| I just saw a comment that from my perspective and
| historical education seemed to equate two things that I
| regard as different and said that it might be helpful to
| not conflate those. It seems like you did not intend to
| conflate them and it is a difference in what you and I
| read into the term "actual concentration camp".
|
| From my perspective this conversation is as if someone
| said "working for XCompany is actual slavery" and I said
| "Perhaps don't use 'actual slavery' as a term for
| something that isn't that?"
| junon wrote:
| Historians themselves call what ICE is doing a
| concentration camp. So your experience is very much
| localized.
| hdhjebebeb wrote:
| It seems like a distinction without a difference, this
| article for example uses them interchangably:
| https://www.commondreams.org/views/2019/06/21/brief-
| history-...
| dragonwriter wrote:
| Nazi "concentration camps" were not actual concentration
| camps (a thing which long predates the Nazi camps), they
| were extermination camps for which "concentration camp"
| was a minimizing euphemism.
|
| US WWII "internment" and "relocation" centers were actual
| concentration camps ("relocation center" was itself a
| euphemism, but "internment" referred to a formal legal
| distinction impacting treaty obligations.)
| SahAssar wrote:
| Sure, but I don't know if I've ever heard anyone use the
| term "concentration camp" without qualifiers to refer to
| anything else than the nazi concentration camps (or
| something equivalent).
|
| If someone says that something is "_literally_ a
| concentration camp" I think that most people will think
| of ovens and genocide.
|
| Perhaps it's a regional thing, but that is how I
| interpreted it.
| [deleted]
| sombremesa wrote:
| It's not so much a regional as a political thing. Want it
| to sound worse? Use concentration camp. Want it to sound
| better? Use internment camp (or in some cases, re-
| education facility).
| michael1999 wrote:
| Or "Reserve".
| dragonwriter wrote:
| Relevant to that, the US WWII internment camps
| were...placed on land taken from (with disputedly-
| adequate compensation for the use) reservation land.
| [deleted]
| bambax wrote:
| Why is this downvoted... It's simply the truth.
| kanzenryu2 wrote:
| There were only a handful of mass extermination camps.
| There were tens of thousands of concentration camps.
| https://encyclopedia.ushmm.org/content/en/article/nazi-
| camps....
| okamiueru wrote:
| Pretty sure they were being sarcastic. I also don't find your
| arguments persuasive in the slightest, and I find myself
| being skeptical of these recent moral outcries. I'm skeptical
| of its sincerity, and I don't buy it. "Master" has an
| etmylogical background far more diverse than the dichotomy to
| "slave". I can wholeheartedly say that I've not once thought
| to make that association. It's been a title for centuries.
| Master blacksmith, etc. (See
| https://en.wikipedia.org/wiki/Master for a list)
|
| Another example of what seems like a fake moral outcry is
| "blackface". And, I mean what it is being referred to now,
| and not the actual meaning. The racist ridicule by
| stereotyping ethnicity. That was "Blackface". Yet, for some
| reason, context doesn't matter anymore, and we end up with
| removing episodes of Community because someone painted their
| face in a cosplay of an dark elf, in exact commentary of
| this.
|
| There is a significan systemic racism in the US that affects
| almost everything. In order to deal with those things, the
| very first thing would be to properly be able to identify
| racism. Context matters. Renaming "Master" branches is not
| progress. Ostracising a kid for dressing up as Michael
| Jackson isn't it.
|
| Whenever I see outrage over such things I cynically think
| that the person is probably white, and probably doing it for
| attention. One thing is for sure, it only serves to detract
| from the real issues.
| rorykoehler wrote:
| Check out the recent Marc Rebillet stream with Flying Lotus
| and Reggie Watts. They absolutely destroy the bs around the
| use of the word master. I think both FL and RW will be
| quite representative of how African Americans (and the rest
| of the world) feel about this.
| okamiueru wrote:
| Do you have a timestamp? As enjoyable as as it is to
| listen to each of them, the stream was mostly music and
| almost two hours long.
| rorykoehler wrote:
| The next couple of minutes from here
| https://youtu.be/0J8G9qNT7gQ?t=3984
| greyfox wrote:
| Very interesting that this was posted as I literally JUST watched
| an even MORE interesting youtube upload about this very bit of
| code just last weekend.
|
| Here's the very fun video if anyone wants to take a look:
|
| https://www.youtube.com/watch?v=p8u_k2LIZyo
| cblconfederate wrote:
| Clearly, swearing is the only right way to write that function
| stefan_ wrote:
| Even includes the commented out code. Clearly Copilot has gained
| a deep understanding of code and is not simply the slowest way to
| make a terrible, opaque search engine ever!
| mrfredward wrote:
| From the tweet it looks like an awesome search feature. Just
| type what you wanted to search for right inline and then it can
| drop the result in without you ever changing a window or moving
| a hand to the mouse.
|
| Problem is you don't know whose code you're stealing, which
| leads to all sorts of legal, security, and correctness issues.
| aj3 wrote:
| Does GitHub Copilot write perfect code?
|
| No. GitHub Copilot tries to understand your intent and to
| generate the best code it can, but the code it suggests may not
| always work, or even make sense. While we are working hard to
| make GitHub Copilot better, code suggested by GitHub Copilot
| should be carefully tested, reviewed, and vetted, like any
| other code. As the developer, you are always in charge.
|
| https://copilot.github.com/
|
| EDIT: the text above is a direct quote from the Copilot website
| danparsonson wrote:
| > ...may not always work, or even make sense...
|
| Naively, as someone who just heard of this - that sounds
| worse than useless. If you can't trust its output and have to
| verify every line it produces _and_ that the combination of
| those lines does what you wanted, surely it 's quicker just
| to write the code yourself?
| aj3 wrote:
| Then write the code yourself. It's not like you're forced
| to use this demo.
| danparsonson wrote:
| Well, you're right. I was somehow expecting there might
| be a silver lining I'd missed but perhaps not.
| cjaybo wrote:
| Not exactly a confidence-inspiring reply from someone who
| just identified themselves as representing the project
| here!
| aj3 wrote:
| I don't work for Github (nor MS) and do not represent
| Copilot.
| vultour wrote:
| Just today I needed to quickly load a file into a string in
| golang. I haven't done that in a while, so I had to go look
| up what package and function to use for that. I'd love a
| tool that would immediately suggest a line saying
| `ioutil.ReadFile()` after defining the function. I would
| never accept a full-function suggestion from Copilot,
| similarily to how I never copy and paste code verbatim from
| StackOverflow. Using it as hints for what you might want to
| use next seems like a nice productivity boost.
| edgyquant wrote:
| It's quite literally stealing code from repos under a GPL
| license and suggesting them to people regardless of license
| (if any) they're using. I do not see how this is legal.
| aj3 wrote:
| I disagree with this attitude. Many demos such as this one
| with Quake code are intentionally looking for (funny)
| outliers by bending the rules. But this is not how anyone
| would use the system in a real scenario (no one should
| select license by typing "// Copyright\t" and selecting
| whatever gets auto-completed), so it doesn't really
| demonstrate any new limits besides what you could
| reasonably expect anyway (and what's mentioned on the
| Copilot's landing page).
|
| Basically, in order to fall victim for this "code theft"
| (or any other "footguns" from Twitter threads) you'd need
| to be actively working against all the best practices and
| common sense. If you actually use it as a productivity tool
| (the way it is marketed) you'll remain in full control of
| your code.
| comodore_ wrote:
| funny, the youtube algo blessed me with an in dept video (~1y
| old) about this quake function yesterday.
| gumby wrote:
| stack overflow at its automated finest.
|
| Or should we call it the Tesla of software?
| FlyingSnake wrote:
| The rate at which these bots implode imply something about the
| whole AI/ML zeitgeist.
| meling wrote:
| What I would love even more than copilot helping me write code is
| a copilot to write my tests for the code I write.
| danuker wrote:
| They could train on solely MIT-licensed code, and dump ALL the
| copyright notices of code used for training into a file. Problem
| solved.
| Uehreka wrote:
| Plenty of people probably copy-paste GPL code with the comments
| and stick MIT on it. This kind of thing violates the GPL, but
| I'm pretty sure (IANAL) that such code is "fruit of the poison
| tree", and if you then copy it, you too can be held
| responsible. Sure, you might not get caught, but it's a rough
| situation if you do.
| rebolek wrote:
| Have you read MIT license? It explicitly says: The above
| copyright notice and this permission notice shall be included
| in all copies or substantial portions of the Software.
| dgellow wrote:
| Another fascinating one, an "About me" page generated by copilot
| links to a real person's Github and twittter accounts!
|
| https://twitter.com/kylpeacock/status/1410749018183933952
| bencollier49 wrote:
| That's bonkers. And the beauty of it is that now someone could
| realistically do a GDPR Erasure request on the Neural Net. I do
| hope that they're able to reverse data out.
| qayxc wrote:
| Since the information is encoded in model weights, I doubt
| that erasure is even possible. Only post-retrieval filtering
| would be an option.
|
| It only goes to show that intransparent black-box models have
| no place in the industry. The networks leak information left
| and right, because it's way too easy to just crawl the web
| and throw terabytes of unfiltered data at the training
| process.
| ohazi wrote:
| I think the fact that there's no way to delete the data in
| question without throwing away the entire model is a
| feature...
|
| The strategic goal of a GPDR erasure request would be to
| force GitHub to nuke this thing from orbit.
| bencollier49 wrote:
| > Only post-retrieval filtering would be an option.
|
| And illegal, if the original information remains.
|
| I assume that there must be a process for altering the
| training data set and rerunning the entire thing.
| gmueckl wrote:
| The problem is that the information is in an opaque
| encoding that nobody can reverse engineer today. So it's
| impossible to prove that a certain subset of data has
| been removed from the model.
|
| Say, you have a model that repeats certain PII when
| prompted in a way that I figure out. I show you the
| prompt, you retrain the model to give a different, non-
| offensive answer. But now I go and alter the prompt and
| the same PII reappears. What now?
| computerex wrote:
| Yes, but the compute costs required for training are
| probably in the range of hundreds of thousands of usd to
| potentially millions of usd. Not to mention potentially
| months of training time.
___________________________________________________________________
(page generated 2021-07-02 23:00 UTC)