[HN Gopher] Open Source Is Throwing AI Policymakers for a Loop
       ___________________________________________________________________
        
       Open Source Is Throwing AI Policymakers for a Loop
        
       Author : furcyd
       Score  : 63 points
       Date   : 2021-08-25 14:36 UTC (3 days ago)
        
 (HTM) web link (spectrum.ieee.org)
 (TXT) w3m dump (spectrum.ieee.org)
        
       | sandworm101 wrote:
       | I have a guy working for me who strongly believes that open
       | source is less secure because "hackers" have access to source
       | code for finding weeknesses. He is wrong, but that attitude is
       | not rare in some policy circles. At first glance someone with
       | little understanding of technology (politicians) can be easily
       | convinced of this idea.
        
       | qchris wrote:
       | Shameless plug: I wrote a pair of articles for The Gradient that
       | explores much of this area as well, called "Machine Learning,
       | Ethics, and Open Source Licensing"[1][2]. I'd like to think those
       | have a bit more depth and nuance.
       | 
       | Long story short, while I don't personally find this to be a
       | particularly well-written article, the author's not entirely
       | wrong. Open source machine learning software _and_ models, in
       | particular, are presenting a number of new challenges in a
       | variety of areas, including regulation, that many of our existing
       | systems are poorly equipped to grapple with.
       | 
       | [1] https://thegradient.pub/machine-learning-ethics-and-open-
       | sou...
       | 
       | [2] https://thegradient.pub/machine-learning-ethics-and-open-
       | sou...
        
       | darepublic wrote:
       | Yeah I don't want any blackbox model making decisions that decide
       | people's fate, whatever it's biases may be.
        
       | ScoobleDoodle wrote:
       | > If an open-source algorithm is flawed, it is harder to undo the
       | damage than if the software came from one proprietary--and
       | accountable--company.
       | 
       | Both the companies implementing and using the system need to be
       | responsible and accountable. Whether or not the software is open
       | source. In their example of AI being used to prescribe
       | sentencing, both the judicial district using the AI need to be
       | responsible and accountable as well as the vendor selling,
       | implementing, and operating the system.
       | 
       | The software being open source at least gives a chance for
       | external eyes to evaluate a portion of the system (given that the
       | data set might be private.) With proprietary software we have to
       | rely on the honesty of companies and voluntary transparency or
       | any whistle blowers which may come forward. In the example
       | criminal sentencing example both the judicial district might be
       | incompetent or biased and the vendor could be incompetent or
       | greedy and not communicate system biases since it may hurt their
       | brand or contract or growth.
       | 
       | It seems very clear to me that openness and transparency is key
       | to combating bias or manipulation. And open source foists a
       | minimum of openness into this.
        
       | hectormalot wrote:
       | Terrible piece. It makes me wonder if this is a 'submarine' from
       | a proprietary AI provider that wants to push the narrative that
       | open source AI tools need stronger regulation vs proprietary AI
       | tools.
       | 
       | "The scary part is how easy open source is to use", and "a
       | mistake in open source is much more difficult to correct then an
       | algorithm from a - accountable - company." Shows how little the
       | author understands of this field. It won't be tensorflow that's
       | biased against minorities in resume selection, more likely the
       | data was biased.
        
         | phkahler wrote:
         | Yeah, they conflate the software part of AI with the trained
         | systems. They specifically mention introducing bias into open
         | source code.
        
         | jhgb wrote:
         | > "The scary part is how easy open source is to use"
         | 
         | That sounds like a great advertisement for FLOSS, really.
        
         | garrinm wrote:
         | In fact I see the opposite situation: a closed source algorithm
         | will be protected as a trade secret. So identifying a bias
         | might be harder. And correcting it anyway will still require
         | them to ship a new version.
         | 
         | Meanwhile in open source there could be many users raising
         | issues if a bias is suspected. Many people working to fix it
         | regardless of other priorities. And then an update will be
         | pushed.
        
           | AlanYx wrote:
           | Under the proposed EU AI regulation, there's actually a
           | requirement for source code to be provided to the regulators
           | to review. Presumably it would be subject to an NDA to
           | address trade secrecy concerns.
        
             | mattlondon wrote:
             | It is the dataset used for training that is often at fault,
             | not the source code of the model.
             | 
             | I don't know if the EU regulations require access to the
             | data too, but that could be problematic in a practical
             | sense ( e.g. do they deliver terabytes of raw data to the
             | regulators?) and there may be privacy issues with companies
             | sharing their data with regulators
        
               | datastoat wrote:
               | > It is the dataset used for training that is often at
               | fault, not the source code of the model
               | 
               | This remark is absolutely spot on. This is why I don't
               | like the language around 'algorithmic bias' and
               | 'algorithmic accountability' -- it puts too much
               | attention on The Algorithm, which lets big tech companies
               | deflect us from scrutiny of their training data.
               | 
               | Your comment about the conflict with privacy is also spot
               | on. Here's a paper (shameless plug!) about it: _Show Us
               | the Data: Privacy, Explainability, and Why the Law Can 't
               | Have Both_ [1].
               | 
               | What's interesting about the GDPR is that it's not purely
               | about what regulators do: it's legislation that grants
               | rights to people, rights that can be litigated in court.
               | If I as a GDPR data subject demand an explanation of an
               | ML decision about me, as I'm entitled to do, my lawyer
               | could argue "the source code isn't an adequate
               | explanation, we want to see the training data too". Then
               | it'd be up to the court to make a ruling about my
               | lawyer's request, not up to the regulator. The paper [1],
               | co-authored by a data scientist (myself) and a lawyer,
               | thinks through the privacy / explainability conflict as
               | it might play out _in court_.
               | 
               | [1] https://www.gwlr.org/show-us-the-data/
        
         | throwjd764glug wrote:
         | Yes, proprietary systems are _so_ accountable. It only took
         | "the prosecution and conviction of hundreds of [innocent] sub-
         | postmasters for alleged theft, false accounting and/or fraud,
         | resulting in imprisonment, loss of reputation and livelihood,
         | bankruptcy, divorce, and even suicide amongst those involved"
         | before the UK Post Office's proprietary Horizon system was
         | found to be the cause.
         | 
         | So far no-one responsible for the computer system or the
         | prosecution has faced any repercussions.
         | 
         | https://en.wikipedia.org/wiki/British_Post_Office_scandal
        
         | AlanYx wrote:
         | The critical element that's missing from the article--but which
         | likely motivated the article--is that the EU's new AI
         | regulation proposal _mandates compliance all along the supply
         | chain_ (for regulated applications of AI).
         | 
         | In practice, this will likely mean that using open source for
         | these AI applications will become a lot more difficult in a
         | legal sense, perhaps impossible, because there's no single
         | entity that can speak to compliance of open source components
         | (vetting each commit, etc.). Perhaps some orgs will emerge who
         | are willing to monitor and oversee open source software and
         | certify that it meets the requirements of the legislation, but
         | it's unclear how that would work.
         | 
         | I'm not endorsing this model, just explaining the concern. I
         | don't think the tech community really fully appreciates how
         | misguided a lot of the current approaches to AI regulation
         | among policymakers are right now. (Not just this, but
         | initiatives like the current CAHAI work towards a binding
         | international law instrument on AI.)
         | 
         | And yes, like the GDPR, the proposed EU AI regulation has an
         | extraterritorial effect. Under the proposed AI regulation, if
         | you're selling into the EU or to any EU customers but don't
         | have an establishment in the EU, you have to actually appoint a
         | representative to comply with these requirements.
        
           | adolph wrote:
           | Securing supply chain is an important effort even outside of
           | regulation. Here is a podcast about some recent open source
           | efforts:
           | 
           | Kubernetes Podcast 155: Software Supply Chain Security, with
           | Priya Wadhwa
           | 
           | https://kubernetespodcast.com/episode/155-software-supply-
           | ch...
        
           | riedel wrote:
           | The AI act is really a strange effort to regulate technology
           | horizontally without even being able to define it's scope.
           | Many who have read it believe it ends up regulating all
           | computer logic in critical infrastructure (while making this
           | definition wide as well) . I have tried to comment on this
           | through various channels (local government/associations),
           | because it seems a really strange "horizontal" regulation
           | that could have huge side effects. I hope more people will
           | point out the problems of the current proposal. I am really a
           | proponent of both the GDPR and AI ethics for most parts but
           | this one is really going too far .
        
             | xg15 wrote:
             | Well, for me it's enough if it starts to regulate whether
             | I'm approved for this credit or not, or whether I should be
             | considered suspect for a crime.
             | 
             | I'm honestly worried that the greatest innovation in
             | computing of the last decade seems to be inventing programs
             | where literally no one understands how they work.
        
             | hectormalot wrote:
             | Current drafts have a much too wide definition of AI
             | (basically any predictive function) but a fairly narrow
             | scope of what high risk applications are (listed in an
             | annex). For those applications in scope, the birder seems
             | extraordinarily high. E.g. that data sets should be "free
             | of errors"
             | 
             | I think things will improve before the final act. I have
             | contributed to some of the commentary towards the regulator
             | on this, and a lot of the flaws mentioned are now well
             | recognized.
             | 
             | Disclaimer: opinions above are my own.
        
       | TehCorwiz wrote:
       | "Open source math is throwing financial policymakers for a loop"
        
       | webmaven wrote:
       | All the comments saying "this misses the point, it isn't the
       | software, it's the data" are missing the point themselves:
       | trained models are being released and informally used as software
       | components.
       | 
       | The consequence of datasets being used for training (and trained
       | models being incorporated into software) typically via a line of
       | curl or wget in some setup code, is to leave the dependency,
       | essentially, undeclared.
       | 
       | Yes, any biases in those models are more likely to be uncovered
       | because of their public availability, and yes, the biases
       | uncovered will probably be due to biases in the data, but right
       | now we don't really have (or aren't using) processes, tooling, or
       | infrastructure to do the necessary versioning and tracing of data
       | or model dependencies.
       | 
       | In fact, when some researcher discovers a bias in a model and
       | publishes it, most often no new version of the model or the data
       | it is based on is released.
       | 
       | If by chance the biases _are_ addressed and new versions of a
       | particular model and dataset _are_ released, all the other models
       | based on the same data are left unaffected. And software that
       | incorporates the biased models is unaffected by their re-release
       | as well. Nothing will ever trigger those models to be marked as
       | biased, retrained, and re-released, and typically nothing marks
       | the dependent software as vulnerable to the bias either.
        
       | imgabe wrote:
       | Let's see, people who bothered to learn how to program might not
       | understand the problems with the software libraries they use.
       | 
       | Ok, maybe, but now we're supposed to believe that people who know
       | absolutely nothing about how computers work are in a better
       | position to understand their limitations and regulate them?
        
       | toploader wrote:
       | It always strikes me that the arguments for oversight on AI are
       | because people's conceptions of how-things-oughtta-be are
       | different than actual patterns in the wild.
       | 
       | There should be a name for that tension.
        
         | HPsquared wrote:
         | Cognitive dissonance?
        
         | pengstrom wrote:
         | The human condition?
        
       | hau wrote:
       | My summary of claims: Opensource AI is too dangerous and can't be
       | left as it is. It's scary that everyone can just create AI and we
       | should make laws regarding this glaring issue. Proprietary AI is
       | more accountable, transparent and easier to fix than opensource
       | AI. Opensource AI is more bias-prone by it's nature.
       | 
       | No arguments presented. It's just that. Everyone can change
       | opensource AI and add biases. Unlike companies, who are
       | accountable and will easily fix it if something goes wrong. Tech
       | is too powerful for simpletons to wield, we should restrict it.
        
         | ReactiveJelly wrote:
         | > Everyone can change opensource AI and add biases.
         | 
         | I remember seeing this misconception about Linux probably 15
         | years ago.
         | 
         | "How can you trust it? Anyone can change it!"
         | 
         | It's not like Wikipedia, dad. There are people deciding what
         | ends up in the soup.
         | 
         | Wikipedia isn't even like Wikipedia - They enforce IP blocks,
         | most of the pages that are vulnerable to vandalism are
         | protected, and there are people watching pages they care about
         | so they'll be notified if they change.
        
         | akersten wrote:
         | Shit like this boils my blood. Policymakers should focus on
         | writing laws that actually help people instead of muddling
         | around with technology they don't understand. If anyone doesn't
         | understand just how ridiculous a proposal to regulate "AI" is,
         | pull back the veil a tiny bit on that summary to see what the
         | author is really saying:
         | 
         | > Opensource math is too dangerous and can't be left as it is.
         | It's scary that everyone can just create math and we should
         | make laws regarding this glaring issue. Proprietary math is
         | more accountable, transparent and easier to fix than opensource
         | math. Opensource math is more bias-prone by it's nature.
         | 
         | Plainly nonsense and I hope this worm doesn't get anywhere near
         | a lawmaker's ear.
        
         | xg15 wrote:
         | Ok, then please explain to me who exactly will vet open-source
         | AI projects.
        
         | harles wrote:
         | I wonder how much of this has been pushed along by incumbents
         | in the space - they nearly always benefit from regulation as
         | smaller companies suffer the most.
        
       | sva_ wrote:
       | > "One of the scary parts of open-source AI is how intensely easy
       | it is to use," he says. "The barrier is so low... that almost
       | anyone who has a programming background can figure out how to do
       | it, even if they don't understand, really, what they're doing."
       | 
       | I don't see whats so scary about that, to be honest. Some random
       | person who is messing with ML and doesn't know what they're doing
       | probably isn't in a position to use this in a way that would be
       | harmful to people.
       | 
       | The example of discrimination given in the article would only
       | really apply to corporations and government, who would likely use
       | proprietary datasets (and perhaps implementations.) So I don't
       | understand why this piece takes aim at open source. Seems almost
       | contradictory.
        
         | jhgb wrote:
         | > even if they don't understand, really, what they're doing
         | 
         | ...which includes full-time ML people. So I'm not sure there's
         | a lot of difference.
        
       | civilized wrote:
       | Agree with the criticism of this piece so far, but what really
       | makes this piece laughable is the fact that _the vast majority of
       | proprietary AI is just open source AI that 's been hidden behind
       | an organization's firewall._ So, this piece is arguing that the
       | problem with open source AI is that people can see it. When it's
       | hidden behind a firewall, it will magically become more
       | transparent and accountable.
       | 
       | Makes perfect sense, right?
       | 
       | And while we're at it, the piece stinks of casual misinformation
       | right from the first paragraph. AI isn't "curing disease", this
       | is just nonsense hype. Why would I listen to the author if he
       | can't get through his short introductory paragraph without
       | spouting nonsense?
       | 
       | Par for the course for IEEE, which has always come off to me as a
       | very self-aggrandizing, bullshitty organization.
        
       | mark_l_watson wrote:
       | I have to criticize this article for confusing the underlying
       | software for building machine learning models vs. the models
       | themselves. Open Source contributors to PyTorch, TensorFlow, etc.
       | are unlikely to affect bias in data (and thus in trained models).
       | Can anyone here think of reasons why this is not so?
        
       | b0tzzzzzzman wrote:
       | Can we please make laws that stop people from calling everything
       | AI! We do not have AI. We have processes for looking at data
       | which scale very well, why are not self aware, they are not
       | making connections without being told to do so.
        
       | jgalt212 wrote:
       | This is only a problem if you're Eric Holder and prefer to indict
       | corporations rather than the employees or officers of such
       | corporations.
       | 
       | https://www.goodreads.com/book/show/34397551-the-chickenshit...
        
       | simonw wrote:
       | This piece was baffling. It seems to be conflating the issue of
       | AI bias with the concept of open source software development.
       | 
       | "If an open-source algorithm is flawed, it is harder to undo the
       | damage than if the software came from one proprietary--and
       | accountable--company."
        
       | thomashop wrote:
       | What a strange article. There is almost no substance and it
       | doesn't deliver any convincing arguments why policymakers should
       | focus on open source.
       | 
       | Intuitively I would say policy should focus more on
       | corporationsas open source is publicly auditable.
        
         | patmorgan23 wrote:
         | It's FUD, probably trying to influence policy markers/the
         | unknowledgeable public in to supporting policies that favor
         | large established firms.
        
       | twelvechairs wrote:
       | The question not answered here is "why should policy care if AI
       | is open source or not?"
        
         | Ftuuky wrote:
         | Because they want to control it. Can't let the peasants obtain
         | capabilities once reserved only to the military and big
         | governmental agencies.
        
           | revolvingocelot wrote:
           | Don't forget the inherent value of being able to point to a
           | black box and say "that's why" when questioned about some
           | questionable process. The ability for anyone to vet the
           | source code is a weakness. It exposes attack surface for the
           | forces of Responsibility trying to peer into the black box.
           | 
           | "Computer says no" is an incredible force multiplier for
           | Authority, and anything that threatens the truthiness of the
           | black box's output will be mobilized against.
           | 
           | Unaccountable security-through-obscurity is actually a bonus
           | for Authority, too. Look at the whole Signal-Cellbrite [0]
           | thing: Moxie found all these vulnerabilities, some going back
           | years. What are the odds that an avatar of Authority('s
           | technically-minded underling) found one or more of these
           | Cellbrite exploits and used them as tools against enemies, or
           | to exonerate friends? "Make sure you scan _this_ cellphone
           | _first_. " Would any of that be possible if Cellbrite was
           | legally mandated to be built from remote, publicly-viewable
           | source every time they wanted to use it? Or, if you want to
           | be less conspiratorial, how many vulnerabilities would Moxie
           | have been able to find if the source had already been public
           | (and subject to pull requests) for years?
           | 
           | [0] https://signal.org/blog/cellebrite-vulnerabilities/
        
         | wilde wrote:
         | Right? This is good! It means they're focusing on outcomes
         | rather than prescribing implementations.
         | 
         | The article just linked to the actual text of the bill but this
         | summary seemed reasonable.
         | 
         | https://www.lawfareblog.com/artificial-intelligence-act-what...
        
         | Proven wrote:
         | Because there's no one to fine and coerce, bot.
        
         | keskival wrote:
         | Open source software can be inspected and verified against
         | whatever policy criteria, closed source software cannot.
        
           | mistrial9 wrote:
           | the data sets are the difference, can they be "inspected"? ..
           | can the models ?
        
       ___________________________________________________________________
       (page generated 2021-08-28 23:02 UTC)