[HN Gopher] Open Source Is Throwing AI Policymakers for a Loop
___________________________________________________________________
Open Source Is Throwing AI Policymakers for a Loop
Author : furcyd
Score : 63 points
Date : 2021-08-25 14:36 UTC (3 days ago)
(HTM) web link (spectrum.ieee.org)
(TXT) w3m dump (spectrum.ieee.org)
| sandworm101 wrote:
| I have a guy working for me who strongly believes that open
| source is less secure because "hackers" have access to source
| code for finding weeknesses. He is wrong, but that attitude is
| not rare in some policy circles. At first glance someone with
| little understanding of technology (politicians) can be easily
| convinced of this idea.
| qchris wrote:
| Shameless plug: I wrote a pair of articles for The Gradient that
| explores much of this area as well, called "Machine Learning,
| Ethics, and Open Source Licensing"[1][2]. I'd like to think those
| have a bit more depth and nuance.
|
| Long story short, while I don't personally find this to be a
| particularly well-written article, the author's not entirely
| wrong. Open source machine learning software _and_ models, in
| particular, are presenting a number of new challenges in a
| variety of areas, including regulation, that many of our existing
| systems are poorly equipped to grapple with.
|
| [1] https://thegradient.pub/machine-learning-ethics-and-open-
| sou...
|
| [2] https://thegradient.pub/machine-learning-ethics-and-open-
| sou...
| darepublic wrote:
| Yeah I don't want any blackbox model making decisions that decide
| people's fate, whatever it's biases may be.
| ScoobleDoodle wrote:
| > If an open-source algorithm is flawed, it is harder to undo the
| damage than if the software came from one proprietary--and
| accountable--company.
|
| Both the companies implementing and using the system need to be
| responsible and accountable. Whether or not the software is open
| source. In their example of AI being used to prescribe
| sentencing, both the judicial district using the AI need to be
| responsible and accountable as well as the vendor selling,
| implementing, and operating the system.
|
| The software being open source at least gives a chance for
| external eyes to evaluate a portion of the system (given that the
| data set might be private.) With proprietary software we have to
| rely on the honesty of companies and voluntary transparency or
| any whistle blowers which may come forward. In the example
| criminal sentencing example both the judicial district might be
| incompetent or biased and the vendor could be incompetent or
| greedy and not communicate system biases since it may hurt their
| brand or contract or growth.
|
| It seems very clear to me that openness and transparency is key
| to combating bias or manipulation. And open source foists a
| minimum of openness into this.
| hectormalot wrote:
| Terrible piece. It makes me wonder if this is a 'submarine' from
| a proprietary AI provider that wants to push the narrative that
| open source AI tools need stronger regulation vs proprietary AI
| tools.
|
| "The scary part is how easy open source is to use", and "a
| mistake in open source is much more difficult to correct then an
| algorithm from a - accountable - company." Shows how little the
| author understands of this field. It won't be tensorflow that's
| biased against minorities in resume selection, more likely the
| data was biased.
| phkahler wrote:
| Yeah, they conflate the software part of AI with the trained
| systems. They specifically mention introducing bias into open
| source code.
| jhgb wrote:
| > "The scary part is how easy open source is to use"
|
| That sounds like a great advertisement for FLOSS, really.
| garrinm wrote:
| In fact I see the opposite situation: a closed source algorithm
| will be protected as a trade secret. So identifying a bias
| might be harder. And correcting it anyway will still require
| them to ship a new version.
|
| Meanwhile in open source there could be many users raising
| issues if a bias is suspected. Many people working to fix it
| regardless of other priorities. And then an update will be
| pushed.
| AlanYx wrote:
| Under the proposed EU AI regulation, there's actually a
| requirement for source code to be provided to the regulators
| to review. Presumably it would be subject to an NDA to
| address trade secrecy concerns.
| mattlondon wrote:
| It is the dataset used for training that is often at fault,
| not the source code of the model.
|
| I don't know if the EU regulations require access to the
| data too, but that could be problematic in a practical
| sense ( e.g. do they deliver terabytes of raw data to the
| regulators?) and there may be privacy issues with companies
| sharing their data with regulators
| datastoat wrote:
| > It is the dataset used for training that is often at
| fault, not the source code of the model
|
| This remark is absolutely spot on. This is why I don't
| like the language around 'algorithmic bias' and
| 'algorithmic accountability' -- it puts too much
| attention on The Algorithm, which lets big tech companies
| deflect us from scrutiny of their training data.
|
| Your comment about the conflict with privacy is also spot
| on. Here's a paper (shameless plug!) about it: _Show Us
| the Data: Privacy, Explainability, and Why the Law Can 't
| Have Both_ [1].
|
| What's interesting about the GDPR is that it's not purely
| about what regulators do: it's legislation that grants
| rights to people, rights that can be litigated in court.
| If I as a GDPR data subject demand an explanation of an
| ML decision about me, as I'm entitled to do, my lawyer
| could argue "the source code isn't an adequate
| explanation, we want to see the training data too". Then
| it'd be up to the court to make a ruling about my
| lawyer's request, not up to the regulator. The paper [1],
| co-authored by a data scientist (myself) and a lawyer,
| thinks through the privacy / explainability conflict as
| it might play out _in court_.
|
| [1] https://www.gwlr.org/show-us-the-data/
| throwjd764glug wrote:
| Yes, proprietary systems are _so_ accountable. It only took
| "the prosecution and conviction of hundreds of [innocent] sub-
| postmasters for alleged theft, false accounting and/or fraud,
| resulting in imprisonment, loss of reputation and livelihood,
| bankruptcy, divorce, and even suicide amongst those involved"
| before the UK Post Office's proprietary Horizon system was
| found to be the cause.
|
| So far no-one responsible for the computer system or the
| prosecution has faced any repercussions.
|
| https://en.wikipedia.org/wiki/British_Post_Office_scandal
| AlanYx wrote:
| The critical element that's missing from the article--but which
| likely motivated the article--is that the EU's new AI
| regulation proposal _mandates compliance all along the supply
| chain_ (for regulated applications of AI).
|
| In practice, this will likely mean that using open source for
| these AI applications will become a lot more difficult in a
| legal sense, perhaps impossible, because there's no single
| entity that can speak to compliance of open source components
| (vetting each commit, etc.). Perhaps some orgs will emerge who
| are willing to monitor and oversee open source software and
| certify that it meets the requirements of the legislation, but
| it's unclear how that would work.
|
| I'm not endorsing this model, just explaining the concern. I
| don't think the tech community really fully appreciates how
| misguided a lot of the current approaches to AI regulation
| among policymakers are right now. (Not just this, but
| initiatives like the current CAHAI work towards a binding
| international law instrument on AI.)
|
| And yes, like the GDPR, the proposed EU AI regulation has an
| extraterritorial effect. Under the proposed AI regulation, if
| you're selling into the EU or to any EU customers but don't
| have an establishment in the EU, you have to actually appoint a
| representative to comply with these requirements.
| adolph wrote:
| Securing supply chain is an important effort even outside of
| regulation. Here is a podcast about some recent open source
| efforts:
|
| Kubernetes Podcast 155: Software Supply Chain Security, with
| Priya Wadhwa
|
| https://kubernetespodcast.com/episode/155-software-supply-
| ch...
| riedel wrote:
| The AI act is really a strange effort to regulate technology
| horizontally without even being able to define it's scope.
| Many who have read it believe it ends up regulating all
| computer logic in critical infrastructure (while making this
| definition wide as well) . I have tried to comment on this
| through various channels (local government/associations),
| because it seems a really strange "horizontal" regulation
| that could have huge side effects. I hope more people will
| point out the problems of the current proposal. I am really a
| proponent of both the GDPR and AI ethics for most parts but
| this one is really going too far .
| xg15 wrote:
| Well, for me it's enough if it starts to regulate whether
| I'm approved for this credit or not, or whether I should be
| considered suspect for a crime.
|
| I'm honestly worried that the greatest innovation in
| computing of the last decade seems to be inventing programs
| where literally no one understands how they work.
| hectormalot wrote:
| Current drafts have a much too wide definition of AI
| (basically any predictive function) but a fairly narrow
| scope of what high risk applications are (listed in an
| annex). For those applications in scope, the birder seems
| extraordinarily high. E.g. that data sets should be "free
| of errors"
|
| I think things will improve before the final act. I have
| contributed to some of the commentary towards the regulator
| on this, and a lot of the flaws mentioned are now well
| recognized.
|
| Disclaimer: opinions above are my own.
| TehCorwiz wrote:
| "Open source math is throwing financial policymakers for a loop"
| webmaven wrote:
| All the comments saying "this misses the point, it isn't the
| software, it's the data" are missing the point themselves:
| trained models are being released and informally used as software
| components.
|
| The consequence of datasets being used for training (and trained
| models being incorporated into software) typically via a line of
| curl or wget in some setup code, is to leave the dependency,
| essentially, undeclared.
|
| Yes, any biases in those models are more likely to be uncovered
| because of their public availability, and yes, the biases
| uncovered will probably be due to biases in the data, but right
| now we don't really have (or aren't using) processes, tooling, or
| infrastructure to do the necessary versioning and tracing of data
| or model dependencies.
|
| In fact, when some researcher discovers a bias in a model and
| publishes it, most often no new version of the model or the data
| it is based on is released.
|
| If by chance the biases _are_ addressed and new versions of a
| particular model and dataset _are_ released, all the other models
| based on the same data are left unaffected. And software that
| incorporates the biased models is unaffected by their re-release
| as well. Nothing will ever trigger those models to be marked as
| biased, retrained, and re-released, and typically nothing marks
| the dependent software as vulnerable to the bias either.
| imgabe wrote:
| Let's see, people who bothered to learn how to program might not
| understand the problems with the software libraries they use.
|
| Ok, maybe, but now we're supposed to believe that people who know
| absolutely nothing about how computers work are in a better
| position to understand their limitations and regulate them?
| toploader wrote:
| It always strikes me that the arguments for oversight on AI are
| because people's conceptions of how-things-oughtta-be are
| different than actual patterns in the wild.
|
| There should be a name for that tension.
| HPsquared wrote:
| Cognitive dissonance?
| pengstrom wrote:
| The human condition?
| hau wrote:
| My summary of claims: Opensource AI is too dangerous and can't be
| left as it is. It's scary that everyone can just create AI and we
| should make laws regarding this glaring issue. Proprietary AI is
| more accountable, transparent and easier to fix than opensource
| AI. Opensource AI is more bias-prone by it's nature.
|
| No arguments presented. It's just that. Everyone can change
| opensource AI and add biases. Unlike companies, who are
| accountable and will easily fix it if something goes wrong. Tech
| is too powerful for simpletons to wield, we should restrict it.
| ReactiveJelly wrote:
| > Everyone can change opensource AI and add biases.
|
| I remember seeing this misconception about Linux probably 15
| years ago.
|
| "How can you trust it? Anyone can change it!"
|
| It's not like Wikipedia, dad. There are people deciding what
| ends up in the soup.
|
| Wikipedia isn't even like Wikipedia - They enforce IP blocks,
| most of the pages that are vulnerable to vandalism are
| protected, and there are people watching pages they care about
| so they'll be notified if they change.
| akersten wrote:
| Shit like this boils my blood. Policymakers should focus on
| writing laws that actually help people instead of muddling
| around with technology they don't understand. If anyone doesn't
| understand just how ridiculous a proposal to regulate "AI" is,
| pull back the veil a tiny bit on that summary to see what the
| author is really saying:
|
| > Opensource math is too dangerous and can't be left as it is.
| It's scary that everyone can just create math and we should
| make laws regarding this glaring issue. Proprietary math is
| more accountable, transparent and easier to fix than opensource
| math. Opensource math is more bias-prone by it's nature.
|
| Plainly nonsense and I hope this worm doesn't get anywhere near
| a lawmaker's ear.
| xg15 wrote:
| Ok, then please explain to me who exactly will vet open-source
| AI projects.
| harles wrote:
| I wonder how much of this has been pushed along by incumbents
| in the space - they nearly always benefit from regulation as
| smaller companies suffer the most.
| sva_ wrote:
| > "One of the scary parts of open-source AI is how intensely easy
| it is to use," he says. "The barrier is so low... that almost
| anyone who has a programming background can figure out how to do
| it, even if they don't understand, really, what they're doing."
|
| I don't see whats so scary about that, to be honest. Some random
| person who is messing with ML and doesn't know what they're doing
| probably isn't in a position to use this in a way that would be
| harmful to people.
|
| The example of discrimination given in the article would only
| really apply to corporations and government, who would likely use
| proprietary datasets (and perhaps implementations.) So I don't
| understand why this piece takes aim at open source. Seems almost
| contradictory.
| jhgb wrote:
| > even if they don't understand, really, what they're doing
|
| ...which includes full-time ML people. So I'm not sure there's
| a lot of difference.
| civilized wrote:
| Agree with the criticism of this piece so far, but what really
| makes this piece laughable is the fact that _the vast majority of
| proprietary AI is just open source AI that 's been hidden behind
| an organization's firewall._ So, this piece is arguing that the
| problem with open source AI is that people can see it. When it's
| hidden behind a firewall, it will magically become more
| transparent and accountable.
|
| Makes perfect sense, right?
|
| And while we're at it, the piece stinks of casual misinformation
| right from the first paragraph. AI isn't "curing disease", this
| is just nonsense hype. Why would I listen to the author if he
| can't get through his short introductory paragraph without
| spouting nonsense?
|
| Par for the course for IEEE, which has always come off to me as a
| very self-aggrandizing, bullshitty organization.
| mark_l_watson wrote:
| I have to criticize this article for confusing the underlying
| software for building machine learning models vs. the models
| themselves. Open Source contributors to PyTorch, TensorFlow, etc.
| are unlikely to affect bias in data (and thus in trained models).
| Can anyone here think of reasons why this is not so?
| b0tzzzzzzman wrote:
| Can we please make laws that stop people from calling everything
| AI! We do not have AI. We have processes for looking at data
| which scale very well, why are not self aware, they are not
| making connections without being told to do so.
| jgalt212 wrote:
| This is only a problem if you're Eric Holder and prefer to indict
| corporations rather than the employees or officers of such
| corporations.
|
| https://www.goodreads.com/book/show/34397551-the-chickenshit...
| simonw wrote:
| This piece was baffling. It seems to be conflating the issue of
| AI bias with the concept of open source software development.
|
| "If an open-source algorithm is flawed, it is harder to undo the
| damage than if the software came from one proprietary--and
| accountable--company."
| thomashop wrote:
| What a strange article. There is almost no substance and it
| doesn't deliver any convincing arguments why policymakers should
| focus on open source.
|
| Intuitively I would say policy should focus more on
| corporationsas open source is publicly auditable.
| patmorgan23 wrote:
| It's FUD, probably trying to influence policy markers/the
| unknowledgeable public in to supporting policies that favor
| large established firms.
| twelvechairs wrote:
| The question not answered here is "why should policy care if AI
| is open source or not?"
| Ftuuky wrote:
| Because they want to control it. Can't let the peasants obtain
| capabilities once reserved only to the military and big
| governmental agencies.
| revolvingocelot wrote:
| Don't forget the inherent value of being able to point to a
| black box and say "that's why" when questioned about some
| questionable process. The ability for anyone to vet the
| source code is a weakness. It exposes attack surface for the
| forces of Responsibility trying to peer into the black box.
|
| "Computer says no" is an incredible force multiplier for
| Authority, and anything that threatens the truthiness of the
| black box's output will be mobilized against.
|
| Unaccountable security-through-obscurity is actually a bonus
| for Authority, too. Look at the whole Signal-Cellbrite [0]
| thing: Moxie found all these vulnerabilities, some going back
| years. What are the odds that an avatar of Authority('s
| technically-minded underling) found one or more of these
| Cellbrite exploits and used them as tools against enemies, or
| to exonerate friends? "Make sure you scan _this_ cellphone
| _first_. " Would any of that be possible if Cellbrite was
| legally mandated to be built from remote, publicly-viewable
| source every time they wanted to use it? Or, if you want to
| be less conspiratorial, how many vulnerabilities would Moxie
| have been able to find if the source had already been public
| (and subject to pull requests) for years?
|
| [0] https://signal.org/blog/cellebrite-vulnerabilities/
| wilde wrote:
| Right? This is good! It means they're focusing on outcomes
| rather than prescribing implementations.
|
| The article just linked to the actual text of the bill but this
| summary seemed reasonable.
|
| https://www.lawfareblog.com/artificial-intelligence-act-what...
| Proven wrote:
| Because there's no one to fine and coerce, bot.
| keskival wrote:
| Open source software can be inspected and verified against
| whatever policy criteria, closed source software cannot.
| mistrial9 wrote:
| the data sets are the difference, can they be "inspected"? ..
| can the models ?
___________________________________________________________________
(page generated 2021-08-28 23:02 UTC)