[HN Gopher] Moving beyond "algorithmic bias is a data problem"
___________________________________________________________________
Moving beyond "algorithmic bias is a data problem"
Author : mwexler
Score : 60 points
Date : 2021-08-19 12:41 UTC (1 days ago)
(HTM) web link (www.cell.com)
(TXT) w3m dump (www.cell.com)
| janto wrote:
| i.e. the ball is not under this cup.
| blackbear_ wrote:
| Nah, I don't buy this. Minimizing test error on a socially
| unbiased dataset will always give a socially unbiased model.
| Ergo, algorithmic bias does not exist. End of story.
| macleginn wrote:
| "A key takeaway is that our algorithms are not impartial." Of
| course they are, they just underperform on the data points from
| the long tail of the empirical distribution, as show basically
| all real-world examples from the paper. Underfitting on the
| samples from the long tale (aka blinding the model to ethically
| bad features) removes bias sensu strictu but increases variance,
| which leads to more errors, which is perceived as "bias" (did
| someone say catch 22?).
| Barrin92 wrote:
| > Of course they are
|
| there are plenty of cases where the 'of course' does not apply.
|
| In computer vision for example models generally have a bias
| towards scale invariance but struggle with rotational
| invariance. This is not a problem of 'bad data', but of the
| kind of features that the architecture is prone to extract and
| represent while it struggles with others. That's why there is
| an entire zoo of different ML architectures and systems,
| because there is no magical uniform, algorithm that performs
| equally well in every domain. CNN's excel at spatial data,
| while RNNs perform better on sequential data, and so on.
|
| Of course one can attempt to reframe this as a 'data problem'
| and argue that just means you need to input 100x more data of X
| than of Y, but that actually just shows that algorithm
| performance is not uniform, and the more productive thing would
| actually be to understand strengths and weaknesses of
| architecture.
| [deleted]
| macleginn wrote:
| How is this relevant for the notion of bias as discussed in
| the paper? Of course different approaches have different weak
| points.
| geofft wrote:
| The classic algorithm to solve the
| https://en.wikipedia.org/wiki/Stable_marriage_problem is a
| pretty straightforward and frankly introductory-level example
| of an algorithm which is not impartial (and not just because it
| ought to be called the stable heterosexual monogamous marriage
| problem).
|
| The algorithm outputs a matching between two sets A and B
| (e.g., bachelors and bachelorettes) where each member of set A
| has ranked each member of set B in preference order and vice
| versa. The output is called stable because there's no situation
| where any pair would prefer to be with each other than their
| current partners, e.g., there's never a case where Alice
| prefers Bob to her current partner _and_ Bob prefers Alice to
| his current partner and are tempted to run away with each
| other.
|
| But a curious property of this algorithm is that the matching
| is always best for one group and always worst for the other. Of
| the possible answers to the problem, this algorithm an answer
| that is at least as good for set A as any other one, and at
| least as bad for set B as any other one. You can, of course,
| flip sets A and B when running the algorithm, and it will them
| be optimal for set B and pessimal for set A.
|
| This has absolutely nothing to do with ML or data fitting or
| anything. It's a deterministic, classical algorithm. And yet it
| is clearly partial, and you must choose when using it whether
| to be partial to set A or set B, to the men or to the women, to
| med school students or to their employers, etc. It would be
| silly to say that the algorithm is impartial and that it's
| solely the operator's choice to make it impartial - this
| particular algorithm forces the operator to make _some_ choice.
|
| I think in the end you're agreeing with the author, who is
| saying that you _must_ make a tradeoff of some sort, and the
| question is which tradeoff to choose. The author (and I) would
| describe that as the algorithm being partial, and the operator
| only being able to make some choice about what to do with that.
| macleginn wrote:
| "This has absolutely nothing to do with ML or data fitting or
| anything. It's a deterministic, classical algorithm." ---
| that's why I wouldn't use it as an example to analyse the
| question under discussion. If we disregard the name of the
| algorithm, it's just a non-commutative function: there are
| lots of those in CS and data analysis (e.g., KL divergence),
| and people have more or less learned how to deal with them.
|
| Generally, I do not agree with the author in what they call
| "bias". Sometimes algorithms are too noisy for particular
| data subsets; quite often they are over-confident in their
| predictions, thus exacerbating the differences between small
| and large sub-populations. These are all _technical_ issues
| that surely need to be taken into account when making
| decisions based on systems' outputs and when designing new
| systems, but the general idea of the ethical AI literature
| seems to be to recast these technical issues as ethical
| issues and to ask AI people to strive to eliminate those
| altogether. I think that this is misguided and very unlikely
| to work.
| twic wrote:
| This feels a bit like saying that subtraction is partial
| because it always treats its first argument as positive and
| its second as negative.
|
| It's important to be aware of the asymmetry in the Gale-
| Shapley algorithm (and i wasn't - thank you), and to not
| accidentally (or intentionally!) use it in an unfair way. But
| if someone does, it is they who have introduced the
| partiality, not the algorithm.
|
| PS It seems there are at least two algorithms for solving the
| stable marriage problem equitably, one given in this paper,
| and one it mentions given by Knuth and attributed to Selkow:
|
| https://epubs.siam.org/doi/10.1137/0216010
| mirker wrote:
| Have we even figured out what bias we care about? Race, gender,
| age, etc. are some potential problems, but is that it?
|
| These sorts of problems are often formulated theoretically
| ("suppose we want fairness with respect to variable Z"). It seems
| that half the battle is to figure out what to be fair with
| respect to. Often, the fairness variable in question isn't even a
| feature, but is implicitly in the data (e.g., race in human
| photos). Therefore, the fairness space is potentially infinite.
|
| For example, maybe life insurance models are biased toward those
| predisposed to developing cancer. Maybe ads target those
| suffering depression. You can continue partitioning the space in
| this way forever, and, therefore, it seems that the algorithms
| are relatively straightforward if you could formalize the bias
| requirements.
|
| This is before you even consider the fairness variables
| _interacting_ (e.g., age and gender and race), which requires
| potentially normalizing across exponentially growing feature
| combinations.
| mulmen wrote:
| This is a philosophical gap that society in general needs to
| bridge.
|
| Data is truth. If we discover bias in data that tells us
| something about the world. I can't fix systemic racism with a
| SQL statement. But I can tell you where it happened.
|
| Society can decide what classes are protected and data can tell
| us when that happens. It is then up to the offending entity to
| change their behavior.
|
| In other words bias in _data_ isn 't unethical but bias in
| _action_ can be.
| Spivak wrote:
| I really don't think anyone is really going to disagree with
| you that data is truth and actions are what matters but you
| run into two problems.
|
| * If you draw any conclusion from the data it will reflect
| the biases in the data.
|
| * If you take any action from the data it will reflect the
| biases in the data.
|
| So you're going to be "correcting" at some point if you want
| to avoid that bias which is what really matters.
| mirker wrote:
| Right. And once you start doing corrections you are
| basically making another ML model to turn "what happened"
| into "what should have happened".
| whatshisface wrote:
| I feel that talking about SQL in a way misses this issue -
| when data analysis is done by human beings, human virtues can
| restrain the worst excesses of self-interest. In contrast, a
| model trained blindly to maximize some kind of metric can
| commit essentially any sin in pursuit of that goal.
| mulmen wrote:
| A model is just another form of analysis. A SQL statement
| is already built on levels of abstraction. An "AI" model is
| only one more level. At the bottom it's all ones and zeros.
| But no matter how high you make the tower there's always a
| human at the top.
| mirker wrote:
| That's not really true.
|
| Say you want to know if you should approve someone for
| insurance.
|
| SELECT AVG(profit) WHERE feature == $user_feature;
|
| Say "feature" is education level and then suppose education
| level is correlated with race or gender. Then you
| implicitly are writing a query to filter by race or gender.
| Humans often don't consider these implicit correlations.
|
| An AI would do the same. When it's human face detection,
| then sure, humans would find the mistakes quickly. But SQL
| data is very easy to implicitly bias, and like I said, you
| can slice the pie millions of ways for various reasons.
|
| On the other hand, if we had a simple definition, then the
| database could raise an exception when the condition for
| the query failed.
| salawat wrote:
| Ah... The ML scientist's "Metrics are harmless. People doing
| stupid things with metrics on the other...".
|
| Or if I'm being too subtle, this is the ML practitoner's "Guns
| don't kill people. People kill people."
|
| Which is absolutely right, I might add; but also a stop point and
| consideration that should be had before you go training something
| on sketchy data, and unleashing it on an unknowing populace.
| elcomet wrote:
| That's not what the article is talking about.
|
| They are saying that data biases are a problem but not the only
| one: other things can amplify biases such as the model's
| architecture. So when working to remove biases, fixing the data
| might not be enough.
| athrowaway3z wrote:
| What salawat is probably saying is:
|
| There would be less issue with bias (for society), if the
| expectation/sales pitch was different.
|
| As an analogy:
|
| Somebody sells the idea of "The Unbreakable Rope" and it
| breaks after you hang 10kg on it. A blog post "Moving beyond
| material" goes in depth on how to process material. This is a
| good thing for people producing rope. However, for a layman
| who still sees people assuming "The Unbreakable Rope" is
| attainable, the content is going to be a little
| underwhelming.
| malshe wrote:
| I admit that I still don't know a single example of _systematic_
| AI bias in the absence of any data bias. I really want to improve
| my understanding of this topic. This article doesn 't help
| either. It starts off with why we should address this bias
| without giving an example of it. In fact, the author skirts this
| issue altogether by first stating: "Here, we start out with a
| deceptively simple question: how does model design contribute to
| algorithmic bias?" and then moves into why this is an important
| issue! If you said "we start" then let's start indeed. Maybe I am
| not used to this style of writing?
| bjornsing wrote:
| I have a really hard time following this kind of reasoning,
| partly because the word bias has very different meanings in
| social science and statistics. E.g. let's say I have a
| statistical model that takes among other variables a person's
| ethnicity and produces a credit score. If that model is
| statistically unbiased it would probably produce different credit
| scores depending on ethnicity. If we "fix" that by making the
| model insensitive to ethnicity, then it probably becomes
| statistically biased. Wouldn't it be better to talk about
| "fairness" or something?
| Spivak wrote:
| You're viewing "fixing" from the wrong lens. There are infinite
| bits of information about a person that could potentially be
| used in a model to calculate a credit score. At all times in
| every model you're choosing a subset of one's facets and can
| build statistically unbiased models based on the data you make
| available.
|
| But when you talking actually fixing models like this you're
| actually forced to correct the final result, not filter the
| data. Being blind to ethnicity doesn't work because one's
| ethnicity permeates (to different degrees, sure) every part of
| their lives. All the data is bad, everything is a statistically
| detectable proxy for ethnicity.
| jhgb wrote:
| But surely there's some kind of refining your inputs where
| all the paths converge to one result, much like the
| refinement of interval partitions in Riemann's integral
| converges to one value (for a certain class of functions).
| The bits of information may be infinite but there's some
| structure in them. I'm not sure that just because the optimal
| result (the one that uses all information) is something you
| don't like, the result is bad. Best thing you can say is that
| you're actually not predicting an existing credit score, but
| rather synthesizing some other indicator that is not an
| existing credit score.
| dekhn wrote:
| I'm still trying to understand "algorithmic bias is an algorithm
| problem". Does that mean, if I select L1, I'm training a network
| which is less accurate against examples that have rare features,
| because the weights on those features will be forced to 0?
|
| I just want to understand if that's the rough idea or I'm far
| from the point. If my thinking is approximately correct, then I
| have a series of further questions and comments, but I think I
| must be understanding what shooker is saying.
| joiguru wrote:
| The basic idea is as follows.
|
| Lets say you are building an ML model to decide whether to give
| someone insurance or not. Lets also assume your past behavior
| had some bias (say against some group). Now ML model trained on
| this past data will likely learn that bias.
|
| Part of modern ML focus is then to understand what bias exists
| in data, and how can we train models to use the data but
| somehow counteract that bias.
| commandlinefan wrote:
| > ML model trained on this past data will likely learn that
| bias
|
| That's the opposite of what the author is saying, though - or
| rather, she's saying that data bias exists, but the algorithm
| itself introduces bias that would be there even if the data
| itself were somehow totally fair, for some unspecified
| definition of "fair".
| ramoz wrote:
| A reference I like, based on your last point:
|
| https://www.frontiersin.org/articles/10.3389/fpsyg.2013.0050.
| ..
| dekhn wrote:
| what you just described is a previous bias being encoded in
| the data. It's not algorithmic bias, because it's not encoded
| in the structure of the algorithm. Sara addresses that (data
| re-weighting) but says that's not all.
|
| I honestly don't think it can be what you're describing, or
| the debate is a very different one from what Sara and others
| in the "algorithmic bias exists and it is distinct from data
| bias" sense.
| umvi wrote:
| How do you tell if something is biased or not? Seems like the
| current system is "if people cry foul because it seems
| unfair, then the model is biased" which doesn't seem
| scientifically rigorous.
|
| This seems like a hard problem. For example, say that you
| have an ML model that decides whether someone will be a good
| sports athlete or not purely based on biometrics (blood
| oxygen level, blood pressure, BMI, reflex time, etc.). If the
| model starts predicting black people will be better athletes
| at higher rates than white people, is the ML model biased? Or
| is the reality that black people have higher-than-average
| advantageous physical characteristics? How do you tell the
| difference between bias and reality?
| dekhn wrote:
| The bias would have to be determined by a board of experts
| who debate things based on facts, but is ultimately
| subjective and linked to the time and place of the culture.
|
| The ethics in AI folks, for the most part, seem to want
| models to predict what they would predict, based at least
| partly on subjective analysis of culture, not entirely
| based on scientific data.
|
| At least that's what I think I've concluded about
| algorithmic bias. It's one of the situations where I really
| want to understand what they're saying before I make too
| many criticisms and counterarguments./
| 6gvONxR4sf7o wrote:
| I've had the best luck explaining this in terms of causal
| inference.
|
| We all know that it's really easy to screw up or invent
| connections between things when you use observational data
| instead of a randomized controlled trial. Observational data
| contains weird connections that you often can't tease apart,
| whether it's because there are important aspects of the
| mechanisms that were unrecorded (missing features), selection
| bias (missing data), or because regularization forces your hand
| in interesting ways.
|
| Generally for statisticians, this leads to bias in parameter
| space. For ML practicioners, it's more interesting/useful to
| regard this as bias in function space.
|
| There are useful lessons from the causal inference world too.
| There is a whole field of tools to try to get unbiased
| parameters/functions from observational data. There are even
| models that can guarantee unbiased parameters/functions under
| pretty reasonable assumptions and reasonable conditions. There is
| a rapidly developing field around learning unbiased (estimates
| of) functions from less "perfect" data.
| fouronnes3 wrote:
| This belief has always been extremely cringe in my opinion,
| because it somehow implies that dataset engineering is not an
| important part of machine learning. Ensuring you are feeding good
| data to your model is a critical part of machine learning
| practice, but some researchers seem to hand wave it away as
| merely an implementation detail not worthy of their attention.
| "It's a data problem" is used as an excuse to be discharged of
| the moral responsability of the output bias, as if you were
| ethically responsible for the model only.
| darawk wrote:
| I don't think "it's a data problem" is inherently used to hand-
| wave away the problem. I think it's used to locate the problem
| in the area where its most readily addressed. Designing
| algorithms to debias your data is hard, over-sampling under-
| represented groups is easier. I do think you're right that
| sometimes people use it that way, but that doesn't mean we
| should make up false narratives about biased models, either. It
| just means we should work on de-biasing the data, and also
| developing algorithms to help mitigate the learning of features
| we don't want weighted.
| mjburgess wrote:
| You are right that there are no algorithms which capture
| semantics, and therefore none which can be constructed not to
| introduce bias.
|
| In that sense it is a "data solution", but it is _not_ a data
| problem.
|
| > de-biasing the data
|
| The issue isnt statistical bias. The issue is semantic:
| occurrence doesnt capture meaning. Negative terms and racial
| terms can co-occur more frequently in some text (eg., a
| biography of MLK) without the remedy being to "rebalance them
| with positive terms".
| rualca wrote:
| > (...) as if you were ethically responsible for the model
| only.
|
| The model is the result and the outcome of the whole process.
| The model is supposed to be a reliable representation of
| reality that has acceptable accuracy within predetermined
| operational limits.
|
| If someone tries to generate models that interpolate and/or
| extrapolate data and the data it uses to seed the model is
| garbage then the model is garbage, and the person responsible
| for putting it together is the person responsible for the model
| failing to do its job.
|
| There is no separation between model and the data used to
| generate the model. Garbage in, garbage out. If a machine
| learning model uses garbage data then the model is garbage, and
| throwing machine learning buzzwords at the problem does not
| justify why garbage was sent in.
| mjburgess wrote:
| > he model is supposed to be a reliable representation of
| reality that has acceptable accuracy within predetermined
| operational limits.
|
| This is the problem. That is _NOT_ what ML models are
| supposed to be outside of newspaper articles and research
| grant proposals.
|
| ML models interpolate between co-occurrences; that is all
| they do. Representations of reality are not interpolations
| between statistical co-occurances -- that is the furthest
| thing from what they are.
|
| Reality has a counter-factual (ie., modal) structure, causal
| structure, generative structure (etc.) and much more.
| rualca wrote:
| > This is the problem. That is NOT what ML models are
| supposed to be outside of newspaper articles and research
| grant proposals.
|
| No, not really. That is the very definition of a model, and
| the very reason why people bother with then. Stating
| otherwise requires a gross misunderstanding of the whole
| subject and problem domain.
|
| Also, it makes absolutely no difference if you know a model
| fails to output accurate results on specific scenarios. The
| key factor if you know the domain where the model does
| indeed provide acceptable results. That's why in some
| applications gravity is modelled as a constant acceleration
| of 9.8m/s2 pointing straight down, or materials modelled
| having a linear relationship between stress and strain.
| Those who use those models know that they are only valid if
| specific circumstances are met. This is something known
| forever in engineering, and made famous in statistical
| model circles by George Box with his quote of "all models
| are wrong".
|
| My take is that there is a kind of naive arrogance plaguing
| ML circles where throwing novel hightech approaches to old
| and basic modeling applications leads those in the field
| into believing that they can take a blank slate approach to
| modeling and ignore lessons learned throughout the
| centuries because they aren't needed anymore. And this is
| the end result.
| mjburgess wrote:
| The presumption of ML is that compressions of X are
| representations of Y. This is just false.
|
| The compression of X, ie., f, isnt a representation of Y.
| Its an estimator of the value of Y within some domain of
| X,Y.
|
| For it to be f to be a representation of Y, it needs to
| be able to stand-in for Y (at least). And compressions of
| X cannot. They lack, eg., the right counterfactual
| behaviour.
|
| ie., A representation of a cat enables computing things
| about a cat in imaginative scenarios, eg., in a video
| game simulation. Compressions of pixels of cats do not.
| rualca wrote:
| > The presumption of ML is that compressions of X are
| representations of Y. This is just false.
|
| The whole point of modeling is that without a doubt
| compressions of X are indeed representations of Y,
| because the whole reason X was picked was that it is
| clearly represents Y with an acceptable degree of
| accuracy for the use in mind.
|
| If a proposed model fails to reproduce and predict a
| model with acceptable accuracy, it's scrapped in favour
| of those that can. Why are we discussing this?
|
| I frankly do not understand why we are wasting time
| explaining the very basics of what a model is and why
| they are created and how they are used. This is not up
| for discussion. Just ask yourself why people, specially
| in engineering and physics, bother with models.
| ad404b8a372f2b9 wrote:
| You cannot conflate dataset engineering, machine learning, and
| researchers as if they're a single discipline and people. This
| is precisely where the disagreement stems from.
|
| Dataset engineering is an important field of research. It is
| also an important part of the modelling process. Neither of
| these things are included in model research. Researchers who
| design computer vision models and other novel mathematical
| models can be held ethically responsible for the societal
| results of the novel part of their research but if we agree
| that their field of research is valuable for society then they
| cannot be held responsible for the misuse of these models by
| people who skip or fail integral parts of the modelling
| process.
|
| You cannot ask a researcher to invent a new more performant
| computer-vision model, invent a way to correct for bias, and
| design a fair dataset all in one. That's three entirely
| different careers.
| MichaelGroves wrote:
| > _You cannot ask a researcher to invent a new more
| performant computer-vision model, invent a way to correct for
| bias, and design a fair dataset all in one. That 's three
| entirely different careers._
|
| Might it not sometimes be reasonable to ask researchers in
| one domain to slow their roll, if researchers in a related
| domain are making slower progress? If I invent 10 new forms
| of nerve gas every year, while you invent 5 new antidotes to
| nerve gas every year, where does the problem lay? Solely with
| you, for not working fast enough? I don't think that's
| reasonable. Does the problem lay with me, for not inventing
| nerve gas antidotes? I don't think that's reasonable either,
| antidotes are not my specialty and I can't be expected to
| create them. But if I'm the one inventing new systems that
| become a problem for society because your domain of counter-
| research isn't keeping up, then I have some substantial share
| in that blame nonetheless.
| mirker wrote:
| Most will not pass up an opportunity to keep their job or
| get promoted, so I think the practical answer lies
| elsewhere in the research pipeline.
| elisbce wrote:
| Let's take the classic example of a "racist algorithm" Image
| recognition using neural nets and examine what's going on.
|
| Let's say we train the NN using equal number of human faces of
| all races, and animals faces. Let's say the trained neural net
| made some mistakes, including a few cases where black faces are
| recognized as gorilla faces. And this doesn't happen at all or as
| likely with white faces. The results are horrible, right? And
| people immediately start to point fingers to the training data
| and the algorithms, stating the training data is racially biased
| and/or the training algorithms or even the use of neural nets are
| racially biased.
|
| But is it really so? It's known that in order to take a black
| face photo with the same degree of details, the lighting
| condition and camera settings need to be adjusted. This is an
| effect purely due to physics. In other words, it could well be by
| nature, that recognizing black faces is harder than pale faces
| under the normal camera and scene settings. This is why you have
| night mode on your phones. It is just harder to take clear photos
| when less light gets into the camera. And this requires the
| camera and photography settings to be adjusted.
|
| So, the unwanted results here are still due to the input data.
| But neither the input data, nor the algorithms contain any racial
| bias towards the black people. The results might be merely due to
| the _difference_ between dark faces and pale faces under the
| natural law of physics.
|
| These are unwanted results due to our social norms, but they are
| NOT racially biased or racist, because there is no such bias
| introduced or inherent during any part of the process.
|
| We could and should correct such unwanted results by introducing
| adjustments to the input data, like improving dark face
| photography and camera sensitivity. But we can't just label the
| input data, the algorithms and the people who designed these
| algorithms as "racist" or "racially-biased". There is zero racial
| bias that is man-made here. The race just coincide with the side-
| effect of photography.
|
| Likewise, there will be cases where the reverse is true, like
| white faces get unwanted results instead of dark faces.
|
| So, while we work towards improving the data quality and the
| algorithms, we must stop this trend of labeling or calling people
| and algorithms racists.
| jbattle wrote:
| > In other words, it could well be by nature, that recognizing
| black faces is harder than pale faces under the normal camera
| and scene settings.
|
| This might be the crux. Why are pale-face-recognizing settings
| the "normal" settings? Why aren't the cameras designed and
| tuned to recognize darker skinned faces by default?
|
| Cameras are designed and tuned by people - this is not a matter
| of fundamental physics having a preference.
| dthul wrote:
| It's not as easy as just retuning camera settings. Due to
| physical limitations (at least with our current state of
| technology) camera sensors have a very limited dynamic range
| compared to the human eye. Increasing the exposure to better
| image darker surfaces will overexpose the rest of the image.
|
| We can be hopeful though that this will become less of an
| issue in the future due to camera technology advancements
| (like HDR exposure stacks).
| [deleted]
| SpicyLemonZest wrote:
| The source article seems to agree with you on this point, and
| does not call any person or algorithm "racist". I think they
| understand the term "bias" to mean simply "things we might want
| to introduce adjustments for".
| [deleted]
| MichaelGroves wrote:
| > _So, the unwanted results here are still due to the input
| data. But neither the input data, nor the algorithms contain
| any racial bias towards the black people. The results might be
| merely due to the difference between dark faces and pale faces
| under the natural law of physics. These are unwanted results
| due to our social norms, but they are NOT racially biased or
| racist, because there is no such bias introduced or inherent
| during any part of the process._
|
| In this, and the reasoning above it, I think you are correct.
| Assuming we are right about this, what is the next step? You
| say the next step is to improve data collection, e.g. by
| creating better cameras. That seems a fine proposal to me, I
| support that, but I think there is more that might _also_ be
| done. For instance, the use of ML models could be restricted or
| regulated in at least some contexts, until the problems with
| data collection are rectified. For instance, we could ban the
| police from using facial recognition models until the problems
| with data collection are solved. The bias is a side-effect of
| photography, not something _intrinsic_ to the facial
| recognition algorithm, but restricting the use of that
| algorithm might _nonetheless_ be a valid response to this
| circumstance.
| trhway wrote:
| the point of the article is that algorithm design choices affect
| results on the long-tail of the dataset. Well, what is in the
| long-tail of the dataset is data bias. So, are we going to try to
| introduce algorithm bias to correct for the data bias?
| darawk wrote:
| > A surprisingly sticky belief is that a machine learning model
| merely reflects existing algorithmic bias in the dataset and does
| not itself contribute to harm. Why, despite clear evidence to the
| contrary, does the myth of the impartial model still hold allure
| for so many within our research community? Algorithms are not
| impartial, and some design choices are better than others.
| Recognizing how model design impacts harm opens up new mitigation
| techniques that are less burdensome than comprehensive data
| collection.
|
| Ah yes the classic "why do people keep insisting on X despite
| consistent proof of not X", with...zero citations. Well, let's
| read the paper and see if they make the case.
|
| > Even if we are able to label sensitive attributes at scale such
| as gender and race, algorithms can still leverage proxy variables
| to reconstruct the forbidden label. Data collection of even a
| limited number of protected attributes can be onerous. For
| example, it is hard to align on a standard taxonomy--categories
| attributed to race or gender are frequently encoded in
| inconsistent ways across datasets.2 Furthermore, procuring labels
| for these legally protected attributes is often perceived as
| intrusive leading to noisy or incomplete labels.3,4
|
| > If we cannot guarantee we have fully addressed bias in data
| pipeline, the overall harm in a system is a product of the
| interactions between the data and our model design choices. Here,
| acknowledging the impact of model design bias can play an
| important role in curbing harm. Algorithms are not impartial, and
| some design choices are better than others. Recognizing how model
| design impacts harm opens up new mitigation techniques that are
| far less burdensome than comprehensive data collection.
|
| > We are well-versed in the connection between function choice
| and test-set accuracy because objective functions such as cross-
| entropy or mean squared error reflect our preference to optimize
| for high test-set accuracy. Standard loss functions do not
| explicitly encode preferences for other objectives we care about
| such as algorithmic bias, robustness, compactness, or privacy.
| However, just because these desiderata are not reflected does not
| mean they have ceased to exist. Turing award winner Donald Knuth
| said that computers "do exactly what they are told, no more and
| no less." A model can fulfill an objective in many ways, while
| still violating the spirit of said objective.
|
| > Model design choices made to maximize test-set accuracy do not
| hold static other properties we care about such as robustness and
| fairness. On the contrary, training a parametric model is akin to
| having a fixed amount of materials to build a house with. If we
| decide to use more bricks building a bigger living room, we force
| the redistribution of the number of bricks available for all
| other rooms. In the same vein, when we prioritize one objective,
| whether that be test-set accuracy or additional criteria such as
| compactness and privacy, we inevitably introduce new trade-offs.
|
| This is literally just saying "Well yes all the bias starts out
| in the data, and so a race/gender-neutral model will encode those
| features, but in principle you could design a model expressly to
| avoid that, and since that hasn't been done, the models are
| racist too". Which is...a certain kind of take. But even if we
| accept the reasoning, it is not at all at odds with the common
| understanding of their starting sentence: "A surprisingly sticky
| belief is that a machine learning model merely reflects existing
| algorithmic bias in the dataset and does not itself contribute to
| harm.".
|
| Yes, people commonly believe the "bias is in the data". Models
| are designed to be mirrors, so they reflect whatever bias is in
| their training data. Yes, it's true you can design a distorted
| mirror that does not reflect certain attributes. But that would
| be _inserting_ bias into a model to correct a bias that you found
| in the original data. It should be very clear that the root
| source of the bias here is the training data, not the model.
|
| We can and should build models that attempt to correct those
| biases, and we also can and should attempt to de-bias the data
| itself. But can we stop torturing the meaning of English
| sentences to support people's preferred narratives? The models
| are neutral. The data contains the bias. Convolutional neural
| nets do not inherently work better on white faces. Multi-layer
| perceptrons do not have beliefs about race and gender.
|
| One additional linguistic nitpick:
|
| > The belief that model design merely reflects algorithmic bias
| in the dataset can be partly ascribed to the difficulty of
| measuring interactions between all the variables we care about.
|
| There is no "algorithmic bias in the dataset". That is a
| contradiction in terms. The bias in the dataset is not
| "algorithmic". The algorithm is what processes the data. The
| dataset is just biased, it has not been biased algorithmically.
| mjburgess wrote:
| "Bias" here is being used in so many different ways. "Bias"
| here isnt _statistical bias_. It is prejudice.
|
| The "algorithm" doesnt capture the semantics of the data, and
| so introduces moral bias _regardless_ of any statistical bias
| within the dataset.
|
| Eg., consider training an NLP system on black rights
| literature. Negative words co-occur with racial terms -- the
| semantic association is one of "black people opposing hatred"
| _NOT_ "black people COMEWITH hatred".
| darawk wrote:
| Ya, the overloading of 'bias' is confusing.
|
| > The "algorithm" doesnt capture the semantics of the data,
| and so introduces moral bias regardless of any statistical
| bias within the dataset.
|
| > Eg., consider training an NLP system on black rights
| literature. Negative words co-occur with racial terms -- the
| semantic association is one of "black people opposing hatred"
| NOT "black people COMEWITH hatred".
|
| I agree that certain NLP models might learn an association
| like that, but that's a consequence of the pairing of model
| and data. The model itself does not encode a racial
| prejudice, it's simply poorly suited to the dataset in this
| context.
| mjburgess wrote:
| "Poorly suited" is the heart of the issue.
|
| Nothing here is morally biased: neither the algorithm or
| the data. But when people say the "data is biased" they are
| suggesting that _this_ is where the moral bias enters.
|
| Not at all. The moral bias _IS_ entering at the level of
| the algorithm. In this sense the algorithm is morally
| baised.
|
| Of course it is really the human operator who selects this
| algorithm which causes the issue -- but by saying "the data
| is biased" we are obscuring this reality.
| random314 wrote:
| That's a long winded way of stating that correlation
| coefficients are morally biased. It is kinda hilarious.
| mjburgess wrote:
| The moral bias enters when a person takes correlation
| coefficients to indicate meaning. It doesnt matter what
| value they have.
| random314 wrote:
| > The moral bias IS entering at the level of the
| algorithm. In this sense the algorithm is morally baised.
|
| > The moral bias enters when a person takes correlation
| coefficients to indicate meaning.
|
| Changing your stance within 2 replies is not a good look.
| wizzwizz4 wrote:
| We're humans. We do that even when we're generating those
| coefficients in our own brains; it's irresponsible to
| teach a computer how to find lots of correlations, show
| the results to humans, then try to say "but we didn't
| _mean_ to imply that the correlation was meaningful! ".
| djoldman wrote:
| > Nothing here is morally biased: neither the algorithm
| or the data.
|
| I think the issue here is the following progression:
|
| 1. Someone chooses and collects a subset of all data that
| exists.
|
| 2. They then choose an algorithm/generic-framework-of-a-
| model with the intent to apply it to the data chosen in
| step 1.
|
| 3. They then train the chosen algorithm from step 2,
| yielding a _model_.
|
| What @darawk is saying, I believe, is that the
| algorithm/generic-framework-of-a-model chosen in step 2
| is not morally/ethically biased/racist/etc. A human may
| consider the outputted model from step 3 to be so,
| however.
|
| If the above is true, then the source of moral/ethical
| bias is the data chosen in step 1.
|
| What _untrained_ model /algorithm is morally or ethically
| biased? I know of no model listed in scikit learn,
| tensorflow, scipy, etc. that has hardcoded any sense of
| anything about humans at all...
| seoaeu wrote:
| I don't see how that follows. Surely you could select
| hyper-parameters that cause a model to perform
| differently across different races, ethnicities, etc.
| Like, shouldn't you be able to get a model to perform
| poorly for minority groups simply by specifying a model
| size that's too small to learn the full data set, with
| the knowledge that most training samples won't be
| describing those groups?
| djoldman wrote:
| Hrm, I think I understand what you're saying.
|
| Perhaps a better way to formulate it is: let's say we
| take a model and add a regularization term that
| penalizes/rewards some part of the data, which when
| trained makes for a morally/ethically biased model. I
| think you're exactly correct, the source of moral/ethical
| bias here is in the model.
|
| Good point.
| random314 wrote:
| I have wasted my time watching these long winded tortuous and
| even hour long videos that set out to explain to us dumb data
| scientists that algorithmic bias is a problem. And there isn't
| any algorithm or proof or equation in these garbage laden soup
| of words. And if there is a good example of bias, it is always
| in the dataset in these long winded diatribes.
|
| However, I must say they are quite consistent in their
| introductory paragraphs. They start by acting dumbfounded that
| people don't understand that algorithmic bias has already been
| demonstrated and proven to be a problem, so we should be
| talking about how to solve this problem instead of "pretending
| to be not aware ". Except they won't directly link to the work,
| simply mention the authors name - typically Gebru. And we must
| Google, read the paper, find that its garbage and waste our
| time.
|
| FWIW, I am a person of color.
| criddell wrote:
| Have you read Cathy O'Neil's book _Weapons of Math
| Destruction_? If so, what did you think of it?
| AlanYx wrote:
| Funny you mention that book. It's cited in a _huge_
| proportion of AI fairness papers, especially papers by
| academic authors who do not have a technical background,
| and I can 't figure out for the life of me why. The book
| itself is more polemic than academic.
|
| One of the heuristics I actually use when reviewing papers
| in this area is to check whether that particular book is
| cited without any references to page numbers, like a mantra
| rather than an academic citation. This seems to correlate
| fairly highly IMHO to low quality work with few, if any,
| interesting original contributions.
| darawk wrote:
| Totally agree. I found the total lack of equations, data,
| and experiments in the paper this thread is about
| extremely telling in this regard as well. Lots of vague
| assertions, zero rigor. I think all the points being made
| in this paper evaporate rather quickly when subjected to
| any sort of serious formalism.
| [deleted]
| concordDance wrote:
| > We can and should build models that attempt to correct those
| biases, and we also can and should attempt to de-bias the data
| itself.
|
| What if reality itself is the problem and the source of the
| "bias"? E.g. say that your ML algo gives less loans to African
| Americans, but it turns out that in reality African Americans
| pay back loans less often. Do you accept that some people will
| have a harder time getting a loan? Or do you instead work to
| equalise the loan rate across protected characteristics? Or
| maybe all characteristics?
| MontyCarloHall wrote:
| The real answer is that you work to address the underlying
| reasons that certain demographics are more likely to be
| delinquent on loans than others. Of course, this would take
| enormous amounts of effort and decades to happen, which is
| why nobody wants to hear the real solution to the problem,
| and instead just want a quick and easy superficial fix ("it
| must be the algorithm's fault").
| seoaeu wrote:
| It turns out we actually have laws that answer that question:
| you cannot determine the loan rate based on protected
| characteristics. In fact, that's what "protected" means in
| this context.
|
| Laundering the bias through an algorithm doesn't make it any
| more OK or more legal, it just makes it harder to detect the
| discrimination.
| commandlinefan wrote:
| That's the "disparate impact" line of reasoning: if
| _anything_ results in unequal outcomes, no matter how
| evenly-handed it was administered, it is racist and must be
| adjusted until the outcomes are "equitable". It shouldn't
| be hard (although apparently is for some people) to see
| what a disaster that line of thinking is.
| hackinthebochs wrote:
| This doesn't really address the core issue. There's a
| difference between determining loan rate based on race of
| the applicant, and using features that correlate with
| failed loan repayment that happen to overrepresented among
| black people. While it is illegal to use race as an input
| to your algorithm, biases in outcomes will still be present
| given the bias in the relevant non-protected properties.
| Are ML algorithms required to normalize for race in the
| face of bias in the relevant variables?
| ultrablack wrote:
| Thank You Darawk.
|
| It is as you say, inserting bias, if you correct the data for
| biases you yourself dont agree to.
|
| As for bias on white faces vs. black, its my understanding that
| it just is harder to recognize black faces because of lightning
| effects. Black people need to be lightened different from
| white. See also:
| https://www.npr.org/sections/codeswitch/2014/04/16/303721251...
| wizzwizz4 wrote:
| > _As for bias on white faces vs. black, its my understanding
| that it just is harder to recognize black faces because of
| lightning effects._
|
| If humans can manage it, then it's not harder a priori. It's
| harder _with respect to certain algorithms_ - i.e., the
| choice of algorithm is introducing a bias. (Though in
| reality, this is usually more a problem of training data
| issues.)
| kjkjadksj wrote:
| The human eye has more dynamic range than any camera sensor
| so you are going off a lot more data
| wizzwizz4 wrote:
| Humans can manage _when looking at a digital photograph_.
| robertlagrant wrote:
| > There is no "algorithmic bias in the dataset".
|
| Yes, this is where the article unveiled its non-ML
| underpinnings.
| losvedir wrote:
| I'm not sure if this is what the article is trying to get at,
| but here's a concern I have about models vs data: algorithm
| choice and hyperparameters.
|
| I'm no expert but I've trained a few models, and I think
| there's a fair bit of manual tuning that goes on to maximize
| accuracy.
|
| So in terms of the model vs data dichotomy, could we be
| optimizing for models based on biased data now, eg lack of
| labeled black faces in image recognition, and later once we fix
| the data, the model will still underperform on black faces?
|
| I did a simple transfer learning project based on a model that
| had trained on an enormous data set, and my understanding is
| that it worked because the lower layers had "figured out" how
| to look at edges and contrast and such. But could those lower
| primitive features be biased so that even with more, new,
| training data, the model won't work so well? (e.g. focusing on
| contrast, which might not work as well on darker faces).
|
| I think this all depends on a sort of "build a model once and
| retrain on new datasets" approach. Is that how it works in
| practice? Or is the model re-tuned, architected, etc each time
| there's a change to the training data? In that case since the
| model is effectively tied 1 to 1 to the data, I'm not sure it
| makes much sense to draw this distinction between model and
| data, right?
| zozbot234 wrote:
| > The models are neutral.
|
| Models are not neutral. All models encode inbuilt priors
| reflecting some inherent bias. In fact, absent that bias a
| model would have little to no generalizability beyond its
| training set!
| darawk wrote:
| > Models are not neutral. All models encode inbuilt priors
| reflecting some inherent bias. In fact, absent that bias a
| model would have little to no generalizability beyond its
| training set!
|
| Yes, but those structural priors have nothing to do with
| race, gender, or any other protected attribute.
| loopz wrote:
| The set of possible models may be regarded as infinite.
| However, selection of model may fail to account for biases
| and prejudice that may not even be present in the data at
| all. Indirectly, the bias might be from the researchers
| themselves, ignorance or some silly thing like chance. When
| talking about hypothetical models, flaws probably lingers
| in any part of the chain. If not accounted for, you'd
| indeed expect biases, a need to clear the most obvious ones
| and adhering to laws and rules.
|
| First thing is to eradicate the poorly-defined word
| "racism", and find a more fitting term regarding the flaw
| in question: unfairness, discrimination, prejudice, bias,
| etc., and then make it concrete.
|
| Ie. instead of "structural racism", we could instead use
| the term "structural discrimination", to be more clear
| about what we're talking about.
|
| It is also more neutral to view these flaws as _bugs_. That
| only becomes more important as algorithms gain more power
| over people 's lives.
|
| The sinister part of such algorithmic rules is the
| tolerance of a silent majority, benefitting unfairly from
| the outcomes.
| darawk wrote:
| So I think you and I agree on all the things you just
| said. My point is really just that, linguistically, I
| don't think it makes sense to describe the models as
| being "biased" or "discriminatory" here. Statistical
| learning models are designed intentionally to act like
| mirrors. They reflect the data they're trained on. And I
| don't think it's descriptively useful to describe a
| mirror as biased because you don't like your reflection.
| Even if, in some sense, you could design a curved mirror
| that generates the reflection you wanted. The mirror is
| just a mirror.
|
| Now, that being said, I think it is fair to talk about
| structural equitably in the _use_ of models that produce
| outcomes we believe are discriminatory. If ML engineers
| at some company produce a model, and fail to check it for
| these issues, or do check it but fail to correct them, we
| can certainly describe _that_ behavior in negative terms,
| and shame them appropriately.
|
| At the end of the day, if we didn't live in a
| racist/sexist society, these ML models wouldn't produce
| discriminatory outcomes. And it is in that sense that the
| bias is "in the data". That being said, we may still
| choose to correct that bias at the model level, just like
| people fix cinematic issues in post-processing all the
| time.
| seoaeu wrote:
| > Yes, but those structural priors have nothing to do with
| race, gender, or any other protected attribute.
|
| This seems like wishful thinking. If fed a data set
| containing 'race' or 'gender' as one of the fields, most
| models have structural priors that will make them assume
| all correlations between race/gender and other fields are
| meaningful. Worse, just because an input data set doesn't
| have race or gender recorded doesn't mean that the model
| won't predict them, and then use the results of those
| predictions to bias its output
| darawk wrote:
| Don't conflate "meaningful" with "predictive". The
| attributes are indeed predictive. The intention of the
| model is to make accurate predictions optimally given the
| data its fed. If you give it data that contains
| predictive correlations with race and gender, yes, it
| will learn them. It should be pretty clear that the root
| problem there though is the data, not the model.
| rovolo wrote:
| Expanding on this point, you have a choice of goals you want
| to model, and you have a choice of success criteria. Each
| model has outcomes which are factually neutral, but the
| choice of model and way you use the results reflect a value
| system.
| geofft wrote:
| I think there's a reasonable insight in the article, though,
| that there are _two different_ kinds of potential data set
| bias.
|
| One potential bias is, say, that a data set of loans shows that
| in a certain city, people born in one neighborhood are more
| likely to pay back large loans without defaulting than people
| born in another neighborhood. An algorithm could, based on this
| data, conclude that it should not issue large loans to _anyone_
| from the second neighborhood, because it lacks data that those
| people are _usually but not always_ poorer, and thereby encode
| bias based on place of birth.
|
| But the paper is talking about a different kind of bias, say,
| that a data set of loans has thousands of data points of people
| born in the first neighborhood and tens of data points of
| people born in the second neighborhood. Even if you were to
| control for economic status (or perhaps explicitly control for
| things like ethnicity), an algorithm that performs well on the
| data set as a whole might perform poorly on the particular
| subset of people born in the second neighborhood, simply
| because it has less data. The algorithm might have an
| acceptable (to its human supervisors) false-positive rate
| rejecting loans to people born in the first neighborhood, but
| the exact same model might have a much higher false-positive
| rate to people born in the second neighborhood.
|
| That's different, and that effect could apply _even if people
| in the second neighborhood were just as good as paying back
| loans_ , because you have fewer items in your sample and so
| there's more noise. That's what the section "Measuring complex
| trade-offs" is about.
| 3wolf wrote:
| Differences in lending rates between groups due to less data
| or confounding features is the motivating example in the oft-
| cited 'Equality of Opportunity in Supervised Learning'.
| Highly recommend it: https://arxiv.org/abs/1610.02413
| darawk wrote:
| > But the paper is talking about a different kind of bias,
| say, that a data set of loans has thousands of data points of
| people born in the first neighborhood and tens of data points
| of people born in the second neighborhood. Even if you were
| to control for economic status (or perhaps explicitly control
| for things like ethnicity), an algorithm that performs well
| on the data set as a whole might perform poorly on the
| particular subset of people born in the second neighborhood,
| simply because it has less data. The algorithm might have an
| acceptable (to its human supervisors) false-positive rate
| rejecting loans to people born in the first neighborhood, but
| the exact same model might have a much higher false-positive
| rate to people born in the second neighborhood.
|
| Right, the paper is talking about models that perform poorly
| on data clusters with low cardinality in the dataset. This is
| a problem, but it's an intrinsic problem. We simply don't
| have enough information about those clusters to make informed
| judgments.
|
| I think the paper is sort of assuming something is happening
| in which the model forgets information about the scarce
| groups to make room for even more information about the dense
| groups, but I don't think that really makes a lot of sense if
| you think it through. Most neural networks are information
| sparse, that is, they have lots of neuronal capacity to
| spare. They don't need to forget things to learn new things,
| and if they did, we could solve that problem by simply adding
| capacity, not by forcing the model to forget things its
| learned about the dense groups.
|
| I accept that it is in principle possible for things to work
| the way they're describing, but I think there's very good
| reasons to believe that they don't, and I think it's pretty
| telling that this paper contains no math, and no supporting
| data or experiments to back up this model of statistical
| learning.
| zozbot234 wrote:
| Obligatory reminder that there's an inherent tradeoff between
| bias and variance in any ML architecture. Try to create a "highly
| regular" model that gives predictable, less noisy results even
| with low-quality data, you'll just end up strengthening some
| implied prior and introducing bias. Try to remove inbuilt bias,
| you'll just make the model more data-dependent, and noisier with
| any given amount of data. You just can't win.
| jstx1 wrote:
| Also important to point out that the word bias has a very
| specific meaning in the context of statistics and machine
| learning and it isn't just a synonym for discrimination or
| something of that sort -
| https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff
| commandlinefan wrote:
| ... which seems not to be the definition the author of the
| linked article is using.
| rualca wrote:
| > Obligatory reminder that there's an inherent tradeoff between
| bias and variance in any ML architecture.
|
| It should also be stressed that in this context "bias" is a kin
| to interpolating between data points where resolution in the
| training set is relatively low, thus leads to a simpler model
| that does not exactly match the test set.
|
| What this implies is that this "bias" does not reflect a
| preference, let alone an intention to favour one group over
| another. It just means that a combination of the test set being
| too sparse or with too few dimensions, the choice of model
| being too inflexible, and the problem domain having too much
| uncertainty at that resolution.
|
| Critics may single out the model, but the core of the problem
| actually lies on the data and the problem domain.
| narrator wrote:
| This reminds me of a previous crisis involving the intersection
| of far-left politics and technology after the Russian revolution.
| What to do about the problem of all scientific progress up to
| that point being the product of ideologically impure bourgeois
| capitalism? Stalin, being, for all his notorious faults, a
| technological pragmatist, had to actually get in the middle of it
| and straighten things out:
|
| "At one time there were 'Marxists' in our country who asserted
| that the railways left to us after the October Revolution were
| bourgeois railways, that it would be unseemly for us Marxists to
| use them, that they should be torn up and new, 'proletarian'
| railways built. For this they were nicknamed 'troglodytes'.
|
| It goes without saying that such a primitive-anarchist view of
| society, of classes, of language has nothing in common with
| Marxism. But it undoubtedly exists and continues to prevail in
| the minds of certain of our muddled comrades."
|
| - Stalin, Marxism and the Problem of Linguistics.[1]
|
| You could easily redo this for the 21st century:
|
| "At one time there were 'Social Justice Warriors' in our country
| who asserted that the data and algorithms left to us after the
| Critical Race Theory Revolution were systematically racist data
| and algorithms, that it would be unseemly for us Social Justice
| Warriors to use those data and algorithms, that those data and
| algorithms should be destroyed and new, 'anti-racist' data and
| algorithms be built. For this they were nicknamed the 'cancel
| mob'.
|
| It goes without saying that such a primitive-woke view of
| computing, of races, of machine learning has nothing in common
| with 'Social Justice Warrior Ideology'. But it undoubtedly exists
| and continues to prevail in the minds of certain of our muddled
| comrades."
|
| The Soviets almost failed to implement the atomic bomb because
| they had an ideological problem with relativity:
|
| "According to a story related by Mr. Holloway, Beria had asked
| Kurchatov shortly before the conference whether it was true that
| quantum mechanics and relativity theory were idealist and
| antimaterialist. Kurchatov reportedly replied that if relativity
| theory and quantum mechanics had to be rejected by Russian
| science, the atomic bomb would have to be rejected, too.
| According to another story in the book, Stalin phrased his
| decision to cancel this way: "Leave them [ the physicists ] in
| peace. We can always shoot them later." He could afford a
| charlatan like Lysenko in biology, but physics was another
| matter. Stalin relied on his physicists for the bomb -- and for
| Soviet status as a superpower. When his first atomic bomb was
| tested in August 1949, five months after the aborted conference,
| those scientists who would have been shot in the event of failure
| received the highest awards: Hero of Socialist Labor and so on,
| down the line."[2]
|
| The 21st century version:
|
| "Leave them [the data scientists] in peace. We can always cancel
| them later."
|
| I love my downvotes. Thank you :).
|
| [1]
| https://www.marxists.org/reference/archive/stalin/works/1950...
|
| [2]
| https://archive.nytimes.com/www.nytimes.com/books/98/12/06/s...
| ASpring wrote:
| I wrote about this exact topic a few years back: "Algorithmic
| Bias is Not Just Data Bias" (https://aaronlspringer.com/not-just-
| data-bias/).
|
| I think the author is generally correct but there is a lot of
| focus on algorithmic design and not on how we collectively decide
| what is fair and ethical for these algorithms to do. Right now it
| is totally up to the algorithm developer to articulate their
| version of "fair" and implement it however they see fit. I'm not
| convinced that is a responsibility that belongs to private
| corporations.
| fennecfoxen wrote:
| > I'm not convinced that is a responsibility that belongs to
| private corporations.
|
| Private corporations are, by and large, the entities which
| execute their business using these algorithms, which their
| employees write.
|
| They are already responsible for business decisions whether
| made using computers or otherwise. Indeed, who else would
| possibly manage such a thing? This is tantamount to saying that
| private corporations should have no business deciding how to
| execute their business -- definitely an opinion you can have,
| it's just that it's an incredibly statist-central-planning
| opinion the end.
| bluesummers5651 wrote:
| One of the first papers I read in this area was very
| interesting in this regard (https://crim.sas.upenn.edu/sites/
| default/files/2017-1.0-Berk...). I think the challenge is
| that a business (e.g. COMPAS) can certainly take a position
| on what definition of algorithmic fairness they want to
| enforce, but the paper mentions six different definitions of
| fairness, which are impossible to satisfy simultaneously
| unless base rates are the same across all groups (the "data
| problem"). Even the measurement of these base rates itself
| can be biased, such as over- or under-reporting of certain
| crimes. And even if you implement one definition, there's no
| guarantee that that is the kind of algorithmic fairness that
| the government/society/case law ends up interpreting as the
| formal mathematical instantiation of the written law.
| Moreover, this interpretation can change over time since
| laws, and for that matter, moral thinking, also change over
| time.
|
| I think the upshot to me is that businesses, whether it's one
| operating in criminal judicial risk assessment or advertising
| or whatever, don't really make obvious which definition (if
| any) of fairness that they are enforcing, and thus it becomes
| difficult to determine whether they are doing a good job at
| it.
| naasking wrote:
| > Indeed, who else would possibly manage such a thing? This
| is tantamount to saying that private corporations should have
| no business deciding how to execute their business
|
| No business is allowed to discriminate against protected
| groups. That's arguably a third-party standard for fairness,
| but I don't think this qualifies as central planning.
|
| I see no reason why other types of third-party standards
| would be impossible or infeasible for machine learning
| applications.
| ASpring wrote:
| Maybe I wasn't very clear, I don't think every single machine
| learning model should be subject to regulation.
|
| Rather I view it more along the lines of how the US currently
| regulates accessibility standards for the web or enforces
| mortgage non-discrimination in protected categories. The role
| of government here is identify a class of tangible harms that
| can result from unfair models deployed in various contexts
| and to legislate in a way to ensure those harms are avoided.
| janto wrote:
| This field is a joke and should be mercilessly mocked. These
| manipulated models do not represent reality. I look forward to
| gaming the systems that are produced by these people.
| mjburgess wrote:
| Many here are in the throws of this "data mythology"...
|
| It is trivial for algorithms to introduce novel racism: consider
| processing text analysing minority causes. These texts contain
| statistical associations between politically sensitive terms,
| negative terms, (and so on). However they are not racist, are not
| born of any racist project, and most humans reading them would be
| moved to more positively regard racial terms. The meaning text
| express consists in what they want their readers to do. It is an
| activity of coordination (writer/reader); and it here it has no
| racist aspect.
|
| A machine if it should have learned anything at all from this
| text, should learn to associate minority terms with virtuous
| political projects; it should have acquired an understanding of
| the many types of associative relation: needing, wanting,
| opposing, supporting, trusting, advocating...
|
| Yet a machine performing an action based on statistical
| associations will not learn these; and so, will act
| prejudicially. It will merely expect terms to co-occur, wholly
| unable to determine why. We expect charity workers to be greedy?
| No, we expect them to morally oppose greed. We expect movies to
| smell like candy? No, we expect them to be sold together. (And so
| on.)
|
| It is somewhat alarming that statistical associations in terms in
| historical texts are being seen as characteristic of human
| communication, meaning, belief, association, ethics, action. I
| think it has led here to a great moral oversight: that absent the
| ability to understand text, machines here are introducing novel
| prejudice where none existed prior.
| darawk wrote:
| I think your example is obscuring the issue here a little bit,
| because you didn't really describe a targeted learning problem:
|
| > It is trivial for algorithms to introduce novel racism:
| consider processing text analysing minority causes. These texts
| contain statistical associations between politically sensitive
| terms, negative terms, (and so on). However they are not
| racist, are not born of any racist project, and most humans
| reading them would be moved to more positively regard racial
| terms. The meaning text express consists in what they want
| their readers to do. It is an activity of coordination
| (writer/reader); and it here it has no racist aspect.
|
| Yes, negative words will _associate_ with black people in such
| literature. They will also likely associate with white people.
| That is, emotionally intense language will associate with race
| in literature related to racial justice issues. That is a
| perfectly valid inference. What you seem to be implying is that
| the model will learn the propositional statement "black people
| are bad", but your example is just about associations, not
| propositional assertions. Associations are not assignments of
| quality, they are just that: associations. Such a model would
| correctly learn such associations and not be in any sense
| biased, morally or otherwise, because the model is not making
| decisions or evaluations of people or moral objects.
|
| The notion of bias usually talked about in ML is in the context
| of either:
|
| 1. Making statistical _decisions_ (e.g. granting a loan, or
| predicting criminal recidivism)
|
| 2. Providing a service of some kind (e.g. facial recognition in
| a camera/video, ad targeting)
|
| Talking about bias in these cases brings the issue into focus,
| because there is a morally relevant objective function in these
| cases. And my point is that in these cases, standard untrained
| ML models are morally neutral. They come to the table with no
| preconceptions. In a society without racism, they will not
| learn racism. In a society without sexism, they will not learn
| sexism. They only reflect what we feed them.
|
| Is it reasonable for me to describe my mirror as biased if I
| don't like how I look? In theory, I could build another
| "mirror" that reflects me the way I want. It just wouldn't
| comport with how I actually look, and we would no longer call
| that object a mirror.
|
| I want to step back for a moment and say that I think we
| probably agree on the object-level facts here. I believe you
| can correct the morally biased ML-output problem at the model
| level or the data level equally well. I'm mostly objecting to
| the linguistic utility of locating the moral bias equally in
| each of them. I think that kind of relativism is just not a
| very useful way to describe things.
|
| I think for some reason this idea has formed that saying the
| bias is in X means we have to correct it in X, but I think it's
| that view that's mistaken. We can and should correct it
| wherever we feel is most efficient and effective.
| mattmcknight wrote:
| Sequence models go beyond co-occurrence.
| lvxferre wrote:
| >that absent the ability to understand text, machines here are
| introducing novel prejudice where none existed prior.
|
| Kinda off-topic, but what haunts me is that this is equally
| true for people. People who don't understand what they read are
| always introducing new prejudices, where none existed prior.
| davidgerard wrote:
| > they are not racist
|
| "having racist effects" counts as a perfectly valid usage of
| the word.
|
| This is a cheap sophistry:
|
| 1. It's only racist if it was intended to be.
|
| 2. You can't read minds.
|
| 3. Therefore, you can't say "this is racist."
|
| This isn't a useful usage of the term, except for attempting to
| deflect people from calling out racist effects.
| concordDance wrote:
| I really wish the word "racist" had a single meaning.
| MichaelGroves wrote:
| Any word with powerful meaning in one context will
| inevitably be repurposed in other contexts by people who
| want to borrow some of that power for their own uses.
|
| Here is an example: https://hn.algolia.com/?q=democratize
| a9h74j wrote:
| In general that won't happen, but one can do close readings
| of presumed valid arguments, and spot cases where
| equivocation (e.g. bringing in different meanings of a word
| within different premises) can invalidate a superficially
| good argument.
| user-the-name wrote:
| It is the norm, not the exception, for words to have
| multiple meanings.
| nine_k wrote:
| Would you call a virus that predominantly infects a
| particular ethnic group "racist"? Would you call alcohol or
| milk "racist" because Europeans, North Siberian and Alaskan
| peoples, and East-Asian peoples react differently to them?
|
| I'd say something like "race-biased" or "race-sensitive"
| would be a more proper term.
|
| You can't read minds, but you can read laws. Laws state the
| intent expressly, and also state the policy expressly. This
| is why "racist" can be very exactly applied to some laws
| (like those mandating segregation, different rights, etc,
| depending on the race). So, to my mind, using the word
| "racist" to denote intentional action makes enough sense.
| davidgerard wrote:
| I would call the thing we're actually talking about, a
| machine set up and administered by humans and applied by
| humans, something that involves human agency.
|
| "Racist" is a perfectly applicable word for its effects
| when applied by the humans applying it.
|
| Some unrelated hypothetical about things not involving
| human agency is irrelevant to the question, and is
| functionally just another attempted deflection.
| [deleted]
| bsanr2 wrote:
| I'm confused and slightly alarmed by the insistence on
| doing everything possible to scrub the prospect of human
| agency from matters that affected by and that effect
| humans. If we see some social ill, shouldn't be combing
| the systems that are related to thei existence for the
| flaws that let those ills happen, instead of just
| pointing at the system and saying, "Well, a human didn't
| make that specific decision, so it must be objective and
| fair."
|
| It's high-tech ventriloquism.
| SpicyLemonZest wrote:
| I think you're misunderstanding the concern. As seen in
| stories like the x-ray race detection thing
| (https://www.wired.com/story/these-algorithms-look-x-
| rays-det...), there's a lot of people in the AI field who
| believe that _any_ correlation with race in a model is a
| "bias" which should be corrected. If a radiology AI model
| stumbles across some unknown condition like sickle cell
| anemia with a strong racial correlation, I think we
| should learn about it, and I worry that under the kinds
| of practices the source article suggests it would just be
| bias adjusted away.
| bsanr2 wrote:
| You're misunderstanding the article. It's talking about a
| situation where an AI model replicated the unscientific
| classification of people by race. That means that, far
| from uncovering race-correlated health issues, it could
| perpetuate unscientific and damaging associations that
| could put people's health at risk: for example, throwing
| out disease diagnoses commonly associated with white
| people that show up for a black person. Of course, the
| central issue is that we wouldn't know _why_ the model
| came to any portion of its conclusions.
|
| Additionally, you're privileging possible advantages over
| concrete and known issues. That's the opposite of risk
| mitigation.
| wayoutthere wrote:
| But that's not race.
|
| Race is explicitly about using _visible_ differences to
| mark a group of people as a "lower" class. Race has nothing
| to do with genetics; it's a sociological invention. There
| are some _correlations_ between ancestral heritage and
| disease prevalence at the population level, but because
| genetic disease susceptibility is generally are not visible
| to the naked eye, conditions cannot be "racist".
|
| Skin color was chosen because it made it easier to identify
| escaped slaves, as all previous attempts to enslave people
| were difficult when the escaped slaves could blend in to
| the population. But other things like "Jewishness" or red
| hair have also been racialized at various points through
| history.
|
| There's also the interesting phenomenon how the definition
| of "white" keeps changing. At first it meant only English
| immigrants, but was later extended to all Protestant
| immigrants, and much later to Catholics (Irish and Italian
| minorities). Its absolutely a made-up distinction, and we
| just group people with dark skin together because it's a
| lazy shortcut.
| Supermancho wrote:
| > Race is explicitly about using visible differences to
| mark a group of people as a "lower" class
|
| A. That is not the usage by definition. That may be your
| interpretation, but that's not the common usage.
|
| B. That is not the common way to determine race (visible
| differences). Genetic markers are generally the indicator
| (eg Ancestry.com).
|
| C. That is not the sole reason (mark people as lower
| class). There are medical reasons to know the genetic
| ancestry of your forefathers. The is easily described as
| "race" and is only useful in a very general practical
| sense insofar as it narrows the possibilities for
| investigation into genetic consequences.
|
| If anyone can point out a good reason in using it for
| more, that isn't looking to justify other or past
| behavior, I'd be interested in hearing about it. I might
| have missed something because I really don't think about
| race too much.
| rovolo wrote:
| Race is correlated with and impacted by
| ancestry/genetics, but it is not how race is defined. For
| example, there is far more genetic diversity within
| Africa than outside of Africa. All of that diversity is
| mostly collapsed into "Black" in US racial classification
| though.
|
| So, questions for you, assuming you're in the US: how do
| you know whether people are Black? I am sure you at least
| know _of_ some individuals who are Black, so how do you
| know that they are Black?
| jhgb wrote:
| > For example, there is far more genetic diversity within
| Africa than outside of Africa. All of that diversity is
| mostly collapsed into "Black" in US racial classification
| though.
|
| That's actually a very nice example of how genetics
| _does_ matter for race because it shows the bottleneck of
| the small population that made it out of Africa.
| rovolo wrote:
| It shows how ancestry and race are somewhat related, but
| it doesn't show that genetics _determine_ race. You can
| pick groups within "Black" who are more closely related
| to a random "White" person than they are to a random
| person from some other "Black" group. You can also pick
| "White" groups who are more closely related to a "Black"
| group than they are to other "White" groups.
|
| You can group people based on genetic similarity, but the
| racial classification of these genetic clumps is
| _socially_ defined.
| jhgb wrote:
| > You can pick groups within "Black" who are more closely
| related to a random "White" person than they are to a
| random person from some other "Black" group.
|
| Sure, for an arbitrarily restricted definition of
| "subgroup", you can do that. Worst case of cherrypicking,
| you pick some two specific individuals who would satisfy
| your scenario (since a one-person group is also a group).
| Likewise you could claim that West Berlin was more
| similar to East Berlin because Frankfurt was much further
| away from both than they were to each other. Not quite
| sure how cherrypicking matters, though.
| jhgb wrote:
| > Race has nothing to do with genetics; it's a
| sociological invention.
|
| I'm pretty sure that modern genetics disagrees. You can
| predict your "sociological invention" from multiple
| genetic markers in like 99.9% of cases or something like
| that. It's hard to argue that "A has nothing to do with
| B" if B extremely successfully predicts A.
| IncRnd wrote:
| > Race is explicitly about using visible differences to
| mark a group of people as a "lower" class.
|
| That's not the definition of either Race or Racism but
| something you created as a definition. It's not something
| you should use to correct others.
| bsanr2 wrote:
| >Would you call a virus that predominantly infects a
| particular ethnic group "racist"?
|
| Depends. It would be wrong to call the virus itself racist,
| but it would also likewise be wrong to focus on the
| mechanics of infection rather than the mechanics of
| transmission, because the latter is the determinig factor
| in whether or not an outbreak occurs. Epidemics and
| pandemics are manufactured crises - that is, they are the
| result of human action. To have one effect one ethnic group
| more than another can conceivably be because of racist
| behavior. In this sense, _the aspect that is important to
| people_ (whether or not one is likely to be infected) can
| involve racism.
|
| >Would you call alcohol or milk "racist" because Europeans,
| North Siberian and Alaskan peoples, and East-Asian peoples
| react differently to them?
|
| Again, it's not wrong to relate racism to these things
| because the central issue - not necessarily whether certain
| people can digest certain nutrients, but why nutrients that
| are only advantageous to certain people are privileged in
| food policy - can be affected by racial bias.
|
| Reminder that racism is not necessarily based on intent;
| disparate impact also constitutes a reasonable rationale.
| [deleted]
| jhgb wrote:
| Computers are not sapient, and therefore can't be prejudiced
| (which would be a necessity for them to be racist) because
| prejudice is a feature of sapient entities. That would be
| like calling an elevator that had a malfunction and fell to
| the ground floor killing everyone inside "a murderer".
| dabbledash wrote:
| I think when the person you're responding to says "they are
| not racist" he is referring to the texts being analyzed, not
| the model. I.e, "your model can take texts about or against
| racism as inputs and from these texts form an association
| between negative sentiment and certain races."
| toxik wrote:
| I don't think you've fully understood how far "statistical
| associations", as your derisively call it, can get you in
| understanding text. Modern language models absolutely make both
| sentiment and semantic distinctions, and they would not
| complete a sentence like "The movie was like" with "candy"
| simply because the word candy exists in texts simultaneously
| with the word "movie". That model would be completely useless.
| sokoloff wrote:
| Then again, we have things like: https://www.google.com/searc
| h?q=when+did+neil+armstrong+set+...
| TchoBeer wrote:
| This isn't because Google's language model is stupid, but
| because it's trying to give you what it thinks you meant
| rather than give you precisely what you put in the search
| bar.
| kbenson wrote:
| The problem here is that unlike the "did you mean to
| search for" text that makes it obvious that they're
| showing you what they thought you were looking for in
| regular search results, they're not doing something like
| that here, and it's unclear if that's because their model
| is so loose that they don't actually have knowledge that
| this is what they're doing (i.e. it's baked in), or if
| it's just an oversight.
|
| The former is a problem because it promulgates incorrect
| information and provides veracity to mistakes and
| misconceptions, and if it's the latter, why wasn't it
| fixed long ago?
| rsfern wrote:
| I agree with your general point, but I think "the movie was
| like candy" could be a perfectly reasonable simile to make.
| Maybe the movie was flashy but had no substance?
|
| I'm not sure if I expect modern language models to work at
| that level of abstraction though? I guess you need a larger
| generated fragment to assess if the hypothetical sentence was
| statistical nonsense or part of some larger conceptual
| sequence
| toxik wrote:
| I think if you read what I wrote more carefully, you'll see
| I never claimed that the model cannot generate "candy" as
| the successor word, but that it wouldn't do so simply out
| of having seen the words colocated. The relative sequence
| order matters to these models, and they do model grammar
| and semantic roles.
| rsfern wrote:
| Yeah, I agree with your assessment.
|
| I thought you had written that sentence as an example of
| some Markova chain non-sequitur, and i was just musing
| that it doesn't seem like a super unlikely sentence.
|
| "The movie was like ticket" maybe makes the distinction
| clearer?
| mjburgess wrote:
| > Modern language models absolutely make both sentiment and
| semantic distinctions
|
| They only appear to because of co-occurrence with pre-given
| sentiment terms.
|
| A conceptual relation between concept A and concept B isn't
| statistical. Eg., "Minorities" oppose "Racism" -- not because
| of any statistical occurrence of any of these terms, nor
| because of any statistical association _at all_.
|
| P(Minority|Racism, Oppose) and P(Minority| Racism, W forall
| words) have no baring on the truth of the proposition.
|
| It is true that in a large enough body of text if we took the
| powerset of all words and computing relative frequencies
| (ie., conditional probabilities on all possible co-
| occurances) we would find that "Minorities" oppose "Racism"
| more than, eg., "smell" it.
|
| But that fact isnt sufficient to make "semantic distinctions"
| -- because the proposition isn't true in virtue of that
| frequency.
|
| NLP systems have no means of generating text other than what
| is justified by those frequencies. This leads to trivial
| failures, such as facts in the world changing which
| invalidates those historical frequency-relationships.
|
| But also to absolutely fundamental failures such as the text
| generated itself being meaningless: the system has nothing it
| wishes to express, because there is no world it is in to
| express anything about. All sentences are just justified by
| previous sentences, not eg., by that there is -- right now --
| a sunny day.
|
| When I say, "do you like the clothes I'm wearing?" i am not
| generating text justified by past frequencies. I am talking
| /about/ the clothes I am wearing, and that is what my words
| /mean/.
| jmoss20 wrote:
| > But that fact isnt sufficient to make "semantic
| distinctions" -- because the proposition isn't true in
| virtue of that frequency.
|
| The trick here is that language models are (currently!)
| demonstrating you /can/ get most of the way to semantic
| distinctions just by analyzing symbol-level statistics.
| Whether you can get all the way is an open question.
|
| I agree with you that "movie theaters" don't "sell candy"
| /because/ of some statistical artifact in large bodies of
| text. Movie theaters sell candy because people want to eat
| candy when they watch movies (and are willing to pay for
| it, etc.).
|
| But this wraps back around: the statistical artifacts
| happen to exist in large bodies of text because it is true.
| So, with enough text, and the right kind of analysis, you
| can tease the semantics back out.
|
| The power in language models is not that they "understand"
| text "the right way", from first principles, with a
| symbolic language model. The power is that they don't have
| to get most of the way there. Perhaps they'll get all the
| way there! And if they do, what then? Are we so sure that
| we don't do the same thing?
| nonameiguess wrote:
| It's an open question, but I certainly suspect the reason
| humans are able to do this is we can synthesize knowledge
| from other sources and not have to rely solely upon
| learning from text. We've been to movie theaters and
| experienced buying candy there, which adds a great deal
| to our understanding of sentence containing the
| associated words without needing to read a hundred
| million sentences about movies and candy and rely only
| upon statistical patterns in the text to understand it.
| cscurmudgeon wrote:
| Doesn't Google use language models heavily? But we still get
| this wrong behavior:
|
| https://twitter.com/xkcd/status/1333529967079120896?lang=en
| TchoBeer wrote:
| Search needs to be under-sensitive to return results people
| want. Often a search isn't for precisely the semantic
| information in the query, but for information generally
| about that topic (not to mention how queries are often not
| actually sentences, e.g. if I want to find a white bird
| with big wings that was eating a fish I might search "white
| bird wings fish". Maybe that's a bad example, I can't think
| of a better one off the top of my head but the point
| stands.)
| cscurmudgeon wrote:
| "Its a feature not a bug"
|
| You are claiming that this behavior is intentional. But
| Google's posts say otherwise.
|
| As outsiders, we can't know either way. Search engines
| were doing this loose matching in 2010 (I worked in one
| such team).
| 908B64B197 wrote:
| Again with the "Ethics in AI".
|
| You can't say it publicly but ethics is a trend. It's trendy
| right now to apply it to AI because there's a lot of academic
| funding going into AI so fields where there's no value production
| (like ethics) will slap "AI" on their grant proposal.
|
| The point of ethics is to shame people or practices you don't
| like via some kind of cancel culture of peer pressure [0]. Of
| course, it's impossible to do it in court as you must actually
| prove things. So you get a large enough mob instead.
|
| There's mostly no point in debating with the ethics crowd (you
| really can't do that since their papers typically won't be
| reproducible). Or acknowledging their existence really.
|
| [0] https://syncedreview.com/2020/06/30/yann-lecun-quits-
| twitter...
| dekhn wrote:
| Let's consider your comments applied to another ethical area:
| eugenics. That was a previous area where scientists did
| something which is now considered highly unethical (remove the
| sexual autonomy of people due to incorrect scientific
| judgements). Would you say the people who opposed eugenics were
| "shaming" the scientists, or helping "guide them to making
| better decisions?"
|
| I think there are many concerns around ethics in AI, but the
| majority of the players are not contributing (folks like
| Timnit, with the stochastic parrots paper, the sections on
| power consumption are completely technically wrong). Sara
| Hooker is a much better spokesperson for this, but I'm really
| struggling to see the point in her paper, beyond "we should
| identify existing data biases and make algorithmic changes to
| reduce them" which just sounds like "we need regularization
| that is the opposite of L1 and L2".
| AlbertCory wrote:
| Your first paragraph is called the Motte and Bailey fallacy
| [1]. You can't win your original argument, so you pick a
| different one that's easier, and then pretend that you won
| the original.
|
| [1] https://en.wikipedia.org/wiki/Motte-and-bailey_fallacy
| 908B64B197 wrote:
| Ironically, it's just what Timnit Gebru did.
|
| "I can't prove algorithms are unethical, therefore I'll
| make a point that the (clean) energy used to train them
| contributes to global warming (by assuming all the energy
| required came from jet fuel)". [0]
|
| And then ethics tries to claim it's a real science!
|
| [0] https://www.technologyreview.com/2020/12/04/1013294/goo
| gle-a...
| dekhn wrote:
| What is truly extraordinary about that paper is that it
| completely ignored the fact that ML is only a tiny
| contributor to overall power usage in computing, and also
| attempted to compare the training to intercontentintal
| flights, which use the same amount of fuel empty of full
| (and the planes do fly nearly empty). All of it makes it
| look like she just wanted to attack ML, rather than make
| a good faith argument.
|
| The other part of the paper (the danger of trusting
| things that stochastically parrot well enough to exit the
| right side of the uncanny valley) is interesting.
| rhizome wrote:
| Since "I can't prove algorithms are unethical" implies a
| demand or requirement to prove a negative, I'm inclined
| to conclude that you're misrepresenting your source.
| claudiawerner wrote:
| That's a very uncharitable (and in my judgement inaccurate)
| characterization of GP's argument - the usage of the MaB
| fallacy as you've done neglects the principle of
| steelmanning your opponent's position, and shifts the
| discussion from a fact-finding mission to "winning an
| argument", both things that are anathema to good discourse.
| The article you linked expands on this criticism of the MaB
| fallacy in the 'Criticism' section.
|
| GP took advantage of an analogy to counter the claim that
| ethics is only about shaming and calling people out.
| Analogical reasoning is a powerful philosophical tool that
| can lead to important insights and discovering new
| questions. In this case, GP used the analogy where we apply
| ethics (relatively uncontroversial) in a scientific
| context. If the comparison is successful, it counters the
| claim that ethics is only a fashionable way of shaming
| people.
|
| Lastly, GP never claimed to 'win' the argument with the
| analogy (even if it were a motte-and-bailey). It's entirely
| possible that GP has more work to do after getting a
| response to the analogy.
| dekhn wrote:
| I didn't even make argument; I'm just asked the author to
| reconsider their statements given that there is historical
| evidence that ethics is more than just shaming and cancel
| culture. It's more
| https://en.wikipedia.org/wiki/Argument_from_analogy and
| https://en.wikipedia.org/wiki/Appeal_to_emotion
| burnished wrote:
| Your comment is the "fallacy fallacy" where every statement
| is countered by the claim that it is in fact a fallacy.
|
| The person you responded to has a cogent reply and you
| haven't done anything to address the meat of the matter.
| 908B64B197 wrote:
| It's interesting you had to pick illegal practices to attempt
| to make a point, while my point was that ethics was used to
| shame practices when the legal system couldn't be used
| because of a lack of rigor.
| dekhn wrote:
| eugenics wasn't illegal at the time.
| https://www.nytimes.com/2014/10/14/science/haunted-files-
| the...
|
| When the Eugenics Record Office opened its doors in 1910,
| the founding scientists were considered progressives,
| intent on applying classic genetics to breeding better
| citizens. Funding poured in from the Rockefeller family and
| the Carnegie Institution. Charles Davenport, a prolific
| Harvard biologist, and his colleague, Harry H. Laughlin,
| led the charge.
|
| ...
|
| By the 1920s, the office had begun to influence the United
| States government. Laughlin testified before Congress,
| advocating forced sterilization and anti-immigration laws.
| Congress complied. The Immigration Act of 1924 effectively
| barred Eastern Europeans, Jews, Arabs and East Asians from
| entering the country. And, at the state level, thousands of
| people who were deemed unfit were sterilized.
| claudiawerner wrote:
| >while my point was that ethics was used to shame practices
| when the legal system couldn't be used because of a lack of
| rigor.
|
| I'm confused as to why that would be a bad thing. For
| instance, liars are shamed, even though it would be
| ridiculous (and harmful) to create laws against lying in
| all cases. At the same time, the law sometimes takes
| account of ethics (rather than harm) to make laws, for
| example, the illegality of corpse desecration is not about
| harm, but about ethics - what it says about our society if
| it were legal.
| 908B64B197 wrote:
| > For instance, liars are shamed, even though it would be
| ridiculous (and harmful) to create laws against lying in
| all cases.
|
| But lying under oath or false advertising are illegal.
|
| > for example, the illegality of corpse desecration is
| not about harm, but about ethics - what it says about our
| society if it were legal.
|
| That's morals, not ethics. And it's rooted in religions.
| claudiawerner wrote:
| >But lying under oath or false advertising are illegal.
|
| That's why I said "in all cases". Clearly, lying isn't
| _only_ immoral in the cases of lying under oath or false
| advertising. It 's still immoral for me to say that a
| ball is "out" in tennis when I clearly saw that it's
| "in", even if doing so is not illegal in most cases.
|
| >That's morals, not ethics. And it's rooted in religions.
|
| The difference is immaterial. Even most ethical codes are
| rooted in religious thinking at some poitn in their
| formation. That doesn't make them invalid ethical codes.
| Ethical codes are generally constructed on the basis of
| morality, or in some cases on the basis of professional
| conduct - but even that isn't only practical.
| bobcostas55 wrote:
| There was nothing incorrect about their scientific judgments.
| Modern behavioral genetics have confirmed both the concerns
| and the potential effectiveness of the eugenicists. What
| changed was that, as a society, we no longer think the trade-
| off is worth it: we are now willing to sacrifice future
| generations to quiet down our conscience. How much of this is
| actually a considered choice, and how much of it is simply
| down to the fact that future people aren't around to
| complain...I'll leave it to you to decide.
| dekhn wrote:
| I think you're probably limiting yourself to the idea that
| "selection can be applied to human breeding" which I do
| agree seems to be scientifically possible. They were
| roughly correct about that (but far from being able to
| exploit that knowledge).
|
| The ethical concerns in eugenics weren't about that at all,
| though, they are about removing autonomy from people based
| on morally and scientifically questionable ideas (the
| eugenicists had many more scientific ideas which were just
| not supported by data, then ones that were).
| nickvincent wrote:
| It just isn't true that "The point of ethics is to shame people
| or practices" or that ethics has "no value production".
|
| In general, a primary factor reviewers in computing conferences
| are asked to consider is the degree to which a submission make
| a "substantial contribution" to the community. What is or isn't
| a substantial contribution is subjective and entirely dependent
| on the prevailing ethical perspectives in various communities.
| Papers -- a key unit of academic progress (for better or worse)
| are entirely subject to concerns of ethics. Certainly, there
| are interesting argument around how much time and paper space
| should be spent on speculating about negative impacts, and
| people are having those conversations.
|
| The fact that ethics is "trending" is because more researchers
| would like tackle ethical challenges explicitly, instead of
| falling back on the default of a given community. For instance,
| here is a paper that quantitatively (with reproducibility!)
| analyzes the values in ML papers.
| https://arxiv.org/abs/2106.15590 This is one way to have a very
| empirically-grounded discussion of the topic.
|
| IMO, many researchers can and do debate topics of ethics in AI,
| and in doing so move the field forward (and increase likelihood
| that computing will have more positive impacts than negative
| ones).
| hellotomyrars wrote:
| The "point of ethics" isn't cancel culture and if you believe
| that you're either being willfully dishonest about what ethics
| is and it's entire history to make your own political statement
| or are actually ignorant of what ethics is, both in reality and
| historically.
|
| By all means if you want to make a political statement about
| how ethics is being applied (which would actually imply ethics
| being used as a smokescreen in this case) go ahead but we've
| been grappling with the concept and idea of ethics for
| thousands of years as a species.
| mgraczyk wrote:
| The linked article is arguing against a straw man that I don't
| think many ML engineers and researchers actually believe.
|
| Whether or not loss functions and model calibration are part of
| the "data" or the "algorithm" is just a question of semantics.
| Nobody who knows anything about ML would argue, as is incorrectly
| suggested by this article, that the choice of loss function plays
| no role in producing bias or inequity.
|
| The actual argument that most closely resembles the straw man in
| this article goes something like this.
|
| "The general project of using deep neural networks is extremely
| flexible and powerful. It is possible to build datasets and train
| deep neural networks so that the biases in those models are
| understood and acceptable. When existing models show bias, there
| are usually engineering solutions that can remove the bias within
| the paradigm of deep learning."
|
| Counter arguments to this reasonable claim are much more
| difficult to defend. The research community rewards
| "whistleblowing" much more strongly than finding solutions, which
| is generally a good thing. But a nasty side effect is that the
| majority of algorithm fairness research is pessimistic, despite
| huge improvements and widespread belief in industry that these
| problems are solvable using known techniques.
___________________________________________________________________
(page generated 2021-08-20 23:01 UTC)