_______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
 (HTM) Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
 (HTM)   A Developer Accidentally Found CSAM in AI Data. Google Banned Him for It
       
       
        josefritzishere wrote 3 hours 19 min ago:
        Oh gross
       
        burnt-resistor wrote 5 hours 9 min ago:
        Technofeudalism strikes again. MAANG can ban people at any time for
        anything without appeal, and sometimes at the whim of any nation state.
        Reversal is the rare exception, not the rule, and only happens
        occasionally due to public pressure.
       
        codedokode wrote 5 hours 36 min ago:
        Slightly unrelated, but I wonder if a 17-year old child sends her dirty
        photo to a 18-year old guy she likes, who goes to jail? Just curious
        how the law works if there is no "abuse" element.
       
          Hizonner wrote 3 hours 41 min ago:
          Both of them, still, in some places, although she may get more
          lenient treatment because she's a juvenile. Other places have cleaned
          that up in various ways, although I think he's usually still at risk
          unless he actively turns her in to the Authorities(TM) for sending
          the picture.
          
          And there's a subset of crusaders (not all of them, admittedly) who
          will say, with a straight face, that there is abuse involved. To wit,
          she abused herself by creating and sending the image, and he abused
          her either by giving her the idea, or by looking at it.
       
            codedokode wrote 8 min ago:
            Obviously, the laws were originally intended to protect children
            from malicious adults,    but nowadays, when every child has a phone
            with a camera, they technically are one tap away from committing a
            crime even without realizing this. Maybe we should do surprise
            phone inspections at schools, or maybe we should restrict using
            camera app only to verified adults.
       
          bigfatkitten wrote 3 hours 44 min ago:
          It comes down to prosecutorial discretion, and that can go either
          way.
          
          Prosecutors have broad discretion to proceed with a matter based on
          whether there is a reasonable prospect of securing a conviction,
          whether it’s in the public interest to do so and various other
          factors. They don’t generally bring a lot of rigour to these
          considerations.
       
          xethos wrote 5 hours 12 min ago:
          One for distribution (of her own image), one for possession. See
          sections 3 & 4
          
 (HTM)    [1]: https://laws-lois.justice.gc.ca/eng/acts/c-46/section-163.1....
       
          jfindper wrote 5 hours 31 min ago:
          Obviously this depends on the country, but many countries have
          so-called "Romeo and Juliet" laws which carve out specific exclusions
          for situations along these lines.
       
            stavros wrote 5 hours 16 min ago:
            Those tend to be about sex, not pornography, no?
       
              jfindper wrote 5 hours 12 min ago:
              I'm certainly no expert, but if my memory serves me correctly
              from cases that have hit the media, the carve outs are more
              broadly applicable than just to sex (e.g. intimate images between
              partners). But I could certainly be wrong!
              
              (I didn't really want to start looking up the exact details of
              this topic while at work, so just went from memory. At the very
              least, the terminology "Romeo & Juliet Law" should give the
              original commenter enough to base a search on)
       
        mflkgknr wrote 5 hours 52 min ago:
        Being banned for pointing out ”Emperor’s new clothes”, is what
        autocrats typically do, because the worst thing they know is when
        anyone embarrass them.
       
        UberFly wrote 6 hours 22 min ago:
        Posting articles that are paywalled is worthless.
       
          stronglikedan wrote 3 hours 31 min ago:
          Finding the non-paywalled link in the comments is trivial.
       
        winchester6788 wrote 6 hours 37 min ago:
        Author of NudeNet here.
        
        I just scraped data from reddit and other sources so i could build a
        nsfw classifier and chose to open source the data and the model for
        general good.
        
        Note that i was a 1 year experienced engineer working solely on this
        project in my free time, so it was basically impossible for me to
        review or clear out the few csam images in the 100,000+ images in the
        dataset.
        
        Although, now i wonder if i should never have open sourced the data.
        Would have avoided lot of these issues.
       
          qubex wrote 5 hours 15 min ago:
          I in no way want to underplay the seriousness of child sexual abuse,
          but as a naturist I find all this paranoia around nudity and “not
          safe for work” to be somewhere between hilarious and bewildering.
          Normal is what you grew up with I guess, and I come from an FKK
          family. What’s so shocking about a human being? All that stuff in
          public speaking about “imagine your audience is naked”. Yeah,
          fine: so what’s Plan B?
       
          markatlarge wrote 5 hours 57 min ago:
          Im the developers who actually got banned because of this dataset. I
          used NudeNet offline to benchmark my on-device NSFW app Punge —
          nothing uploaded, nothing shared.
          
          Your dataset wasn’t the problem. The real problem is that
          independent developers have zero access to the tools needed to detect
          CSAM, while Big Tech keeps those capabilities to itself.
          
          Meanwhile, Google and other giants openly use massive datasets like
          LAION-5B — which also contained CSAM — without facing any
          consequences at all. Google even used early LAION data to train one
          of its own models. Nobody bans Google.
          But when I touched NudeNet for legitimate testing, Google deleted
          130,000+ files from my account, even though only ~700 images out of
          ~700,000 were actually problematic. That’s not safety — that’s
          a detection system wildly over firing with no independent oversight
          and no accountability.
          
          Big Tech designed a world where they alone have the scanning tools
          and the immunity when those tools fail. Everyone else gets punished
          for their mistakes.
          So yes — your dataset has done good. ANY data set is subject to
          this. There needs to be tools and process for all.
          
          But let’s be honest about where the harm came from: a system rigged
          so only Big Tech can safely build or host datasets, while indie
          developers get wiped out by the exact same automated systems Big Tech
          exempts itself from.
       
            lynndotpy wrote 4 hours 48 min ago:
            Agreed entirely.
            
            I want to add some technical details, since this is a peeve I've
            also had for many years now:
            
            The standard for this is Microsoft's PhotoDNA, a paid and gatekept
            software-as-a-service which maintains a database of "perceptual
            hashes." (Unlike cryptographic hashes, these are robust against
            common modifications).
            
            It'd be very simple for Microsoft to release a small library which
            just wraps (1) the perceptual hash algorithm and provides (2) a
            bloom filter (or newer, similar structures, like an XOR filter) to
            allow developers to check set membership against it.
            
            There are some concerns that an individual perceptual hash can be
            reversed to a create legible image, so I wouldn't expect or want
            that hash database to be widely available. But you almost certainly
            can't do the same with something like a bloom filter.
            
            If Microsoft wanted to keep both the hash algorithm and even an XOR
            filter of the hash database proprietary, that's understandable. But
            then that's ok too, because we also have mature implementations of
            zero-knowledge set membership proofs.
            
            The only reason I could see is that security-by-obscurity might be
            a strategy that makes it infeasible for people to find adversarial
            ways to defeat the proprietary secret-sauce in their perceptual
            hash algorithm. But I that means giving up opportunities to improve
            the algorithm, while excluding so many ways it could be useful to
            combat CSAM.
       
              johnea wrote 37 min ago:
              So, given your high technical acumen, why would expose yourself
              to goggle's previously demonstrated willingness to delete your
              career's and your life's archive of communications?
              
              Stop using goggle!
              
              It's as simple, and as necessary, as that.
              
              No technically astute person should use ANY goggle services at
              this point...
       
                lynndotpy wrote 2 min ago:
                I'm not the person in the OP, I've been completely de-googled
                since 2019 and I agree that Google should not be trusted for
                anything important.
       
              markatlarge wrote 2 hours 56 min ago:
              I’m not a CSAM-detection expert, but after my suspension I
              ended up doing a lot of research into how these systems work and
              where they fail. And one important point: Google isn’t just
              using PhotoDNA-style perceptual hashing.
              
              They’re also running AI-based classifiers on Drive content, and
              that second layer is far more opaque and far more prone to false
              positives.
              
              That’s how you get situations like mine: ~700 problematic
              images in a ~700k-image dataset triggered Google to delete
              130,000+ completely unrelated files and shut down my entire
              developer ecosystem.
              Hash-matching is predictable.
              
              AI classification is not.
              And Google’s hybrid pipeline:
              isn’t independently vetted
              isn’t externally audited
              isn’t reproducible
              
              has no recourse when it’s wrong
              
              In practice, it’s a black box that can erase an innocent
              researcher or indie dev overnight. I wrote about this after
              experiencing it firsthand — how poisoned datasets + opaque AI
              detection create “weaponized false positives”: [1] I agree
              with the point above: if open, developer-accessible perceptual
              hashing tools existed — even via bloom filters or ZK membership
              proofs — this entire class of collateral damage wouldn’t
              happen.
              
              Instead, Big Tech keeps the detection tools proprietary while
              outsourcing the liability to everyone else. If their systems
               are wrong, we pay the cost — not them.
              
 (HTM)        [1]: https://medium.com/@russoatlarge_93541/weaponized-false-...
       
              bigfatkitten wrote 3 hours 50 min ago:
              PhotoDNA is not paid, but it is gatekept.
       
              Hizonner wrote 4 hours 21 min ago:
              > There are some concerns that an individual perceptual hash can
              be reversed to a create legible image,
              
              Yeah no. Those hashes aren't big enough to encode any real image,
              and definitely not an image that would actually be either
              "useful" to yer basic pedo, or recognizable as a particular
              person. Maybe they could produce something that a diffusion model
              could refine back into something resembling the original... if
              the model had already been trained on a ton of similar material.
              
              > If Microsoft wanted to keep both the hash algorithm and even an
              XOR filter of the hash database proprietary
              
              That algorithm leaked years ago. Third party code generates
              exactly the same hashes on the same input. There are
              open-literature publications on creating collisions (which can be
              totally innocent images). They have no actual secrets left.
       
                lynndotpy wrote 7 min ago:
                > > There are some concerns that an individual perceptual hash
                can be reversed to a create legible image,
                
                > Yeah no.
                
                Well, kind of. Towards Data Science had an article on it that
                they've since removed: [1] And this newer paper: [2] They're
                not very good at all (it just uses a GAN over a recovered
                bitmask), but it's reasonable for Microsoft to worry that every
                bit in that hash might be useful. I wouldn't want to distribute
                all those hashes on a hunch they could never be be used to
                recover images. I don't think any such thing would be possible,
                but that's just a hunch.
                
                That said, I can't speak on the latter claim without a source.
                My understanding is that PhotoDNA still has proprietary
                implementation details that aren't generally available.
                
 (HTM)          [1]: https://web.archive.org/web/20240219030503/https://tow...
 (HTM)          [2]: https://eprint.iacr.org/2024/1869.pdf
       
            petee wrote 5 hours 51 min ago:
            700 were csam, if I'm reading this right?
       
              rolph wrote 4 hours 55 min ago:
              700 CSAM images, even one is damning, but hundreds are often
              referred to as a cache or horde, normally anyone caught with that
              can wave bye-bye to thier life.
              
              google should be fully accountable for possesion and
              distribution, perhaps even manufacturing.
       
              markatlarge wrote 5 hours 31 min ago:
              That is right:
              
 (HTM)        [1]: https://medium.com/@russoatlarge_93541/canadian-child-pr...
       
              wang_li wrote 5 hours 43 min ago:
              Perhaps these folks should work together to make patches to the
              dataset to remove the problematic images?
              
              E: But also make sure every image in the dataset is properly
              licensed. This would have eliminated this entirely from the get
              go. Playing fast and loose with the distribution rights to these
              images led to this problem.
       
        gillesjacobs wrote 7 hours 6 min ago:
        
        
 (HTM)  [1]: https://archive.ph/awvmJ
       
        amarcheschi wrote 7 hours 15 min ago:
        Just a few days ago I was doing some low paid (well, not so low) Ai
        classification task - akin to mechanical turk ones - for a very big
        company and was - involuntarily, since I guess they don't review them
        before showing - shown an ai image by the platform depicting a naked
        man and naked kid. though it was more barbie like than anything else. I
        didn't really enjoy the view tbh, contacted them but got no answer back
       
          kennyloginz wrote 23 min ago:
          How can I find work like this?
       
          ipython wrote 6 hours 37 min ago:
          If the picture truly was of a child, the company is _required_ to
          report CSAM to NCMEC. It's taken very seriously. If they're not being
          responsive, escalate and report it yourself so you don't have legal
          problems.
          
          See [1] .
          
 (HTM)    [1]: https://report.cybertip.org/
       
            kotaKat wrote 5 hours 34 min ago:
            > It's taken very seriously
            
            Can confirm. The amount of people I see in my local news getting
            arrested for possession that "... came from a cybertip escalated to
            NCMEC from " is... staggering. (And it's almost always Google Drive
            or GMail locally, but sometimes a curveball out there.)
       
            amarcheschi wrote 6 hours 26 min ago:
            Even if it's an Ai image? I will follow through contacting them
            directly rather than with the platform messaging system, then I'll
            see what to do if they don't answer
            
            Edit i read the informations given in the briefing before the task,
            and they say that there might be offensive content displayed. They
            say to tell them if it happens, but well I did and got no answer so
            weeeell, not so inclined to believe they care about it
       
              jfindper wrote 6 hours 25 min ago:
              >Even if it's an Ai image?
              
              This varies by country, but in many countries it doesn't matter
              if it is a drawing, AI, or a real image -- they are treated
              equally for the purposes of CSAM.
       
                amarcheschi wrote 6 hours 22 min ago:
                That's understandable
       
        giantg2 wrote 7 hours 19 min ago:
        This raises an interesting point. Do you need to train models using
        CSAM so that the model can self-enforce restrictions on CSAM? If so, I
        wonder what moral/ethical questions this brings up.
       
          boothby wrote 6 hours 23 min ago:
          I know what porn looks like.  I know what children look like.  I do
          not need to be shown child porn in order to recognize it if I saw it.
           I don't think there's an ethical dilemma here; there is no need if
          LLMs have the capabilities we're told to expect.
       
            Nevermark wrote 5 hours 30 min ago:
            That is a good point. Is the image highly sexual? Are their
            children in the image?
            
            Not a perfect CP detection system (might detect kids playing in a
            room with a rated R movie playing on a TV in the background), but
            it would be a good first attempt filter.
            
            Of course, if you upload a lot of files to Google Drive and run a
            sanity check like this on the files, it is too late to save you
            from Google.
            
            Avoiding putting anything with any risk potential on Google Drive
            seems like an important precaution regarding the growing tyranny of
            automated and irreversible judge & juries.
       
            jjk166 wrote 6 hours 11 min ago:
            AI doesn't know what either porn or children are. It finds
            correlations between aspects of inputs and the labels porn and
            children. Even if you did develop an advanced enough AI that could
            develop a good enough idea of what porn and children are, how would
            you ever verify that it is indeed capable of recognizing child porn
            without plugging in samples for it to flag?
       
              boothby wrote 5 hours 26 min ago:
              LLMs don't "know" anything.  But as you say, they can identify
              correlations between content "porn" and a target image; between
              content labeled "children" and a target image.    If a target image
              scores high in both, then it can flag child porn, all without
              being trained on CSAM.
       
                jjk166 wrote 3 hours 16 min ago:
                But things correlated with porn != porn and things correlated
                with children != children. For example, in our training set, no
                porn contains children, so the presence of children would mean
                it's not porn. Likewise all images of children are clothed, so
                no clothes means it's not a child. You know it's ridiculous
                because you know things, the AI does not.
                
                Nevermind the importance of context, such as distinguishing a
                partially clothed child playing on a beach from a partially
                clothed child in a sexual situation.
       
              wang_li wrote 5 hours 38 min ago:
              So it is able to correlate an image as porn and also correlate an
              image as containing children. Seems like it should be able to
              apply an AND operation to this result and identify new images
              that are not part of the data set.
       
                jjk166 wrote 3 hours 6 min ago:
                No, it found elements in an image that it tends to find in
                images labelled porn in the training data. It finds elements in
                an image it tends to find in images labelled child in the
                training data. If the training data is not representative, then
                the statistical inference is meaningless. Images that are
                unlike any in the training set may not trigger either category
                if they are lacking the things the AI expects to find, which
                may be quite irrelevant to what humans care about.
       
                Nevermark wrote 5 hours 28 min ago:
                The AI doesn’t even need to apply the AND. Two AI queries.
                Then AND the results with one non-AI operation.
       
            cs02rm0 wrote 6 hours 20 min ago:
            They don't have your capabilities.
       
              mossTechnician wrote 6 hours 3 min ago:
              I've seen AI image generation models described as being able to
              combine multiple subjects into a novel (or novel enough) output
              e.g. "pineapple" and "skateboarding" becomes an image of a
              skateboarding pineapple. It doesn't seem like a reach to assume
              it can do what GP suggests.
       
          jsheard wrote 7 hours 15 min ago:
          It's a delicate subject but not an unprecedented one. Automatic
          detection of already known CSAM images (as opposed to heuristic
          detection of unknown images) has been around for much longer than AI,
          and for that service to exist someone has to handle the actual CSAM
          before it's reduced to a perceptual hash in a database.
          
          Maybe AI-based heuristic detection is more ethically/legally fraught
          since you'd have to stockpile CSAM to train on, rather than hashing
          then destroying your copy immediately after obtaining it.
       
            tcfhgj wrote 6 hours 45 min ago:
            > Maybe AI detection is more ethically fraught since you'd need to
            keep hold of the CSAM until the next training run,
            
            why?
            
            the damage is already done
       
              pseudalopex wrote 3 hours 8 min ago:
              Some victims feel this way. Some do not.
       
              tremon wrote 6 hours 20 min ago:
              Why would you think that? Every distribution, every view is
              adding damage, even if the original victim doesn't know (or even
              would rather not know) about it.
       
                tcfhgj wrote 5 hours 59 min ago:
                I don't think it's how it works.
       
                jjk166 wrote 6 hours 7 min ago:
                I don't think AI training on a dataset counts as a view in this
                context. The concern is predators getting off on what they've
                done, not developing tools to stop them.
       
                  pseudalopex wrote 3 hours 9 min ago:
                  Debating what counts as a view is irrelevant. Some child
                  pornography subjects feel violated by any storage or use of
                  their images. Government officials store and use them
                  regardless.
       
        jsnell wrote 7 hours 27 min ago:
        As a small point of order, they did not get banned for "finding CSAM"
        like the outrage- and clickbait title claims. They got banned for
        uploading a data set containing child porn to Google Drive. They did
        not find it themselves, and them later reporting the data set to an
        appropriate organization is not why they got banned.
       
          markatlarge wrote 5 hours 33 min ago:
          I’m the person who got banned. And just to be clear: the only
          reason I have my account back is because 404 Media covered it. Nobody
          else would touch the story because it happened to a nobody. There are
          probably a lot of “nobodies” in this thread who might someday
          need a reporter like Emanuel Maiberg to actually step in. I’m
          grateful he did.
          
          The dataset had been online for six years. In my appeal I told Google
          exactly where the data came from — they ignored it. I was the one
          who reported it to C3P, and that’s why it finally came down. Even
          after Google flagged my Drive, the dataset stayed up for another two
          months.
          
          So this idea that Google “did a good thing” and 404 somehow did
          something wrong is just absurd.
          
          Google is abusing its monopoly in all kinds of ways, including
          quietly wiping out independent developers:
          
 (HTM)    [1]: https://medium.com/@russoatlarge_93541/déjà-vu-googles-usi...
       
          jfindper wrote 7 hours 12 min ago:
          >They got banned for uploading child porn to Google Drive
          
          They uploaded the full "widely-used" training dataset, which happened
          to include CSAM (child sexual abuse material).
          
          While the title of the article is not great, your wording here
          implies that they purposefully uploaded some independent CSAM
          pictures, which is not accurate.
       
            AdamJacobMuller wrote 6 hours 52 min ago:
            No but "They got banned for uploading child porn to Google Drive"
            is a correct framing and "google banned a developer for finding
            child porn" is incorrect.
            
            There is important additional context around it, of course, which
            mitigates (should remove) any criminal legal implications, and
            should also result in google unsuspending his account in a
            reasonable timeframe but what happened is also reasonable. Google
            does automated scans of all data uploaded to drive and caught CP
            images being uploaded (presumably via hashes from something like
            NCMEC?) and banned the user. Totally reasonable thing. Google
            should have an appeal process where a reasonable human can look at
            it and say "oh shit the guy just uploaded 100m AI training images
            and 7 of them were CP, he's not a pedo, unban him, ask him not to
            do it again and report this to someone."
            
            The headline frames it like the story was "A developer found CP in
            AI training data from google and banned him in retaliation for
            reporting it." Totally disingenuous framing of the situation.
       
              jfindper wrote 6 hours 46 min ago:
              "There is important additional context around it, of course,"
              
              Indeed, which is why a comment that has infinitely more room to
              expand on the context should include that context when they are
              criticizing the title for being misleading.
              
              Both the title and the comment I replied to are misleading. One
              because of the framing, the other because of the deliberate
              exclusion of extremely important context.
              
              Imagine if someone accused you of "Uploading CSAM to Google
              Drive" without any other context. It's one of the most serious
              accusations possible! Adding like five extra words of context to
              make it clear that you are not a pedophile trafficking CSAM is
              not that much of an ask.
       
                jsnell wrote 6 hours 0 min ago:
                Fair enough. I'd already included the fact about it being a
                data set in the post once, which seemed clear enough especially
                when my actual point was that the author did not "find" the
                CSAM, and by implication were not aware of it. But I have
                edited the message and added a repetition of it.
                
                I bet the journalists and editors working for 404 will not
                correct their intentionally misleading headline. Why hold a
                random forum post buried in the middle of a large thread to a
                higher standard then the professionals writing headlines shown
                in 30-point font on the frontpage of their publication?
       
                  jfindper wrote 5 hours 58 min ago:
                  >Why hold a random forum post buried in the middle of a large
                  thread to a higher standard then the professionals writing
                  headlines shown in 30-point font on the frontpage of their
                  publication?
                  
                  How many times do I need to repeat that I agree the headline
                  is misleading? Yes, the article here has a shit title. You
                  already made that point, I have already agreed to that point.
                  
                  If I had an easy and direct line to the editor who came up
                  with the title, I would point that out to them. Unfortunately
                  they aren't on HN, that I'm aware, or I could also write a
                  comment to them similar to yours.
       
          jeffbee wrote 7 hours 25 min ago:
          Literally every headline that 404 media has published about subjects
          I understand first-hand has been false.
       
            amelius wrote 6 hours 39 min ago:
            Can we use AI to fix this?
            
            Make an LLM read the articles behind the links, and then rewrite
            the headlines (in a browser plugin for instance).
       
              add-sub-mul-div wrote 6 hours 25 min ago:
              HN already needlessly rewrites headlines with automation and it's
              more annoying to see automation go stupidly wrong than letting
              the original imperfect situation stand. Having outrage about
              headlines is a choice.
       
                amelius wrote 5 hours 51 min ago:
                I don't think HN's rewrite algorithm uses modern LLM
                techniques.
                
                Also, it could be optional. It probably should be, in fact.
       
                  shakna wrote 3 hours 53 min ago:
                  If you edit a title after posting, it will not be rewritten
                  again until a human at Y Combinator comes across it.
       
                  jeffbee wrote 5 hours 16 min ago:
                  My browser integrates an LLM, so I asked it to restate the
                  headline of this one, and it came up with "Developer
                  Suspended by Google After Uploading AI Dataset Containing
                  CSAM" which seems pretty even-handed. Of course, I would want
                  to dial the snark to 11. Many hacker stories can be headlined
                  "Developer discovers that C still sucks" etc.
       
        deltoidmaximus wrote 7 hours 31 min ago:
        Back when the first moat creation gambit for AI failed (that they were
        creating SkyNet so the government needs to block anyone else from
        working on SkyNet since only OpenAI can be trusted to control it not
        just any rando) they moved onto the safety angle with the same idea. I
        recall seeing an infographic that all the major players were signed
        onto some kind of safety pledge, Meta, OpenAI, Microsoft, etc.
        Basically they didn't want anyone else training on the whole world's
        data because only they could be trusted to not do nefarious things with
        it. The infographic had a statement about not training on CSAM and
        revenge porn and the like but the corpospeak it was worded in made it
        sound like they were promising not to do it anymore, not that they
        never did.
        
        I've tried to find this graphic against several times over the years
        but it's either been scrubbed from the internet or I just can't
        remember enough details to find it. Amusingly, it only just occurred to
        me that maybe I should ask ChatGPT to help me find it.
       
          jsheard wrote 7 hours 29 min ago:
          > The infographic had a statement about not training on CSAM and
          revenge porn and the like but the corpospeak it was worded in made it
          sound like they were promising not to do it anymore, not that they
          never did.
          
          We know they did, an earlier version of the LAION dataset was found
          to contain CSAM after everyone had already trained their image
          generation models on it.
          
 (HTM)    [1]: https://www.theverge.com/2023/12/20/24009418/generative-ai-i...
       
        bsowl wrote 7 hours 35 min ago:
        More like "A developer accidentally uploaded child porn to his Google
        Drive account and Google banned him for it".
       
          jkaplowitz wrote 7 hours 26 min ago:
          The penalties for unknowingly possessing or transmitting child porn
          are far too harsh, both in this case and in general (far beyond just
          Google's corporate policies).
          
          Again, to avoid misunderstandings, I said unknowingly - I'm not
          defending anything about people who knowingly possess or traffic in
          child porn, other than for the few appropriate purposes like
          reporting it to the proper authorities when discovered.
       
            burnt-resistor wrote 5 hours 0 min ago:
            That's the root problem with all mandated, invasive CSAM scanning.
            (Non-signature based) creates an unreasonable panopticon that leads
            to lifelong banishment by imprecise, evidence-free guessing. It
            also hyper-criminalizes every parent who accidentally takes a
            picture of their kid without being fully dressed. And what about
            DoS victims who are anonymously sent CSAM without their consent to
            get them banned for "possession"? While pedo is gross and evil no
            doubt, but extreme "think of the children" measures that sacrifice
            liberty and privacy create another evil that is different. Handing
            over total responsibility and ultimate decision-making for critical
            matters to a flawed algorithm is lazy, negligent, and immoral.
            There's no easy solution to any such process, except requiring
            human review should be the moral and ethical minimum standard
            before drastic measures (human in the loop (HITL)).
       
            jjk166 wrote 6 hours 22 min ago:
            The issue is that when you make ignorance a valid defense, the
            optimal strategy is to deliberately turn a blind eye, as it reduces
            your risk exposure. It further gives refuge for those who can
            convincingly feign ignorance.
            
            We should make tools readily available and user friendly so it is
            easier for people to detect CSAM that they have unintentionally
            interacted with. This both shields the innocent from being falsely
            accused, and makes it easier to stop bad actors as their activities
            are detected earlier.
       
              pixl97 wrote 3 hours 43 min ago:
              No, it should be law enforcement job to determine intent, not a
              blanket you're guilty. This being Actus Reus is a huge mess that
              makes it easy to frame people and get in trouble with no guilty
              act.
       
                jjk166 wrote 2 hours 42 min ago:
                Determining intent takes time, is often not possible, and
                encourages people to specifically avoid the work to check if
                something needs to be flagged. Not checking is at best
                negligent. Having everybody check and flag is the sensible
                option.
       
       
 (DIR) <- back to front page