[HN Gopher] US Government plans to develop AI that can unmask an...
       ___________________________________________________________________
        
       US Government plans to develop AI that can unmask anonymous writers
        
       Author : NickRandom
       Score  : 81 points
       Date   : 2022-09-30 13:59 UTC (9 hours ago)
        
 (HTM) web link (reclaimthenet.org)
 (TXT) w3m dump (reclaimthenet.org)
        
       | frozenlettuce wrote:
       | The thing is, AI is a good mask for a "backend process that you
       | don't need to explain how it works". Assuming that the US
       | government already has private conversations on multiple content
       | and messaging platforms, this AI will provide the perfect excuse
       | to connecting a blog post with a given id in a process.
        
         | philipkglass wrote:
         | My first cynical take was that this will be used for "hunch
         | laundering." There could be no indication that user A is an
         | alias for user B, other than someone's hunch, but getting a
         | computer to say that they match might be good enough to get a
         | warrant when someone's hunch wouldn't be. It would be similar
         | to having drug sniffing dogs affirm their handlers' feelings.
        
       | hoosieree wrote:
       | Creepy factor aside, a similar tool for attribution would be very
       | useful for content creators (or copyright holders) currently
       | worried about stable diffusion.
        
       | hilbert42 wrote:
       | In my case and I suppose for most HN posters there'd be little
       | point--for on demand Ycombinator would be compelled to hand over
       | email and IP addresses to Government--as I'd reckon in most
       | instances that'd be a much easier and faster way of obtaining
       | relevant information.
       | 
       | In an era when privacy has become hugely diminished under the
       | hands of both governments and corporate interests it raises the
       | question of what rights to anonymity anyone has in either a
       | public or private forum, and at present there's little if any
       | consensus on this which ought to signal that any such project is
       | premature.
       | 
       | Unlike yours truly--who usually speaks his mind irrespective of
       | whether he's known to his audience or does so anonymously--many
       | will not speak their minds out of fear of being ridiculed, or
       | humiliated, or exposed, or out of the risk of offending--risking
       | the breakup of a friendship, etc. Same goes for whistleblowers
       | whose public utterances, if not done anonymously, usually costs
       | them their jobs.
       | 
       | If people fear that their autonomy to act in an anonymous manner
       | has been removed then they're unlikely to act at all, silence
       | being the better part of discretion.
       | 
       | This would have huge negative repercussions for society, our
       | institutions and our governance--after all, the secret ballot is
       | one of the cornerstones of our democracies. If we're not careful
       | AI could undermine the ballot by unmasking what users think or
       | how they actually vote and it's not hard to see how this would
       | lead to coercion thence totalitarian government.
       | 
       | That said, in this world of widespread almost instant
       | communications, actors who intentionally act out of bad faith can
       | do widespread damage, especially so when they do so anonymously.
       | Knowing who they are would minimize the damage they are able to
       | cause.
       | 
       | Similarly, in a distantly-related post on HN a few days ago I
       | referred to the increasing loss of respect for our important
       | institutions and for the way we're being governed and how I
       | thought that faith could be restored. There, I suggested that as
       | a part of that process we need to unmask the hidden processes of
       | government and that this would also include the naming of those
       | who originate policy, law, etc.:
       | 
       |  _" If we're to restore any faith in our governance then this
       | protection [hiding originators of policy] must stop. Decisions
       | made by government employees must be open to public scrutiny,
       | similarly, the origins of government policy--laws, regulations
       | etc.--must be traceable back to its source (those who initiated
       | said policies).
       | 
       | Systems without accountability will always become corrupt."_
       | 
       | Thus, there's a real dichotomy at work here. For some things
       | anonymity is essential, at other times it's a curse. And from the
       | many recent instances of where the gnomes within government
       | haven't acted in our best interests then I'm damned sure that
       | putting AI to work here won't bode well for us either.
       | 
       | I've little doubt that the technology will be abused, and by
       | virtue of the fact it will automatically silence a large
       | proportion of the population who need speak out and who should do
       | so anonymously in the interests of all. Even if they aren't
       | targeted directly just knowing that there are systems in place
       | that have the potential to expose them would be sufficient to
       | silence many--as AI analysis of their words could be used to
       | determine their identity at any future time (living with ongoing
       | stress from potential exposure of one's ID would likely be
       | unbearable for some).
       | 
       | Given past history and current bad behavior of governments in
       | these areas, I do not believe that it is possible to put such a
       | system in place that would gain the full confidence of all
       | players involved. It would have to have sufficient protections
       | locked in place to provide full public accountability as well as
       | having inbuilt mechanisms that would ensure the system could not
       | be abused by governments. At present, such conditions cannot be
       | realistically met--not by a long shot.
       | 
       | Before anyone or any entity could let AI loose on this project
       | and simultaneously state with all honesty that sufficient
       | protections were in place for the project to proceed with safety
       | would require many other prerequisite protections and 'safety
       | measures'--which currently do not exist--to be incorporated
       | (locked) into our governance. For instance, a whole raft
       | definitions and concomitant laws pertaining to privacy are needed
       | --and that's just for starters.
       | 
       | No doubt this project will proceed without those prerequisite
       | protections, ipso facto, it will also be abused.
       | 
       |  _PS: note my quoted point about government policy etc. being
       | open to public scrutiny. Here such questions arise such as where
       | did this idea originate, what are the names of its instigators
       | and what are their motives for instigating this development--not
       | to mention others such as what are their qualifications,
       | experience, etc. (perhaps, given the enormous potential of this
       | AI application to damage society, we may even need to pose
       | questions concerning their political beliefs and allegiances).
       | 
       | It's no accident that this information is missing with this
       | announcement._
        
       | throwamon wrote:
       | Didn't they already use something like this to supposedly unmask
       | Satoshi Nakamoto?
        
         | Grimburger wrote:
         | Nakamoto Satoshi has not been unmasked despite the very short
         | list of people capable/interested in creating what he made.
         | 
         | Stylometric analysis did suggest a single person on that list.
         | The easier thing for governments to do at the time would have
         | been to just spin up a node in the first year and look at the
         | IP addresses.
         | 
         | He had no desire to become known back then and likely never
         | will. It's only more dangerous now compared to the threat
         | before of being locked up like the LibertyCoin guy (who just
         | got released a year ago).
         | 
         | NS is happy to stay in the shadows, nearly everyone respects
         | that decision especially in a world of crypto scams and ponzis.
         | Surprised they never linked the domain name purchase to him
         | though.
        
       | ortusdux wrote:
       | I wonder how things like Gmail's smart auto-complete would affect
       | these efforts.
        
       | ezekg wrote:
       | Don't talk bad about your government, folks. We're going to be
       | entering a new age of technological tyranny.
        
         | orangepurple wrote:
         | @ezekg, GovAI has detected that your post violates community
         | guidelines. Your COVID pass is RED for 72 hours to protect
         | yourself and others.
        
         | imglorp wrote:
         | The government appears more worried about managing dissent than
         | anything else, like what actual threats people are talking
         | about.
        
         | alexbiet wrote:
         | @ezekg, GovAI has detected unlawful talk posted from your
         | account. Your CBDC account is locked for 48 hours.
        
           | brippalcharrid wrote:
           | Further violations will lead to the balances of your close
           | friends and family being adjusted by -20%, and the balances
           | of acquaintances being adjusted by -5%. Help protect against
           | the threat of misinformation and safeguard your Balance for
           | up to 28 days by reporting anything that you think could lead
           | to harm. Remember, We're All In This Together.
        
             | Psychoshy_bc1q wrote:
             | bitcoin fixes this.
        
             | ezekg wrote:
             | > We're All In This Together.
             | 
             | Sent chills down my spine.
        
       | daniel-cussen wrote:
       | This is why I write my shit under my own legal name, even going
       | to notarize this account at some point. Yeah throwaway yeah.
       | Anonymous speech. Oh yeah darknet, Tor, cryptography, like yes
       | sometimes, but it's a game of cat and mouse, it's purely a
       | question of cost.
       | 
       | Furthermore I consider games like poker or Magic the Gathering
       | unplayable, that is the extent to which there is literally
       | absolutely no privacy.
       | 
       | But don't mind me, just got lobotomized is all.
        
       | xani_ wrote:
       | I'm sure it will not be used in malicious way
        
       | blakesterz wrote:
       | There's a great book on this kind of thing, Author Unknown: On
       | the Trail of Anonymous, by Don Foster. This was written way back
       | in 2000.
       | 
       | People have been doing this for decades.
        
         | runjake wrote:
         | _> People have been doing this for decades._
         | 
         | The _key point_ here is that it 's AI-driven and at scale -- in
         | other words, mass surveillance.
         | 
         | Personally, I see this as part of the US IC's mission, despite
         | the potential domestic detriment.
        
       | LinuxBender wrote:
       | Here [1] is a previous discussion on this as well.
       | 
       | [1] - https://news.ycombinator.com/item?id=33009545
        
       | Zigurd wrote:
       | Moreover, a stylometry analysis will reveal when I started to
       | limit myself to one "moreover" for every two or three chapters.
        
       | narrator wrote:
       | How about using this to find bot accounts?
        
       | michaelwww wrote:
       | I'm old so I've been planning on this for awhile. I have a folder
       | that contains all my personal data: photos and videos, journal,
       | all my saved social media posts, all my emails and all my
       | anonymous handles leading to everything I've ever written online.
       | My thinking is an AI could create a reasonable facsimile of
       | myself that my descendants could have a conversation with. I
       | think it'd be better than a autobiography since Joyce Carol Oates
       | convinced me by something she tweeted that no one reads
       | autobiographies, not even close family, unless you are famous.
        
         | oneoff786 wrote:
         | If I had a bot that replicated my great great ancestor I'd
         | probably get bored quickly and then try to prod it into
         | revealing it's deeply outdated and inappropriate social views
        
           | michaelwww wrote:
           | You're right, the novelty would wear off quickly, but it
           | doesn't hurt anything for me to organize a data set about
           | myself just in case
        
       | lwneal wrote:
       | The best protection against this type of de-anonymization is to
       | take measures now, while you still have time, to prevent it. It
       | is possible to change the style of one's writing by using a
       | language model which alters the original text in order to create
       | a new piece with a different style. For example, to translate
       | your text into the grandiose and flowing diction of a bygone era,
       | you might consider the project below.
       | 
       | [1] https://github.com/lwneal/victorianhackernews
        
         | boarnoah wrote:
         | There is a case to be made, not just for natural language but
         | code.
         | 
         | AFAIK there is quite a bit of examples from security labs where
         | malware authors aren't necessarily identified but at least
         | fingerprinted based on naming conventions, patterns they use
         | across multiple projects etc...
         | 
         | That sort of fingerprinting could expand to correlating
         | someone's anonymous software projects to other examples of code
         | elsewhere (ex: if they contribute to source available stuff).
         | 
         | re: the example project you mention specifically, it does feel
         | like using tools like that almost as a linter for natural
         | language would be a fingerprint in itself.
         | 
         | EDIT: As far as OPSEC goes, a fun tidbit. A friend of mine
         | identified a PR I submitted anonymously to them, simply because
         | of the style of PR comments I made.
        
         | doliveira wrote:
         | Aren't GANs all about creating both the generator and the
         | discriminator? Seems to me you can also build the "reverser"
         | quite easily.
        
         | sn41 wrote:
         | I find the examples given in the README to be quite tame for
         | Victorian English. Compare it with the ending lines of A Tale
         | of Two Cities:
         | 
         | "It is a far, far better thing that I do, than I have ever
         | done; it is a far, far better rest that I go to than I have
         | ever known.",
         | 
         | or this from Pride and Prejudice:
         | 
         | "However little known the feelings or views of such a man may
         | be on his first entering a neighbourhood, this truth is so well
         | fixed in the minds of the surrounding families, that he is
         | considered as the rightful property of some one or other of
         | their daughters."
        
         | MichaelCollins wrote:
         | Tools like this probably fool traditional stylometry, but what
         | about de-anonymization tools that find similar _ideas_ , not
         | writing style? Perhaps most people have boring common ideas
         | they got from others, but the sort of people the US Government
         | is most interested in are likely quirkier than most.
        
       | Kukumber wrote:
       | They demonized China for doing things like that
       | 
       | Now they'll copy China
       | 
       | I find it very funny
        
         | xani_ wrote:
         | "How dare they do it before us"
         | 
         | USA next decade:
         | 
         | "Posting bad things on twitter reduces your credit score"
        
           | hardnose wrote:
           | Credit scores are not determined by the government. If a
           | credit ratings agency took that step, it would harm them
           | because Twitter posts are unlikely to represent a meaningful
           | variable when predicting someone's creditworthiness.
           | 
           | Don't confuse that with "social credit" systems, whereby
           | China prevents you from riding trains if you say something
           | naughty.
        
             | wahnfrieden wrote:
             | (citation needed - that's not actually implemented in china
             | at scale, though it's a convenient talking point in the
             | west)
        
               | egberts1 wrote:
               | It is called She Hui Xin Yong Ti Xi  .
               | 
               | Try and keep up.
               | 
               | https://en.m.wikipedia.org/wiki/Social_Credit_System
        
               | wahnfrieden wrote:
               | Try and read.
               | 
               | Where exactly does it say it's ever progressed beyond
               | trials and announcements ie implemented at scale - oh it
               | doesn't
        
               | egberts1 wrote:
               | seek and ye shall find
               | 
               | https://nhglobalpartners.com/china-social-credit-system-
               | expl...
        
               | wahnfrieden wrote:
               | thank you. the most specific citation I could find in
               | your link was this, regarding the 80% rollout statistic:
               | 
               | >As of December 2020, more than 80 percent of all the
               | provinces, autonomous regions, and municipal cities had
               | issued or were preparing to issue local credit laws and
               | regulations.
        
               | egberts1 wrote:
               | I'm quite sure they are trying to automate this as well.
        
             | egberts1 wrote:
             | Or that VP of Apple quoting from a movie called "Arthur" at
             | an auto show resulting in him being fired.
             | 
             | It has begun right here in the United States and we just
             | are oblivious to these new dastardly form of social
             | credits.
        
             | [deleted]
        
             | Kukumber wrote:
             | government, FANG, it's all the same, the same group of
             | lobby talking and coordinating with each other, hiring CIA
             | agents and bunch of 'friends'
             | 
             | https://mronline.org/2022/07/27/national-security-search-
             | eng...
             | 
             | that's why the US wants to ban TikTok asap, because they
             | don't want china to be able to do what they are doing for
             | decades too
             | 
             | > it would harm them because Twitter posts are unlikely to
             | represent a meaningful variable when predicting someone's
             | creditworthiness.
             | 
             | people get fired and arrested already for posting stuff on
             | twitter, in both the US and Europe, so no, it's not just
             | just a "twitter moderation" thing
             | 
             | Ask yourself why they are allowed to exist and still
             | operate despite unable to grow and are loosing money for
             | years, talk about anti-competitive practices, unless it's
             | in reality a government body in disguise
        
               | hardnose wrote:
               | > government, FANG, it's all the same
               | 
               | If you believe that, then you also support efforts to
               | force big tech to respect freedom of political speech,
               | yes?
        
         | hardnose wrote:
         | Developing a technology that can do that doesn't infringe on
         | anyone's liberties.
         | 
         | Failing to develop that technology, leaving it to China or some
         | other authoritarian state to do, would be more likely to harm
         | liberties, wouldn't it?
         | 
         | Seems like you're just "damned if you do, damned if you
         | don't"-ing, no offense.
        
           | pessimizer wrote:
           | > Developing a technology that can do that doesn't infringe
           | on anyone's liberties.
           | 
           | Why are you making this up?
        
       | bitL wrote:
       | "Hey Joe, run that article of yours through the anonymizing AI
       | first!"
        
       | 1970-01-01 wrote:
       | Voynich manuscript ---> US Govt AI stylometry machine ---> 42
        
       | PointA2B wrote:
       | Content marketers in the digital marketing space commonly put
       | blog posts through "spinners" that take your text and modify it
       | through replacing words/phrases with similar equivalents. This
       | lets you take one article and turning it into 5-10+ unique ones,
       | even though they still discuss the same things. It would be a
       | shame if a service like this was marketed towards those
       | interested in privacy, it would probably break this entire
       | system...
        
         | ElementaryElk wrote:
         | I've found plenty of articles that seem to be run through these
         | spinners, but hand made corrections are likely to be necessary
         | (unless it can be automated with ML for example) as you can
         | almost always tell that something is odd based on context
         | lacking word choices
        
           | PointA2B wrote:
           | And thats exactly what newer programs do, look at AppSumo and
           | its practically all of them. The older gen simply used a
           | giant dictionary, then picked a random option from the list
           | of acceptable choices.
        
       | swayvil wrote:
       | What we need is a speaking style anonymizer. Like a language
       | translator except it translates your text into some kind of
       | stylistic uniformness.
       | 
       | We're flexible on that format. Anything legible and relatively
       | easy to translate into. Call it ANONSPEAK.
       | 
       | It will probably be, aesthetically, horrible.
        
         | hoosieree wrote:
         | There exists such a practice, and it is known as academic
         | publishing.
         | 
         | Every verb is done in passive voice, punctuation is added -
         | wherever possible - to make sentences appear more complex than
         | they need be, and of course there is an effervescent use of
         | sesquipedalian terms where shorter similes would otherwise
         | suffice.
        
         | __jambo wrote:
         | Seems very doable given the state of google translate.
         | 
         | Trouble is if you are a revolutionary leader of some kind you
         | are probably going to be saying new things that no-one else
         | talks about - which renders both anonspeak and the ai detection
         | kind of redundant.
         | 
         | I guess the application for this then is in the interim to stop
         | people or online groups becoming revolutionary by tracking and
         | deradicalising them with targeted manipulation.
        
       | dehrmann wrote:
       | _Plans?_ I assumed lots of people were already working on this.
       | There 's already a lot of training data out there, and I suspect
       | most users can be identified by use of a handful of uncommon
       | trigrams and sentence stats. I know you can recognize things I've
       | written at work because they have real em dashes--people rarely
       | type with those.
        
         | ben_w wrote:
         | I briefly considered making one. My idea was simple -- build a
         | Markov chain for each person plus the text with the unknown
         | author, do a dot product of the intersection of the all the
         | chains, pick the author with the best match. Never got around
         | to it. Perhaps this weekend?
        
         | MonkeyMalarky wrote:
         | Someone unintentionally did something similar with HN comments
         | to find users who sound most similar to you/eachother and
         | people were finding their throwaway and alt accounts.
        
           | vmoore wrote:
           | Yes I recall that: 'Find Your Hacker News Doppelganger':
           | 
           | https://news.ycombinator.com/item?id=27568709
        
             | dhosek wrote:
             | Interesting. There was commentary about finding
             | anonymous/throwaway accounts, but on mine, I did not find
             | my anonymous account (although I use that very rarely). The
             | accounts that turned up seemed to be all real, and in a
             | couple cases I could guess what might have made the checker
             | match us (e.g., mentions of MFAs or Apple //e or similar
             | politics), but not all. I didn't notice any linguistic
             | similarities.
        
               | rkagerer wrote:
               | Yeah back then I felt mine didn't produce good matches
               | either.
        
             | ravenstine wrote:
             | Neat experiment, though it took me fewer than 30 seconds to
             | rule out my nearest "doppelganger". Too many patterns there
             | I've never used.
        
         | copperx wrote:
         | About 15 years ago, at JHU, I heard about an algorithm that
         | detected a writer's gender with more than 90% of accuracy, and
         | the NLP professor considered that problem solved.
        
         | kps wrote:
         | I use em dashes -- though with space -- and also ellipses...
         | Also 2x4s (that are actually 11/2x31/2), where 2 [?] 4 above
         | -270degC, and footnotes1 and all that.
         | 
         | 1 There.
        
         | [deleted]
        
         | wsinks wrote:
         | I forget that they're called em dashes -- but I also love to
         | use them to offset what I say from other people.
        
           | Cupertino95014 wrote:
           | They've broken more than one Python script of mine. That'll
           | make you declare everything UTF-8.
        
           | pessimizer wrote:
           | If you have a compose key, (-) is compose + (-) (-) (.)
           | 
           | Or compose + (-) (-) (-) for (--) which is wider depending on
           | the font.
        
         | lajamerr wrote:
         | The next step after is to build an AI to convert your style of
         | writing to another person's style.
        
           | notadev wrote:
           | Running all writing through AI to re-write everything in a
           | different style while conveying the same information might
           | work. Create a way to apply it to all online writing and you
           | have something like a writing VPN.
        
           | klabb3 wrote:
           | Exactly. You barely need "AI" for this. Changing style enough
           | to put some sand in a sophisticated text identifier could be
           | as simple as introducing some spelling and grammar
           | variations. It could easily be commoditized (isn't grammarly
           | doing this already minus the privacy part?). Of all cat and
           | mouse games LE and intelligence are playing, for this one I'm
           | betting on the mouse.
        
         | badrabbit wrote:
         | NSA already does this. "Stylometry" I believe is the term.
         | Perhaps they used heuristics and algorithms so far? I would NLP
         | was good enough for this years ago.
        
           | dehrmann wrote:
           | Back in 2015 you might have some linguists work with data
           | scientists to do feature engineering and use those as inputs
           | to an LR model. I suppose you can let a deep learning model
           | feature engineer for you, but either way, you'll get to some
           | of the same heuristics you're thinking of.
        
       | Zak wrote:
       | I built something like that more than a decade ago to identify
       | alter-egos in an online game from in-game chat. It was reasonably
       | successful and I thought about commercial applications for it,
       | but ultimately decided most things that could be used for are
       | creepy or evil.
       | 
       | I remember hearing DARPA was actively seeking research in the
       | field around that time. In principle, I'm not absolutely against
       | my software being part of the chain of events that leads to the
       | decision to kill someone, but I don't trust the US government
       | (or, realistically, anybody else) to independently verify an
       | identification made by such a system.
       | 
       | I'd be surprised if the three letter agencies aren't using
       | something at least as good as what I wrote by now.
        
         | copperx wrote:
         | How sophisticated was your system? It sounds like you used
         | cutting edge NLP techniques at the time.
        
           | Zak wrote:
           | I'd describe it as fairly simple. It was just a classifier
           | where each account name was a category: there was no fancy
           | NLP. It used a single feature type and an algorithm from a
           | well-known family. I don't want to say what either was lest I
           | further proliferate the technique.
           | 
           | I cross checked using statistically improbable words, which
           | helped confirm or exclude weak matches.
        
       | eftychis wrote:
       | https://news.ycombinator.com/item?id=33037319
       | 
       | https://news.ycombinator.com/item?id=33034918
       | 
       | Am I the only one seeing the irony and contradiction here? Not
       | the same people at all -- subgroups at best , but the Director of
       | National Intelligence is part of the administration. Perhaps I am
       | missing something -- feel free to comment -- I am curious what
       | everyone thinks.
        
       | causi wrote:
       | Wouldn't it be more accurate to say it links anonymous writings
       | together? If you don't have any writings under your real name
       | there's nothing it can do except indicate two pseudonyms belong
       | to the same person.
        
         | andy_ppp wrote:
         | You're assuming they don't have copies of every email you ever
         | sent...
        
       | vmoore wrote:
       | If I want to write anonymously, I cycle my text through Google
       | Translate multiple times and keep all the grammatical errors
       | intact. So, English > Italian, and then Italian > French, then
       | back to English.
       | 
       | I also pass it into Hemingway[0] first to make my text lean and
       | non-superfluous.
       | 
       | [0] https://hemingwayapp.com/
        
       | unethical_ban wrote:
       | So we'll have AI that can mask writer's identity in about three
       | months.
        
         | lm28469 wrote:
         | Text to text AI, it should be relatively easy to do too
        
           | MengerSponge wrote:
           | Basically exists already, right? A lot of college kids are
           | using it for their writing assignments.
        
         | ezekg wrote:
         | It'll be AIs all the way down.
        
         | yamtaddle wrote:
         | But it'll erase all my anachronistic grammatical and style
         | preferences! Then what's even the point of writing?
         | 
         | (it's _role_ and _& c._, goddamnit)
        
         | DharmaPolice wrote:
         | If precision isn't super important you can already run your
         | text through machine translation into another language and then
         | back again. Spammy content sites already seem to do that to
         | avoid copyright detection.
        
       | glitchc wrote:
       | We can get around this by writing something and sending it
       | through GPT-3 for "style correction".
        
       ___________________________________________________________________
       (page generated 2022-09-30 23:01 UTC)