[HN Gopher] Launch HN: Reality Defender (YC W22) - Deepfake Dete...
       ___________________________________________________________________
        
       Launch HN: Reality Defender (YC W22) - Deepfake Detection Platform
        
       Hi HN, we're Ben, Gaurav and Ali from Reality Defender
       (https://www.realitydefender.ai). We help companies, governments,
       and journalists determine if media is real or fake, focusing on
       audio, video and image manipulation. Our API and web app provide
       real-time scanning, risk scoring, and PDF report cards.  Recent
       advancements in machine learning make it possible to create images,
       videos and audio of real people saying and doing things they never
       said or did. The recent spread of this technology has enabled
       anyone to create highly realistic deepfakes. Although some
       deepfakes are detectable to the eye by experienced observers who
       look closely, many people either don't have experience or are not
       always looking closely--and of course the technology is only
       continuing to improve. This marks a leap in the ability of bad
       actors to distort reality, jeopardizing financial transactions,
       personal and brand reputations, public opinion, and even national
       security.  We are a team with PhD and Master degrees from Harvard,
       NYU and UCLA in data science. Between us, we have decades of
       experience at Goldman Sachs, Google, CIA, FDIC, Dept of Defense and
       Harvard University Applied Research at the intersection of machine
       learning and cybersecurity. But our current work began with a
       rather unlikely project: we tried to duplicate Deepak Chopra. We
       were working with him to build a realistic deepfake that would
       allow users to have a real-time conversation with "Digital Deepak"
       from their iPhones. Creating the Deepak deepfake was surprisingly
       simple and the result was so alarmingly realistic that we
       immediately began looking for models that could help users tell a
       synthetic version from the real thing.  We did not find a reliable
       solution. Frustrated that we'd already spent a week on something we
       thought would take our coffee break, we doubled down and set out to
       build our own model that could detect manipulated media.  After
       investigating, we learned why a consistently accurate solution
       didn't exist. Companies (including Facebook and Microsoft) were
       trying to build their own silver-bullet, single-model detection
       methods--or, as we call it, "one model to rule them all." In our
       view, this approach will not work because adversaries and the
       underlying technologies are constantly evolving. For this same
       reason there will never be a single model to solve anti-virus,
       malware, etc.  We believe that any serious solution to this problem
       requires a "multi-model'' approach that integrates the best
       deepfake detection algorithms into an aggregate "model of models."
       So we trained an ensemble of deep-learning detection models, each
       of which focuses on its own feature, and then combined the scores.
       We challenged ourselves to build a scalable solution that
       integrates the best of our deepfake detection models with models
       from our collaborators (Microsoft, UC Berkeley, Harvard). We began
       with a web app proof of concept, and quickly received hundreds of
       requests for access from governments, companies, and researchers.
       Our first users turned to our platform for some deepfake scenarios
       ranging from bad to outright scary: Russian disinformation directed
       at Ukraine and the West; audio mimicking a bank executive
       requesting a wire transfer; video of Malaysia's government
       leadership behaving scandalously; pornography where participants
       make themselves appear younger; dating profiles with AI-generated
       pro pics. All of these, needless to say, are completely fake!  As
       with computer viruses, deepfakes will continue evolving to
       circumvent current security measures. New deepfake detection
       techniques must be as iterative as the generation methods. Our
       solution not only accepts that, but embraces it. We quickly
       onboard, test, and tune third party models for integration into our
       model stack, where they can then be accessed via our web app and
       API. Our mission has attracted dozens of researchers who contribute
       their work for testing and tuning, and we've come up with an
       interesting business model for working together: when their models
       meet our baseline scores, we provide a revenue share for as long as
       they continue to perform on our platform. (If you're interested in
       participating, we'd love to hear from you!)  We have continued to
       scale our web app and launched an API that we are rolling out to
       pilot customers. Currently the most popular use cases are: KYC
       onboarding fraud detection and voice fraud detection (ie. banks,
       marketplaces); and user-generated deepfake content moderation (ie.
       social media, dating platforms, news and government organizations).
       We are currently testing a monthly subscription to scan a minimum
       of 250 media assets per month. We offer a 30 day pilot that
       converts into a monthly subscription. If you'd like to give it a
       try, go to www.realitydefender.ai, click "Request Trial Access" and
       mention HN in the comments field.  We're here to answer your
       questions and hear your ideas, and would love to discuss any
       interesting use cases. We'd also be thrilled to collaborate with
       anyone who wants to integrate our API or who is working, or would
       like to work, in this space. We look forward to your comments and
       conversation!
        
       Author : bpcrd
       Score  : 63 points
       Date   : 2022-03-22 13:49 UTC (9 hours ago)
        
       | Geee wrote:
       | We have solved this problem already for text with digital
       | signatures. There might be a way to digitally sign speech in
       | real-time and show the signature on a display device, which is
       | captured on video. It could be just a smartphone app or a
       | separate device. This way every video could be proven authentic,
       | even if they originate from unofficial sources.
        
       | nickfromseattle wrote:
       | Have you looked at identifying AI written content?
       | 
       | If you plan to release this, I will be your first customer.
        
         | bpcrd wrote:
         | Yes - Please contact us at ask@realitydefender.ai
        
       | emacs28 wrote:
       | I would like to coin this alternative name: Deeptect
        
       | sigil wrote:
       | I too wonder whether cryptographic signatures will be the long
       | run solution to deepfakes. Can you outline why you don't think
       | that will be the case?
       | 
       | Here's an alt solution to argue against:
       | 
       | 1. The necessary PKI gets bootstrapped by social media companies,
       | where deepfakes begin to seriously threaten their "all-in-on-
       | video" strategy, and simultaneously look like an opportunity to
       | layer on some extra blue-check-style verification.
       | 
       | 2. Example: you upload your first video to twitter, it sees the
       | AV streams are unsigned, generates a keypair for you, does the
       | signing, and adds the pubkey half to your twitter account. (All
       | of this could be done with no user input.)
       | 
       | 3. The AV streams have signatures on short sequences of frames.
       | Say every 24th video frame has a signature over the previous 24
       | frames embedded into the picture. Similarly, every second of
       | audio has a signature baked in.
       | 
       | 4. The signatures aren't metadata that can be easily thrown away.
       | They're watermarked into the picture and audio themselves in a
       | way that's ~invisible & ~inaudible, but also robust to
       | downsampling & compression. This is already technically possible.
       | 
       | 5. Since we're signing every past second of the AV streams, this
       | works for live video.
       | 
       | 6. Viewers on the platform see a green check on videos with valid
       | signatures; maybe they even see the creator's twitter handle if
       | this is a reshare.
       | 
       | 7. Like all social media innovations, the other major platforms
       | copy it within the next 6 - 12 months. POTUS uses it. People come
       | to expect it.
       | 
       | 8. Long run: the public comes to regard any unverified video
       | footage with suspicion.
       | 
       | Why won't this deepfake solution come to pass in the long run?
        
         | davidweatherall wrote:
         | (Not OP)
         | 
         | This would work for videos that people want to acknowledge as
         | their own, but they could just tweet out they own it for the
         | same affect.
         | 
         | The issue this tackles is videos that people don't want to
         | claim ownership of, e.g. if a video emerged of <insert
         | politician here> kicking a child, the politician can't say "I
         | haven't signed it therefore it's not mine", instead we need
         | tools like the above to be able to say, "this is faked, do not
         | trust it".
        
           | sigil wrote:
           | Maybe in the interim we need deepfake detection tools, but
           | I'm asking about the long run. Suppose signed AV takes off as
           | described above. The public has been trained: "If the video
           | doesn't have The King's Seal, it's not from The King."
        
             | ChefboyOG wrote:
             | I think the point is that yes, that seems like a good
             | solution for verifying content that purports to be released
             | by a certain creator, but it doesn't solve the problem of
             | deep fakes for captured footage i.e. you can prove it isn't
             | a video that you created, but you can't prove it isn't a
             | video someone else took of you.
        
               | sigil wrote:
               | That makes sense. But if signed AV takes off, the video
               | someone else took of you and shared likely bears _their_
               | seal. And audiences decide how much they trust that
               | source - just like they look for a CNN  / BBC / etc logo
               | in the corner currently.
        
       | cphoover wrote:
       | I believe deep-fakes to be a serious threat to functioning
       | democracies around the world. I would love to be on the front-
       | lines fighting against this threat.
       | 
       | I have submitted my resume: https://cphoover.github.io/
       | 
       | Have you also considered either offering a browser plugin to
       | display contextual warnings attached to video elements? Or
       | thought of working with browser makers? The web/social media is
       | where a ton of fake media is propagated. It would be good if
       | large platforms (fb/twitter/yt) integrated with such service, but
       | also individuals should be able to protect themselves.
        
         | bpcrd wrote:
         | We are adding new roles to our careers page on
         | www.realitydefender.ai but feel free to reach out to
         | career@realitydefender.ai and we can discuss your interest!
         | 
         | We are working with few partners (including Microsoft) who are
         | interested in integrating our solution. We are focused right on
         | supporting large organizations (companies and governments) that
         | need to scan user generated content at scale.
        
         | smt88 wrote:
         | I worry about the same thing.
         | 
         | I think "blockchain" (or even just a signed, immutable, public
         | database) is mostly a solution in search of a problem, but I do
         | think it may have an application here.
         | 
         | If you can hash a video when it's recorded and publish the hash
         | with a timestamp that can't be forged, you can at least prove
         | that this video existed at least as long ago as that stamp.
         | 
         | That allows you to invalidate any deepfakes produced on top of
         | that video that have later timestamps.
         | 
         | It's not perfect, but it might be one weapon against this
         | stuff.
        
           | prometheus76 wrote:
           | In that scenario, where is the line between a deepfake and
           | satire?
        
             | smt88 wrote:
             | That's an irrelevant issue. The issue I'm talking about is
             | only whether a viewer can determine if a video has been
             | edited.
             | 
             | Once they know it's edited, they can decide for themselves
             | whether it's enjoyable satire or an attempt to deceive
             | them.
        
           | cphoover wrote:
           | You can have an immutable centralized/federated system
           | without using "crypto" though "proof-of-work", there are
           | databases out there that provide immutable storage that don't
           | require this overhead.
           | 
           | Providing api access to a signed immutable database makes
           | sense... but I'm not sure how much sense utilizing existing
           | popular cryptocurrencies would make (e.g. bitcoin, etherium)
        
             | smt88 wrote:
             | You need cryptography for hashing and verifying the
             | integrity of any copy of the database.
             | 
             | You probably don't need proof of work, you're right.
        
       | huynhhacnguyen wrote:
       | First of all, congratulations on the lauch! Your description
       | about the "model of models" and combining their scores is really
       | intriguing. Detecting deepfake is an interesting topic on its own
       | and apparently there are lots of use cases that I'm not even
       | aware of, partly due to my limited knowledge in this subject.
       | There are a few points of I'm curious about (Beware that the
       | following questions can be very silly, coming from someone having
       | little to no experience in the field):
       | 
       | - What do you use as input for the model? Does it use all the
       | pixels in all the frames in the input video? How about the
       | video's metadata (location, extension,...)?
       | 
       | - My biggest concern about fighting deepfakes is that they have a
       | point to achieve where the line between reality and fiction is
       | nonexistent. Namely, if a deepfake video of someone can be
       | created to look exactly like a real one if that someone decide to
       | record such a video, I imagine there would be no way to tell the
       | deepfake video from the authentic one (since there is no
       | difference between the two). Because of that, this looks like a
       | losing battle to me, but maybe I'm just too pessimistic. Do you
       | feel that it is a real problem? Do you believe it is such a long
       | shot that we shouldn't be worried about, or even if things reach
       | that point, there would still be tools in our arsenal to counter
       | such technologies.
        
       | domidre wrote:
       | Hi, congratulations to your launch.
       | 
       | I would also like to ask three question. Do you know how well
       | your model generalizes to video/audio deepfakes created by models
       | that are not within your training sets? And also have you
       | investigated whether your model can be used in a GAN setting to
       | improve a deepfake generator towards creating better fakes? Or
       | how robust your detectors are against adversarial attacks?
        
         | bpcrd wrote:
         | Great questions.
         | 
         | 1 - We include multiple models for GAN and non-GAN related
         | synthetic media.
         | 
         | 2 - Models are only as good as the training data, and most
         | training data breaks down in the real world because hackers
         | have access to this same open source training data. So we
         | create our own proprietary training data which we have
         | automated, and we continuously update it based upon emerging
         | deepfakes that we find in the wild.
         | 
         | 3 - We target 95% accuracy with all public and proprietary
         | training sets. And we continuously test and iterate both the
         | data sets and the models.
         | 
         | 4 - Our policies require a background check on all users to
         | filter out bad actors. We additionally have technology
         | safeguards in place to limit improper use.
        
       | ted0 wrote:
       | Congratulations on the launch. What is a good email to reach you
       | at?
        
         | bpcrd wrote:
         | Thank you! Please reach out to ask@realitydefender.ai :)
        
       | endisneigh wrote:
       | Imho this is a good product, but wouldn't it make more sense to
       | simply sign videos cryptographically?
       | 
       | Unfakeable and unbeatable.
       | 
       | So someone uploads a video, you sign it, they display video. If
       | authenticity is in question check the signature.
       | 
       | Deep fake detection is intractable imo. Use cryptography instead.
       | 
       | Hell if you want to be thorough sign each frame and create an
       | extension for YouTube and other providers to literally check to
       | see if a given frame or period was altered.
        
         | bpcrd wrote:
         | We're fascinated by the potential applications of crypto in
         | content provenance. In this example, a UGC video platform would
         | need a way to initially determine the content hasn't been
         | manipulated before it's signed, right? What about a live
         | scenario where a deepfake mimicking an exec calls a manager to
         | wire $10M (https://www.forbes.com/sites/thomasbrewster/2021/10/
         | 14/huge-...)
         | 
         | We totally recognize deepfake detection is a big & constantly
         | evolving challenge, but we don't see that as a reason to cede
         | the truth to bad actors :)
        
           | cphoover wrote:
           | > "We're fascinated by the potential applications of crypto
           | in content provenance. In"
           | 
           | @bpcrd what's the advantage of using a block-chain to store
           | video fingerprints, to determine provenance, over say a
           | highly performance-optimized immutable centralized or
           | federated system?
        
         | brap wrote:
         | Most videos people watch nowadays don't come from "official"
         | authorities, whose public keys are known and signatures can be
         | verified. When videos are recorded by random people from their
         | smartphones, what signature are you going to verify?
         | 
         | Even if smartphone manufacturers start integrating digital
         | signatures right into their cameras, you can use the smartphone
         | to re-record a pre-recorded fake video. And I'm sure that with
         | enough resources you can do something way more clever.
         | 
         | I really don't see how crypto can solve this problem. I don't
         | think AI can either (for reasons already mentioned in this
         | thread). It's something we'll have to learn to live with.
        
           | endisneigh wrote:
           | > Most videos people watch nowadays don't come from
           | "official" authorities
           | 
           | I doubt this is true. Most videos are likely disseminated
           | from a handful of companies.
           | 
           | > Even if smartphone manufacturers start integrating digital
           | signatures right into their cameras, you can use the
           | smartphone to re-record a pre-recorded fake video. And I'm
           | sure that with enough resources you can do something way more
           | clever.
           | 
           | Even if a fake video was re recorded it would have a
           | different signature than the "original". Problem solved.
           | 
           | Literally cryptography is the only solution
        
           | cphoover wrote:
           | " you can use the smartphone to re-record a pre-recorded fake
           | video."
           | 
           | I assume the intent is not to store solely a direct hash of
           | the original video, but also the original file which can be
           | fingerprinted, or the fingerprint itself that will be matched
           | if a later duplicate is uploaded.
           | 
           | Fingerprint differing from hashing in that two non-identical,
           | but similar files (e.g. an original video and deepfake) can
           | have the same fingerprint, but differing checksums.
        
       | candiddevmike wrote:
       | Are you concerned that your product will inadvertently improve
       | deepfakes? Suddenly you've given them a baseline that they need
       | to be better than, and hackers love challenges. I predict this
       | will turn into a constant arms race like AV or copyright
       | protection, and I don't think this will work in the long run.
       | 
       | IMO, KYC needs to go back to in person verification. Everything
       | you can do digitally can be faked or impersonated.
        
         | matchbox wrote:
         | good point
        
         | fxtentacle wrote:
         | I came here to write that. Me and my friends have a competition
         | to get a (obviously CGI) picture of my living room filled with
         | gold coins past those "is that photo real" platforms. So far,
         | none has stood up to the test.
         | 
         | The worst one so far was TruePic, who even gave me a
         | certificate of authenticity and then kept that certification
         | PDF online on their website until they heard about us openly
         | mocking them, and then they blocked the photo not because
         | someone noticed that it's fake, but because "User has violated
         | terms of service", which was me mocking them.
        
         | bpcrd wrote:
         | In 2017 deepfakes were pretty crude, today the avg person can't
         | tell a real face from a deepfake generated on a 5 year old
         | iPhone. We expect the tech to continue moving in this
         | direction. So, similar to anti-virus, we're approaching this
         | problem with an iterative, multi-model solution that can evolve
         | with the threat.
        
           | candiddevmike wrote:
           | So the folks at the forefront of deep fake technology (i.e.
           | the attackers you're targeting) will slip through your
           | product because it lags behind the state of the art (like AV,
           | which you said is the approach you're following), while
           | innocent folks will be caught by it due to a new kafkaesque
           | version of "prove you're not a bot" since you focus on
           | reducing false negatives. Hopefully I can avoid companies
           | using your product.
        
             | btown wrote:
             | Retrospective antivirus-esque techniques are still useful,
             | though, as not every actor is a state-level actor, and even
             | then, forcing state-level actors to "burn" their state-of-
             | the-art exploits/models because previous exploits/models
             | are detected out-of-the-box, slows down the abuse of those
             | actors.
             | 
             | And realistically, since deepfake detection will inevitably
             | be more expensive than captchas or antivirus scanning, this
             | will be adopted by human-in-the-loop organizations for
             | critical processes where threat scoring or moderation is
             | already being applied.
             | 
             | That said - Reality Defender, _please_ train your system on
             | diverse human data sets, do not release models where
             | ethnicity or gender (including gender identity) are
             | nontrivially correlated with deepfake score, and have
             | processes in place from day 1 to allow users to report
             | suspected patterns of bias. The kafkaesque  "prove you're
             | not a bot" scenario envisioned by the parent poster is one
             | thing for holistic human-in-the-loop verification
             | processes, and another thing if it suppresses minority
             | voices and minority access to government services.
        
               | bpcrd wrote:
               | We agree. Dataset fidelity and bias are major concerns
               | for publicly available datasets. For this reason we are
               | working to develop programmatically created datasets
               | along with anti-bias testing and policies.
        
               | prometheus76 wrote:
               | "Bias" and "anti-bias" is a slippery snake that will bite
               | you as soon as it warms up to you.
        
             | calvinmorrison wrote:
             | Of course, because these companies are probably owned by
             | the same people in the end that develop the DeepFake
             | datasets, generating endless income from both sides.
             | 
             | It's like ADA Compliance lawsuits. I can't prove the
             | AccessaBe or other "ADA Compliance" web tooling are
             | generating these lawsuits, but their company would not
             | exist without them. Why wouldn't they want more lawsuits?
        
               | ChefboyOG wrote:
               | The majority of large, popular datasets in deep learning
               | are curated and hosted by academics:
               | 
               | https://paperswithcode.com/task/deepfake-
               | detection#datasets
        
         | skeeter2020 wrote:
         | If I was them I'd be concerned that it DOESNT move in this
         | direction. Not a lot of money to be made selling one-and-done
         | fake detection software, but a fortune available if they can
         | convince clients a perpetual subscription is a core
         | requirement.
        
         | dymk wrote:
         | Aren't you worried that in person KYC will lead to an
         | improvement in latex face masks? Sounds like an arms race.
        
           | [deleted]
        
       | eurasiantiger wrote:
       | So how much cash/clout are you guys raking in from nation states
       | wishing to disguise their homebrew deepfake tech?
        
       | atlasunshrugged wrote:
       | This is super cool, you should chat with the folks in Estonia who
       | had worked on developing Sentinel (https://thesentinel.ai/) which
       | had a similar premise but ended up pivoting. I advised them for a
       | bit, happy to chat too (email in bio).
       | 
       | Edit to include link to their website
        
       | version_five wrote:
       | This is definitely an interesting problem, and I can see a place
       | for it.
       | 
       | I want to comment (in ignorance because I don't know the
       | techniques you are using) that there is more to detecting fakes
       | or "misinformation" than just the digital attributes of the data.
       | Specifically, confirmation from multiple sources, reputation of
       | sources, and above all, consistency with a world model of how
       | people behave. For example, if there's a video of Biden backing
       | Putin, you could dismiss it as fake regardless of video
       | attributes.
       | 
       | I think (and have been criticized for saying, why I don't
       | understand), that education and emphasizing critical thinking are
       | the biggest counters to fakes, not learning to spot them in
       | feature space. I believe that whatever you make, sanity checks
       | need to be a part of it, and not just blind flagging of true /
       | false based on the digital attributes.
       | 
       | Thanks!
        
       | jka wrote:
       | Hey - great and important idea.
       | 
       | Have you discussed / looked into sampling environmental radio
       | noise at various frequencies and locations and then interpolating
       | samples of them within the video and audio itself at recording-
       | time?
       | 
       | (ideally along with some kind of near-unfalsifiable timestamp
       | signals and/or device keys to confirm that "yes, this unique
       | device was here at this time and the proof is within the pudding)
        
       | m00dy wrote:
       | May I ask who has worked in CIA for decades and then became a
       | startup guy ?
        
         | bpcrd wrote:
         | Kevin Zerrusen spent 20 year at the CIA and led US Cyber
         | Command for the Dept of Defense
        
       | marc__1 wrote:
       | Congratulations on the launch team! This is a great need and hope
       | the team can deliver.
       | 
       | My questions are:
       | 
       | 1 - Do you have any research on false negatives and false
       | positives for your platform?
       | 
       | 2 - How do you build trust in your platform so that users will
       | use your results (and their users will trust it)? Fake news has
       | been so widespread and people continue to believe in it, so why
       | is that any different than with deep fake?
       | 
       | 3 - Why are you trying a consumption-type of pricing?
       | Cybersecurity typically charges on a per seat, and it would be
       | very hard for a malware provider charge by 'malware detection'
        
         | bpcrd wrote:
         | 1- Each model looks for different deepfake signatures. By
         | design, the models do not always agree, which is the goal. We
         | are much more concerned with false negatives, and we target a
         | min of 95% accuracy for our model of detection models. 2 - The
         | challenge is educating users about results without requiring a
         | PhD. Our platform is targeted for use by junior analysts in
         | cyber security or trust and safety. 3 - This is a good
         | suggestion. We are exploring how we can offer an unlimited plan
         | that can cover our high compute costs (we run our multiple
         | models in realtime).
        
           | candiddevmike wrote:
           | Why would you be more concerned about false negatives?
           | Wouldn't false positives erode trust and value in your
           | product, and considering the applications you're targeting,
           | possibly open you up to lawsuits if you start accusing
           | innocent people of being deepfakes (which, IMO, currently
           | seems unlikely)?
        
             | [deleted]
        
             | bpcrd wrote:
             | We provide a probabilistic percentage result that is used
             | by a trust and safety team to set limits (ie. flag or block
             | content) so it is not a binary yes/no. We search for
             | specific deepfake signatures and we explain what our
             | results are identifying.
        
               | wanderer_ wrote:
               | So... that percentage sort of 'return type' allows the
               | people using your service to decide how aggressive they
               | want to be? Smart. It also could possibly turn your
               | service into more of a tool and less of something that
               | someone could blame incorrect results on.
        
       ___________________________________________________________________
       (page generated 2022-03-22 23:01 UTC)