[HN Gopher] Launch HN: Pinch (YC W25) - Video conferencing with ...
       ___________________________________________________________________
        
       Launch HN: Pinch (YC W25) - Video conferencing with immersive
       translation
        
       Hey HN! I'm Christian, and my co-founder Keyu and I are building
       Pinch (https://startpinch.com), a virtual conferencing platform
       with translation that mimics your voice and synchronizes your lips
       in real-time to make you sound and appear as a native speaker in
       over 15 languages.  Here's a demo: https://youtu.be/Cu7KlbZ3gjw,
       but you can also try it for free on our website.  Over the last
       three years, Keyu and I were working in a company where we had to
       lead engineering and research teams across the U.S., China, India,
       and Europe. We felt the language barrier actively limiting our
       team's potential in terms of collaboration + productivity. The
       existing tools we tried operated in low-bandwidth mediums (mostly
       text), which 1) means they are slower because they need to convert
       audio to text before translation, and 2) lose all information of
       _how_ something was said.  At that point we knew there had to be a
       better way to connect across different languages and cultures, so
       we started building Pinch. Shortly after, we found out how
       challenging translation truly was. Balancing latency/accuracy for
       chunk-based audio translation, capturing inflection and tonality
       per statement, handling culturally specific phrasing, and making a
       seamless meeting experience are all unsolved problems we're taking
       on.  So far, we've seen some really interesting use-cases (many we
       hadn't considered!), from personal connections like a first
       conversation with foreign in-laws, to more business-oriented usage
       in sales and meeting foreign clients.  After a long experience
       building conversation AI video/audio, we're incredibly excited to
       see what these same technologies can unlock for human<>human
       communication.  You can try a demo or create a meeting for free:
       https://startpinch.com  All feedback is appreciated, and we'd love
       to know how we're doing on the overall meeting UX and translation
       accuracy for your language. Thanks all!
        
       Author : christiansafka
       Score  : 53 points
       Date   : 2025-02-04 17:10 UTC (5 hours ago)
        
       | bongwater_OS wrote:
       | Hey just a heads up the demo on your site is broken (for me).
       | English transcriptions are coming through fine but translations
       | aren't being spoken, despite the output video stuttering for a
       | moment at the time when it should.
        
         | christiansafka wrote:
         | We noticed that Swedish isn't currently working properly, but
         | we weren't able to replicate this with any other languages.
         | Please let us know if it's still having issues!
        
       | tpae wrote:
       | I really like the concept, but I don't understand why you guys
       | are building an entire video conferencing platform. That sounds
       | like years of work building the network and millions of VC funds.
       | It could be a standalone app that exports video to existing
       | conferencing services. I would pay good money for that.
        
         | michaelmior wrote:
         | By export I assume you mean as a virtual webcam? I would
         | definitely prefer that as a user to be able to use any
         | videoconferencing app.
        
         | christiansafka wrote:
         | Thanks! We have a virtual camera on our roadmap as well, but by
         | building the conferencing platform end to end we can optimize
         | both latency and conversation UX to a much higher degree. We're
         | also lucky to be building this now and not five years ago -
         | there are some solid webrtc infra companies and open source
         | projects to build on.
        
           | AznHisoka wrote:
           | Would companies switch their video conferencing solution to
           | yours, or do you envision them using both side by side?
        
             | christiansafka wrote:
             | We're hoping companies with international teams will switch
             | over fully (we have internally), but our initial goal is to
             | attract a subset of the market that has cross-lingual needs
             | and unblock them as much as possible from using it more.
        
           | elwillbo wrote:
           | Would love the virtual camera - will be on the look out for
           | it to arrive
        
       | lefstathiou wrote:
       | Telemedicine!
        
       | elixirnogood wrote:
       | the demo doesn't mimic my voice unless I misunderstand 'mimic'
        
         | christiansafka wrote:
         | I didn't make this clear enough in the post, but we're still
         | working on voice cloning and inflection transfer. Voice cloning
         | is easier, but to support inflection transfer we have to
         | modality-align an LLM.
        
       | elixirnogood wrote:
       | are you guys using livekit for webrtc? If yes, are you using
       | livekit agents as well?
        
         | christiansafka wrote:
         | Yes! LiveKit is great - and we are using livekit agents but had
         | to override a few low-level library components for our use
         | case.
        
           | elixirnogood wrote:
           | Do you have any concerns around scaling? I like LikeKit
           | stack, but if not mistaken their agent architecture is based
           | on multiprocessing (one os process per
           | 'session'/'conversation') which doesn't sound very scalable.
           | Btw, great demo, this is a cool technical problem to solve.
           | I've spend a couple of months in this space (using a similar
           | stack) and know for a fact that's not easy.
        
             | christiansafka wrote:
             | Thanks, there are certainly a lot of fun and challenging
             | problems to take on in the space. On scaling, the agent
             | architecture isn't limited to one machine, so you can also
             | autoscale your machines. It's essentially python's Celery
             | if you've tried that. It gets more tricky when you require
             | GPUs though!
        
       | alloysmila wrote:
       | Just this morning I told myself I should build something like
       | this. I work in global supply chain and the language barriers are
       | an absolute mess.
        
         | christiansafka wrote:
         | It's hard, don't do it :D We have a few supply chain companies
         | trying us out though! Would love to hear more about your
         | experience.
        
       | skylerwiernik wrote:
       | Cool idea, but just watching your demo it looks like it doesn't
       | work. Is there any change in the video? The lip movements
       | certainly don't look synchronized, and audio often continues
       | after the person stops talking. It also doesn't do any audio
       | mimicking. It really doesn't look like it does anything that
       | Google Translate doesn't.
        
         | christiansafka wrote:
         | Appreciate the feedback. On the video side, we currently
         | synchronize it to play out with the translated audio (as often
         | as possible), matching when you started speaking to the moment
         | the translated audio starts. Mentioned in another comment but
         | we're still working on audio mimicking (voice clone then
         | inflection transfer). Our model does a lot that Google
         | Translate doesn't, even just around translation, such as taking
         | into account who you're talking to in the meeting and the
         | conversation context. + we have to do it much faster, so
         | smaller audio chunks at a time!
        
       | skeeter2020 wrote:
       | >> The system continuously learns and improves from usage while
       | maintaining privacy and security.
       | 
       | Are you training your own translation models? or using third-
       | party services?
       | 
       | >> >> Think real-time translation + natural expressions + perfect
       | lip sync.
       | 
       | not yet, based on the demo.
        
       | brap wrote:
       | Assuming the tech is solid, I think that if you had developed
       | this as a browser extension to work on top of Meet/Teams/etc, not
       | only would your dev time have been much shorter and adoption much
       | faster, but Google/Microsoft/etc would have probably bought you
       | out in a blink of an eye.
        
         | hassleblad23 wrote:
         | Getting bought out is not a bad option here.
        
           | brap wrote:
           | Yup that's what I'm saying
        
         | debarshri wrote:
         | I am not sure if the dev time would be shorter because teams
         | and meet have its own nuances as well as you would be limited
         | to what you could be by the tool itself. Also, i dont think you
         | would go into every call with this plugin on.
         | 
         | This is very valuation where communication barrier is high and
         | has specialized usecases in industries like supply chain,
         | outsourcing.
        
       | instagary wrote:
       | Congrats, really cool idea! You should add the demo video to your
       | website in addition to the interactive version.
        
       | Aspos wrote:
       | Impressive demo! Note that on 01:46 it says "You can speak
       | Korean" in Ukrainian lol.
        
       | FlamingMoe wrote:
       | I work with a lot of overseas developers who speak with thick
       | accents and sometimes it can be very difficult to understand them
       | or for them to understand me. I could definitely see this being a
       | more pleasant experience for everyone.
        
       | lolpanda wrote:
       | Great idea! The demo looks impressive. What are your thoughts on
       | real-time translated captioning compared to AI voice? I guess
       | it's still difficult to mimic nonverbal elements like laughter
       | and pauses.
        
         | christiansafka wrote:
         | Fantastic question. Our opinion on this is that the higher-
         | bandwidth we can make the communication, the more useful it
         | will be. The reason we've moved from IRC->VoIP->Video is
         | because of the efficiency of information transfer and
         | additionally the empathic element of face-to-face conversation.
         | 
         | From the technical side, speech to speech models have more
         | potential for accuracy (no explicit ASR, no audio->text
         | information loss). We have a few options on mimic'ing nonverbal
         | elements - we could decide when to naturally mix in the
         | original audio, or train our end to end model to handle those
         | nonverbal audio chunks. We'll be trying both but likely the
         | first option on the sooner side!
        
       | zachanderson wrote:
       | Ofc i fully associate "video conferencing" with a "Pinch".
       | 
       | Wake up you SV product manager dorks! Lazy effort in naming
       | things!
        
         | Tinkeringz wrote:
         | Slack, zoom, Apple etc were really successful without that
         | mattering at all
        
         | motoxpro wrote:
         | What would you name it? Something that explains exactly what it
         | does in the name like Hulu, Apple, Snowflake, Oracle, Google,
         | Ford, or Bungie?
        
           | AznHisoka wrote:
           | Pinch is the worst of both possibilities. It doesn't describe
           | what it does, but it's also not a catchy memorable "brand-
           | like" name like Google either.
        
       | zachanderson wrote:
       | And fuck Christian and "Keyu".
        
       | davidz wrote:
       | It's really cool to see how you guys are using the voice AI stack
       | to overcome language barrier.
       | 
       | (btw I work at LiveKit, so let me know if we could make Agents
       | easier to use for your use case.)
        
       | aloukissas wrote:
       | What does real-time mean here? Would it work e.g. in a live
       | stream?
        
         | christiansafka wrote:
         | Yes, right now we're ranging from 0.75-3 seconds for the
         | translation to start, and we're hoping to move the average time
         | lower with our next updates. There will always be some
         | limitation to how fast we can translate (different languages
         | have different sentence structures and phrasing), but for
         | livestreaming usually you'd have even a bit more wiggle room
         | for the latency.
         | 
         | Also in case you're interested in the logistics of using us for
         | livestreaming: If our current platform won't work for your use-
         | case and you need to use OBS + a virtual camera, it's on our
         | roadmap.
        
       | Jingyuan_Design wrote:
       | Kudos to the Pinch team for tackling such a challenging yet
       | crucial problem. Excited to see where this goes!
        
       | asimpleusecase wrote:
       | Please just keep a list of supported languages on the site. The
       | FAQ only gives a few and the bubble on your site that says more
       | than 29 languages was not clickable on my iPhone - I won't go any
       | further until I can see supported languages - bonus points if you
       | listed languages that are coming - in order of future of
       | availability
        
       | nhod wrote:
       | Very cool guys. I am beyond excited about this -- I can see how
       | this could transform certain projects and relationships -- and it
       | is useless if it doesn't work in the language people need.
       | 
       | What languages do you actually support? The site says "20+" and
       | "over 20" and there's even a FAQ entry listing a handful of them,
       | but it doesn't list all of them. What is the thinking in not just
       | listing all of them?
        
       ___________________________________________________________________
       (page generated 2025-02-04 23:00 UTC)