[HN Gopher] "Please do not make it public" (Tencent's Sogou Inpu...
       ___________________________________________________________________
        
       "Please do not make it public" (Tencent's Sogou Input Method)
        
       Author : donohoe
       Score  : 66 points
       Date   : 2023-08-09 14:49 UTC (8 hours ago)
        
 (HTM) web link (citizenlab.ca)
 (TXT) w3m dump (citizenlab.ca)
        
       | myself248 wrote:
       | I must've missed the bit where they explained why _a keyboard_
       | would be sending anything at all _across the network_ in the
       | first place.
       | 
       | Anyone?
        
         | taldo wrote:
         | From TFA
         | 
         | > While alphabetic keyboards typically provide autocomplete
         | features for more expedient typing, predictive features in
         | Chinese input methods are more crucial when using input methods
         | such as pinyin where hundreds of characters might match an
         | inputted pinyin syllable. For longer strings of syllables, an
         | IME will commonly reach out over the network to a cloud-based
         | service for suggestions if suitable suggestions are not
         | available in the input method's local database.
         | 
         | Not saying whether they _should_ , but it's pretty easy to
         | understand why they do it.
        
           | Nextgrid wrote:
           | It's impressive that users are ok with this. This is even
           | beyond the (now generally-accepted) analytics and ad
           | targeting, it's literally "we'll send all your keystrokes to
           | a remote server", a literal keylogger.
        
       | miki123211 wrote:
       | > In this report, we analyze Tencent's Sogou Input Method, the
       | most popular Chinese input method with over 455 million monthly
       | active users and versions of the app for multiple platforms,
       | including Windows, Android, and iOS. Sogou Input Method accounts
       | for 70% of Chinese input method users, with products by iFlytek
       | and Baidu taking second and third place, respectively. This part
       | is surprising to me. Are the Chinese input methods provided
       | directly by the operating systems not enough? I'm surprised that
       | Microsoft, Apple et al provide such a sub-par service in China
       | that over 450 million people were bothered enough to install a
       | third-party keyboard.
        
         | djtango wrote:
         | Haven't used sogou since like 2009 but back then it not only
         | was ace at contextual prediction but also had colloquialisms
         | built in
        
         | lmm wrote:
         | > I'm surprised that Microsoft, Apple et al provide such a sub-
         | par service in China that over 450 million people were bothered
         | enough to install a third-party keyboard.
         | 
         | American companies man. They really don't care. So much stuff
         | just breaks or does the bare minimum if you're using any kind
         | of IME, or if you're not using UTF8, or even if you're not
         | using ASCII. There seems to just be a general cultural
         | incomprehension that there's any other way of writing on
         | computers. (It's not even limited to companies - there's a
         | whole bunch of Linux stuff with the same problem, e.g.
         | Snap/FlatPak just break everything and don't care)
        
         | saurik wrote:
         | It is a lot better now, but the Chinese keyboard used to be so
         | not-as-good on iOS that Baidu's official help for people (and
         | even for a while an advertisement on their home page!)
         | suggested they jailbreak their phone to get a better keyboard.
        
         | dmoy wrote:
         | phone native pinyin->character input seems fine now? Google
         | keyboard and iphone's keyboard for pinyin->characters works
         | fine. If you're using normal mandarin.
         | 
         | Windows OS pinyin->character input.... I don't know if I've
         | ever seen someone use something other than Sogou lol, so
         | honestly I can't say how good or bad windows native is at this
         | now.
         | 
         | It's a hard problem, because of the 1:many fanout for any given
         | pinyin input. "bao" can mean like 40 different things - bread
         | thing, weak (?), hug, violence (?), treasured-one (think like
         | "my precious", like a pet name for a baby?), etc etc. Sogou had
         | a big leg up for a long time because it figured out the correct
         | words from context much better than other alternatives,
         | requiring fewer manual selections.
         | 
         | (semi-related note, google's voice->text still fails pretty
         | hard for regional mandarin. For example it really doesn't like
         | the hard Rs at the end of words in far northeastern mandarin.
         | It can't seem to figure out that "baoERRRR" is actually just
         | "bao". That problem doesn't exist for pinyin->characters
         | though)
        
       | pm2222 wrote:
       | "These findings underscore the importance for software developers
       | in China to use well-supported encryption implementations such as
       | TLS instead of attempting to custom design their own." So
       | generally speaking established standard are scrutinized more and
       | thus more trustworthy, right? I can think of all those WiFi
       | encryption methods we've been through and they are all
       | vulnerable, sooner or later.
        
         | JohnFen wrote:
         | > So generally speaking established standard are scrutinized
         | more and thus more trustworthy, right?
         | 
         | Yes, in large part.
         | 
         | Also, implementing good cryptography requires specialist
         | mathematical skills on par with dev skills. It's very easy to
         | make a really trivial mistake such that it _looks_ like the
         | crypto is solid, when it 's in fact very weak.
         | 
         | The ability to make a trivial mistake that's hard to spot,
         | combined with the high stakes involved, makes cryptography
         | something that's better left to the experts.
        
       | rdtsc wrote:
       | > "Please do not make it public" (Tencent's Sogou Input Method)
       | (citizenlab.ca)
       | 
       | Ok, so they didn't make it public and the development team fixed
       | the bugs.
       | 
       | Maybe I am missing some new trend where the headline in these
       | disclosures _has_ to come from the communication with the
       | company. Kind of like vulnerabilities need custom websites with
       | logos and cool made up names?
       | 
       | > Even with the reported vulnerabilities now resolved, the Sogou
       | app relies on transmitting typed content to Sogou's servers as
       | part of its ordinary functionality.
       | 
       | Well besides the email firewall mess back and forth, shouldn't
       | that have been the main headline: "Everything you're typing on
       | your keyboard is being sent to China"?
        
       | LordShredda wrote:
       | Very responsible handling of a usual cryptography failure. What's
       | more impressive is tencents developers willingness to cooperate
       | despite the firewalls and communication issues. Also do not make
       | your own crypto algorithm
        
       | Waterluvian wrote:
       | "These findings underscore the importance for software developers
       | in China to use well-supported encryption implementations such as
       | TLS instead of attempting to custom design their own."
       | 
       | I'm very interested in better understanding this. Why do they
       | elect to do this? Is this just developer hubris, as found
       | everywhere? Does this relate to government regulation or control,
       | whether above or under the table?
        
         | paxys wrote:
         | The article says that they use both HTTP and HTTPS endpoints,
         | and the exchanges using HTTPS are secure (as expected). My
         | guess is they had to build their own encryption scheme paired
         | with plain HTTP for older devices or those that for some reason
         | weren't compatible with the latest TLS standards (which are a
         | _lot_ of them).
        
         | JohnFen wrote:
         | It's pretty common for devs who are inexperienced with
         | cryptography to succumb to the temptation to roll their own,
         | especially if they start studying cryptography algorithms.
         | 
         | It's always a mistake, though. This is something I had to cover
         | with younger devs quite a bit back when I worked for a company
         | that made heavy use of cryptography.
        
         | 2OEH8eoCRo0 wrote:
         | > Why do they elect to do this?
         | 
         | They could be _rightly_ suspicious of a western TLS
         | implementation but discovered the pitfall of writing their own.
         | Could have also been intentional.
        
           | manuelabeledo wrote:
           | > They could be rightly suspicious of a western TLS
           | implementation but discovered the pitfall of writing their
           | own. Could have also been intentional.
           | 
           | They could have deployed TLS with some cipher of Chinese
           | origin, not like Chinese companies haven't done this before
           | [0]
           | 
           | [0] https://ciphersuite.info/cs/TLS_SM4_GCM_SM3/
        
             | lucubratory wrote:
             | If there's a zero day that's been embedded in a protocol by
             | the NSA or actively used by the NSA, I normally wouldn't
             | expect it to come from the actual encryption process
             | itself. It would be something that choosing your own cipher
             | wouldn't fix, because it would be about compromising
             | security on the software level rather than the encryption
             | level. There's a very good reason the PRC won't allow
             | compromised Cisco routers, it wouldn't surprise me if there
             | was similar thinking here, justified or not.
        
         | hangonhn wrote:
         | Developer ignorance rather than hubris. People who don't really
         | know anything about cryptography has the naive and wrong
         | impression that encryption renders your secret completely safe
         | against anything and that only by getting the key or a major
         | cipher vulnerability would the plaintext be revealed. They
         | treat it like a blackbox because they don't know anything. In
         | recent years, some of the crypto libraries have methods (i.e.
         | Fernet) that are much safer and takes care of these issues for
         | you but it's still very possible to make mistakes. I've seen
         | engineers use a static IV for AES because they didn't know how
         | they would be able to search for the encrypted data other than
         | making the ciphertext the same for a given key and plaintext.
         | Basically they severely weakened it because they didn't
         | understand the purpose of a random IV. Again, they thought key
         | + plaintext -> encrypt = super secure.
        
         | newaccount74 wrote:
         | My experience with TLS is that it is not trivial to use.
         | 
         | Understanding how to use eg. OpenSSL APIs correctly to ensure
         | that a connection is secure, the certificates are valid, etc.
         | is not trivial. The APIs are poorly documented, hard to use,
         | and many examples you can find are outdated (some OpenSSL APIs
         | return different numbers on success/failure depending on
         | version).
         | 
         | The platform native libraries are not much better. For example,
         | the SecTrust APIs on macOS / iOS are also poorly documented,
         | hard to use, and have bugs (eg. some time ago they suddenly
         | started to reject valid certificates from Google cloud for some
         | reason).
         | 
         | Also, your code is always a ticking time bomb, because TLS
         | algorithms are deprecated, certificates expire, etc. So you are
         | always at the risk of your client code to stop working at some
         | point.
         | 
         | So in my opinion, there are often good reasons not to use TLS.
         | But if you make a mistake, everyone will say "You should have
         | used TLS". I wonder what people say when they find a bug
         | despite you using standard crypto?
        
           | est31 wrote:
           | For the deprecated TLS algorithms, just use a bunch of
           | reverse proxies at the front using the latest Debian, CentOS,
           | or Ubuntu LTS, with mostly default settings.
           | 
           | For OpenSSL, app developers don't need it. There is OS
           | builtin libraries to do http requests (which is what was done
           | here).
           | 
           | As for certificates, there is plenty of solutions allowing
           | for auto-renewal. It's very easy to set up using automation.
        
           | jsiepkes wrote:
           | > I wonder what people say when they find a bug despite you
           | using standard crypto?
           | 
           | Not using TLS doesn't automatically mean you need to "roll
           | your own crypto". They could have used a well documentend
           | library such as Google Tink[1] instead of doing their own
           | crypto.
           | 
           | [1] https://github.com/google/tink
        
           | manuelabeledo wrote:
           | It may not be trivial to use, but I fail to understand how a
           | solution to a very hard problem is better if tailored. For
           | example, Open/LibreSSL are widespread, have large communities
           | of both maintainers and developers, which necessarily
           | subjects them to continuous audits over time.
           | 
           | > Also, your code is always a ticking time bomb, because TLS
           | algorithms are deprecated, certificates expire, etc. So you
           | are always at the risk of your client code to stop working at
           | some point.
           | 
           | Certificate expiration should be handled as part of the
           | configuration management lifecycle. Same goes for TLS algos.
           | If you are hardcoding either of these, you are definitely
           | doing something wrong.
        
           | Nextgrid wrote:
           | It's still more trivial than rolling your own?
           | 
           | I would understand (not approve of it, but merely understand)
           | completely ignoring security/authentication - _that_ would
           | obviously be easier and avoid having to answer hard questions
           | and make hard decisions.
           | 
           | But here it seems like they've put even _more_ effort
           | actually designing some custom encryption scheme based on
           | (wrongly-applied) cryptographic primitives complete with
           | custom request encapsulation format, etc. This is _more_ work
           | than just swapping your TCP channel with a TLS one and
           | reasonably trivial auxiliary code to load /renew
           | certificates. In this case since they're running it over HTTP
           | it's even easier to just put a reverse-proxy in front that
           | will add HTTPS on top.
        
       | olliej wrote:
       | I'm not a huge fan of the blog title - the clear intent of the
       | title is to make it sound like they didn't want any public
       | disclosure, but my reading is that the first response incorrectly
       | considered it low priority, and then after Tencent realized it
       | was a real issue they quickly said "whoops, please don't disclose
       | this as we need to fix it".
       | 
       | It seems like this could be in part mitigated by making sure
       | their server is not an oracle (though obviously fixing the
       | primitives is also important, but older/non-updatable clients
       | could exist).
       | 
       | I would guess the traffic all over TLS on iOS due to "App
       | Transport Security" requiring https by default - it's not a huge
       | leap to turn it off, but it's controlled by the App's Info.plist
       | so is trivially indexable. Also probably more work than just
       | adding 's' to the protocol (at least from the PoV of the
       | individual dev working on the code).
        
       | phyzome wrote:
       | Even though it's part of the original post's title, "please do
       | not make it public" is an extremely misleading quote.
        
         | capableweb wrote:
         | How is it misleading exactly?
         | 
         | > Vulnerability disclosed to IMETS@tencent.com.
         | 
         | > Vulnerability disclosed again via Tencent Security Response
         | Centre (TSRC) web portal.
         | 
         | > Tencent: "Thank you for your interest in Tencent security.
         | There is no low or low security risk for this issue. We look
         | forward to your next more exciting report."
         | 
         | > Tencent: "Sorry, my previous reply was wrong, we are dealing
         | with this vulnerability, please do not make it public, thank
         | you very much for your report."
         | 
         | > Tencent's initial rejection of our disclosure and subsequent
         | about-face served as inspiration for the title of this report.
         | 
         | It's a direct quote from a Tencent reply.
        
           | 015a wrote:
           | Because they said it essentially as soon as the vulnerability
           | is reported. That's an entirely reasonable thing to ask for;
           | don't make this public, we're working on it. And its a
           | totally normal allowance from security researchers.
           | 
           | The title induces readers into thinking that they said this
           | in some other context. Example 1: They aren't working toward
           | fixing it, don't release this, lets just keep it hush hush.
           | This isn't what happened. Example 2: They did fix it, but
           | they didn't want the researcher to publish details of the
           | problem after they fixed it. This also isn't what happened.
           | 
           | Assuming I understand the context correctly; its absolutely
           | an inflammatory title that has no place in security
           | disclosure articles like this.
        
             | netsharc wrote:
             | Yeah, kinda disappointing that the CitizenLab folks are
             | exploiting the (I presume) non-mastery of subtle English of
             | the developers to create a "clickbait" title.
             | 
             | If they were English speakers they would've written
             | something along the lines of "We thank you that you
             | respected the vulnerability disclosure policy and notified
             | us. We expect you'll continue respecting the policy and not
             | publish this vulnerability before we resolve the issue and
             | after a period of time where the updated software has been
             | uploaded."
        
           | ysavir wrote:
           | When I read the title, my impression wasn't that it regarded
           | keeping a vulnerability private until fixed, but that there
           | was some functionality that tencent didn't want people to
           | know about.
        
           | paxys wrote:
           | Just because it is a direct quote doesn't mean it can't be
           | misleading when shared without all the necessary context.
           | Tencent asked for it to not be made public _during the period
           | while they were actively fixing it_ and well within any
           | standard vulnerability disclosure deadline.
        
             | JohnFen wrote:
             | I agree. I don't see anything here that seems out of line.
        
       | pphysch wrote:
       | [flagged]
        
         | stefan_ wrote:
         | Yes, what could be wrong with some keyboard input addon that
         | sends every keypress to Tencent, and on top of that, in a
         | manner trivial for a passive eavesdropper to decode?
         | 
         | We used to call these things "keyloggers".
        
         | myself248 wrote:
         | Tencent initially misclassified the issue as not a security
         | risk. Shortly after, they reconsidered and asked the
         | researchers not to make it public.
        
       ___________________________________________________________________
       (page generated 2023-08-09 23:02 UTC)