[HN Gopher] "Please do not make it public" (Tencent's Sogou Inpu...
___________________________________________________________________
"Please do not make it public" (Tencent's Sogou Input Method)
Author : donohoe
Score : 66 points
Date : 2023-08-09 14:49 UTC (8 hours ago)
(HTM) web link (citizenlab.ca)
(TXT) w3m dump (citizenlab.ca)
| myself248 wrote:
| I must've missed the bit where they explained why _a keyboard_
| would be sending anything at all _across the network_ in the
| first place.
|
| Anyone?
| taldo wrote:
| From TFA
|
| > While alphabetic keyboards typically provide autocomplete
| features for more expedient typing, predictive features in
| Chinese input methods are more crucial when using input methods
| such as pinyin where hundreds of characters might match an
| inputted pinyin syllable. For longer strings of syllables, an
| IME will commonly reach out over the network to a cloud-based
| service for suggestions if suitable suggestions are not
| available in the input method's local database.
|
| Not saying whether they _should_ , but it's pretty easy to
| understand why they do it.
| Nextgrid wrote:
| It's impressive that users are ok with this. This is even
| beyond the (now generally-accepted) analytics and ad
| targeting, it's literally "we'll send all your keystrokes to
| a remote server", a literal keylogger.
| miki123211 wrote:
| > In this report, we analyze Tencent's Sogou Input Method, the
| most popular Chinese input method with over 455 million monthly
| active users and versions of the app for multiple platforms,
| including Windows, Android, and iOS. Sogou Input Method accounts
| for 70% of Chinese input method users, with products by iFlytek
| and Baidu taking second and third place, respectively. This part
| is surprising to me. Are the Chinese input methods provided
| directly by the operating systems not enough? I'm surprised that
| Microsoft, Apple et al provide such a sub-par service in China
| that over 450 million people were bothered enough to install a
| third-party keyboard.
| djtango wrote:
| Haven't used sogou since like 2009 but back then it not only
| was ace at contextual prediction but also had colloquialisms
| built in
| lmm wrote:
| > I'm surprised that Microsoft, Apple et al provide such a sub-
| par service in China that over 450 million people were bothered
| enough to install a third-party keyboard.
|
| American companies man. They really don't care. So much stuff
| just breaks or does the bare minimum if you're using any kind
| of IME, or if you're not using UTF8, or even if you're not
| using ASCII. There seems to just be a general cultural
| incomprehension that there's any other way of writing on
| computers. (It's not even limited to companies - there's a
| whole bunch of Linux stuff with the same problem, e.g.
| Snap/FlatPak just break everything and don't care)
| saurik wrote:
| It is a lot better now, but the Chinese keyboard used to be so
| not-as-good on iOS that Baidu's official help for people (and
| even for a while an advertisement on their home page!)
| suggested they jailbreak their phone to get a better keyboard.
| dmoy wrote:
| phone native pinyin->character input seems fine now? Google
| keyboard and iphone's keyboard for pinyin->characters works
| fine. If you're using normal mandarin.
|
| Windows OS pinyin->character input.... I don't know if I've
| ever seen someone use something other than Sogou lol, so
| honestly I can't say how good or bad windows native is at this
| now.
|
| It's a hard problem, because of the 1:many fanout for any given
| pinyin input. "bao" can mean like 40 different things - bread
| thing, weak (?), hug, violence (?), treasured-one (think like
| "my precious", like a pet name for a baby?), etc etc. Sogou had
| a big leg up for a long time because it figured out the correct
| words from context much better than other alternatives,
| requiring fewer manual selections.
|
| (semi-related note, google's voice->text still fails pretty
| hard for regional mandarin. For example it really doesn't like
| the hard Rs at the end of words in far northeastern mandarin.
| It can't seem to figure out that "baoERRRR" is actually just
| "bao". That problem doesn't exist for pinyin->characters
| though)
| pm2222 wrote:
| "These findings underscore the importance for software developers
| in China to use well-supported encryption implementations such as
| TLS instead of attempting to custom design their own." So
| generally speaking established standard are scrutinized more and
| thus more trustworthy, right? I can think of all those WiFi
| encryption methods we've been through and they are all
| vulnerable, sooner or later.
| JohnFen wrote:
| > So generally speaking established standard are scrutinized
| more and thus more trustworthy, right?
|
| Yes, in large part.
|
| Also, implementing good cryptography requires specialist
| mathematical skills on par with dev skills. It's very easy to
| make a really trivial mistake such that it _looks_ like the
| crypto is solid, when it 's in fact very weak.
|
| The ability to make a trivial mistake that's hard to spot,
| combined with the high stakes involved, makes cryptography
| something that's better left to the experts.
| rdtsc wrote:
| > "Please do not make it public" (Tencent's Sogou Input Method)
| (citizenlab.ca)
|
| Ok, so they didn't make it public and the development team fixed
| the bugs.
|
| Maybe I am missing some new trend where the headline in these
| disclosures _has_ to come from the communication with the
| company. Kind of like vulnerabilities need custom websites with
| logos and cool made up names?
|
| > Even with the reported vulnerabilities now resolved, the Sogou
| app relies on transmitting typed content to Sogou's servers as
| part of its ordinary functionality.
|
| Well besides the email firewall mess back and forth, shouldn't
| that have been the main headline: "Everything you're typing on
| your keyboard is being sent to China"?
| LordShredda wrote:
| Very responsible handling of a usual cryptography failure. What's
| more impressive is tencents developers willingness to cooperate
| despite the firewalls and communication issues. Also do not make
| your own crypto algorithm
| Waterluvian wrote:
| "These findings underscore the importance for software developers
| in China to use well-supported encryption implementations such as
| TLS instead of attempting to custom design their own."
|
| I'm very interested in better understanding this. Why do they
| elect to do this? Is this just developer hubris, as found
| everywhere? Does this relate to government regulation or control,
| whether above or under the table?
| paxys wrote:
| The article says that they use both HTTP and HTTPS endpoints,
| and the exchanges using HTTPS are secure (as expected). My
| guess is they had to build their own encryption scheme paired
| with plain HTTP for older devices or those that for some reason
| weren't compatible with the latest TLS standards (which are a
| _lot_ of them).
| JohnFen wrote:
| It's pretty common for devs who are inexperienced with
| cryptography to succumb to the temptation to roll their own,
| especially if they start studying cryptography algorithms.
|
| It's always a mistake, though. This is something I had to cover
| with younger devs quite a bit back when I worked for a company
| that made heavy use of cryptography.
| 2OEH8eoCRo0 wrote:
| > Why do they elect to do this?
|
| They could be _rightly_ suspicious of a western TLS
| implementation but discovered the pitfall of writing their own.
| Could have also been intentional.
| manuelabeledo wrote:
| > They could be rightly suspicious of a western TLS
| implementation but discovered the pitfall of writing their
| own. Could have also been intentional.
|
| They could have deployed TLS with some cipher of Chinese
| origin, not like Chinese companies haven't done this before
| [0]
|
| [0] https://ciphersuite.info/cs/TLS_SM4_GCM_SM3/
| lucubratory wrote:
| If there's a zero day that's been embedded in a protocol by
| the NSA or actively used by the NSA, I normally wouldn't
| expect it to come from the actual encryption process
| itself. It would be something that choosing your own cipher
| wouldn't fix, because it would be about compromising
| security on the software level rather than the encryption
| level. There's a very good reason the PRC won't allow
| compromised Cisco routers, it wouldn't surprise me if there
| was similar thinking here, justified or not.
| hangonhn wrote:
| Developer ignorance rather than hubris. People who don't really
| know anything about cryptography has the naive and wrong
| impression that encryption renders your secret completely safe
| against anything and that only by getting the key or a major
| cipher vulnerability would the plaintext be revealed. They
| treat it like a blackbox because they don't know anything. In
| recent years, some of the crypto libraries have methods (i.e.
| Fernet) that are much safer and takes care of these issues for
| you but it's still very possible to make mistakes. I've seen
| engineers use a static IV for AES because they didn't know how
| they would be able to search for the encrypted data other than
| making the ciphertext the same for a given key and plaintext.
| Basically they severely weakened it because they didn't
| understand the purpose of a random IV. Again, they thought key
| + plaintext -> encrypt = super secure.
| newaccount74 wrote:
| My experience with TLS is that it is not trivial to use.
|
| Understanding how to use eg. OpenSSL APIs correctly to ensure
| that a connection is secure, the certificates are valid, etc.
| is not trivial. The APIs are poorly documented, hard to use,
| and many examples you can find are outdated (some OpenSSL APIs
| return different numbers on success/failure depending on
| version).
|
| The platform native libraries are not much better. For example,
| the SecTrust APIs on macOS / iOS are also poorly documented,
| hard to use, and have bugs (eg. some time ago they suddenly
| started to reject valid certificates from Google cloud for some
| reason).
|
| Also, your code is always a ticking time bomb, because TLS
| algorithms are deprecated, certificates expire, etc. So you are
| always at the risk of your client code to stop working at some
| point.
|
| So in my opinion, there are often good reasons not to use TLS.
| But if you make a mistake, everyone will say "You should have
| used TLS". I wonder what people say when they find a bug
| despite you using standard crypto?
| est31 wrote:
| For the deprecated TLS algorithms, just use a bunch of
| reverse proxies at the front using the latest Debian, CentOS,
| or Ubuntu LTS, with mostly default settings.
|
| For OpenSSL, app developers don't need it. There is OS
| builtin libraries to do http requests (which is what was done
| here).
|
| As for certificates, there is plenty of solutions allowing
| for auto-renewal. It's very easy to set up using automation.
| jsiepkes wrote:
| > I wonder what people say when they find a bug despite you
| using standard crypto?
|
| Not using TLS doesn't automatically mean you need to "roll
| your own crypto". They could have used a well documentend
| library such as Google Tink[1] instead of doing their own
| crypto.
|
| [1] https://github.com/google/tink
| manuelabeledo wrote:
| It may not be trivial to use, but I fail to understand how a
| solution to a very hard problem is better if tailored. For
| example, Open/LibreSSL are widespread, have large communities
| of both maintainers and developers, which necessarily
| subjects them to continuous audits over time.
|
| > Also, your code is always a ticking time bomb, because TLS
| algorithms are deprecated, certificates expire, etc. So you
| are always at the risk of your client code to stop working at
| some point.
|
| Certificate expiration should be handled as part of the
| configuration management lifecycle. Same goes for TLS algos.
| If you are hardcoding either of these, you are definitely
| doing something wrong.
| Nextgrid wrote:
| It's still more trivial than rolling your own?
|
| I would understand (not approve of it, but merely understand)
| completely ignoring security/authentication - _that_ would
| obviously be easier and avoid having to answer hard questions
| and make hard decisions.
|
| But here it seems like they've put even _more_ effort
| actually designing some custom encryption scheme based on
| (wrongly-applied) cryptographic primitives complete with
| custom request encapsulation format, etc. This is _more_ work
| than just swapping your TCP channel with a TLS one and
| reasonably trivial auxiliary code to load /renew
| certificates. In this case since they're running it over HTTP
| it's even easier to just put a reverse-proxy in front that
| will add HTTPS on top.
| olliej wrote:
| I'm not a huge fan of the blog title - the clear intent of the
| title is to make it sound like they didn't want any public
| disclosure, but my reading is that the first response incorrectly
| considered it low priority, and then after Tencent realized it
| was a real issue they quickly said "whoops, please don't disclose
| this as we need to fix it".
|
| It seems like this could be in part mitigated by making sure
| their server is not an oracle (though obviously fixing the
| primitives is also important, but older/non-updatable clients
| could exist).
|
| I would guess the traffic all over TLS on iOS due to "App
| Transport Security" requiring https by default - it's not a huge
| leap to turn it off, but it's controlled by the App's Info.plist
| so is trivially indexable. Also probably more work than just
| adding 's' to the protocol (at least from the PoV of the
| individual dev working on the code).
| phyzome wrote:
| Even though it's part of the original post's title, "please do
| not make it public" is an extremely misleading quote.
| capableweb wrote:
| How is it misleading exactly?
|
| > Vulnerability disclosed to IMETS@tencent.com.
|
| > Vulnerability disclosed again via Tencent Security Response
| Centre (TSRC) web portal.
|
| > Tencent: "Thank you for your interest in Tencent security.
| There is no low or low security risk for this issue. We look
| forward to your next more exciting report."
|
| > Tencent: "Sorry, my previous reply was wrong, we are dealing
| with this vulnerability, please do not make it public, thank
| you very much for your report."
|
| > Tencent's initial rejection of our disclosure and subsequent
| about-face served as inspiration for the title of this report.
|
| It's a direct quote from a Tencent reply.
| 015a wrote:
| Because they said it essentially as soon as the vulnerability
| is reported. That's an entirely reasonable thing to ask for;
| don't make this public, we're working on it. And its a
| totally normal allowance from security researchers.
|
| The title induces readers into thinking that they said this
| in some other context. Example 1: They aren't working toward
| fixing it, don't release this, lets just keep it hush hush.
| This isn't what happened. Example 2: They did fix it, but
| they didn't want the researcher to publish details of the
| problem after they fixed it. This also isn't what happened.
|
| Assuming I understand the context correctly; its absolutely
| an inflammatory title that has no place in security
| disclosure articles like this.
| netsharc wrote:
| Yeah, kinda disappointing that the CitizenLab folks are
| exploiting the (I presume) non-mastery of subtle English of
| the developers to create a "clickbait" title.
|
| If they were English speakers they would've written
| something along the lines of "We thank you that you
| respected the vulnerability disclosure policy and notified
| us. We expect you'll continue respecting the policy and not
| publish this vulnerability before we resolve the issue and
| after a period of time where the updated software has been
| uploaded."
| ysavir wrote:
| When I read the title, my impression wasn't that it regarded
| keeping a vulnerability private until fixed, but that there
| was some functionality that tencent didn't want people to
| know about.
| paxys wrote:
| Just because it is a direct quote doesn't mean it can't be
| misleading when shared without all the necessary context.
| Tencent asked for it to not be made public _during the period
| while they were actively fixing it_ and well within any
| standard vulnerability disclosure deadline.
| JohnFen wrote:
| I agree. I don't see anything here that seems out of line.
| pphysch wrote:
| [flagged]
| stefan_ wrote:
| Yes, what could be wrong with some keyboard input addon that
| sends every keypress to Tencent, and on top of that, in a
| manner trivial for a passive eavesdropper to decode?
|
| We used to call these things "keyloggers".
| myself248 wrote:
| Tencent initially misclassified the issue as not a security
| risk. Shortly after, they reconsidered and asked the
| researchers not to make it public.
___________________________________________________________________
(page generated 2023-08-09 23:02 UTC)