[HN Gopher] A bug that spoke Russian and crashed my app
       ___________________________________________________________________
        
       A bug that spoke Russian and crashed my app
        
       Author : zaldih
       Score  : 44 points
       Date   : 2024-05-05 12:01 UTC (1 days ago)
        
 (HTM) web link (imzaldih.com)
 (TXT) w3m dump (imzaldih.com)
        
       | gunapologist99 wrote:
       | > If the device is in Russian, Locale.getDefault() returns
       | "russkii", and it seems that SQLite doesn't like this for some
       | reason.
       | 
       | > I did more tests, and it seems that out of the over 100
       | languages Android supports, only Russian encoding broke SQLite.
       | Not even Chinese, which we typically think of as the most
       | complicated (and also I tested changing to this language much
       | earlier before trying the other).
       | 
       | Is this an issue?
        
         | Joker_vD wrote:
         | According to some people on the Internet [0][1], it is.
         | Apparently, Java's (Android's?) Locale.getLanguage() is broken
         | for Russian locale and instead of returning "ru" as the docs
         | promise [2], it returns "russkii".
         | 
         | [0] https://stackoverflow.com/questions/46916210/cannot-get-
         | writ...
         | 
         | [1] https://github.com/google/ExoPlayer/issues/8251
         | 
         | [2]
         | https://docs.oracle.com/javase/8/docs/api/java/util/Locale.h...
        
           | Arnavion wrote:
           | Except that TFA is using `getDisplayLanguage`
           | public void updateAppLanguage(Context context) {
           | String languageCode =
           | Locale.getDefault().getDisplayLanguage();             Locale
           | locale = new Locale(languageCode);
           | Locale.setDefault(locale);
           | 
           | ... which the JDK docs say:
           | 
           | >Returns a name for the locale's language that is appropriate
           | for display to the user.
           | 
           | So maybe `getLanguage` is broken or maybe it isn't, but it
           | seems wrong to use `getDisplayLanguage` for this purpose
           | regardless.
           | 
           | (What even is the point of doing `Locale.setDefault()` with
           | the same Locale that `Locale.getDefault()` returned? I don't
           | know anything about Android.)
        
             | david_allison wrote:
             | > (What even is the point of doing `Locale.setDefault()`
             | with the same Locale that `Locale.getDefault()` returned? I
             | don't know anything about Android.)
             | 
             | The code intends to trim the country, variant and any
             | extensions from the value returned from
             | `Locale.getDefault()`
             | 
             | Example: "Chinese (Traditional) - Taiwan" to "Chinese"
             | 
             | Pseudocode:
             | 
             | `zh_Hant_TW` -> `zh`
             | 
             | It also normalizes to fix some Android internal mapping
             | issues due to changes in ISO 639 (Hebrew can be `iw` or
             | `he[b]`)
        
               | ivan_gammel wrote:
               | Does it make sense though to do it this way? I18n
               | framework that loads resources should support fallback
               | internally in case there's no direct match.
        
               | david_allison wrote:
               | Limiting, but it looks like a valid solution. Android's
               | native i18n is a mess.
               | 
               | We went down the route of duplicating strings between
               | `/values-heb` and `/values-iw` to fix the platform
               | issues, even though the change from `iw` -> `heb` was
               | enacted in ISO 639:1989.[0]
               | 
               | OP's change is questionable in that it may block regional
               | dialects: `[zh-CN, zh-TW]; [pt-PT, pt-BR]` in case the
               | system default is set incorrectly, but it will suffice
               | for most Android apps which will realistically only be in
               | a small subset of 'easy' languages.
               | 
               | [0] https://xml.coverpages.org/iso639a.html (revised
               | 1989)
        
             | rhdunn wrote:
             | For language:
             | 
             | 1. `getLanguage` should return the 2-letter ISO 3166-1
             | language code, or an empty string if missing. -- e.g. en,
             | ru, fr, etc.
             | 
             | 2. `getISO3Language` should return the 3-letter ISO 3166-1
             | language code, or an empty string if missing. -- e.g. eng,
             | rus, fra, etc.
             | 
             | 1. `getDisplayLanguage` should return the language display
             | name in the current locale, or an empty string if missing.
             | -- e.g. English, Russian, French, etc.
        
               | DiggyJohnson wrote:
               | Am I correct in understanding that `getDisplayLanguage`
               | should return "Russian" if the current locale is US
               | English but will return "Russe" if set to French?
               | 
               | Btw you have a formatting error. The third numbered point
               | is listed as another '1.'.
        
               | rhdunn wrote:
               | Yes, it would return "Russe" when the current locale is
               | French. -- It is designed for displaying the locale to
               | the user on a UI, console output, etc.
        
           | ivan_gammel wrote:
           | I checked your links. Java and Android are not broken. SO and
           | Github examples just do it wrong, as can be seen in javadoc
           | [2]. You are not supposed to invoke constructor with
           | localized language name. It's a bit silly to assume that it
           | could work.
        
       | david_allison wrote:
       | This isn't fixed properly
       | 
       | The linked code is:                       String languageCode =
       | Locale.getDefault().getDisplayLanguage();             Locale
       | locale = new Locale(languageCode);
       | Locale.setDefault(locale);
       | 
       | But `getDisplayLanguage()` is defined as returning the display
       | name, not the language code:
       | 
       | > if the locale is en_US and the default DISPLAY locale is fr_FR,
       | getDisplayLanguage() will return "anglais"
       | 
       | https://developer.android.com/reference/java/util/Locale#get...()
       | 
       | ----
       | 
       | This then proceeds to create a locale with a corrupt language
       | code. The language parameter of `Locale` is not validated
       | (besides a null check), and using a corrupt Locale in
       | `Locale.setDefault()` will cause a number of bugs
        
         | canucker2016 wrote:
         | I agree. Locale.getLanguage() returns the language code.
         | 
         | The specific Locale() constructor used in the code is
         | documented as taking a parameter: "An ISO 639 alpha-2 or
         | alpha-3 language code, or a language subtag up to 8 characters
         | in length. See the Locale class description about valid
         | language values."
         | 
         | I'd guess the app code lucked out on the string returned from
         | getDisplayLanguage() getting mapped to the appropriate language
         | in all the other cases that the app supports.
         | 
         | Now if getLanguage() had been renamed to getLanguageCode() then
         | maybe a dev might have had the light bulb turn on, but random
         | string returned from an API passed to another API passes the
         | compile test and the app doesn't blow up, so we're good to
         | go...
         | 
         | Searching for the offending incorrect code on the web doesn't
         | return any matching "sample" code, so I'm not sure where the
         | app got the bug from.
        
           | marcellus23 wrote:
           | For me the light bulb here is the word `display`. The method
           | is called `getDisplayLanguage` because it's returning the
           | language name for display to the user. Even if I didn't
           | already know that, I'd be asking myself as I was typing the
           | code: "wait, what does 'display' mean here? Why would they
           | name it that? I'd better check the docs..."
        
           | ryandrake wrote:
           | Would stronger typing help here?
           | 
           | If I were to design those APIs, I'd maybe suggest a more
           | opaque type called "LanguageCode" that is NOT a String and
           | could not be returned from or passed as a String. Then
           | getLanguage() could return a LanguageCode type, and the
           | Locale() constructor could take the LanguageCode type as its
           | parameter, and then tooling could enforce that you can't mix
           | LanguageCode with String.
           | 
           | Then, calling Locale's constructor with the return value from
           | getDisplayLanguage() would produce an easy-to-find error.
        
       | zaldih wrote:
       | Thank you very much for your comments guys! I am definitely
       | learning a lot.
       | 
       | I was encouraged to write the post just in case someone happened
       | to have the same problem, hopefully ended up on the site and
       | could find some light here.
       | 
       | Now thanks to you I realise that I made the mistake of using
       | getDisplayLanguage() instead of getLanguage(). The former returns
       | the text of the language. The second one returns the universal
       | code of the language.
       | 
       | As I said at the end, I'm not an Android developer and I guess in
       | my haste and tiredness I settled for the first thing that seemed
       | to work without much thought to good practice or whether I was
       | actually getting the language code or not. A very silly bug but
       | as it worked most of the time for some reason, it was almost
       | impossible to find the first time. And of course, snipped was a
       | simplification of a more complex system with persistence handling
       | and so on, which made it not so obvious.
       | 
       | I certainly think all of us here won't make this mistake in the
       | future :)
        
       ___________________________________________________________________
       (page generated 2024-05-06 23:01 UTC)