[HN Gopher] A study of data collection by Android devices
       ___________________________________________________________________
        
       A study of data collection by Android devices
        
       Author : dede4metal
       Score  : 80 points
       Date   : 2021-10-15 09:37 UTC (13 hours ago)
        
 (HTM) web link (lwn.net)
 (TXT) w3m dump (lwn.net)
        
       | tmsbrg wrote:
       | I was curious about the LineageOS, so I checked and found:
       | 
       | "On all of the other handsets the Google Play Services and Google
       | Play store system apps send a considerable volume of data to
       | Google, the content of which is unclear, not publicly documented
       | and Google confirm there is no opt out from this data collection.
       | LineageOS collects no data beyond this data collected by Google
       | and so is perhaps the next most private choice after /e/OS."
       | 
       | So the problem is Play Services and the Play store. Note that
       | /e/OS uses MicroG[0] to replace Play Services so you can still
       | use the many Android apps that require it. It's a really cool
       | project and I think it's amazing how someone is like "Damn, I
       | need a Google account for this app? I'll just write my own
       | account manager to replace Google's!". Really that's the spirit
       | of open source, and I hope more people become empowered to be
       | able to solve problems that way in the future.
       | 
       | I think you can also use MicroG rather than OpenGapps on
       | LineageOS, though I haven't tried and haven't read anything about
       | it. Does anyone have some info about this setup?
       | 
       | [0] https://edevelopers-blog.medium.com/microg-what-you-need-
       | to-...
        
         | rashil2000 wrote:
         | You can find all the required info here -
         | 
         | https://lineage.microg.org
        
         | jqpabc123 wrote:
         | _Does anyone have some info about this setup?_
         | 
         | Yes, you can use MicroG with LineageOS. If this totally solved
         | the privacy issue, e/OS would be pointless.
         | 
         | GApps is a big upper level privacy problem that microG can
         | solve but this is not the only issue. By default, AOSP itself
         | also sends personally identifying info to Google servers
         | through low level system calls. The e/OS fork is needed in
         | order to remove this.
        
           | chuckee wrote:
           | > By default, AOSP itself also sends personally identifying
           | info to Google servers through low level system calls.
           | 
           | Can you explain more about this? What kind of information is
           | sent? And does LineageOS not disable this?
        
             | jqpabc123 wrote:
             | No, LineageOS does not disable this.
             | 
             | Personal opinion but the LineageOS project seems more
             | concerned with security than privacy. The idea of Google
             | hijacking your privacy for profit doesn't seem to really
             | bother them too much. Their web site and wiki repeatedly
             | address security but rarely privacy.
             | 
             | As for explaining further, I will defer to the document
             | below from the e foundation.
             | 
             | https://e.foundation/wp-content/uploads/2020/09/e-state-
             | of-d...
        
         | aaaxyz wrote:
         | By default lineageOS comes with neither Google play services
         | nor microg. The choice of what to install is left to the user.
         | 
         | MicroG is a bit more complicated to install since it requires
         | package signature spoofing, which official lineage builds don't
         | enable since it's seen as a security vulnerability.
        
       | 9387367 wrote:
       | Good discussion /e/ here:
       | https://community.e.foundation/t/divestos-vs-e-os-security-a...
        
       | zibzab wrote:
       | My thoughts on this. TLDR version: some are probably innocent but
       | some are worse than presented here.
       | 
       | > Key findings from the study:
       | 
       | >
       | 
       | > With the exception of e/OS, all of the handset manufacturers
       | examined collect a list of all the apps installed on a handset.
       | This is potentially sensitive information since it can reveal
       | user interests, e.g., a mental health app, a Muslim prayer app, a
       | gay dating app, a Republican news app. There is no opt out from
       | this data collection.
       | 
       | This happens when looking for updates anyway
       | 
       | > The Xiaomi handset sends details of all the app screens viewed
       | by a user to Xiaomi, including when and how long each app is
       | used. This reveals, for example, the timing and duration of phone
       | calls. The effect is akin to the use of cookies to track people's
       | activity as they move between web pages. This data appears to be
       | sent outside Europe to Singapore.
       | 
       | Can this be "standard" app analytics. Not saying its okay, but
       | this is the norm these days.
       | 
       | > On the Huawei handset the Swiftkey keyboard sends details of
       | app usage over time to Microsoft. This reveals, for example, when
       | a user is writing a text, using the search bar, searching for
       | contacts.
       | 
       | Your custom wordlist is on the cloud with them, so they can see
       | much more than that.
       | 
       | > Samsung, Xiaomi, Realme and Google collect long-lived device
       | identifiers, e.g., the hardware serial number, alongside user-
       | resettable advertising identifiers. This means that when a user
       | resets an advertising identifier the new identifier value can be
       | trivially re-linked back to the same device, potentially
       | undermining the use of user-resettable advertising identifiers.
       | 
       | This is probably a major GDPR issue.
       | 
       | Edit: could also be for guarantee reasons. The big question is if
       | this is ever used for advertising.
       | 
       | > Third-party system apps, e.g., from Google, Microsoft, LinkedIn
       | and Facebook, are pre-installed on most of the handsets and
       | silently collect data, with no opt out.
       | 
       | This is horrible!! I specifically avoid using any Facebook
       | services and we all know about their shadow profiles for users
       | who don't own a FB account.
       | 
       | (But what kind of data have the apps access to? In theory they
       | are never used and have no privileges?)
       | 
       | > There may exist a data ecosystem where data collected from a
       | handset by different companies is shared/linked. Notably, the
       | privacy focused e/OS variant of Android was observed to transmit
       | essentially no data.
       | 
       | We need more openness here.
        
         | retSava wrote:
         | > but this is the norm these days
         | 
         | Norm as in normal, but I'd not say norm as in what most people
         | expect and would accept if they knew.
         | 
         | And norms change, let's not accept a "normal" as the de facto
         | "this is how it should be" but instead work towards a better
         | norm.
         | 
         | That's the hard part, the easy part of how to do this is left
         | as an exercise for the reader.
        
         | illwrks wrote:
         | Wow. As a Xiaomi phone user, and having not read the article
         | yet, does it mention if that level of tracking happens on
         | Global Rom (Android One) versions of their phones?
        
           | gpas wrote:
           | I loved my phone until I looked at nextdns logs and noticed
           | it's reaching out to Xiaomi owned servers roughly every 30
           | minutes. It's definitely tracking something. Now nextdns has
           | a dedicated Xiaomi block list so I should be ok, but who
           | knows?
           | 
           | My bad for not checking the lineageos compatibility list
           | before buying an unsupported device.
           | 
           | Global MIUI 12.5.3 on a note 8 pro.
        
         | rhn_mk1 wrote:
         | > This happens when looking for updates anyway
         | 
         | But it doesn't have to. My Debian system does not send a list
         | of installed packages in order to get updates. It queries the
         | list of available ones.
        
         | black3r wrote:
         | > This happens when looking for updates anyway
         | 
         | I would still expect to have opt out of "looking for updates".
         | Especially since this study suggests that this data is not
         | collected by Play Store (Google) only, but also by the device
         | manufacturer (possibly by their own store app which nobody
         | really uses)
        
       | srg0 wrote:
       | LWN is a nice site, but to save you a couple of clicks, this is
       | the original post by Trinity College Dublin:
       | 
       | https://www.tcd.ie/news_events/articles/study-reveals-scale-...
       | 
       | And this is the paper it talks about (PDF):
       | 
       | https://www.scss.tcd.ie/Doug.Leith/Android_privacy_report.pd...
       | 
       | "Key findings from the study:
       | 
       | - With the exception of e/OS, all of the handset manufacturers
       | examined collect a list of all the apps installed on a handset.
       | This is potentially sensitive information since it can reveal
       | user interests, e.g., a mental health app, a Muslim prayer app, a
       | gay dating app, a Republican news app. There is no opt out from
       | this data collection.
       | 
       | - The Xiaomi handset sends details of all the app screens viewed
       | by a user to Xiaomi, including when and how long each app is
       | used. This reveals, for example, the timing and duration of phone
       | calls. The effect is akin to the use of cookies to track people's
       | activity as they move between web pages. This data appears to be
       | sent outside Europe to Singapore.
       | 
       | - On the Huawei handset the Swiftkey keyboard sends details of
       | app usage over time to Microsoft. This reveals, for example, when
       | a user is writing a text, using the search bar, searching for
       | contacts.
       | 
       | - Samsung, Xiaomi, Realme and Google collect long-lived device
       | identifiers, e.g., the hardware serial number, alongside user-
       | resettable advertising identifiers. This means that when a user
       | resets an advertising identifier the new identifier value can be
       | trivially re-linked back to the same device, potentially
       | undermining the use of user-resettable advertising identifiers.
       | 
       | - Third-party system apps, e.g., from Google, Microsoft, LinkedIn
       | and Facebook, are pre-installed on most of the handsets and
       | silently collect data, with no opt out.
       | 
       | - There may exist a data ecosystem where data collected from a
       | handset by different companies is shared/linked. Notably, the
       | privacy focused e/OS variant of Android was observed to transmit
       | essentially no data."
        
         | nervuri wrote:
         | > - With the exception of e/OS, all of the handset
         | manufacturers examined collect a list of all the apps installed
         | on a handset.
         | 
         | /e/OS is no exception. I looked at the requests made by its
         | "Apps" app. Every time it checks for updates, it tells the
         | server what applications you have installed. These requests are
         | made with a User-Agent header revealing your device model,
         | build ID and Android version. Installed languages are also sent
         | via the Accept-Language header. And there is no option to
         | disable update checks; the closest you can get is to set the
         | interval to monthly.
         | 
         | Contrast that with F-Droid, which downloads the package index
         | in advance (like apt does), so it doesn't need to send the
         | server a list of installed apps in order to check for updates.
        
         | smoldesu wrote:
         | I am curious, do iPhones not send a list of opened apps (a la
         | MacOS) back to Apple periodically? I was under the impression
         | that most phone vendors would collect statistics like that.
        
       | dartharva wrote:
       | >We find that the Samsung, Xiaomi, Huawei and Realme Android
       | variants all transmit a substantial volume of data to the OS
       | developer (i.e. Samsung etc) and to third-party parties that have
       | pre-installed system apps (including Google, Microsoft, Heytap,
       | LinkedIn, Facebook).
       | 
       | Of course they do. That's the whole reason they're selling high-
       | capacity hardware for cheap, they more than make up for their
       | foregone profits from user data and third party partnerships.
       | 
       | That's why you should always flash a custom ROM whenever you buy
       | a "value for money" Android phone; never stay on the vendor's OS.
       | Thankfully, except Samsung and Huawei, most other Android device
       | manufacturers aren't actively working on locking down their
       | firmware against customization and appear tolerant as of yet. You
       | can even choose not to install Google services on your phone,
       | although it would make using it normally a hassle.
        
       | gigel82 wrote:
       | I wish someone did this for Windows. Couldn't find anything so
       | started setting up myself (using 2 VirtualBox VMs, internal
       | networking and mitmproxy).
       | 
       | I can see the data with that setup, but it's way too much to
       | parse by a (single) human. I'll collect just the URLs for now, to
       | at least update my PiHole config to block what isn't needed for
       | Windows Update.
        
       | jccalhoun wrote:
       | So what is this data for? Because the ads I get are still nearly
       | completely irrelevant. Last week youtube showed me ads in Spanish
       | which I have zero knowledge of and ads for a company in an
       | entirely different state. If they can't use all this data to know
       | that I don't understand Spanish or what state I live in then what
       | good is all this data?
        
         | lopis wrote:
         | Like in all analytics, not all data collected is used. In fact,
         | probably 99% of data every collected is never collected.
         | Companies just want to preemptively collect it in case they
         | need it in the future.
        
           | marginalia_nu wrote:
           | A lot of this seems to be a mechanism to sooth anxieties
           | about whether you are on the right track when taking a risk.
           | Feels good to have a nice graph that points upward. In the
           | past you would have seen an oracle or an astrologer to get
           | reassurance.
           | 
           | People do it as well, they gather tons of statistics on
           | themselves like how many steps they've walked and how many
           | glasses of water they've had and how many hours they've
           | slept, which for the most part is completely non-actionable
           | information and your body is much better at telling you if
           | you are feeling well rested than your spreadsheet is. You get
           | a pretty graph for sure, but it ultimately doesn't say
           | anything you didn't already know.
           | 
           | I guess you could convince yourself this is science, but in
           | science the hypothesis precedes the experiment. If you gather
           | tons of arbitrary data and go digging for correlations, you
           | will find them, but anything interesting you find is most
           | likely going to be spurious relationships and other
           | statistical aberrations.
           | 
           | It's data dredging, not science.
        
         | zibzab wrote:
         | It could very much be that this data is not used for
         | advertising and is mostly just really horribly implemented dev
         | analytics.
         | 
         | Having read the report, I can't find any smoking guns about
         | _uses_ of the data. But we really don't know at this point.
        
           | jqpabc123 wrote:
           | _But we really don 't know at this point._
           | 
           | We do know ... or a least we should.
           | 
           | Google has said it receives tens of thousands of "geofence"
           | and "keyword" warrants each year looking to identify anyone
           | within a certain geographic area at a particular time or
           | anyone who searched for a particular keyword.
           | 
           | There are 3 pertinent points here:
           | 
           | 1) Google can absolutely identify you personally; otherwise,
           | the warrants would be useless.
           | 
           | 2) The authorities are searching info from your phone without
           | probable cause (aka "fishing").
           | 
           | 3) Innocent people have been convicted for being in the wrong
           | place at the wrong time.
           | 
           | https://techcrunch.com/2021/08/19/google-geofence-warrants/
        
         | hulitu wrote:
         | It is not used only for ads. It is also sold to other companies
         | and 3 letter agencies.
        
       | greenyoda wrote:
       | Big discussion of the original source a few days ago:
       | https://news.ycombinator.com/item?id=28830328
        
       ___________________________________________________________________
       (page generated 2021-10-15 23:02 UTC)