Reprinted from TidBITS by permission; reuse governed by Creative Commons
license BY-NC-ND 3.0. TidBITS has offered years of thoughtful commentary
on Apple and Internet topics. For free email subscriptions and access to the
entire TidBITS archive, visit http://www.tidbits.com/


Former Apple Engineer: Here's Why I Trust Apple's COVID-19 Notification Proposal

   David Shayer

   We all use apps. We know they capture information about us. But exactly
   how much information? I've worked as a software engineer at Apple and
   at a mid-sized tech company. I've seen the good and the bad. And my
   experience at Apple makes me far more comfortable with the system Apple
   and Google have proposed for COVID-19 exposure notification. Here's
   why.

Apple Respects User Privacy

   When I worked on the Apple Watch, one of my assignments was to record
   how many times the Weather and Stocks apps were launched and report
   that back to Apple. Recording how many times each app is launched is
   simple. But reporting that data back to Apple is much more complex.

   Apple emphasizes that its programmers should keep customer security and
   privacy in mind at all times. There are a few basic rules, the two most
   relevant of which are:
     * Collect information only for a legitimate business purpose
     * Don't collect more information than you need for that purpose

   That second one could use a little expansion. If you're gathering
   general usage data (how often do people check the weather?), you can't
   accidentally collect something that could identify the user, like the
   city they're looking up. I didn't realize how tightly Apple enforces
   these rules until I was assigned to record user data.

   Once I had recorded how many times the Weather and Stocks apps were
   launched, I set up Apple's internal framework for reporting data back
   to the company. My first revelation was that the framework strongly
   encouraged you to transmit back numbers, not strings (words). By not
   reporting strings, your code can't inadvertently record the user's name
   or email address. You're specifically warned not to record file paths,
   which can include the user's name (such as
   /Users/David/Documents/MySpreadsheet.numbers). You also aren't allowed
   to play tricks like encoding letters as numbers to send back strings
   (like A = 65, B = 66, etc.)

   Next, I learned I couldn't check my code into Apple's source control
   system until the privacy review committee had inspected and approved
   it. This wasn't as daunting as it sounds. A few senior engineers wanted
   a written justification for the data I was recording and for the
   business purpose. They also reviewed my code to make sure I wasn't
   accidentally recording more than intended.

   Once I had been approved to use Apple's data reporting framework, I was
   allowed to check my code into the source control system. If I had tried
   to check my code into source control without approval, the build server
   would have refused to build it.

   When the next beta build of watchOS came out, I could see on our
   reporting dashboard how many times the Weather and Stocks apps were
   launched each day, listed by OS version. But nothing more. Mission
   accomplished, privacy maintained.

TechCo Largely Ignores User Privacy

   I also wrote iPhone apps for a mid-size technology company that shall
   remain nameless. You've likely heard of it, though, and it has several
   thousand employees and several billion dollars in revenue. Call it
   TechCo, in part because its approach to user privacy is unfortunately
   all too common in the industry. It cared much less about user privacy
   than Apple.

   The app I worked on recorded every user interaction and reported that
   data back to a central server. Every time you performed some action,
   the app captured what screen you were on and what button you tapped.
   There was no attempt to minimize the data being captured, nor to
   anonymize it. Every record sent back included the user's IP address,
   username, real name, language and region, timestamp, iPhone model, and
   lots more.

   Keep in mind that this behavior was in no way malicious. The company's
   goal wasn't to surveil their users. Instead, the marketing department
   just wanted to know what features were most popular and how they were
   used. Most important, the marketers wanted to know where people fell
   out of the 'funnel.'

   When you buy something online, the purchase process is called a funnel.
   First, you look at a product, say a pair of sneakers. You add the
   sneakers to your shopping cart and click the buy button. Then you enter
   your name, address, and credit card, and finally, you click Purchase.

   At every stage of the process, people fall out. They decide they don't
   really want to spend $100 on new sneakers, or their kids run in to show
   them something, or their spouse tells them that dinner is ready.
   Whatever the reason, they forget about the sneakers and never complete
   the purchase. It's called a funnel because it narrows like a funnel,
   with fewer people successfully progressing through each stage to the
   end.

   Companies spend a lot of time figuring out why people fall out at each
   stage in the funnel. Reducing the number of stages reduces how many
   opportunities there are to fall out. For instance, remembering your
   name and address from a previous order and auto-filling it means you
   don't have to re-enter that information, which reduces the chance that
   you'll fall out of the process at that point. The ultimate reduction is
   Amazon's patented 1-Click ordering. Click a single button, and those
   sneakers are on their way to you.

   TechCo's marketing department wanted more data on why people fell out
   of the funnel, which they would then use to tune the funnel and sell
   more product. Unfortunately, they never thought about user privacy as
   they collected this data.

   Most of the data wasn't collected by code that we wrote ourselves, but
   by third-party libraries we added to our app. Google Firebase is the
   most popular library for collecting user data, but there are dozens of
   others. We had a half-dozen of these libraries in our app. Even though
   they provided roughly similar features, each collected some unique
   piece of data that marketing wanted, so we had to add it.

   The data was stored in a big database that was searchable by any
   engineer. This was useful for verifying our code was working as
   intended. I could launch our app, tap through a few screens, and look
   at my account in the database to make sure my actions were recorded
   correctly. However, the database hadn't been designed to
   compartmentalize access'everyone with any access could view all the
   information in it. I could just as easily look up the actions of any of
   our users. I could see their real names and IP addresses, when they
   logged on and off, what actions they took, and what products they paid
   for.

   Some of the more senior engineers and I knew this was bad security, and
   we told TechCo management that it should be improved. Test data should
   be accessible to all engineers, but production user data shouldn't be.
   Real names and IP addresses should be stored in a separate secure
   database; the general database should key off non-identifying user IDs.
   Data that's not needed for a specific business purpose shouldn't be
   collected at all.

   But marketing preferred the kitchen sink approach, hoovering up all
   available data. From a functional standpoint, the marketers weren't
   being entirely unreasonable, because that extra data allowed them to go
   back and answer questions about user patterns they hadn't thought of
   when we wrote the app. But just because something can be done doesn't
   mean it should be done. Our security complaints were ignored, and we
   eventually stopped complaining.

   The app hadn't been released outside the US when I worked on it. It
   probably isn't legal under the European General Data Protection
   Regulation (also known as GDPR'see Geoff Duncan's article, 'Europe's[1]
   General Data Protection Regulation Makes Privacy Global,' 2 May 2018).
   I presume it will be modified before TechCo releases it in Europe. The
   app also doesn't comply with the [2]California Consumer Privacy Act
   (CCPA), which aims to allow California residents to know what data is
   being collected and control its use in certain ways. So it may be
   changing in a big way to accommodate GDPR and CCPA soon.

Privacy Is Baked into the COVID-19 Exposure Notification Proposal

   With those two stories in mind, consider the COVID-19 exposure
   notification technology proposed by Apple and Google. This proposal
   isn't about explicit contact tracing: it doesn't identify you or anyone
   with whom you came in contact.

   (My explanation below is based on published descriptions, such as Glenn
   Fleishman's article, '[3]Apple and Google Partner for
   Privacy-Preserving COVID-19 Contact Tracing and Notification,' 10 April
   2020. Apple and Google have continued to tweak elements of the project;
   read that article's comments for major updates. Glenn has also received
   ongoing briefing information from the Apple/Google partnership, and he
   vetted this retelling.)

   The current draft of the proposal has a very Apple privacy-aware feel.
   Participation in both recording and broadcasting information is opt-in,
   as is your choice to report if you receive a positive COVID-19
   diagnosis. Your phone doesn't broadcast any personal information about
   you. Instead, it creates a Bluetooth beacon with a unique ID that can't
   be tracked back to you. The ID is derived from a randomly generated
   diagnosis encryption key generated fresh every 24 hours and stored only
   on your phone. Even that ID isn't trackable: it changes every 15
   minutes, so it can't be used by itself to identify your phone. Only the
   last 14 keys'14 days' worth'are retained.

   Your phone records all identifiers it picks up from other phones in
   your vicinity, but not the location where it recorded them. The list of
   Bluetooth IDs you've encountered is stored on your phone, not sent to a
   central server. (Apple and Google confirmed recently that [4]they won't
   approve any app that uses this contact-notification system and also
   records location.)

   If at some point, you test positive for COVID-19, you then use a public
   health authority app that can interact with Apple and Google's
   framework to report your diagnosis. You will likely have to enter a
   code or other information to validate the diagnosis to avoid the apps
   being used for fake reporting, which would cause unnecessary trouble
   and undermine confidence in the system.

   When the app confirms your diagnosis, it triggers your phone to upload
   as many as the last 14 days of daily encryption keys to the Apple and
   Google-controlled servers, although fewer might be uploaded based on
   when exposure could have occurred.

   If you have the service turned on, your phone constantly downloads any
   daily diagnosis keys that confirmed people's devices have posted. Your
   phone then performs cryptographic operations to see if it can match
   derived IDs from each key against any Bluetooth identifiers captured
   during the same period covered by the key. If so, you were in proximity
   and will receive a notification. (Proximity is a complicated question,
   because of Bluetooth's range and how devices far apart might measure as
   close together.) Even without an app installed, you will receive a
   message from the smartphone operating system; with an app, you receive
   more detailed instructions.

   At no time does the server know anyone's name or location, just a set
   of randomly generated encryption keys'you don't even get the exact
   Bluetooth beacons, which might let someone identify you from public
   spaces. In fact, your phone never sends any data to the server unless
   you prove to the app you tested positive for COVID-19. Even if a hacker
   or overzealous government agency were to take over the server, they
   couldn't identify the users. Because your phone dumps all keys over 14
   days old, even cracking your phone would reveal little long-term
   information.

   In reality, there would be more than one server, and the process is
   more complicated. This is a broad outline that shows how Apple and
   Google are building privacy in from the very beginning to avoid the
   kinds of mistakes made by TechCo.

   Apple claims to respect user privacy, and my experience indicates
   that's true. I'm much more willing to trust a system developed by Apple
   than one created by any other company or government. It's not that
   another company or government would be trying to abuse user privacy;
   it's just that outside of Apple, too many organizations either lack the
   understanding of what it means to bake privacy in from the start or
   have competing interests that undermine efforts to do the right thing.s

References

   1. https://tidbits.com/2018/05/02/europes-general-data-protection-regulation-makes-privacy-global/
   2. https://en.wikipedia.org/wiki/California_Consumer_Privacy_Act
   3. https://tidbits.com/2020/04/10/apple-and-google-partner-for-privacy-preserving-covid-19-contact-tracing-and-notification/
   4. https://www.reuters.com/article/us-health-coronavirus-usa-apps/apple-google-ban-use-of-location-tracking-in-contact-tracing-apps-idUSKBN22G28W

.