[HN Gopher] Data exfiltration in Keepa Price Tracker
       ___________________________________________________________________
        
       Data exfiltration in Keepa Price Tracker
        
       Author : taxyovio
       Score  : 57 points
       Date   : 2021-08-03 09:57 UTC (13 hours ago)
        
 (HTM) web link (palant.info)
 (TXT) w3m dump (palant.info)
        
       | [deleted]
        
       | mrsaint wrote:
       | And not sure if Amazon would agree to this as it essentially
       | threatens the privacy and integrity of their users.
       | Interestingly, Keepa is also an Amazon Affiliate, so they are in
       | a direct business relationship with Amazon.
        
         | patd wrote:
         | As far as I know, Keepa is not an Amazon affiliate. They used
         | to be and got kicked out like many similar tools around 5 years
         | ago.
         | 
         | They moved to the current model of providing an API for Amazon
         | data (which seems to use the extensions users to scrape data).
        
         | avipars wrote:
         | They actively warned about Honey Security Issues, but haven't
         | mentioned Keepa at all.
        
       | a254613e wrote:
       | I can't quite understand this article and its conclusion.
       | 
       | The article says: "[The extension] will collect information about
       | the products you look at and the ones you search for".
       | 
       | Yet, two sentences later it says "The company behind the
       | extension fails to comply with its legal obligations. The privacy
       | policy is misleading in claiming that no personal data is being
       | collected."
       | 
       | So which personal information is exactly included in the data
       | submitted to their servers about the products? Because in that
       | json example I don't see anything that would be even close to
       | personal information.
       | 
       | The remote scraping/execution abilities are not great, I'll give
       | it that. But the rest of it seems like overblown conclusion and
       | interpretation of how it works.
        
         | Semaphor wrote:
         | I'd assume that "products you searched for", even if only
         | implicitly thanks to the results, is personal information. It
         | also is not mentioned in their privacy policy, which only
         | mentions sending on product pages.
        
           | [deleted]
        
         | palant wrote:
         | _Note_ : I am the author of the article above.
         | 
         | The history of all Amazon products you looked at or searched
         | for _is_ personal data, and it can tell a lot about you.
         | Whether it is also personal data in the legal sense is not
         | something I can say for sure. But it definitely has to be
         | properly covered in the privacy policy, for GDPR compliance at
         | the very least.
        
           | timdorr wrote:
           | But it is not personal data that would identify you (PII). If
           | someone was able to determine who I was based solely on my
           | browsing activity on Amazon, then they've already obtained my
           | personal information.
        
             | palant wrote:
             | No, it isn't PII in the legal sense, it doesn't allow
             | identifying you directly. Which doesn't mean that it cannot
             | be tied to your identity. Just one example: if you
             | regularly post to social media what you bought online, this
             | information could be correlated with the Keepa data to find
             | out which profile is likely yours and what else you looked
             | at.
             | 
             | But GDPR doesn't merely require you to disclosure
             | collection of PII, but rather all data collected. There is
             | a good reason for that.
        
             | iamacyborg wrote:
             | PII is not a term that is used in the GDPR. The person
             | you're replying to is correct that your browsing data is
             | likely to count as personal data given that it's linked to
             | an individual.
        
       | NazakiAid wrote:
       | I use Keepa basic and it has saved me a ton of money. I always
       | just assumed it was scraping the prices from pages I visit, but I
       | didn't know it would automatically fetch Amazon pages in the
       | background. Might just sign out of Amazon, and use a separate
       | browser to purchase from it.
       | 
       | Either way, I have some thinking to do on if I should "keepa" it
       | or not (sorry really bad joke). Maybe I should purposely turn a
       | blind eye and just trust they aren't going to do anything evil
       | nor have some privacy risk due to how useful it is.
        
         | wheels wrote:
         | I use the Keepa website and never realized before this article
         | that they even have browser plugins. On the website you can set
         | up price alerts that go out via email or Telegram. That works
         | well enough for me.
        
           | NazakiAid wrote:
           | I would do that but it's very helpful to also see how often
           | the price changes and goes on sale to know if I am getting
           | "ripped off".
        
             | wheels wrote:
             | There's a price graph on their website showing the price
             | development over time.
        
         | SCNP wrote:
         | Isn't this always the trade-off? While I do appreciate useful
         | software, it gets tiring that it's almost always at the expense
         | of a little bit of privacy or tracking. Seems like the death of
         | a thousand cuts of our anonymity online. Although, I don't
         | really harbor illusions that we (at least Americans) haven't
         | been tracked since the invention of the credit card. I guess
         | I'm a little jaded at this point as there doesn't seem to be
         | anything I, personally, can do about it and I get a touch of
         | FOMO when I hear about the capabilities of the latest and
         | greatest apps. I understand that data collection is inherently
         | necessary for AI, I just don't like who's in charge of it and
         | making the innovations.
        
       | danpalmer wrote:
       | Wow, they've built a distributed Amazon listing scraping system -
       | essentially a botnet.
       | 
       | As someone who has done a lot of web scraping and had to route
       | around a lot of blocking (we have business contracts to allow
       | scraping, but they don't stop over-eager sysadmins), this feels
       | like a dream come true.
       | 
       | But I'd never actually want to use this for scraping and I'm not
       | sure any informed user would agree to use this.
        
         | voltagex_ wrote:
         | How do you get contracts to allow scraping? What kind of cost
         | are we talking about?
        
           | dewey wrote:
           | Some companies want you to list their products on your page
           | (usually with some kind of affiliate deal attached) but don't
           | have a tech team to implement a feed or an API. In that case
           | you end up in a situation where you have to scrape the data
           | yourself with permission.
        
       | dna_polymerase wrote:
       | Do you remember the time when this weird German startup that
       | publishes an Adblocker tried to start an "Acceptable Ads" program
       | and extort money from Google? Guess what their CTO is up to now.
       | 
       | Exactly. Showing the world the shady business of browser plugins.
        
       | bkor wrote:
       | From the Keepa addon settings:
       | 
       | > Allow the add-on to gather Amazon prices to improve our price
       | data
       | 
       | I thought it was common knowledge that Keepa uses the addon to
       | gather prices. Though with GDPR it probably needs to be more
       | explicitly said.
        
         | Semaphor wrote:
         | There is a difference between gathering prices and loading
         | extra URLs to gather those prices. From that text, I would not
         | assume they are using my computer as a part of a botnet.
        
           | bkor wrote:
           | I knew it was doing that as well (the distributed scraping
           | the article talks about). But I cannot figure out where I
           | read it. Maybe they used to have it somewhere on their site,
           | and now it's gone?
           | 
           | What is strange that people asked for e.g. Amazon.nl support.
           | This isn't implemented as Keepa relies on Amazon (this is
           | their answer in the forums). But if they scrape, why do they
           | still need Amazon?
        
         | palant wrote:
         | _Note_ : I am the author of the article above.
         | 
         | Nice, I didn't find this setting and I explicitly went looking
         | for it. So the settings in the "price history" graph don't
         | merely apply to the way this graph is shown. Now I need to
         | figure out what this setting is doing. Because I didn't see any
         | conditions in the code which were tied to this setting.
        
           | palant wrote:
           | Found it. This is the optOut_crawl setting and its handling
           | is entirely on the server side. So presumably if this setting
           | is set, the server will no longer send the extension any
           | instructions to scrape Amazon pages in background. Mind you,
           | it still _could_ but it probably won't.
           | 
           | Scraping data from pages you visit shouldn't be affected by
           | this.
        
       | avipars wrote:
       | thanks! Uninstalled today!
       | 
       | As well as Honey and Keepa
        
       | wilde wrote:
       | > Unless of course you don't consider the information collected
       | here personal.
       | 
       | I don't. The author even goes out of their way to point out that
       | these requests aren't generated by the user and so there's no
       | latent interest information there. I agree that they should cover
       | this behavior in the privacy policy explicitly, but there's a
       | tone of moral outrage in this piece that seems unearned.
        
         | palant wrote:
         | _Note_ : I am the author of this article.
         | 
         | I'm really unsure how you would come to this conclusion. Even
         | if you only read the summary at the beginning or only the
         | conclusions section at the end, you should notice that Keepa is
         | doing both. It will extract data from your Amazon visits
         | (personal information) and do its own scraping (merely wasting
         | your bandwidth if implemented correctly which I am unconvinced
         | of).
        
           | 45ure wrote:
           | Thanks for the article.
           | 
           | I use this extension (and the app) regularly, which activates
           | as soon as I visit Amazon in a container tab. In addition to
           | providing in-depth statistics, features like alerts via
           | Telegram have helped me hunt down bargains. I have noticed
           | the increase in network requests and bandwidth when the tab
           | is active, using basic tracking via Resource Monitor (W10).
           | However, I can easily block it via uMatrix/uBO, if required.
           | In this case, it is a trade-off, which can be justified.
           | 
           | Also, Tracker Control (Android) for Keepa app reports
           | blocking just two trackers _Google Crashlytics_ and _Google
           | Firebase Analytics_ -- so it is not as bad other apps.
           | 
           | I have used CamelCamelCamel in the past, which was more
           | egregious and aggressive in tracking users, but don't know
           | how it fares today.
           | 
           | https://camelcamelcamel.com/
        
             | palant wrote:
             | Unfortunately, it isn't that easy. You cannot use other
             | extensions to block requests happening on the extension's
             | background page. Whatever tracking and scraping is going
             | on, you can probably disable part of it via extension's
             | settings but otherwise there is nothing you can do.
        
           | wilde wrote:
           | Thanks for engaging here. Maybe my reading comprehension is
           | poor, but here's the full quote that I was objecting to. It
           | comes after a long pull quote where Keepa promises to not log
           | the requests that do contain latent interest behavior:
           | 
           | > This refers to some pieces of the Keepa functionality but
           | it once again completely omits the data collection outlined
           | here. It's reassuring to know that they don't log product
           | identifiers when showing product history, but they don't need
           | to if on another channel their extension sends far more
           | detailed data to the server. This makes the first sentence,
           | formatted as bold text, a clear lie. Unless of course you
           | don't consider the information collected here personal. I'm
           | not a lawyer, maybe in the legal sense it isn't.
           | 
           | When I was reading, I thought that "data collection outlined
           | here" referred to the scraping behavior you reverse
           | engineered, since the pull quote covered the user-generated
           | request. I agree that they should include the additional
           | scraping behavior here for clarity (we're arguing about it
           | after all). I disagree that it constitutes as a "clear lie",
           | since I don't think that data is personal.
        
             | palant wrote:
             | "Data collection outlined here" refers to both mechanisms
             | covered by the article. The first one collects information
             | about the products you look at which clearly is personal
             | information. The automated scraping in the background is
             | less problematic from the privacy protection point of view,
             | at least when it is used in the intended way.
        
       | robk wrote:
       | i don't really care - i love the plugin too much to uninstall it.
       | it's saved me a killing.
        
       | dzink wrote:
       | If the additional Amazon pages are loaded on days when the user
       | hasn't browsed Amazon, or done once a day, that could be cookie
       | stuffing, explicitly prohibited by Amazon Affiliate terms. The
       | Amazon affiliate cookies last 24 hours, so triggering a session
       | when a user doesn't do it, might extent their affiliate window
       | and is not right at all.
        
         | liquorice wrote:
         | Keepa is a data company though, not an Amazon Affiliate, so
         | they shouldn't care about violating that policy
        
       ___________________________________________________________________
       (page generated 2021-08-03 23:02 UTC)