[HN Gopher] Data exfiltration in Keepa Price Tracker
___________________________________________________________________
Data exfiltration in Keepa Price Tracker
Author : taxyovio
Score : 57 points
Date : 2021-08-03 09:57 UTC (13 hours ago)
(HTM) web link (palant.info)
(TXT) w3m dump (palant.info)
| [deleted]
| mrsaint wrote:
| And not sure if Amazon would agree to this as it essentially
| threatens the privacy and integrity of their users.
| Interestingly, Keepa is also an Amazon Affiliate, so they are in
| a direct business relationship with Amazon.
| patd wrote:
| As far as I know, Keepa is not an Amazon affiliate. They used
| to be and got kicked out like many similar tools around 5 years
| ago.
|
| They moved to the current model of providing an API for Amazon
| data (which seems to use the extensions users to scrape data).
| avipars wrote:
| They actively warned about Honey Security Issues, but haven't
| mentioned Keepa at all.
| a254613e wrote:
| I can't quite understand this article and its conclusion.
|
| The article says: "[The extension] will collect information about
| the products you look at and the ones you search for".
|
| Yet, two sentences later it says "The company behind the
| extension fails to comply with its legal obligations. The privacy
| policy is misleading in claiming that no personal data is being
| collected."
|
| So which personal information is exactly included in the data
| submitted to their servers about the products? Because in that
| json example I don't see anything that would be even close to
| personal information.
|
| The remote scraping/execution abilities are not great, I'll give
| it that. But the rest of it seems like overblown conclusion and
| interpretation of how it works.
| Semaphor wrote:
| I'd assume that "products you searched for", even if only
| implicitly thanks to the results, is personal information. It
| also is not mentioned in their privacy policy, which only
| mentions sending on product pages.
| [deleted]
| palant wrote:
| _Note_ : I am the author of the article above.
|
| The history of all Amazon products you looked at or searched
| for _is_ personal data, and it can tell a lot about you.
| Whether it is also personal data in the legal sense is not
| something I can say for sure. But it definitely has to be
| properly covered in the privacy policy, for GDPR compliance at
| the very least.
| timdorr wrote:
| But it is not personal data that would identify you (PII). If
| someone was able to determine who I was based solely on my
| browsing activity on Amazon, then they've already obtained my
| personal information.
| palant wrote:
| No, it isn't PII in the legal sense, it doesn't allow
| identifying you directly. Which doesn't mean that it cannot
| be tied to your identity. Just one example: if you
| regularly post to social media what you bought online, this
| information could be correlated with the Keepa data to find
| out which profile is likely yours and what else you looked
| at.
|
| But GDPR doesn't merely require you to disclosure
| collection of PII, but rather all data collected. There is
| a good reason for that.
| iamacyborg wrote:
| PII is not a term that is used in the GDPR. The person
| you're replying to is correct that your browsing data is
| likely to count as personal data given that it's linked to
| an individual.
| NazakiAid wrote:
| I use Keepa basic and it has saved me a ton of money. I always
| just assumed it was scraping the prices from pages I visit, but I
| didn't know it would automatically fetch Amazon pages in the
| background. Might just sign out of Amazon, and use a separate
| browser to purchase from it.
|
| Either way, I have some thinking to do on if I should "keepa" it
| or not (sorry really bad joke). Maybe I should purposely turn a
| blind eye and just trust they aren't going to do anything evil
| nor have some privacy risk due to how useful it is.
| wheels wrote:
| I use the Keepa website and never realized before this article
| that they even have browser plugins. On the website you can set
| up price alerts that go out via email or Telegram. That works
| well enough for me.
| NazakiAid wrote:
| I would do that but it's very helpful to also see how often
| the price changes and goes on sale to know if I am getting
| "ripped off".
| wheels wrote:
| There's a price graph on their website showing the price
| development over time.
| SCNP wrote:
| Isn't this always the trade-off? While I do appreciate useful
| software, it gets tiring that it's almost always at the expense
| of a little bit of privacy or tracking. Seems like the death of
| a thousand cuts of our anonymity online. Although, I don't
| really harbor illusions that we (at least Americans) haven't
| been tracked since the invention of the credit card. I guess
| I'm a little jaded at this point as there doesn't seem to be
| anything I, personally, can do about it and I get a touch of
| FOMO when I hear about the capabilities of the latest and
| greatest apps. I understand that data collection is inherently
| necessary for AI, I just don't like who's in charge of it and
| making the innovations.
| danpalmer wrote:
| Wow, they've built a distributed Amazon listing scraping system -
| essentially a botnet.
|
| As someone who has done a lot of web scraping and had to route
| around a lot of blocking (we have business contracts to allow
| scraping, but they don't stop over-eager sysadmins), this feels
| like a dream come true.
|
| But I'd never actually want to use this for scraping and I'm not
| sure any informed user would agree to use this.
| voltagex_ wrote:
| How do you get contracts to allow scraping? What kind of cost
| are we talking about?
| dewey wrote:
| Some companies want you to list their products on your page
| (usually with some kind of affiliate deal attached) but don't
| have a tech team to implement a feed or an API. In that case
| you end up in a situation where you have to scrape the data
| yourself with permission.
| dna_polymerase wrote:
| Do you remember the time when this weird German startup that
| publishes an Adblocker tried to start an "Acceptable Ads" program
| and extort money from Google? Guess what their CTO is up to now.
|
| Exactly. Showing the world the shady business of browser plugins.
| bkor wrote:
| From the Keepa addon settings:
|
| > Allow the add-on to gather Amazon prices to improve our price
| data
|
| I thought it was common knowledge that Keepa uses the addon to
| gather prices. Though with GDPR it probably needs to be more
| explicitly said.
| Semaphor wrote:
| There is a difference between gathering prices and loading
| extra URLs to gather those prices. From that text, I would not
| assume they are using my computer as a part of a botnet.
| bkor wrote:
| I knew it was doing that as well (the distributed scraping
| the article talks about). But I cannot figure out where I
| read it. Maybe they used to have it somewhere on their site,
| and now it's gone?
|
| What is strange that people asked for e.g. Amazon.nl support.
| This isn't implemented as Keepa relies on Amazon (this is
| their answer in the forums). But if they scrape, why do they
| still need Amazon?
| palant wrote:
| _Note_ : I am the author of the article above.
|
| Nice, I didn't find this setting and I explicitly went looking
| for it. So the settings in the "price history" graph don't
| merely apply to the way this graph is shown. Now I need to
| figure out what this setting is doing. Because I didn't see any
| conditions in the code which were tied to this setting.
| palant wrote:
| Found it. This is the optOut_crawl setting and its handling
| is entirely on the server side. So presumably if this setting
| is set, the server will no longer send the extension any
| instructions to scrape Amazon pages in background. Mind you,
| it still _could_ but it probably won't.
|
| Scraping data from pages you visit shouldn't be affected by
| this.
| avipars wrote:
| thanks! Uninstalled today!
|
| As well as Honey and Keepa
| wilde wrote:
| > Unless of course you don't consider the information collected
| here personal.
|
| I don't. The author even goes out of their way to point out that
| these requests aren't generated by the user and so there's no
| latent interest information there. I agree that they should cover
| this behavior in the privacy policy explicitly, but there's a
| tone of moral outrage in this piece that seems unearned.
| palant wrote:
| _Note_ : I am the author of this article.
|
| I'm really unsure how you would come to this conclusion. Even
| if you only read the summary at the beginning or only the
| conclusions section at the end, you should notice that Keepa is
| doing both. It will extract data from your Amazon visits
| (personal information) and do its own scraping (merely wasting
| your bandwidth if implemented correctly which I am unconvinced
| of).
| 45ure wrote:
| Thanks for the article.
|
| I use this extension (and the app) regularly, which activates
| as soon as I visit Amazon in a container tab. In addition to
| providing in-depth statistics, features like alerts via
| Telegram have helped me hunt down bargains. I have noticed
| the increase in network requests and bandwidth when the tab
| is active, using basic tracking via Resource Monitor (W10).
| However, I can easily block it via uMatrix/uBO, if required.
| In this case, it is a trade-off, which can be justified.
|
| Also, Tracker Control (Android) for Keepa app reports
| blocking just two trackers _Google Crashlytics_ and _Google
| Firebase Analytics_ -- so it is not as bad other apps.
|
| I have used CamelCamelCamel in the past, which was more
| egregious and aggressive in tracking users, but don't know
| how it fares today.
|
| https://camelcamelcamel.com/
| palant wrote:
| Unfortunately, it isn't that easy. You cannot use other
| extensions to block requests happening on the extension's
| background page. Whatever tracking and scraping is going
| on, you can probably disable part of it via extension's
| settings but otherwise there is nothing you can do.
| wilde wrote:
| Thanks for engaging here. Maybe my reading comprehension is
| poor, but here's the full quote that I was objecting to. It
| comes after a long pull quote where Keepa promises to not log
| the requests that do contain latent interest behavior:
|
| > This refers to some pieces of the Keepa functionality but
| it once again completely omits the data collection outlined
| here. It's reassuring to know that they don't log product
| identifiers when showing product history, but they don't need
| to if on another channel their extension sends far more
| detailed data to the server. This makes the first sentence,
| formatted as bold text, a clear lie. Unless of course you
| don't consider the information collected here personal. I'm
| not a lawyer, maybe in the legal sense it isn't.
|
| When I was reading, I thought that "data collection outlined
| here" referred to the scraping behavior you reverse
| engineered, since the pull quote covered the user-generated
| request. I agree that they should include the additional
| scraping behavior here for clarity (we're arguing about it
| after all). I disagree that it constitutes as a "clear lie",
| since I don't think that data is personal.
| palant wrote:
| "Data collection outlined here" refers to both mechanisms
| covered by the article. The first one collects information
| about the products you look at which clearly is personal
| information. The automated scraping in the background is
| less problematic from the privacy protection point of view,
| at least when it is used in the intended way.
| robk wrote:
| i don't really care - i love the plugin too much to uninstall it.
| it's saved me a killing.
| dzink wrote:
| If the additional Amazon pages are loaded on days when the user
| hasn't browsed Amazon, or done once a day, that could be cookie
| stuffing, explicitly prohibited by Amazon Affiliate terms. The
| Amazon affiliate cookies last 24 hours, so triggering a session
| when a user doesn't do it, might extent their affiliate window
| and is not right at all.
| liquorice wrote:
| Keepa is a data company though, not an Amazon Affiliate, so
| they shouldn't care about violating that policy
___________________________________________________________________
(page generated 2021-08-03 23:02 UTC)