[HN Gopher] Tracking supermarket prices with Playwright
___________________________________________________________________
Tracking supermarket prices with Playwright
Author : sakisv
Score : 135 points
Date : 2024-08-06 17:52 UTC (5 hours ago)
(HTM) web link (www.sakisv.net)
(TXT) w3m dump (www.sakisv.net)
| lotsofpulp wrote:
| In the US, retail businesses are offering individualized and
| general coupons via the phone apps. I wonder if this pricing can
| be tracked, as it results in significant differences.
|
| For example, I recently purchased fruit and dairy at Safeway in
| the western US, and after I had everything I wanted, I searched
| each item in the Safeway app, and it had coupons I could apply
| for $1.5 to $5 off per item. The other week, my wife ran into the
| store to buy cream cheese. While she did that, I searched the
| item in the app, and "clipped" a $2.30 discount, so what would
| have been $5.30 to someone that didn't use the app was $3.
|
| I am looking at the receipt now, and it is showing I would have
| spent $70 total if I did not apply the app discounts, but with
| the app discounts, I spent $53.
|
| These price obfuscation tactics are seen in many businesses,
| making price tracking very difficult.
| mcoliver wrote:
| I wrote a chrome extension to help with this. Clips all the
| coupons so you don't have to do individual searches. Has
| resulted in some wild surprise savings when shopping.
| www.throwlasso.com
| Larrikin wrote:
| This looks amazing. Do you have plans to support Firefox and
| other browsers?
| koolba wrote:
| Ha! I have the same thing as a bookmarklet for specific
| sites. It's fun to watch it render the clicks.
| ikesau wrote:
| Ah, I love this. Nice work!
|
| I really wish supermarkets were mandated to post this information
| whenever the price of a particular SKU updated.
|
| The tools that could be built with such information would do
| amazing things for consumers.
| sakisv wrote:
| Thanks!
|
| If Greece's case is anything to go by, I doubt they'd ever
| accept that as it may bring to light some... questionable
| practices.
|
| At some point I need to deduplicate the products and plot the
| prices across all 3 supermarkets on the same graph as I suspect
| it will show some interesting trends.
| project2501a wrote:
| fyi, I posted this on /r/greece
| sakisv wrote:
| Thanks!
| robotnikman wrote:
| As someone who actively works on these kind of systems, it's a
| bit more complicated than that. The past few years we worked on
| migrating from some old system from the 80's designed for LAN
| use only, to a cloud based item catalogue system that finally
| allowed us the ability to easily make pricing info more
| available to consumers, such as through an app.
| xnx wrote:
| Scraping tools have become more powerful than ever, but bot
| restrictions have become equally more strict. It's hard to scrape
| reliably under any circumstance, or even consistently without
| residential proxies.
| sakisv wrote:
| When I first started it there was a couple of instances that my
| IP was blocked - despite being a residential IP behind CGNAT.
|
| I then started randomising every aspect of the scraping process
| that I could: The order that I visited the links, the sleep
| duration between almost every action, etc.
|
| As long as they don't implement a strict fingerprinting
| technique, that seems to be enough for now
| nosecreek wrote:
| Very cool! I did something similar in Canada
| (https://grocerytracker.ca/)
| sakisv wrote:
| Oh nice!
|
| A thorny problem in my case is that the same item is named in 3
| different ways between the 3 supermarkets which makes it very
| hard and annoying to do a proper comparison.
|
| Did you have a similar problem?
| nosecreek wrote:
| Absolutely! It's made it difficult to implement some of the
| cross-retailer comparison features I would like to add. For
| my charts I've just manually selected some products, but I've
| also been trying to get a "good enough but not perfect"
| string comparison algorithm working.
| sakisv wrote:
| Ah yes.
|
| My approach so far has been to first extract the brand
| names (which are also not written the same way for some
| fcking reason!), update the strings, and then compare the
| remaining.
|
| If they have a high similarity (e.g. >95%) then they could
| be automatically merged, and then anything between 75%-95%
| can be reviewed manually.
| victornomad wrote:
| I am not by any mean an expert but maybe using some LLMs
| or a sentence transformer here could help to do the job?
| sakisv wrote:
| I gave it a very quick try with chatgpt, but wasn't very
| impressed from the results.
|
| Granted it was around January, and things may have
| progressed...
|
| (But then again why take the easy approach when I can
| waste a few afternoons playing around with string
| comparisons)
| project2501a wrote:
| would maintaining a map of products product_x[supermarket]
| with 2-3 values work? I don't suspect that supermarkets
| would be very keen to change the name (but they might play
| other dirty games)
|
| I am thinking of doing the same thing for linux packages in
| debian and fedora
| seszett wrote:
| I have built a similar system for myself, but since it's
| small scale I just have "groups" of similar items that I
| manually populate.
|
| I have the additional problem that I want to compare products
| across France and Belgium (Dutch-speaking side) so there is
| no hope at all to group products automatically. My manual
| system allows me to put together say 250g and 500g packaging
| of the same butter, or of two of the butters that I like to
| buy, so I can always see easily which one I should get (it's
| often the 250g that's cheaper by weight these days).
|
| Also the 42000 or so different packagings for Head and
| Shoulders shampoo. 250ml, 270ml, 285ml, 480ml, 500ml (285ml
| is usually cheapest). I'm pretty sure they do it on purpose
| so each store doesn't have to match price with the others
| because it's a "different product".
| odiroot wrote:
| Similar for Austria: https://heisse-preise.io
| snac wrote:
| Love your site! It was a great source of inspiration with the
| amount of data you collect.
|
| I did the same and made https://grocerygoose.ca/
|
| Published the API endpoints that I "discovered" to make the app
| https://github.com/snacsnoc/grocery-app (see HACKING.md)
|
| It's an unfortunate state of affairs when devs like us have to
| go to such great lengths to track the price of a commodity
| (food).
| haolez wrote:
| I heard that some e-commerce sites will not block scrappers, but
| poison the data shown to them (e.g. subtly wrong prices). Does
| anyone know more about this?
| barryrandall wrote:
| I never poisoned data, but I have implemented systems where
| clients who made requests too quickly got served data from a
| snapshot that only updated every 15 minutes.
| andrewla wrote:
| One problem that the author notes is that so much rendering is
| done client side via javascript.
|
| The flip side to this is that very often you find that the data
| populating the site is in a very simple JSON format to facilitate
| easy rendering, ironically making the scraping process a lot more
| reliable.
| sakisv wrote:
| Initially that's what I wanted to do, but the first supermarket
| I did is sending back HTML rendered on the server side, so I
| abandonded this approach for the sake of "consistency".
|
| Lately I've been thinking to bite the bullet and Just Do It,
| but since it's working I'm a bit reluctant to touch it.
| andrewla wrote:
| For your purposes scraping the user-visible site probably
| makes the most sense since in the end, their users' eyes are
| the target.
|
| I am typically doing one-off scraping and for that, an
| undocumented but clean JSON api makes things so much easier,
| so I've grown to enjoy sites that are unnecessarily complex
| in their rendering.
| xyst wrote:
| Would be nice to have a price transparency of goods. It would
| make processes like this much more easier to track by store, and
| region.
|
| For example, compare the price of oat milk at different zip codes
| and grocery stores. Additionally track "shrinkflation" (same
| price but smaller portion).
|
| On that note, it seems you are tracking price but are you also
| checking the cost per gram (or ounce)? Manufacturer or store
| could keep price the same but offer less to the consumer. Wonder
| if your tool would catch this.
| barbazoo wrote:
| Grocers not putting per unit prices on the label is a pet peeve
| of mine. I can't imagine any purpose not rooted in customer
| hostility.
| baronswindle wrote:
| In my experience, grocers always do include unit prices...at
| least in the USA. I've lived in Florida, Indiana, California,
| and New York, and in 35 years of life, I can't remember ever
| _not_ seeing the price per oz, per pound, per fl oz, etc.
| right next to the total price for food /drink and most home
| goods.
|
| There may be some exceptions, but I'm struggling to think of
| any except things where weight/volume aren't really relevant
| to the value -- e.g., a sponge.
| sakisv wrote:
| I do track the price per unit (kg, lt, etc) and I was a bit on
| the fence on whether I should show and graph that number
| instead of the price that someone would pay at the checkout,
| but I opted for the latter to keep it more "familiar" with the
| prices people see.
|
| Having said that, that's definitely something that I could add
| and it would show when the shrinkflation occured if any.
| hk1337 wrote:
| I would be curious if there were a price difference between what
| is online and physically in the store.
| flir wrote:
| Next step: monitoring the updates to those e-ink shelf edge
| labels that are starting to crop up.
| moohaad wrote:
| Cloudflare Worker has Browser Rendering API
| antman wrote:
| Looks great. Perhaps more than 30 days comparisons would be
| interesting. Or customizable should be fast enough with a duckdb
| backend
| brikym wrote:
| I have been doing something similar for New Zealand since the
| start of the year with Playwright/Typescript dumping parquet
| files to cloud storage. I've just collecting the data I have not
| yet displayed it. Most of the work is getting around the reverse
| proxy services like Akamai and Cloudflare.
|
| At the time I wrote it I thought nobody else was doing but now I
| know of at least 3 start ups doing the same in NZ. It seems the
| the inflation really stoked a lot of innovation here. The
| patterns are about what you'd expect. Supermarkets are up to the
| usual tricks of arbitrary making pricing as complicated as
| possible using 'sawtooth' methods to segment time-poor people
| from poor people. Often they'll segment on brand loyalty vs price
| sensitive people; There might be 3 popular brands of chocolate
| and every week only one of them will be sold at a fair price.
| pikelet wrote:
| As a kiwi, are your able to make any of these (or your)
| projects? I'm quite interested.
| walterbell wrote:
| Those who order grocery delivery online would benefit from
| price comparisons, because they can order from multiple stores
| at the same time. In addition, there's only one marketplace
| that has all the prices from different stores.
| teruakohatu wrote:
| I think the fees they tack on for online orders would ruin
| ordering different products from different stores. It mostly
| makes sense with staples that don't perish.
|
| With fresh produce I find Pak n Save a lot more variable with
| quality, making online orders more risky despite the lower
| cost.
| teruakohatu wrote:
| I was planning on doing the same in NZ. I would be keen to chat
| to you about it (email in HN profile). I am a data scientist
|
| Did you notice anything pre and post Whittakers price
| increase(s)? They must have a brilliant PR firm in retainer for
| every major news outlet to more or less push the line that
| increased prices are a good thing for the consumer. I noticed
| more aggressive "sales" more recently, but unsure if I am just
| paying more attention.
|
| My prediction is that they will decrease the size of the bars
| soon.
___________________________________________________________________
(page generated 2024-08-06 23:00 UTC)