[HN Gopher] Tracking supermarket prices with Playwright
       ___________________________________________________________________
        
       Tracking supermarket prices with Playwright
        
       Author : sakisv
       Score  : 135 points
       Date   : 2024-08-06 17:52 UTC (5 hours ago)
        
 (HTM) web link (www.sakisv.net)
 (TXT) w3m dump (www.sakisv.net)
        
       | lotsofpulp wrote:
       | In the US, retail businesses are offering individualized and
       | general coupons via the phone apps. I wonder if this pricing can
       | be tracked, as it results in significant differences.
       | 
       | For example, I recently purchased fruit and dairy at Safeway in
       | the western US, and after I had everything I wanted, I searched
       | each item in the Safeway app, and it had coupons I could apply
       | for $1.5 to $5 off per item. The other week, my wife ran into the
       | store to buy cream cheese. While she did that, I searched the
       | item in the app, and "clipped" a $2.30 discount, so what would
       | have been $5.30 to someone that didn't use the app was $3.
       | 
       | I am looking at the receipt now, and it is showing I would have
       | spent $70 total if I did not apply the app discounts, but with
       | the app discounts, I spent $53.
       | 
       | These price obfuscation tactics are seen in many businesses,
       | making price tracking very difficult.
        
         | mcoliver wrote:
         | I wrote a chrome extension to help with this. Clips all the
         | coupons so you don't have to do individual searches. Has
         | resulted in some wild surprise savings when shopping.
         | www.throwlasso.com
        
           | Larrikin wrote:
           | This looks amazing. Do you have plans to support Firefox and
           | other browsers?
        
           | koolba wrote:
           | Ha! I have the same thing as a bookmarklet for specific
           | sites. It's fun to watch it render the clicks.
        
       | ikesau wrote:
       | Ah, I love this. Nice work!
       | 
       | I really wish supermarkets were mandated to post this information
       | whenever the price of a particular SKU updated.
       | 
       | The tools that could be built with such information would do
       | amazing things for consumers.
        
         | sakisv wrote:
         | Thanks!
         | 
         | If Greece's case is anything to go by, I doubt they'd ever
         | accept that as it may bring to light some... questionable
         | practices.
         | 
         | At some point I need to deduplicate the products and plot the
         | prices across all 3 supermarkets on the same graph as I suspect
         | it will show some interesting trends.
        
           | project2501a wrote:
           | fyi, I posted this on /r/greece
        
             | sakisv wrote:
             | Thanks!
        
         | robotnikman wrote:
         | As someone who actively works on these kind of systems, it's a
         | bit more complicated than that. The past few years we worked on
         | migrating from some old system from the 80's designed for LAN
         | use only, to a cloud based item catalogue system that finally
         | allowed us the ability to easily make pricing info more
         | available to consumers, such as through an app.
        
       | xnx wrote:
       | Scraping tools have become more powerful than ever, but bot
       | restrictions have become equally more strict. It's hard to scrape
       | reliably under any circumstance, or even consistently without
       | residential proxies.
        
         | sakisv wrote:
         | When I first started it there was a couple of instances that my
         | IP was blocked - despite being a residential IP behind CGNAT.
         | 
         | I then started randomising every aspect of the scraping process
         | that I could: The order that I visited the links, the sleep
         | duration between almost every action, etc.
         | 
         | As long as they don't implement a strict fingerprinting
         | technique, that seems to be enough for now
        
       | nosecreek wrote:
       | Very cool! I did something similar in Canada
       | (https://grocerytracker.ca/)
        
         | sakisv wrote:
         | Oh nice!
         | 
         | A thorny problem in my case is that the same item is named in 3
         | different ways between the 3 supermarkets which makes it very
         | hard and annoying to do a proper comparison.
         | 
         | Did you have a similar problem?
        
           | nosecreek wrote:
           | Absolutely! It's made it difficult to implement some of the
           | cross-retailer comparison features I would like to add. For
           | my charts I've just manually selected some products, but I've
           | also been trying to get a "good enough but not perfect"
           | string comparison algorithm working.
        
             | sakisv wrote:
             | Ah yes.
             | 
             | My approach so far has been to first extract the brand
             | names (which are also not written the same way for some
             | fcking reason!), update the strings, and then compare the
             | remaining.
             | 
             | If they have a high similarity (e.g. >95%) then they could
             | be automatically merged, and then anything between 75%-95%
             | can be reviewed manually.
        
               | victornomad wrote:
               | I am not by any mean an expert but maybe using some LLMs
               | or a sentence transformer here could help to do the job?
        
               | sakisv wrote:
               | I gave it a very quick try with chatgpt, but wasn't very
               | impressed from the results.
               | 
               | Granted it was around January, and things may have
               | progressed...
               | 
               | (But then again why take the easy approach when I can
               | waste a few afternoons playing around with string
               | comparisons)
        
             | project2501a wrote:
             | would maintaining a map of products product_x[supermarket]
             | with 2-3 values work? I don't suspect that supermarkets
             | would be very keen to change the name (but they might play
             | other dirty games)
             | 
             | I am thinking of doing the same thing for linux packages in
             | debian and fedora
        
           | seszett wrote:
           | I have built a similar system for myself, but since it's
           | small scale I just have "groups" of similar items that I
           | manually populate.
           | 
           | I have the additional problem that I want to compare products
           | across France and Belgium (Dutch-speaking side) so there is
           | no hope at all to group products automatically. My manual
           | system allows me to put together say 250g and 500g packaging
           | of the same butter, or of two of the butters that I like to
           | buy, so I can always see easily which one I should get (it's
           | often the 250g that's cheaper by weight these days).
           | 
           | Also the 42000 or so different packagings for Head and
           | Shoulders shampoo. 250ml, 270ml, 285ml, 480ml, 500ml (285ml
           | is usually cheapest). I'm pretty sure they do it on purpose
           | so each store doesn't have to match price with the others
           | because it's a "different product".
        
         | odiroot wrote:
         | Similar for Austria: https://heisse-preise.io
        
         | snac wrote:
         | Love your site! It was a great source of inspiration with the
         | amount of data you collect.
         | 
         | I did the same and made https://grocerygoose.ca/
         | 
         | Published the API endpoints that I "discovered" to make the app
         | https://github.com/snacsnoc/grocery-app (see HACKING.md)
         | 
         | It's an unfortunate state of affairs when devs like us have to
         | go to such great lengths to track the price of a commodity
         | (food).
        
       | haolez wrote:
       | I heard that some e-commerce sites will not block scrappers, but
       | poison the data shown to them (e.g. subtly wrong prices). Does
       | anyone know more about this?
        
         | barryrandall wrote:
         | I never poisoned data, but I have implemented systems where
         | clients who made requests too quickly got served data from a
         | snapshot that only updated every 15 minutes.
        
       | andrewla wrote:
       | One problem that the author notes is that so much rendering is
       | done client side via javascript.
       | 
       | The flip side to this is that very often you find that the data
       | populating the site is in a very simple JSON format to facilitate
       | easy rendering, ironically making the scraping process a lot more
       | reliable.
        
         | sakisv wrote:
         | Initially that's what I wanted to do, but the first supermarket
         | I did is sending back HTML rendered on the server side, so I
         | abandonded this approach for the sake of "consistency".
         | 
         | Lately I've been thinking to bite the bullet and Just Do It,
         | but since it's working I'm a bit reluctant to touch it.
        
           | andrewla wrote:
           | For your purposes scraping the user-visible site probably
           | makes the most sense since in the end, their users' eyes are
           | the target.
           | 
           | I am typically doing one-off scraping and for that, an
           | undocumented but clean JSON api makes things so much easier,
           | so I've grown to enjoy sites that are unnecessarily complex
           | in their rendering.
        
       | xyst wrote:
       | Would be nice to have a price transparency of goods. It would
       | make processes like this much more easier to track by store, and
       | region.
       | 
       | For example, compare the price of oat milk at different zip codes
       | and grocery stores. Additionally track "shrinkflation" (same
       | price but smaller portion).
       | 
       | On that note, it seems you are tracking price but are you also
       | checking the cost per gram (or ounce)? Manufacturer or store
       | could keep price the same but offer less to the consumer. Wonder
       | if your tool would catch this.
        
         | barbazoo wrote:
         | Grocers not putting per unit prices on the label is a pet peeve
         | of mine. I can't imagine any purpose not rooted in customer
         | hostility.
        
           | baronswindle wrote:
           | In my experience, grocers always do include unit prices...at
           | least in the USA. I've lived in Florida, Indiana, California,
           | and New York, and in 35 years of life, I can't remember ever
           | _not_ seeing the price per oz, per pound, per fl oz, etc.
           | right next to the total price for food /drink and most home
           | goods.
           | 
           | There may be some exceptions, but I'm struggling to think of
           | any except things where weight/volume aren't really relevant
           | to the value -- e.g., a sponge.
        
         | sakisv wrote:
         | I do track the price per unit (kg, lt, etc) and I was a bit on
         | the fence on whether I should show and graph that number
         | instead of the price that someone would pay at the checkout,
         | but I opted for the latter to keep it more "familiar" with the
         | prices people see.
         | 
         | Having said that, that's definitely something that I could add
         | and it would show when the shrinkflation occured if any.
        
       | hk1337 wrote:
       | I would be curious if there were a price difference between what
       | is online and physically in the store.
        
         | flir wrote:
         | Next step: monitoring the updates to those e-ink shelf edge
         | labels that are starting to crop up.
        
       | moohaad wrote:
       | Cloudflare Worker has Browser Rendering API
        
       | antman wrote:
       | Looks great. Perhaps more than 30 days comparisons would be
       | interesting. Or customizable should be fast enough with a duckdb
       | backend
        
       | brikym wrote:
       | I have been doing something similar for New Zealand since the
       | start of the year with Playwright/Typescript dumping parquet
       | files to cloud storage. I've just collecting the data I have not
       | yet displayed it. Most of the work is getting around the reverse
       | proxy services like Akamai and Cloudflare.
       | 
       | At the time I wrote it I thought nobody else was doing but now I
       | know of at least 3 start ups doing the same in NZ. It seems the
       | the inflation really stoked a lot of innovation here. The
       | patterns are about what you'd expect. Supermarkets are up to the
       | usual tricks of arbitrary making pricing as complicated as
       | possible using 'sawtooth' methods to segment time-poor people
       | from poor people. Often they'll segment on brand loyalty vs price
       | sensitive people; There might be 3 popular brands of chocolate
       | and every week only one of them will be sold at a fair price.
        
         | pikelet wrote:
         | As a kiwi, are your able to make any of these (or your)
         | projects? I'm quite interested.
        
         | walterbell wrote:
         | Those who order grocery delivery online would benefit from
         | price comparisons, because they can order from multiple stores
         | at the same time. In addition, there's only one marketplace
         | that has all the prices from different stores.
        
           | teruakohatu wrote:
           | I think the fees they tack on for online orders would ruin
           | ordering different products from different stores. It mostly
           | makes sense with staples that don't perish.
           | 
           | With fresh produce I find Pak n Save a lot more variable with
           | quality, making online orders more risky despite the lower
           | cost.
        
         | teruakohatu wrote:
         | I was planning on doing the same in NZ. I would be keen to chat
         | to you about it (email in HN profile). I am a data scientist
         | 
         | Did you notice anything pre and post Whittakers price
         | increase(s)? They must have a brilliant PR firm in retainer for
         | every major news outlet to more or less push the line that
         | increased prices are a good thing for the consumer. I noticed
         | more aggressive "sales" more recently, but unsure if I am just
         | paying more attention.
         | 
         | My prediction is that they will decrease the size of the bars
         | soon.
        
       ___________________________________________________________________
       (page generated 2024-08-06 23:00 UTC)