[HN Gopher] Show HN: Use cookies from Chrome (CDP) in cURL witho...
___________________________________________________________________
Show HN: Use cookies from Chrome (CDP) in cURL without copy pasting
Author : fipso
Score : 125 points
Date : 2023-04-01 11:03 UTC (11 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| juujian wrote:
| I have done a lot of scraping in the past. Cookies are a pain,
| this is a really elegant solution. Of course the biggest problem
| is that everything interesting is hidden away behind JavaScript
| these days and then you have to resort to Selenium and the whole
| thing just spirals out of control. But I'm looking forward to
| giving this a shot for non-JavaScript content in the future.
|
| edit: JavaScript not Java
| [deleted]
| mkl wrote:
| Do you mean JavaScript? I have never run into content hidden by
| Java, but many pages load content dynamically using JavaScript.
|
| I have found it's quite easy to snoop on those JavaScript API
| requests using the Network tab of Chrome Devtools, then copy
| the network request as a curl command for bash scripts or as
| JavaScript for browser extensions.
| tomashubelbauer wrote:
| > I have never run into content hidden by Java
|
| Tongue in cheek: You'd never know - servers running Java code
| generating HTML pages have probably conditionally not-
| rendered many pieces of HTML that you've never come across in
| your browsing :)
| ghqst wrote:
| Yeah, you can sometimes find the API or find data sent in
| JavaScript but not in prerendered HTML, which can save you
| the pain of headless scraping.
| juujian wrote:
| I do mean JavaScript. Not sure how many times I have made
| that mistake... And great advice, that sounds like a neat
| approach.
| bdcravens wrote:
| If you'd be standing up CDP to grab the cookies, you'd probably
| use Puppeteer or Playwright instead of Selenium.
| juujian wrote:
| Appreciate the recommendation, I just used whatever python
| had to offer, Puppeteer looks promising though!
| bdcravens wrote:
| Using the tools at hand is often the best approach. That
| said, I've spent most of the last 13 years of my career
| automating browsers. For years, I used Selenium with a
| variety of libraries. After switching to
| Puppeteer/Playwright, I have zero interest in going back
| lol. Playwright actually has first party Python support.
| (Puppeteer has a port called Pyppeteer, but it's no longer
| maintained and the author recommends using Playwright)
|
| https://playwright.dev/python/
| rgrieselhuber wrote:
| I second Playwright, it's amazing.
| robertlagrant wrote:
| Third.
| berkle4455 wrote:
| Javascript is delivered as text and sends text-based HTTP calls
| to the server to fetch more data. Why do you need selenium?
| LelouBil wrote:
| if you don't want to reverse engineer the javascript
| rhd wrote:
| I've once used Selenium to run javascript in the webpage to
| steal a few dynamic tokens required by the sites API to reuse
| in my more well-trodden python-requests workflow.
| totetsu wrote:
| There are python libraries you can use that import cookies
| directly from wherever your browsers stores them to use in
| selenium projects.
| cookiengineer wrote:
| I've had kind of the same problem in the past. For me I built a
| cookiejar textfile generating chrome extension, because it turns
| out most relevant tracking or session cookies are on external
| domains or oauth provider domains. [1]
|
| You just need to copy/paste the generated text content to a
| cookies.txt and you're set, so it worked for my workflow in the
| terminal.
|
| [1] https://github.com/cookiengineer/me-want-cookies
| 2h wrote:
| > Tired of copy pasting cURL commands from chrome to your
| terminal ?
|
| FYI for anyone that does this, MITM Proxy is usually a better
| option for this type of stuff. Not sure about Chrome, but
| especially with Firefox, you have no way of getting the full raw
| request on anything with a request body like POST. You have to
| Copy Request Headers, then Copy POST Data. With MITM Proxy or
| similar you can just get the full request at once. Also you can
| inject headers like X-Forwarded-For into all or specific
| requests.
| folmar wrote:
| > you have no way of getting the full raw request on anything
| with a request body like POST
|
| On FF right click on Request -> Copy Value -> As cURL This
| gives everything and works with POST since a few years at
| least.
| thrdbndndn wrote:
| Feel like you can just read Chrome's cookie from the file (and
| filter out the ones you need by site, of course) so you don't
| need to bother run chrome in debugging mode?
|
| Like https://github.com/borisbabic/browser_cookie3
| toomuchtodo wrote:
| yt-dlp does this also.
|
| https://news.ycombinator.com/item?id=28320666
| thrdbndndn wrote:
| Thanks for the link. I know yt-dlp does, but from your link I
| found another library
| (https://github.com/n8henrie/pycookiecheat) that can do that
| and it seems more popular than browser_cookie3.
| (browser_cookie3 works totally fine last time I tried).
| fipso wrote:
| This is awesome. I did not know decrypting chrome's password db
| is still that easy.
| paulirish wrote:
| Cookies != Passwords..
|
| But anyway... You know this is also easily accessible within
| DevTools, yah? https://umaar.com/dev-tips/3-copy-as-curl/
| eurasiantiger wrote:
| One could argue that cookies need to be more securely
| stored than passwords, because they can allow an attacker
| to bypass passwords and all other authentication factors.
___________________________________________________________________
(page generated 2023-04-01 23:00 UTC)