Post 9qt9TyzoIMeV4SjZOC by reymarkus@niu.moe
(DIR) More posts by reymarkus@niu.moe
(DIR) Post #9qt8mXzgoPQEsExwoq by reymarkus@niu.moe
2020-01-11T08:49:34Z
0 likes, 0 repeats
Really, Pixiv? Really?
(DIR) Post #9qt8mYDrxglTaDbGvA by reymarkus@niu.moe
2020-01-11T10:06:31Z
0 likes, 0 repeats
And TIL Pixiv also changed their site structure so that site scrapers will find it hard to scrape their site.
(DIR) Post #9qt8mYZ8gbmKeBYG4e by proxeus@iscute.moe
2020-01-11T10:09:13.903305Z
0 likes, 0 repeats
@reymarkus Every class starts with sc-As long as this is true, you can dig through it easily.And even if its not the same, I can scan the whole html structure and find specific patterns to determine the image.
(DIR) Post #9qt8tLLCFpmDerGZI8 by reymarkus@niu.moe
2020-01-11T10:10:10Z
1 likes, 0 repeats
@proxeus Woah, thanks for the tip! 👍
(DIR) Post #9qt949B5ofnT7XgxYu by proxeus@iscute.moe
2020-01-11T10:12:27.749278Z
0 likes, 0 repeats
@reymarkus For instance, look for img tags that contain an URL to i.pximg.netAll the site scrappers that I wrote on python do that.
(DIR) Post #9qt9TyzoIMeV4SjZOC by reymarkus@niu.moe
2020-01-11T10:16:20Z
1 likes, 0 repeats
@proxeus I just found something interesting right now. I cURLed a Pixiv page, and it seems they do not render the site as-is, rather they render it after the page is downloaded. Looks like Pixiv is already giving the direct links to their full arts on the page metadata that is found in the <meta name="preload-data"> tag.https://gitlab.com/snippets/1928999#L59