Subj : Re: Learning screen scraping To : comp.programming From : Rob Thorpe Date : Thu Sep 15 2005 03:12 pm programmernov...@yahoo.com wrote: > Richard Heathfield wrote: > > programmernovice@yahoo.com said: > > > > > > > > Richard Heathfield wrote: > > >> programmernovice@yahoo.com said: > > >> > > >> > Thanks for your reply. The problem is that I have to repeat the > > >> > operation hundreds of times for a given web site, so I need to find a > > >> > way to automate it. > > >> > > >> Yes, but you automate the downloading. That's the point. It's much, much, > > >> much, much, much simpler and quicker than screenscraping. > > > > > > I really appreciate your help. Now, since I'm new at this, can you > > > guide me on how one goes about automating the downloading, resources, > > > etc? I guess it means automatically dumping all the data into > > > something like a spread sheet and then manipulating it from there? > > > > There's an RFC on the HTTP 1.1 protocol: RFC 2616. You can get it from > > http://www.w3.org/Protocols/rfc2616/rfc2616.html > > > > Simply connect to the Web server (use a TCP connection for this), and send > > the necessary messages to get what you need from the page. There's a > > library for the purpose, known as libcurl, which you may find helpful. > > > > > OK I'll look this up. Appreciate your help. You could also call a separate program to dump the web-page to disk for you, then work from there. GNU wget for example. .