Subj : Re: Learning screen scraping
To   : comp.programming
From : Rob Thorpe
Date : Thu Sep 15 2005 03:12 pm

programmernov...@yahoo.com wrote:
> Richard Heathfield wrote:
> > programmernovice@yahoo.com said:
> >
> > >
> > > Richard Heathfield wrote:
> > >> programmernovice@yahoo.com said:
> > >>
> > >> > Thanks for your reply.  The problem is that I have to repeat the
> > >> > operation hundreds of times for a given web site, so I need to find a
> > >> > way to automate it.
> > >>
> > >> Yes, but you automate the downloading. That's the point. It's much, much,
> > >> much, much, much simpler and quicker than screenscraping.
> > >
> > > I really appreciate your help.  Now, since I'm new at this, can you
> > > guide me on how one goes about automating the downloading, resources,
> > > etc?  I guess it means automatically dumping all the data into
> > > something like a spread sheet and then manipulating it from there?
> >
> > There's an RFC on the HTTP 1.1 protocol: RFC 2616. You can get it from
> > http://www.w3.org/Protocols/rfc2616/rfc2616.html
> >
> > Simply connect to the Web server (use a TCP connection for this), and send
> > the necessary messages to get what you need from the page. There's a
> > library for the purpose, known as libcurl, which you may find helpful.
> >
> >
> OK I'll look this up.  Appreciate your help.

You could also call a separate program to dump the web-page to disk for
you, then work from there.  GNU wget for example.

.