Path: usenet.cise.ufl.edu!huron.eel.ufl.edu!usenet.eel.ufl.edu!gatech!news-out.emf.net!news-out.cwix.com!newsfeed.cwix.com!209.251.183.12!newsfeed.corridex.com!nntp2.savvis.net!inetarena.com!not-for-mail From: jari.aalto@poboxes.com (Jari Aalto+mail.emacs) Newsgroups: comp.lang.perl.announce,comp.lang.perl.modules Subject: Announce: mywebget.pl v1999.0210 - Batch get updates from Http ftp dirs. Followup-To: comp.lang.perl.modules Date: 24 Feb 1999 16:50:03 GMT Organization: University of Tampere Lines: 224 Approved: merlyn@stonehenge.com (comp.lang.perl.announce) Message-ID: <7b1ajr$sqn$1@play.inetarena.com> NNTP-Posting-Host: halfdome.holdit.com X-Disclaimer: The "Approved" header verifies header information for article transmission and does not imply approval of content. Xref: usenet.cise.ufl.edu comp.lang.perl.announce:245 comp.lang.perl.modules:8792 Download http://www.perl.com/CPAN-local//scripts/ This is first public release. jari NAME @(#) mywebget.pl - Perl Web URL retrieve program SYNOPSIS mywebget.pl http://example.com/ [URL] .. mywebget.pl --file file-with-urls.txt mywebget.pl --verbose --overwrite http://example.com/ mywebget.pl --verbose --overwrite --Output ~/dir/ http://example.com/ OPTIONS General options Create paths that do not exist in `lcd:' directives. Normally any LCD command that fails to find the path would interrupt the program. With this option the local directory is created as needed. --Firewall FIREWALL Use FIREWALL when accessing files via ftp:// protocol. --file FILE Read URLs from file. File can contains comments starting with # and the syntax is: # @(#) $HOME/.mywebget.default - Perl configuration file # # This is comment # Another comment file://absolute/dir/file-1.23.tar.gz lcd:HOME/updates # chdir here http://www.example.com/page.html http://www.example.com/page.html save:/dir/dir/page.html ftp://ftp.com/dir/file.txt save:xx-file.txt login:foo pass:passwd lcd:$HOME/download-kit ftp://ftp.com/dir/kit-1.1.tar.gz new: Possible keywords in the ftp:// line are `lcd:DIRECTORY' Set Local download directory to DIRECTORY. Any environment variables are substituted in path name. If this tag is found, it replaces setting of --Output. If path is not a directory, terminate with error. See also --create-paths. `login:LOGIN-NAME' Ftp login. Default value used is "ftp". `new:' If this is found from a current line, then the newest file will be retrieved. This variable is reset to the value of `--new' after the line has been processed. `pass:PASSWORD' Defulet value is generic mail\@some.com email address. `regexp:REGEXP' Get all afiles in ftp directory matching regexp. Keyword SAVE: is ignored. `save:LOCAL-FILE-NAME' Save file under this name to local disk. --new Get newest file. If filename does not end to .asp .html .htm, then it is considered that the URL point to some program or data file. When new releases are announced, the version number in filename usually tells which is the current one so getting harcoded file with: mtwebget.pl -o -v http://example.com/dir/program-1.3.tar.gz is not usually good choice. Adding --new option to the command line causes double pass: a) the whole http://example.com/dir/ is examined for all files. b) files matching approximately filename program- 1.3.tar.gz are examined, sorted and file with latest version number in a is retrieved. --Output DIR Before retrieving any files, chdir to DIR. --overwrite Allow overwriting existing files when retrieving URLs. --prefix PREFIX Add PREFIX to all retrieved files. --Postfix POSTFIX -P POSTFIX Add PREFIX to all retrieved files. --prefix-date -D Add iso8601 ":YYYY-MM-DD" prefix to all retrived files. This is added before possible --prefix-www or --prefix. Add POSTFIX to all retrieved files. --prefix-www -W Usually the files are stored with the same names as the URL page, but if you retrieve files that have identical names you can store each page separately so that the file name is prefixed by the site name. http://example.com/page.html --> example.com::page.html http://example2.com/page.html --> example2.com::page.html Miscellaneous options --debug -d LEVEL Turn on debug with positive LEVEL number. Zero means no debug. --help -h Print help page. --Version -V Print program's version information. README This small utility makes it possible to keep a list of URLs in a file and periodically retrieve those pages or files with simple command. This utility is best suited for small batch jobs to download eg. most recent versions of the software files. If you pass an URL that is already on disk, be sure to supply option --overwrite to allow overwriting old files. If the URL ends to slash, then the directory is list on the remote machine is stored to file name: !path!000root-file The content of this file can be either index.html or the directory listing depending on the used http or ftp protocol. While you can run this program from command line to retrieve individual files, it has been designed t use separate configuration file via --file option. In that configuration file you can control the downloading with separate directived like `save:' which tells to save the file under different name. The siplest way to retreive a latest version of a kit from FTP site is: mywebget.pl --new --overwite --verbose \ http://www.example.com/kit-1.00.tar.gz Don't worry about the filename "kit-1.00.tar.gz". If there were kit- 3.08.tar.gz in the site that one would be retrieve. The option --new instructs to find newer versions. DESCRIPTION See readme. EXAMPLES Read directory. It will be stored to YYYY-MM-DD::!dir!000root-file. Notice that you give the http directory and not the file name: `-D -o - v' mywebget.pl --prefix-date --overwrite --verbose http://www.example.com/dir/ To overwrite file and add a date prefix to the file name: `-D -o -v' mywebget.pl --prefix-date --overwrite --verbose \ http://www.example.com/file.pl --> YYYY-MM-DD::file.pl To add date and WWW site prefix to the filenames: `-D -W -o -v' mywebget.pl --prefix-date --prefix-www --overwrite --verbose \ http://www.example.com/file.pl --> YYYY-MM-DD::www.example.com::file.pl ENVIRONMENT No environment settings. SEE ALSO C program wget(1) http://www.ccp14.ac.uk/mirror/wget.htm and Old Perl 4 program webget(1) http://www.wg.omron.co.jp/~jfriedl/perl/ AVAILABILITY CPAN entry is http://www.perl.com/CPAN-local//scripts/ Reach author at jari.aalto@poboxes.com or http://www.netforward.com/poboxes/?jari.aalto SCRIPT CATEGORIES CPAN/Administrative PREREQUISITES Modules `LWP::UserAgent' and `use Net::FTP' are required. COREQUISITES No optional CPAN modules needed. OSNAMES `any' VERSION $Id: mywebget.pl,v 1.12 1999/02/10 20:40:23 jaalto Exp $ AUTHOR Copyright (C) 1996-1999 Jari Aalto. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself or in terms of Gnu General Public licence v2 or later. .