Path: usenet.cise.ufl.edu!usenet.eel.ufl.edu!arclight.uoregon.edu!feed1.news.erols.com!howland.erols.net!news.sprintlink.net!news-peer.sprintlink.net!uunet!in2.uu.net!crusty.teleport.com!nntp0.teleport.com!usenet From: sb@sdm.de (Steffen Beyer) Newsgroups: comp.lang.perl.announce,comp.lang.perl.misc Subject: P.S.: ANNOUNCE gen_tree vers. 2.1 Followup-To: comp.lang.perl.misc Date: 25 Oct 1996 00:57:05 GMT Organization: sd&m GmbH & Co. KG Munich, Germany Lines: 94 Approved: merlyn@stonehenge.com (comp.lang.perl.announce) Message-ID: <54p391$89j@nadine.teleport.com> Reply-To: sb@sdm.de (Steffen Beyer) NNTP-Posting-Host: gadget.cscaper.com X-Disclaimer: The "Approved" header verifies header information for article transmission and does not imply approval of content. Xref: usenet.cise.ufl.edu comp.lang.perl.announce:42 comp.lang.perl.misc:6330 Dear Perl and WWW hackers, some people complained that I forgot to say in my announcement posted recently ("ANNOUNCE: gen_tree version 2.1") what the script was doing. Of course they are right, and I apologize for that! Here comes the missing part: What does it do: ---------------- This script scans the tree (better: the directed graph) of HTML pages of a web site. (It's not always a tree because circles and loops are possible!) It starts at the home page of that site (called the "root page" here) and follows all hyperlinks in a recursive descent (width first, in order to produce a representation in the expected way). (You can also scan just a subtree of your web site if you want) Circles and loops are recognized through unique identification of each page by the device and inode numbers of its corresponding file. When scanning of the web site is complete, an HTML page is generated which contains all the pages found in form of one hyperlink to each of them. The tree structure of the web site is reflected in this page by the indentation of these hyperlinks. The text which is displayed in these hyperlinks is extracted from the ... tags inside the corresponding page. Supported features: ------------------- This script is capable of executing server side includes and of analyzing server side image maps. This way, no important hyperlinks are missed. (Many home pages consist of an image map and nothing else!) It is also able to analyze CGI scripts simply by calling them and analyzing their output. (Therefore, no HTTP server is needed!) While the web site is being scanned, a detailed log file is written. Most of the time, it's a good idea to read it because it lets you discover flaws in your web site that often go unnoticed otherwise! I hope this explains it! Now for those who missed the first posting, here comes the other part again: Where to get it: ---------------- You can download this script from the following address: http://www.sdm.de/e/www/hilfe/gen_tree-2.1.tar.gz Or download it from any CPAN (= "Comprehensive Perl Archive Network") ftp server near you (see "The Perl 5 Module List" in news:comp.lang.perl.modules for a list of CPAN ftp servers): ftp://..../..../CPAN/authors/id/STBEY/gen_tree-2.1.tar.gz What's new in version 2.1: -------------------------- It is now possible to have greater control over which pages are shown and where (the latter when several links exist to the same page(s)) by speci- fying the root page of any given subtree and the number of (topmost) levels to be shown of that subtree, i.e. "0" for hiding the subtree completely, "1" for showing the root (topmost) page of that subtree only, "2" for showing the root page and the pages it contains hyperlinks to, and so on. (The number limits the maximum depth of hyperlinks to follow) Thanks: ------- Special thanks to Michael Bruns for suggesting the finer tuning capability for excluding pages and subtrees realized in version 2.1! Yours, -- Steffen Beyer ________________________ C:\ONGRATLN.W95 _______________________ mailto:sb@sdm.de |s |d &|m | software design & management GmbH&Co.KG phone: +49 89 63812-244 | | | | Thomas-Dehler-Str. 27 fax: +49 89 63812-150 | | | | 81737 Munich, Germany. .