Path: usenet.cise.ufl.edu!newsfeeds.nerdc.ufl.edu!zombie.ncsc.mil!newsgate.duke.edu!nntp-out.monmouth.com!newspeer.monmouth.com!newsfeed.corridex.com!nntp2.savvis.net!inetarena.com!not-for-mail From: Ave Wrigley Newsgroups: comp.lang.perl.announce,comp.lang.perl.modules Subject: ANNOUNCE: HTML::Summary 0.013 Followup-To: comp.lang.perl.modules Date: 31 Mar 1999 13:23:45 GMT Organization: Canon Research Centre Europe Ltd Lines: 38 Approved: merlyn@stonehenge.com (comp.lang.perl.announce) Message-ID: <7dt7l1$l1v$1@play.inetarena.com> NNTP-Posting-Host: halfdome.holdit.com X-Disclaimer: The "Approved" header verifies header information for article transmission and does not imply approval of content. Xref: usenet.cise.ufl.edu comp.lang.perl.announce:274 comp.lang.perl.modules:9911 HTML::Summary is a module to extract a summary from an HTML page, somewhat like that which might be included in a tag in the page head. The interface allows you to specify a maximum length for the summary generated. It does so using the location heuristic, which determines the value of a given sentence based on its position and status within the document. For example, headings, section titles and opening paragraph sentences may be favoured over other textual content. The distribution contains a number of other modules that HTML::Summary uses; these are bundled with HTML::Summary because I am still open to suggestions on the interface / namespace of these modules for this early release. The other modules are: Text::Sentence - a module that splits text into constituent sentences. Lingua::JA::Jcode - a perl5 wrapper around Kazumasa Utashiro's jcode.pl library for detecting / converting Japanese mutlibyte character encodings. Lingua::JA::Jtruncate - a module for truncating Japanese text without breaking multibyte character encodings. The HTML::Summary distribution is available through CPAN: ftp://ftp.perl.org/pub/CPAN/authors/id/A/AW/AWRIGLEY/HTML-Summary-0.013.readme ftp://ftp.perl.org/pub/CPAN/authors/id/A/AW/AWRIGLEY/HTML-Summary-0.013.tar.gz I would be grateful for any comments or suggestions on any of these modules. Ave. -- Ave Wrigley, mailto:wrigley@cre.canon.co.uk Web Group, http://www.cre.canon.co.uk/ Canon Research Europe, tel: +44-1483-448844 Guildford GU2 5YJ, U.K. fax: +44-1483-448845 .