Path: usenet.cis.ufl.edu!usenet.eel.ufl.edu!tank.news.pipex.net!pipex!news.mathworks.com!newsfeed.internetmci.com!in1.uu.net!gasco!nntp.teleport.com!usenet From: pfeifer@charly.informatik.uni-dortmund.de (Ulrich Pfeifer) Newsgroups: comp.lang.perl.announce,comp.lang.perl.misc Subject: Module Wais 2.1 available Followup-To: comp.lang.perl.misc Date: 13 Dec 1995 15:50:17 GMT Organization: University of Dortmund, Germany Lines: 319 Approved: merlyn@stonehenge.com (comp.lang.perl.announce) Message-ID: <4amsnq$60b@maureen.teleport.com> Reply-To: pfeifer@charly.informatik.uni-dortmund.de NNTP-Posting-Host: linda.teleport.com X-Disclaimer: The "Approved" header verifies header information for article transmission and does not imply approval of content. Xref: usenet.cis.ufl.edu comp.lang.perl.announce:205 comp.lang.perl.misc:14663 For all WWW-WAIS gateway implementors: The Perl module Wais 2.1 is available real soon now at your favourite CPAN site. I append the documentation for convenience. ... Yes there is documentation now ;-) Randal: Would you as the author of chat2 replace the 'do' by a '&'? It would make the tests look prettier: t/basic.............Use of "do" to call subroutines is deprecated at /usr/local/ls6/perl5.001n/lib/perl5/chat2.pl line 265. ok t/dict..............Use of "do" to call subroutines is deprecated at /usr/local/ls6/perl5.001n/lib/perl5/chat2.pl line 265. ok t/parallel..........Use of "do" to call subroutines is deprecated at /usr/local/ls6/perl5.001n/lib/perl5/chat2.pl line 265. ok All tests successful. Files=3, Tests=17, 34 secs ( 1.83 cusr 0.68 csys = 2.52 cpu) Currently I also do this in Wais.pm to avoid warnings:-( # make strict happy @we_know = ($chat::name, $chat::debug, $chat::aliases, $chat::family, $chat::nfound, $chat::thisbuf, $chat::thishost, $chat::timeleft); @we_know = (); The module is tested with 5.001n and 5.002b1f. -- @J = split //,"J!k Phau^eHeens%rarrot&\ncl t "; for(0..24){print $J[$_*7%($#J+1)]} ------------------------------------------------------------------------ NAME Wais - access to freeWAIS-sf libraries SYNOPSIS use Wais; DESCRIPTION The interface is divided in four major parts. SFgate 4.0 For backward compatibility the functions used in SFgate up to version 4 are still present. Their use is deprecated and they are not documented here. These functions may no be supported in following versions of this module. Protocol XS functions which provide a low-level access to the WAIS protocol. E.g. generate_search_apdu() constructs a request message. SFgate 5.0 Perl functions that implement high-level access to WAIS servers. E.g. parallel searching is supported. dictionary A bunch of XS functions useful for inspecting local databases. We will start with the SFgate 5.0 functions. USAGE The main high-level interface are the functions Wais::Search and Wais::Retrieve. Both return a reference to an object of the class Wais::Result. Wais::Search Arguments of Wais::Search are hash references, one for each database to search. The keys of the hashes should be: query The query to submit. database The database which should be searched. host host is optional. It defaults to 'localhost'. port port is optional. It defaults to 210. tag A tag by which individual results can be associated to a database/host/port triple. If omitted defaults to the database name. relevant If present must be a reference to an array containing alternating document id's and types. Document id's must be of type Wais:Docid. Here is a complete example: $result = Wais::Search({'query' => 'pfeifer', 'database' => $db1, 'host' => 'ls6', 'relevant' => [$id, 'TEXT']}, {'query' => 'pfeifer', 'database' => $db2}); If host is 'localhost' and database.src exists, local search is performed instead of connecting a server. Wais::Search will open $Wais::maxnumfd connections in parallel at most. Wais::Retrieve Wais::Retrieve should be called with named parameters (i.e. a hash). Valid parameters are database, host, port, docid, and type. $result = Wais::Retrieve('database' => $db, 'docid' => $id, 'host' => 'ls6', 'type' => 'TEXT'); Defaults are the same as for Wais::Search. In addition type defaults to 'TEXT'. Wais:Result The functions Wais::Search and Wais::Retrieve return references to objects blessed into Wais:Result. The following methods are available: diagnostics Returns and array of diagnostic messages. Each element (if any) is a reference to an array consisting of tag The tag of the corresponding search request or 'document' if the request was a retrieve request. code The WAIS diagnostic code. message A textual diagnostic message. header Returns and array of WAIS document headers. Each element (if any) is a reference to an array consisting of tag The tag of the corresponding search request or 'document' if the request was a retrieve request. score lines Length of the corresponding dcoument in lines. length Length of the corresponding document in bytes. headline types A reference to an array of types valid for docid. docid A reference to the WAIS identifier blessed into Wais::Docid. text Returns the text fetched by Wais::Retrieve. Dictionary There are a couple of functions to inspect local databases. See the inspect script in the distribution. You need the Curses module to run it. Also adapt the directory settings in the top part. Wais::dictionary %frequency = Wais::dictionary($database); %frequency = Wais::dictionary($database, $field); %frequency = Wais::dictionary($database, 'foo*'); %frequency = Wais::dictionary($database, $field, 'foo*'); The function returns an array containing alternating the matching words in the global or field dictionary matching the prefix if given and the freqence of the preceding word. In a sclar context, the number of matching word is returned. Wais::list_offset The function takes the same arguments as Wais::dictionary. It returns the same array rsp. wordcount with the word frequencies replaced by the offset of the postinglist in the inverted file. Wais::postings %postings = Wais::dictionary($database, 'foo'); %postings = Wais::dictionary($database, $field, 'foo'); Returns and an array containing alternating numeric document id's and a reference to an array whichs first element is the internal weight if the word with respect to the document. The other elements are the word/character positions of the occurances of the word in the document. If freeWAIS-sf is compiled with -DPROXIMITY, word positions are returned otherwise character postitions. In an scalar context the number of occurances of the word is returned. Wais::headline $headline = Wais::headline($database, $docid); The function retrieves the headline (only the text!) of the document numbered $docid. Protocol Wais::generate_search_apdu $apdu = Wais::generate_search_apdu($query,$database); $relevant = [$id1, 'TEXT', $id2, 'HTML']; $apdu = Wais::generate_search_apdu($query,$database,$relevant); Document id's must be of type WAIS::Docid as returned by Wais::Result::header or Wais::Search::header. $WAIS::maxdoc may be set to modify the number of documents to retrieve. Wais::generate_retrieval_apdu $apdu = Wais::generate_retrieval_apdu($database, $docid, $type); $apdu = Wais::generate_retrieval_apdu($database, $docid, $type, $chunk); Request to send the $chunk's chunk of the document whichs id is $docid (must be of type WAIS::Docid). $chunk defaults to 0. $Wais::CHARS_PER_PAGE may be set to influence the chunk size. Wais::local_answer $answer = Wais::local_answer($apdu); Answer the request by local search/retrieval. The message header is stripped from the result for convenience (see the code of Wais::Search rsp. documentaion of Wais::Search::new below). Wais::Search::new $result = Wais::Search::new($message); Turn the result message in an object of type Wais::Search. The following methods are available: diagnostics, header, and text. Result of the message is pretty the same as for Wais::Result. Just the tags are missing. diagnostics Return an array of references to [$code, $message] header Return an array of references to [$score, $lines, $length, $headline, $types, $docid]. text Returns the chunk of the document requested. For documents larger than $Wais::CHARS_PER_PAGE more than one request must be send. Wais::Search::DESTROY The objects will be destroyed by Perl. VARIABLES $Wais::version Generated by: sprintf(buf, "Wais %3.1f%d", VERSION, PATCHLEVEL); $Wais:errmsg Set to an verbose error message if something went wrong. Most functions return undef on failure after setting $Wais:errmsg. $Wais::maxdoc Maximum number of hits to return when searching. Defaults to 40. $Wais::CHARS_PER_PAGE Maximum number of bytes to retrieve in a single retrieve request. Wais:Retrieve sends multiple requests if necessary to retrieve a document. CHARS_PER_PAGE defaults to 4096. $Wais::timeout Number of seconds to wait for an answer from remote servers. Defaults to 120. $Wais::maxnumfd Maximum number of file descriptors to use simultaneously in Wais::Search. BUGS Wais::Search currently splits the request in groups of $Wais::maxnumfd requests. Since some requests of the group might be local and/or some might refer to the same host/port, groups may not use all $Wais::maxnumfd possible file descriptors. Therefore some performance my be lost when more than $Wais::maxnumfd requests are processed. AUTHOR Ulrich Pfeifer -- Ulrich UNIVERSITAET-DORTMUND telefax: 49 231 755 2405 ///// Pfeifer Lehrstuhl Informatik VI voice: 49 231 755 3032 ____UNI DO @RR D-44221 Dortmund postbox: 50 05 00 \\*\\//// http://ls6-www.informatik.uni-dortmund.de/WhoIsWhoAtLS6.html \\\\\// .