December 15, 1992 FYIFrance: Online French Fulltext, and InfoSci statistics! edited by: Jack Kessler kessler@well.sf.ca.us I've asked around, and no one seems to have heard either of FRANTEXT or of its North American incarnation, ARTFL. 2500 major French language fulltexts online, armed with "absolute and relative term occurrence frequency" statistics for information scientists to ponder, would seem to be a pretty interesting and valuable resource for a lot of folks. What follows, then, is my own translation of a fascinating forthcoming article on FRANTEXT, details on reaching both FRANTEXT and ARTFL, and information about a new fulltext book which may be of great interest to librarians, French students, information scientists, rare book scholars, and network hackers alike. Joyeux Noe:l. *<:-)} Jack Kessler kessler@well.sf.ca.us _________________________ "The Point of View of a FRANTEXT User: FRANTEXT at the Bibliothe`que Publique d'Information" (citation appears below) by Jacques Lemarignier, BPI, Paris (with permission) FRANTEXT has been in use for five years at the Bibliothe`que Publique d'Information. It has attracted a very diverse following, composed essentially of students, professors, researchers, and professionals from the worlds of cinema, publishing, and advertising. Great numbers of users have been able to pose questions of great variety -- in their complexity, their level of difficulty, the object of their search (doctoral or master's thesis, publication in an anthology, simple curiousity) -- and they have greatly appreciated the results obtained. This impressive output can be explained by the remarkable flexibility of the software, which can adapt itself to such a variety of questions. It might be interesting to recount these many possibilities, placing ourselves in the point of view of the user, who can translate a question into search terms and use all of the resources of FRANTEXT to surmount any difficulties found. The choice of several examples of increasing complexity will help us comprehend what one can obtain from this database: The search for a quotation is the simplest and fastest operation. If it is found in one of the 2500 texts of the corpus -- which is not always the case -- one can identify it in several seconds. In addition to amazing the user, one renders her a great service. A documentalist from Gallimard publishers, who is preparing an edition of the correspondance of Drieu La Rochelle, had looked in vain for a phrase cited without a reference in one of the letters: "Il tenait a l'affu^t les douze ou quinze sens Qu'un faune peut braquer sur les plaisirs passants" In an instant these two lines were retrieved on the screen of the terminal, with the citation to their source, Victor Hugo, _La le'gende des sie`cles_, "Le Satyr", and the page of the cited edition. A film-maker was struck in her childhood by a poem which she wished to use for a scene in a film, but she could remember only a single verse with any accuracy, and she had forgotten entirely both the author's name and the title. With the same ease, FRANTEXT showed her that the verse was taken from one of the _Poe`mes barbares_ of Leconte de Lisle, "Les elfes". FRANTEXT also can render appreciable and rapid service in illuminating the history of a word. It is believed that "ordinateur" is a recent word, derived from the invention of the electronic calculator. But a search for this word on FRANTEXT shows us immediately an older use: "The insufficiency of all purely mechanical solutions is a new motive for us to resort to a pre-established arrangement. Why should we make such vain and ridiculous efforts to show ourselves as being the *ordinateur*? Is it not always necessary that the collection of secondary causes points at last to a resolution in the first cause, of which the sublime and consoling idea so entirely satisfies and completes the heart and the spirit?" --Charles Bonnet, _Contemplation de la Nature_, 1764, page 267, part seven. Here the adjective "ordinateur" signifies the Supreme Being, reminding us of the "Dieu horloger" of Voltaire. It is this word, fallen into dis-use, which has been taken up, two centuries later, to translate the English word "computer". These later occurrences begin in the year 1960, using the modern definition. In 1964 one already speaks of an "ordinateur e'lectronique" (_Histoire ge'ne'rale des sciences_, t.3, vol.2, 1964, page 108). Then the examples multiply, and the word becomes common. Searches of the same type, for uncommon meanings or usage of words, often are of great interest. If one is interested in the concept of "Lumie`res", for example, which is used to characterize the entire 18th century, now called the "Sie`cle des Lumie`res", FRANTEXT shows us that its intensive use extends from 1750 to 1850, with no pause at 1799. The following table, of word-occurrence frequencies by chronological eras of twenty years, can be obtained in less than a minute: ____________________________________________________________ NOTE: relative frequencies are expressed in millionths Total absolute frequency: 3226 Maximum frequency: 652, in the period 1780-1799 DIAGRAM OF ABSOLUTE FREQUENCIES Scale: an asterisk represents an absolute frequency of 20 occurences abs. rel. freq.freq. 1700-1719: 41 17 *** 1720-1739: 55 9 *** 1740-1759: 246 36 ************** 1760-1779: 460 46 ************************* 1780-1799: 652 86 ********************************** 1800-1819: 517 93 *************************** 1820-1839: 479 43 ************************* 1840-1859: 315 26 ***************** 1860-1879: 278 24 *************** 1880-1899: 183 18 ********** ____________________________________________________________ The absolute frequency represents the number of usages of the term "lumie`res" in the texts of the corpus, divided up by periods of 20 years; the relative frequency is the relation between the number of usages in the texts considered and the total number of words in these same texts. Thus, in the period 1780-1799, "lumie`res" is used 652 times in the texts of the corpus of FRANTEXT: that is the absolute frequency. Dividing by 652 the total number of words of the texts for this period, one obtains 86 millionths: that is the relative frequency. The figures and the diagram show us the immense increase in usage of the term "lumie`res" up to 1799, notably in the last 20 years of the 18th century, then the slow and steady diminution of its use. One clearly sees the influence of the term "lumie`res" during the French Revolution, and the persistence of this influence during the Second Empire and the Restoration. Another table, obtained just as rapidly, can show the absolute and relative frequencies of the term for each author of the same period, except for those authors who didn't use the term at all. Between 1750 and 1800 the authors who made greatest use of the term are not those of whom one first would think: the Abbe' Pre'vost (144 usages), the Abbe' Barthe'le'my (138 usages), the Abbe' Ge'rard (104 usages), Bernardin de Saint-Pierre (101 usages), Diderot (89 usages), Condorcet (82 usages). This shows that, if the ideas of the "Lumie`res" were developed by philosophers who are famous today, they were taken into the language and popularised largely by writers who are less known or nearly forgotten today, but who had, in their time, a very great influence, and who played an essential role in the preparation of the Revolution. Between 1800 and 1850, when Madame de Stae:l and Chateaubriand come to the fore, with respectively 194 and 160 usages, the concept of "Lumie`res" becomes fixed in the popular imagination and takes on a mythical value. Subject-searching is more complexe, but FRANTEXT has such flexibility that the result is excellent. If one is searching within the works of a single author, or within a single work, it is enough to enumerate the terms which define the theme. Thus, in a study of the color "blanche" in _Madame Bovary_, one first automatically searches all the words formed on the terms "blanc" and "pa^le"; to this list one may add "blafard", "livide", ble^me", "cadave'reux", "neige", "lumie`re", "immacule'", "candeur". From the result appears an evolution from lively colors, often contrasting, through to pale and livid hues, paralleling the gradual fall of Emma into suicide and nothingness. As shown by its 218 examples, the color "blanche" has great importance and symbolic value. At first it signifies health, propriety, elegance, beauty, purity. When Charles sees Emma for the first time, he first notices the whiteness of her fingernails, which surprises him; "her hand was not beautiful, in fact it was perhaps a bit pale"; as a counterpoint, her eyes are brown and appear to be black, her look clear and direct, "reaching you frankly, with a strong candour". The black here is placed into a natural opposition, as a sign of life, of force and of beauty. A few lines further, Charles gazes at "her neck which extended from her black collar" and "her hair, for which the two black ribbons seemed each to be of the same piece". A bit later, after a warm spell which has melted the snow, Charles visits Emma, of whom "her parasol, of gently reflective silk, blocking the sun, highlighted the white skin of her figure with the play of its reflections" (I,2). Love and happiness are associated with an intense and alive light, primarily one that is pale and white. During the early part of the marriage of Charles and Emma, one finds an entire range of colors, from black to white, in an atmosphere of happiness. Charles, "was staring at the sunlight, passing between the bedcovers and her blonde hair"; her eyes, "black as shadow and deep blue as daylight, contained layers of color which, in deeper hues in the depths of her eyes, shone with enameled brightness on the surface" (I,9). A soon as Emma breaks through the sterile dream of this imagined ideal world, the white and the black come to oppose themselves in the usual way, thus, "Emma wished to live...like the ladies...who passed their days ...in watching the approach from the depths of the country a soldier with a white plume in his hat, galloping on his black horse" (I,6). The world then becomes more pale, less real. White carries her away, the image of her dream: "She wandered, her desperate eyes upon the solitude of her life, searching for any white sail in the mists of her horizon" (I,9). The colors follow the movements of the soul of Emma in the novel. The progression from dream, to boredom, to disgust with life, to despair, is expressed by the intrusion of pallor: the washed-out tint, the whitened day, wan shades, the pale sky, the livid river; finally the eyes of the corpse of Emma, "disappearing in a viscous pallor". The minute examination of the 218 occurrences obtained, the report of the text of the edition of Belles Lettres, for which the page is indicated each time, enables the enrichment of these few remarks, the precision of each nuance of color, their placement in relation to their context. A study can obtain an exhaustive list of examples in twenty minutes using FRANTEXT. (Next: conclusion of Lemarignier's description of FRANTEXT, how FRANTEXT and ARTFL may be reached online, and a new book on online fulltext work in France.) Jack Kessler kessler@well.sf.ca.us ========================= This is the conclusion of Jacques Lemarignier's description of FRANTEXT, a collection of 2500 French classical fulltexts, which may be reached online either in Europe or from ARTFL in North America. Access instructions appear below, as does a description of some of the very interesting work in online fulltext being done today in France. _________________________ (Lemarignier on FRANTEXT, continued:) If a subject search is attempted over the entire FRANTEXT corpus, serious problems arise because of the multiple meanings of certain terms. Such is the case for a question about "WC's", which might take one wandering through equally specialized terms such as "latrines" or "gogues"; but then, arriving at "cabinets", one would wander off into "ministerial cabinets" and related terms. One can, however, limit a search of the entire corpus to the novel, poetry and the theater to discipline the search in this sense. The sorting of texts by literary genre often permits the limitation of word usage to the sense desired, by isolating a given semantic field. In another case, however, the examples found serve to enrich the list of words which define a theme. This is what happened with the research of texts which might illustrate and clarify the perception which the French have of the Arabs. About thirty words gradually were found: "Arabe", "Turcs", "Islam", "islamisme", "Mahomet", "Maures", "Sarrasins", "musulman", "mahome'tanisme", "Coran", "Alcoran", "mosque'e", he'gire", "houri", "calife", "be'douin", "se'rail", "pacha", "eunuque", "Avicenne", "Allah", "La Mecque", "Egypte", "Damas", "Maroc", "Tunisie", "Palmyre", "Constantinople", "Bagdad". The terms were used in part or entirely according to the period, the authors or the texts which particularly interested the users. The results revealed extreme reactions with regard to the Arabs, very favorable or very hostile, almost never neutral, and citations such as the following: "I never rejoice for our victories over the Arabs...I love these people, rough, persistant, lively, the final type of primitive societies, who, halting at mid-day, lying in the shade, beneath the bellies of their camels, smoking their 'chibouk', scoff at our grand civilization which quivers in its own rages." Flaubert, _Correspondance_, August 6, 1846. "I demand in the name of humanity the destruction of the black stone, to throw the bits to the wind, the destruction of Mecca, and the desecration of the tomb of Mohammed. This is the way to demoralise fanaticism." Flaubert, _Correspondance_, March 1, 1878. "This evening, at the home of Daudet, Larroumet spoke up curiously for Morocco, which is the last refuge of old Islam and where torture has a ferocious quality surpassing that of the tortures of China." Goncourt, _Journal_, t.4, 1896 (November, 1895). FRANTEXT cannot, certainly, respond to all needs. Its limits are those of its corpus, which encompasses 2500 French language texts, from the 16th century to our day, and which had no original purpose other than to offer an assortment of the French language in its different levels and over the course of its evolution. No citation may be found which is not from a text included in the corpus, and no subject search of a work is possible unless that work is part of the corpus. But experience shows that one rarely must eliminate these sorts of questions, which proves, one more time, that the texts were remarkably well-chosen. This database is useful in a great number of situations, and it offers immense resources, which marvelously complement, easily and quickly, more traditional means of research. The results please those who try it. This brief recounting of typical questions perhaps will give incentive, to both researchers and the merely curious, to experiment with the riches of this resource. (original in French by:) Jacques Lemarignier, Bibliothe`que Publique d'Information Centre Georges Pompidou, 19, rue Beaubourg, 75197 Paris Cedex 04 Jacques Lemarignier may be contacted via e-mail c/o Jacques Faule at faule@univ-rennes1.fr, or via fax to (Paris) 44-78-12-15. _________________________ The above article will appear in its French original as follows: Jacques Lemarignier, "Le point de vue d'un interrogateur sur FRANTEXT: FRANTEXT a` la Bibliothe`que Publique d'Information", in _Les banques de donne'es litte'raires, comparatistes et francophones_, edited by Alain Vuillemin, Limoges: Presses de l'Universite' de Limoges et du Limousin, January 1993 (forthcoming) This book contains enough that is exciting and new -- there appears to be a great deal going on in France in online fulltext -- to make worthwhile the listing here of its table of contents (again, with permission): "Avant-propos", by Jean Claude Vareille, Pre'sident de l'Universite' de Limoges "L'informatique litte'raire: de quelques effets corollaires", by Jacques Fontanille, Doyen de la Faculte' des Lettres et Sciences Humaines de l'Universite' de Limoges I. LES ENJEUX: "La lecture assiste'e par ordinateur et la station de lecture de la bibliothe`que de France", by Jacques Virbel, CNRS, Institut de Recherche en Informatique de Toulouse, U.Paul-Sabatier (Toulouse I) "Le re'seau 'litte'ratures francophones' de l'UREF et la recherche bibliographique", by Jean-Louis Joubert, Universite' de Paris-Nord (Paris XIII), Universite' des Re'seaux d'Expression Franc,aise, Coordonateur du re'seau 'Litte'ratures francophones' "Des banques de donne'es sur les e'tudes litte'raires francophones", by Claire Panijel, URFIST de Paris-Ecole des Chartes "Te'le'informatique et litte'rature franc,aise", by Jacques Faule, Bibliothe`que Publique d'Information, Centre Georges Pompidou "Banques de donne'es et recherche litte'raire: proble`mes eet perspectives", by Claude Cazale-Be'rard, U.de Paris X - Nanterre "La boite en valise ou le poste de travail du litte'raire", by Henri Behar, Universite' de la Sorbonne Nouvelle (Paris III) II. LES DOMAINES "Les litte'ratures d'expression franc,aise", by Jacques Chevrier, Universite' du Val-de-Marne (Paris XII) "Le programme 'LIMAG' (litte'ratures maghre'bines), by Charles Bonn, Universite' de Paris-Nord (Paris XIII) "Aux sources de 'LIMAG': regard porte' sur la cre'ation d'une banque de donne'es", by Fe'riel Kachoukh, U.Paris-Nord (Paris XIII) "Projet de cre'ation d'un lieu ressource dans le domaine de la litte'rature maghre'bine d'expression franc,aise", by Fe'riel Kachoukh, Universite' de Paris-Nord (Paris XIII) "'LITAF': une banque de donne'es de litte'ratures africaines", by Virginie Coulon, Universite' de Bordeaux I "'LITAF': petit manuel pratique", by Virginie Coulon, U.Bordeaux I "La base de donne'es bibliographique 'langue et culture en Louisiane'", by Maguy Grassin, Universite' de Limoges "Le point de vue d'un interrogateur: FRANTEXT a` la Bibliothe`que Publique d'Information", by Jacques Lemarignier, Bibliothe`que Publique d'Information, Centre Georges Pompidou "Peut-on re'gler son compte a` la 'raison'?", by Etienne Brunet, Universite' de Nice "Le vert de Saint-John Perse", by Eveline Caduc, U. de Nice "ARIEL", by Pierre Brunel, Universite' de Paris-Sorbonne (Paris IV) "L'aventure du projet 'ARIEL' ou la gene`se de la banque de donne'es comparatistes et froncophones 'Ariel-litte'ral' de l'univeriste' de Paris-Sorbonne (Paris IV) 1981-1991", by Alain Vuillemin, Universite' de Limoges "'SPIRIT': aide a` la constitution de bases de donne'es bibliographiques", by Fre'de'ric Foussier, INSTN-CEA-Universite' de Paris-Sud (Paris XI) "Projet d'une banque de donne'es des 'exempla' me'die'vaux", by Marie-Anne Polo de Beaulieu, CNRS III. LES PERSPECTIVES "Re'alisation partage'e d'une e'dition de texte a` distance", by Fre'de'ric Foussier, INSTN-CEA-Universite' de Paris-Sud (Paris XI) "'EL HADJ': une maquette de banque de donne'es litte'raires, e'ditoriale et bilingue, en litte'rature compare'e", by Alain Vuillemin, Universite' de Limoges "Pour un syste`me de stylistique informatise'", by Bernard Gicquel, Universite' du Maine "Bases de donne'es et ge'ne'ration de textes", by Jean-Pierre Balpe, Universite' de Paris VIII "La banque de donne'es d'histoire litte'raire", by Michel Bernard, Universite' de la Sorbonne-Nouvelle (Paris III) "La base de donne'es iconographiques des vide'odisques des manuscrits de la biblioth`que Vaticane", by Je'ro^me Baschet, Ecole des Hautes Etudes en Sciences Sociales (Paris) _________________________ FRANTEXT at the BPI, Paris FRANTEXT may be consulted at the BPI library at the Centre Pompidou, Paris (first floor, Bureau 8 - literature), for a fee, from 1 to 5 pm.. Responses are printed out. FRANTEXT in Europe Subscriptions are available from the Institut de la Langue Franc,aise, Tre'sor Ge'ne'rale des Langues et Parlers Franc,ais, (Centre National de la Recherche Scientifique), 52, boulevard de Magenta, 75010 Paris, telephone (Paris) 42-45-00-77. FRANTEXT's own publicity lists 183 million word-occurrences, 2330 works, 3241 "treated texts", of which 20% are non-literary texts taken from 70 disciplines from the 19th and 20th centuries, 900 authors, 450 publishers, and 53 operating public-access sites in addition to the BPI, including sites throughout Europe and in Japan. FRANTEXT in North America -- the ARTFL database and service ARTFL is the "North American antenna" for Frantext, according to its director, Mark Olson (e-mail: mark@gide.uchicago.edu, telephone: 312-702-8488). It contains a copy of the FRANTEXT database, which it makes available via telnet (to artfl.uchicago.edu) to subscribers (US$ 500 per year -- 40 major campuses currently are subscribed) together with a special, improved interface, and e-mail, ftp, and offline photocopying services. Olson freely distributes extensive user documentation and a good bibliography on his ARTFL service and on the general FRANTEXT concept. ____________________ n.b. Does any of this have SGML markup? How friendly is it to ASCII? How easy is it to get to? (You can get to it on both Minitel and the Internet: that's pretty easy.) How inexpensive will it be? Will certain types of scholar prefer it to the printed book? Will certain types of reader? I've neither answered nor frankly asked any of these questions yet of FRANTEXT. But it is interesting that it's there, and that it already is as accessible as it apparently is. It's not the only thing becoming available now in France in online fulltext, moreover, as the headings shown above from the book in which M. Lemarignier's article will appear indicate. *** ISSN 1071 - 5916 end .