CELIA Archivist FAQ Last update: fred.swartz@merit.edu 94-01-13 Contents: Introduction Who runs CELIA? Who can be a CELIA Archivist? Which languages are represented in CELIA? What is a "distributed" archive? What are the advantages of a distributed archive? What are the disadvantages of a distributed archive? Isn't transparently pointing to other resources intellectually dishonest? When should multiple copies of data files be made? Where on the Internet is CELIA? How do I get my server to have CELIA on it? How do the CELIA Archivists communicate? What is Veronica? What are the guidelines for naming files and directories? Comparison of Gopher and FTP ================================================== Introduction ------------ This document answers some of the questions about the work of the volunteer archivists who work on the CELIA (Computer Enhanced Language Instruction Archives) distributed archive. Who runs CELIA? ----------------- The original discussion was initiated by Anthea Tillyer, who has been the driving force behind the creation of TESL-L and its many offspring. But the idea is to create a cooperative community which involves many different people. Whether this can be run as a complete anarchy is questionable, but it is equally doubtful that a rigid structure will work here. It will be interesting to see how the structure evolves, but the general rule will probably be that the people who do the work should "own" it. Who can be a CELIA Archivist? ----------------------------- Anyone with some kind of network access and an interest! There are a number of tasks that an Archivist could perform: o Run a Gopher/WWW/FTP server. o Actively search out materials to add to the archives. o Organize files that others have sent in. o Test submitted materials at least minimally. o Write brief descriptions for a descriptive database. o Answer questions that CELIA users have. o Run a mailing list for CELIA. Once CELIA is in full operation, many of the files will probably be submitted by CELIA users. At least this is the pattern for most existing archives. At first, they are small and everything that is added is searched out and added by the archivists themselves. As (if) an archive grows, more and more of the files are submitted by the archive users. Which languages are represented in CELIA? ----------------------------------------- The initial work on CELIA is being done by people interested in English (EFL/ESL). There is probably more on-line information on English than any other language, so it's quite likely that it will continue to dominate CELIA in terms of quantity. However, the intention is that CELIA will serve all languages. What is a "distributed" archive? -------------------------------- CELIA is a distributed archive; not all files/resources are in the same place. This is not the usual historical way that archives have been arranged because the older network tools like FTP could not handle this kind of organization. With newer tools like Gopher and WWW (World Wide Web) it is now possible to construct an archive which is geographically distributed, but gives the appearance of a single image to the user. [perhaps a discussion of how Gopher/WWW work should go here to explain how this is done? -- Fred] What are the advantages of a distributed archive? ---------------------------------------------------- A distributed archive has several advantages: o Build on the work of others. Perhaps the single greatest advantage of building CELIA as a distributed archive is that it allows CELIA to point to many other resources. For example, let's say someone on the Internet makes a large corpus of learner English available as part of their research project. If the put it on-line on a Gopher server, it can be integrated into CELIA with a simple pointer. There is no need for the CELIA archivists to copy the data over can constantly update it -- all updating that is done at that site is immediately available to and CELIA user. o Distributed load While CELIA won't be large enough to put a large load on most of today's typical servers at the beginning, it has the potential to become quite large and encompass many diverse resources. Getting the funding or administrative permission to run a single CELIA could be very difficult and subject to funding whims. If a single site can encompass all of CELIA, then that would be good, but would require considerable effort. o Distributed responsibility/credit By putting the parts of the archives at the sites which have an interest in them, the archives should be able to benefit from the knowledgeable attention they will get there. What are the disadvantages of a distributed archive? ---------------------------------------------------- Maintaining a uniform appearance will be impossible, since different sites will have different ideas about this. In exchange for being able to build on the work of others, we have to accept their organizational decisions. To the extent possible, we should encourage a uniform set of naming standards. Creating a comprehensive index/database of what's in CELIA is going to be a challenge. Yet this will be one of the most essential things to do right for CELIA to be popular. Isn't transparently pointing to other resources intellectually dishonest? ------------------------------------------------------------------------- It's easy to set up a Gopher server that really has nothing on it except pointers to files on other Gopher servers. This can be done so that it appears that the resources are on that machine when they are really on another. The polite and sensible thing to do is to point to these other resources, but label the entry with the site that is providing the service. All CELIA entries that point elsewhere should have such a label. This not only gives credit to the site that is supplying the service, but gives the user useful information. For example, if you added a pointer to the top level in CELIA to your local server, you shouldn't just use CELIA (Computer Enhanced Language Instruction Archives) but instead use something like CELIA (Comp Enhanced Lang Instr Archives, Merit Network USA) or CELIA (Comp Enhanced Lang Instr Archives, La Trobe Univ Australia) When should multiple copies of data files be made? ------------------------------------------------- While the essential strength of Gopher/WWW is that it can point transparently to resources distributed throughout the Internet, there may still be reasons to make duplicate copies of some of the files in CELIA. Here are two principle reasons: performance and reliability. Slow performance is caused by having to traverse slow/many elements in the network connection. An example of this might be accessing a site with a very slow local link or making an intercontinental connection. Reducing the need for these will make the CELIA user happier and is a general network courtesy. However, this replication should only be done for the most commonly transferred items. It makes no sense to replicate things which are not accessed frequently. Based on an informal look at logs from some other archives, the most frequently accessed files (candidates for replication) are o the CELIA directory listings o index/readme files o new files (if we send out periodic announcements of new entries) Initially, I wouldn't worry about it too much. Gopher, FTP, WWW all can produce logs so it's possible to see what needs replication. However, I do think some level of replication of the directory listings themselves is a good idea. Replication of files is a good way to improve the reliability. Many sites/networks have a very good reliability record and to regular backups. But some do not, and it might make the providers, as well as the users, more comfortable to arrange for another site to mirror their files. Where on the Internet is CELIA? ------------------------------- CELIA is distributed at many places on the Internet. The first two "front doors" to CELIA will probably be at Merit Network in the USA and La Trobe University in Australia. It is anticipated that many people/sites/servers will become part of CELIA. Any site which want's to have the top level of CELIA locally will also be welcome to do so. [details as they are announced] How do I get my server to have CELIA on it? ------------------------------------------- You may wonder why you would CELIA to appear on your local Gopher server. You might want it there so that local users don't have to figure out how to navigate around the Internet to find it. Putting a link from your Gopher server to a CELIA server is very easy. Show your Gopher administrator one of the descriptions given above on where the main CELIA servers are and ask him/her to add it. This is the quick and easy way. If you have a moderately good connection and the main Gopher server is one the some continent, this may be all you would want to do. However, if you want to all the mail hierarchy to also be duplicated on your Gopher server (without copying all the data files), there's a way to do that. [to be supplied since the details of this are just being worked out]. This would be a good option for at least one site on each continent since a lot of the CELIA traffic will just be moving through directories. Note that this doesn't copy the actual data files. Whether the data files are replicated is another issue. How do the CELIA Archivists communicate? ---------------------------------------- A mailing list, CELIA-L@CUNYVM.CUNY.EDU has been set up by Anthea Tillyer for communication among the archivists. What is Veronica? ----------------- Veronica is a system that goes around Gopher space and collects the names of everything it finds. It then makes this database available (in Gopher of course!) for searching. So you can often find things on Gopher servers using Veronica. I think it will work ok without special work. Veronica only comes to "harvest" Gopher servers every couple of weeks, so it doesn't always have the latest information. What are the guidelines for naming files and directories? --------------------------------------------------------- Names should be made up of the following characters: o lower-case alphabetics (a-z) o digits (0-9) o dashes ("-") o a period (".") preceding a suffix (at most one) The name should end with a period and a suffix. For example, exercise.txt concord.exe this-is-nonsense.hqx The restriction to the unaccented roman characters is a hardship on many languages. It remains to be seen how this will be resolved -- there are efforts underway to expand the character sets that are usable on the Internet, but for the time being it makes the files most accessible to use only a subset of the ascii characters. The suffix is very useful for several reasons. The FTP user needs to know whether to transfer the file in binary or text mode. The Gopher user doesn't have to explicitly know this information because the Gopher server figures it out. But the suffixes help the Gopher server make a good decision. The suffixes also contain information about the encoding method so the user knows which decoding program to use. This information and the programs to do the decoding should be available in the archives. Finally, the suffix should often be sufficient to determine which systems the program is for. In cases where it isn't sufficient, there should be separate directories labeled by system. Here's a sample of the kind of table we should build and put in the user readme file. SUFFIX SYS B/T DECODING PROGRAM .exe dos binary none .zip dos binary ? .txt all text none .hqx mac text stuffit-expander There are also informational files and directories (readme, index, help, ... files). It's usually most appropriate to have these at the beginning of a directory listing. The easiest way to do this is to make them sort to the beginning. With Gopher it's possible to exlicitly control the order, but that takes some additional work so it might not always be worth the effort. One way to do this is to make the first character of the file or directory an upper-case alphabetic. On many systems the upper-case letters sort before the lower-case letters so they will appear first. Another way (the one I prefer) is to prefix the name with digits. The convention I've been using is to prefix these files with "00" (two zero characters). This almost certainly (except IBM mainframes which are the only(?) machines not to use ascii) will put them at the beginning. The double zero makes them more noticeable and allows "01" etc if finer ordering is required. ============================================================ I've included the following text which is extracted from some earlier e-mail exchanges. It's moderately disorganized, but I thought some of it might be useful. Please send any questions/corrections to fred.swartz@merit.edu Comparison of Gopher and FTP ---------------------------- --- Why base CELIA on Gopher, not FTP, access? The CELIA design assumes Gopher will be the principal access tool. This doesn't rule out older access methods such as FTP, although it will not be as convenient as Gopher. The Gopher structure will also be exactly what is needed for the newer access protocols such as WWW (World Wide Web). But the more kinds of access, the better. FTP access is/will be provided by some and perhaps all of the CELIA sites. FTP is a popular, but older Internet tool for getting information. Here is a comparison of Gopher and FTP. o distributed system Gopher allows files to be distributed across many systems transparently to the user. Even though Gopher files are on different systems, they appear to the user are though they are all in one place. FTP doesn't allow the files to be spread across systems. If files are on different systems, the user must initiate separate FTP sessions with each system. o efficiency Gopher typically makes more efficient use of the server resources than FTP. o more than just files In addition to getting files, Gopher also allows searching, telnet connections, connections to other Gophers, etc. FTP allows only file transfers. o more readable menu In Gopher the user sees a menu of items, each of which can be a one-line description. In FTP, the menu only shows a directory of file names, which are sometimes very cryptic. Gopher will show just directory/file names as a default; it takes some more effort on the part of the person who runs the server to supply longer descriptions. o multiple servers Multiple Gopher servers on different ports can easily be run on (some) machines. You might ask what the point of this is. It allows one to easily run a test Gopher server for example. Or to run one which has a different load limit by looking to see how loaded the machine is and refuse to accept new requests if the load is too high. This is very good if there's some service that has higher priority than another. Or to make a separate log file, etc. A machine can run only one FTP server (effectively). So something is either visible or not, and all files must be in the same hierarchy. There are two things in favor of FTP: o more commonly available FTP is a more common tool, although that's rapidly changing. Most people agree that FTP will fade and newer, better tools like Gopher and WWW flourish. o sending files to server FTP can be used by a user to "put" files into an archive. There is no similar mechanism in Gopher. FTP access provides one way for users to make contributions to CELIA. Another way is via e-mail (some archives only allow e-mail contributions). What we'll do here is up to the various archivists. --- How hard is it to set up a Gopher server? Setting up a basic Gopher server can be very easy. It is especially easy to do this if you just let it show the natural file and directory structure of the system. In principle, you don't need a very big machine for a Gopher server -- you can run one quite well on a Mac for example. for various reasons, a desktop PC class is probably not the best place to run a CELIA server (lack of adequate network connection, file space, backup, round the clock monitoring, etc), but the fact that it can be done easily shows that it isn't too much to expect from your local system administrator. You could try to make use of a Gopher server that is already running on your institution's Unix of Vax machine. I don't believe Gopher servers are available for the big IBM mainframes. In any case, you'll probably want your system administrator to set it up. He/she can set it up to point to any directory you want, and then you can do what you want in that directory. --- What is a client-server model? A traditional model for computing is the terminal-host relationship where the user runs a program on a personal computer that makes it look like a terminal -- a machine that understands character streams that are sent to it and can send character streams back. Typically this is done in vt100 mode (the vt100 was a terminal made by DEC) that has become the standard. This works fairly well for a lot of things, like reading e-mail and sending textual commands to a mainframe, but it doesn't work so well when it comes to more complex interactions. The main failure of the terminal-host model is that it is based on a single stream of readable ascii characters. The client-server model is quite different in philosophy, although it's sometimes hard to draw a distinct line between the two. The trend in computing is to move from a terminal-host model of interaction to client-server. ... --- How can I create a link to CELIA in my local Gopher server. You need information such as the following: Here are some instructions on how to get to CELIA with Gopher. This will probably be enough, but I don't know exactly where you're starting in Gopher space. If you're having trouble, you can either contact me or ask someone there who has done a little Gophering -- they will certainly have seen how to get to the geographical hierarchy. Here is a typical a way to get to it: Other Gopher and Information Servers North America USA michigan Merit Software Archives Macintosh Archive misc foreignlang CELIA ... Do I have to do this each time? Now this is a lot of navigating. Typically Gopher clients let you set "bookmarks" so that once you find something, it will remember that item and you don't have to start from the "top" again. The plan here is to make CELIA a sibling to Macintosh Archives, but not until there's a little more in it and everyone it ready to release it. It's buried were it is so not too many people run across it by accident. Anyone with a Gopher server can put up a link to CELIA, so I would expect it to appear on a lot of other servers once we announce it. And others will also put up Gopher servers for CELIA with their own information in them. I expect these will then be cross-linked so that they appear roughly the same, regardless of where the information is. An entirely different way to get to the archives is by directly specifying the link information to your Gopher client. If you already know how to do that you can use Name=CELIA Host=gopher.archive.merit.edu Port=7055 Path=celia-gopher/.top If you don't know what all of the above means, that's ok. Just use the navigation outlined above to get to it. ---What about FTP? There are several possibilities for dealing with providing the files by both FTP and Gopher: (1) Both gopher and FTP service can be run on the same system, as many people do. Problems: the machine has to be powerful enough to provide service for both. This probably isn't much of a problem unless one of the services is heavily loaded (it looks like that might be the case on your FTP server?), causing a serious performance impact on the other service. (2) The machines may use a distributed file system to share file space. In this case, files could be accessed by either system. Problems: shared file systems can reduce the reliability since a client machine, in addition to its own reliability issues, will often fail when the server machine is down. Of course, machines run well almost all the time so this probably doesn't mean much. I'm not sure which shared file systems are available of Ultrix, but you could ask your system administrator whether they are or plan to do this. (3) The files could be stored primarily on one machine and "mirrored" on the other. Your FTP machine is already mirroring other sites so the mechanism is in place for this. Problems: running mirror software (no problem here since it's already being done). The filespace cost is doubled. The next options assume the files are on only one machine. (4) The files could be stored only on the gopher machine. This makes them easy to connect seamlessly into gopherspace -- in other words, it would be invisible to the user that the CELIA archive was really distributed at many sites. Problems: unavailable to FTP (5) Only on the FTP machine. They would be available by FTP, but only by specifically going to that machine to FTP them. The CELIA archives would be distributed, but the user would have to know where various things were stored. This might not really be a very big problem if a good index is provided. Some gopher servers can act as FTP gateways (they'll do the FTP and then pass the file back to the client), and this is a possible way to integrate FTP archives into gopherspace. But there are certain abusive practices that this allows so Merit isn't too keen on providing this service on their machine (although it isn't absolutely ruled out). Problems: users have to be aware of where things are. Not available to gopher clients unless except through gopher servers that will do the FTP. Anyway, some of this information may be useful in discussing the various options with your gopher expert. .