CUTG: Codon Usage Tabulated from GenBank README (Updated Feb 18 1998) Originally developed by IKEMURA Toshimichi (tikemura@lab.nig.ac.jp) Professor of Division of Evolutionary Genetics National Institute of Genetics, 1111 Yata Mishima, Shizuoka 411-8540, JAPAN Currently programed and mainteined by NAKAMURA Yasukazu (ynakamu@kazusa.or.jp) Researcher of Laboratory of Gene Structure 2 Kazusa DNA Research Institute, 1532-3 Yana, Kisarazu, Chiba 292-0812, JAPAN Codon usage in individual genes has been calculated using the nucleotide sequence data obtained from the GenBank Genetic Sequence Database. The compilation of codon usage is synchronized with each major release of GenBank. * SOURCE AND METHODS Compiled from NCBI-GenBank Flat File Release 104.0 [15 Dec 1997]. Compiled sequence files are, pri1 and pri2 (primate sequence entries), rod (rodent sequence entries), mam (other mammalian sequence entries), vrt (other vertebrate sequence entries), inv (invertebrate sequence entries), pln (plant sequence entries), bct (bacterial sequence entries), vrl (viral sequence entries) and phg (phage sequence entries). Other sequence files are not compiled: files for est (EST: expressed sequence tag sequence entries), pat (patent sequence entries), rna (Structural RNA sequence entries), sts (STS: sequence tagged site sequence entries), syn (synthetic and chimeric sequence entries), una (unanotated sequence entries) and gss (genome survey sequence entries). In selecting protein coding sequences we relied on the FEATURES tables of GenBank. Only complete genes were used in the analysis. Codons containing ambiguous base (such as N) were excluded from the compilation. In GenBank, a group of consecutive genes whose entire region had been sequenced were registered under one LOCUS name. To distinguish the different genes belonging to a single LOCUS, the symbol # followed by a number is added after the LOCUS name; the numbers represent the order of the CDS registered in the FEATURES table of GenBank. * FILES Files of the present database are available here. Files named gb***.codon list the codon use in each gene registered in the selected GenBank Flat Files. The LOCUS names given in GenBank were used to designate individual genes. Each LOCUS name is followed by fields of information extracted from FEATURES of each CDS for defining each open reading frames analyzed here. The order of the codons in the table is the same as in the previous compilation (see the CODON_LABEL file). To reveal the characteristics of codon use of a wide range of organisms, as well as viruses and organella, the frequency (per thousand) of codon use in each organism was calculated by summing up numbers of codon used. Files named gb***.spsum list the sum of numbers of codon use in each species as well as viruses and organella (see the SPSUM_LABEL file). The files are distributed in two forms. One form is gzip-compressed tar archive, the other form is as flat files. CUTG.**.tar.gz (** is a number which shows GenBank major release used in construction) contains two "LABEL" files, and all of "codon" and "spsum" files. Use "gunzip" and "tar" to extract files from the archive. If you do not have "gunzip" and "tar" in your local operating system, you can get each file as flat text files from this directory. * DISTRIBUTION Complete form of the database is available from following three URLs: 1) DDBJ (DNA Data Bank of Japan, National Institute of Genetics, Mishima Japan) ftp://ftp.nig.ac.jp/pub/db/codon/current/ 2) DISC (DNA Information and Stock Center, National Institute of Agrobiological Resources, Tsukuba, Japan) ftp://ftp.dna.affrc.go.jp/pub/codon/current/ 3) EBI (European Bioinformatics Institute, Cambridge, UK) ftp://ftp.ebi.ac.uk/pub/databases/cutg/ * RELATED SERVICE ON WWW If you need not all data, but want to obtain codon usage tables for small number of species, use Codon Usage Database WWW service. A user can display a codon usage table by searching with the Latin name of the organisms or clicking on an anchor for alphabetical lists. http://www.dna.affrc.go.jp/~nakamura/codon.html * ACCESS FOR MAINTENER Any requests or commnets? Send an E-mail to ynakamu@kazusa.or.jp. * ACKNOWLEDGMENT We wish to thank Dr. Y. Ugawa at the DNA Information and Stock Center, National Institute of Agrobiological Resources for help in constructing and distributing the database. This work was suported by a grait-in-aid for databases from the Ministry of Education, Science, Sports and Culture of Japan. Y.N. is supported by the Kazusa DNA Research Institute Foundation. * PLEASE CITE Nakamura, Y., Gojobori, T. and Ikemura, T. (1997) Codon usage tabulated from the international DNA sequence databases. Nucl. Acids Res. 25, 244-245. .