You are viewing the site in preview mode

Skip to main content

Table 1 Descriptions of codon usage databases for either the generic class or B strain of E. coli. Each annotation describes the source of the genetic data, the total number of coding DNA sequences (CDS) extracted from the gene source(s), and the number of codons extracted from genes used to construct each database

From: Assessing optimal: inequalities in codon optimization algorithms

Author or database E. coli strain Gene source # CDS # codons
Sharp and Li [8] a Generic GenBank 27 6240
15 9223
57 25,010
58 22,612
Kazusa database [9] Generic GenBank 8087 2,330,943
B GenBank 11 3771
HIVE-CUT database [10] Generic GenBank and RefSeq 68,262,063 20,219,118,236
B GenBank and RefSeq 13,042 3,953,593
GtRNAdb [11, 12] Generic GenBank and RefSeq 5011 1,538,003
GenScriptb Proprietary Undefined Undefined Undefined
Dong et al. [13] W1485 (K12) N/A Total RNA Undefined
  1. aAuthors divided their dataset into four groups that represent genes that exhibit “very high expression,” “high levels of expression,” “moderate codon bias,” or “low codon bias” that are represented from the top down, respectively
  2. bwww.genscript.com