ExPASy logo ExPASy Home page Site Map Search ExPASy Contact us Swiss-Prot
Notice: This page will be replaced with www.uniprot.org. Please send us your feedback!
Search for

UniProt Knowledgebase
Swiss-Prot Protein Knowledgebase
TrEMBL Protein Database

Release notes
UniProtKB release 15.0 of 24-Mar-2009

Content

  Introduction
  UniProtKB/Swiss-Prot Protein Knowledgebase release statistics
  UniProtKB/TrEMBL Protein Database release statistics

  Submissions and Updates
  Download information
  Contact
  Citation

  Related documents: UniProtKB user manual, Recent changes, Forthcoming changes.

Introduction

Release 15.0 of the UniProt Knowledgebase is composed of the UniProtKB/Swiss-Prot Protein Knowledgebase release 57.0 and the UniProtKB/TrEMBL Protein Database release 40.0.

More information on these databases can be found in the user manual What is the UniProt Knowledgebase ?.


UniProtKB/Swiss-Prot protein knowledgebase release 57.0 statistics

Release 57.0 of 24-Mar-09 of UniProtKB/Swiss-Prot contains 428'650 sequence entries, comprising 154'416'236 amino acids abstracted from 177'584 references.

The growth of the database is summarized below.

Release Date Number of entries Number of amino acids
2.0 09/86 3'939 900'163
3.0 11/86 4'160 969'641
4.0 04/87 4'387 1'036'010
5.0 09/87 5'205 1'327'683
6.0 01/88 6'102 1'653'982
7.0 04/88 6'821 1'885'771
8.0 08/88 7'724 2'224'465
9.0 11/88 8'702 2'498'140
10.0 03/89 10'008 2'952'613
11.0 07/89 10'856 3'265'966
12.0 10/89 12'305 3'797'482
13.0 01/90 13'837 4'347'336
14.0 04/90 15'409 4'914'264
15.0 08/90 16'941 5'486'399
16.0 11/90 18'364 5'986'949
17.0 02/91 20'024 6'524'504
18.0 05/91 20'772 6'792'034
19.0 08/91 21'795 7'173'785
20.0 11/91 22'654 7'500'130
21.0 03/92 23'742 7'866'596
22.0 05/92 25'044 8'375'696
23.0 08/92 26'706 9'011'391
24.0 12/92 28'154 9'545'427
25.0 04/93 29'955 10'214'020
26.0 07/93 31'808 10'875'091
27.0 10/93 33'329 11'484'420
28.0 02/94 36'000 12'496'420
29.0 06/94 38'303 13'464'008
30.0 10/94 40'292 14'147'368
31.0 02/95 43'470 15'335'248
32.0 11/95 49'340 17'385'503
33.0 02/96 52'205 18'531'384
34.0 10/96 59'021 21'210'389
35.0 11/97 69'113 25'083'768
36.0 07/98 74'019 26'840'295
37.0 12/98 77'977 28'268'293
38.0 07/99 80'000 29'085'965
39.0 05/00 86'593 31'411'114
40.0 10/01 101'602 37'315'215
41.0 02/03 122'564 44'986'459
42.0 10/03 135'850 50'046'799
43.0 03/04 146'720 54'093'154
44.0 07/04 153'871 56'608'159
45.0 10/04 163'235 59'631'787
46.0 02/05 168'297 61'443'278
47.0 05/05 181'577 65'746'672
48.0 09/05 194'317 70'391'852
49.0 02/06 207'132 75'438'310
50.0 05/06 222'289 81'585'146
51.0 10/06 241'242 88'541'632
52.0 03/07 261'513 95'638'062
53.0 05/07 269'293 98'902'758
54.0 07/07 276'256 101'466'206
55.0 02/08 356'194 127'836'513
56.0 07/08 392'667 141'217'034
57.0 03/09 428'650 154'416'236

In rare cases, UniProtKB/Swiss-Prot entries are removed. Deleted entries are almost exclusively Open Reading Frames (ORFs) that have been wrongly predicted to code for proteins. When there is enough evidence that these hypothetical proteins are not real we take the decision to remove them from UniProtKB/Swiss-Prot. In the document delac_sp.txt, you will find a list of all accession numbers which were previously present in UniProtKB/Swiss-Prot, but which have now been deleted from the database.


Status of the model organisms

We have selected a number of organisms that are the target of genome sequencing and/or mapping projects and for which we intend to:

From our efforts to annotate human sequence entries as completely as possible arose the HPI project, and the bacterial model organisms became the focus of the HAMAP project. Here is the current status of the model organisms which are not covered by these two projects:

Organism Database cross-references Index file Number of sequences
A.thaliana TAIR arath.txt 7'876
C.albicans None yet calbican.txt 767
C.elegans Wormpep celegans.txt 3218
D.discoideum DictyBase dicty.txt 3'557
D.melanogaster FlyBase fly.txt 2'904
M.musculus MGD mgdtosp.txt 16'101
S.cerevisiae SGD yeast.txt 6'552
S.pombe GeneDB_SPombe pombe.txt 4'752

UniProtKB/Swiss-Prot release statistics

         UniProtKB/Swiss-Prot protein knowledgebase release 57.0 statistics


1.  INTRODUCTION

Release 57.0 of 24-Mar-09 of UniProtKB/Swiss-Prot contains 428650 sequence entries,
comprising 154416236 amino acids abstracted from 177584 references. 

36053 sequences have been added since release 56.0, the sequence data of
2010 existing entries has been updated and the annotations of
368500 entries have been revised.

Number of fragments: 8328
Number of additional sequences produced by alternative splicing, initiation or promoter usage, or ribosomal frameshifting: 27591 


Protein existence (PE):           entries     %

1: Evidence at protein level        63411   14.8%
2: Evidence at transcript level     64726   15.1%
3: Inferred from homology          285291   66.6%
4: Predicted                        13812    3.2%
5: Uncertain                         1410    0.3%



2.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/Swiss-Prot: 11669

   The first twenty species represent 103439 sequences:  24.1 % of the total
   number of entries.


   2.1 Table of the frequency of occurrence of species

        Species represented 1x: 5235
                            2x: 1703
                            3x:  854
                            4x:  556
                            5x:  413
                            6x:  319
                            7x:  228
                            8x:  194
                            9x:  172
                           10x:  101
                       11- 20x:  515
                       21- 50x:  364
                       51-100x:  217
                         >100x:  798


   2.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1      20333  Homo sapiens (Human)
       2      16101  Mus musculus (Mouse)
       3       7876  Arabidopsis thaliana (Mouse-ear cress)
       4       7314  Rattus norvegicus (Rat)
       5       6552  Saccharomyces cerevisiae (Baker's yeast)
       6       5600  Bos taurus (Bovine)
       7       4752  Schizosaccharomyces pombe (Fission yeast)
       8       4342  Escherichia coli (strain K12)
       9       3600  Bacillus subtilis
      10       3557  Dictyostelium discoideum (Slime mold)
      11       3218  Caenorhabditis elegans
      12       2980  Xenopus laevis (African clawed frog)
      13       2904  Drosophila melanogaster (Fruit fly)
      14       2429  Danio rerio (Zebrafish) (Brachydanio rerio)
      15       2199  Pongo abelii (Sumatran orangutan)
      16       2104  Gallus gallus (Chicken)
      17       2044  Oryza sativa subsp. japonica (Rice)
      18       1979  Escherichia coli O157:H7
      19       1782  Methanocaldococcus jannaschii (Methanococcus jannaschii)
      20       1773  Haemophilus influenzae
      21       1736  Salmonella typhimurium
      22       1652  Escherichia coli O6
      23       1649  Shigella flexneri
      24       1462  Mycobacterium tuberculosis
      25       1343  Sus scrofa (Pig)
      26       1334  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      27       1323  Salmonella typhi
      28       1260  Pseudomonas aeruginosa
      29       1198  Mycobacterium bovis
      30       1140  Macaca fascicularis (Crab eating macaque) (Cynomolgus monkey)
      31       1012  Synechocystis sp. (strain PCC 6803)
      32        989  Archaeoglobus fulgidus
      33        980  Yersinia pestis
      34        927  Vibrio cholerae
      35        909  Acanthamoeba polyphaga mimivirus (APMV)
      36        904  Salmonella paratyphi A
      37        898  Rhizobium meliloti (Sinorhizobium meliloti)
      38        896  Staphylococcus aureus (strain N315)
      39        896  Staphylococcus aureus (strain Mu50 / ATCC 700699)
      40        881  Oryctolagus cuniculus (Rabbit)
      41        869  Staphylococcus aureus (strain COL)
      42        867  Staphylococcus aureus (strain MW2)
      43        862  Staphylococcus aureus (strain MSSA476)
      44        859  Staphylococcus aureus (strain MRSA252)
      45        854  Salmonella choleraesuis
      46        846  Escherichia coli O6:K15:H31 (strain 536 / UPEC)
      47        844  Yersinia pseudotuberculosis
      48        842  Shigella sonnei (strain Ss046)
      49        795  Escherichia coli O9:H4 (strain HS)
      50        794  Shigella boydii serotype 4 (strain Sb227)
      51        784  Ashbya gossypii (Yeast) (Eremothecium gossypii)
      52        784  Escherichia coli O139:H28 (strain E24377A / ETEC)
      53        783  Escherichia coli (strain UTI89 / UPEC)
      54        782  Vibrio parahaemolyticus
      55        776  Shigella dysenteriae serotype 1 (strain Sd197)
      56        767  Candida albicans (Yeast)
      57        765  Pasteurella multocida
      58        764  Aquifex aeolicus
      59        760  Escherichia coli (strain ATCC 8739 / DSM 1576 / Crooks)
      60        758  Kluyveromyces lactis (Yeast) (Candida sphaerica)
      61        756  Canis familiaris (Dog)
      62        751  Erwinia carotovora subsp. atroseptica (Pectobacterium atrosepticum)
      63        745  Neurospora crassa
      64        723  Staphylococcus epidermidis (strain ATCC 35984 / RP62A)
      65        722  Streptomyces coelicolor
      66        722  Staphylococcus epidermidis (strain ATCC 12228)
      67        719  Shigella flexneri serotype 5b (strain 8401)
      68        719  Vibrio vulnificus
      69        716  Photorhabdus luminescens subsp. laumondii
      70        715  Candida glabrata (Yeast) (Torulopsis glabrata)
      71        709  Bacillus halodurans
      72        703  Vibrio vulnificus (strain YJ016)
      73        694  Bacillus anthracis
      74        693  Yersinia enterocolitica serotype O:8 / biotype 1B (strain 8081)
      75        688  Yersinia pestis bv. Antiqua (strain Nepal516)
      76        687  Mycoplasma pneumoniae
      77        682  Yersinia pestis bv. Antiqua (strain Antiqua)
      78        677  Pan troglodytes (Chimpanzee)
      79        677  Yersinia pseudotuberculosis serotype O:1b (strain IP 31758)
      80        671  Staphylococcus aureus (strain NCTC 8325)
      81        670  Salmonella paratyphi B (strain ATCC BAA-1250 / SPB7)
      82        669  Escherichia coli O1:K1 / APEC
      83        668  Anabaena sp. (strain PCC 7120)
      84        662  Enterobacter sp. (strain 638)
      85        660  Pseudomonas syringae pv. tomato
      86        655  Klebsiella pneumoniae subsp. pneumoniae (strain ATCC 700721 / MGH 78578)
      87        653  Pseudomonas putida (strain KT2440)
      88        652  Mycobacterium leprae
      89        637  Escherichia coli
      90        635  Yersinia pestis (strain Pestoides F)
      91        631  Citrobacter koseri (strain ATCC BAA-895 / CDC 4225-83 / SGSC4696)
      92        631  Bradyrhizobium japonicum
      93        626  Staphylococcus aureus (strain USA300)
      94        620  Zea mays (Maize)
      95        615  Serratia proteamaculans (strain 568)
      96        614  Treponema pallidum
      97        613  Bacillus cereus (strain ATCC 14579 / DSM 31)
      98        603  Agrobacterium tumefaciens (strain C58 / ATCC 33970)
      99        602  Staphylococcus aureus (strain bovine RF122 / ET3-1)
     100        601  Shewanella oneidensis
     101        600  Methanobacterium thermoautotrophicum
     102        600  Ralstonia solanacearum (Pseudomonas solanacearum)
     103        591  Rhizobium loti (Mesorhizobium loti)
     104        590  Salmonella arizonae (strain ATCC BAA-731 / CDC346-86 / RSK2980)
     105        583  Listeria monocytogenes
     106        583  Rickettsia prowazekii
     107        579  Photobacterium profundum (Photobacterium sp. (strain SS9))
     108        579  Helicobacter pylori (Campylobacter pylori)
     109        576  Xanthomonas campestris pv. campestris
     110        575  Listeria innocua
     111        573  Lactococcus lactis subsp. lactis (Streptococcus lactis)
     112        573  Staphylococcus haemolyticus (strain JCSC1435)
     113        572  Buchnera aphidicola subsp. Acyrthosiphon pisum 
     114        570  Neisseria meningitidis serogroup B
     115        569  Emericella nidulans (Aspergillus nidulans)
     116        566  Enterobacter sakazakii (strain ATCC BAA-894)
     117        565  Staphylococcus saprophyticus subsp. saprophyticus 
     118        563  Yarrowia lipolytica (Candida lipolytica)
     119        562  Brucella melitensis
     120        562  Buchnera aphidicola subsp. Schizaphis graminum
     121        561  Debaryomyces hansenii (Yeast) (Torulaspora hansenii)
     122        560  Helicobacter pylori J99 (Campylobacter pylori J99)
     123        559  Bacillus cereus (strain ATCC 10987)
     124        559  Brucella suis
     125        546  Neisseria meningitidis serogroup A
     126        540  Bacillus thuringiensis subsp. konkukian
     127        539  Xanthomonas axonopodis pv. citri (Citrus canker)
     128        536  Caulobacter crescentus (Caulobacter vibrioides)
     129        534  Clostridium acetobutylicum
     130        534  Pseudomonas syringae pv. syringae (strain B728a)
     131        531  Bacillus cereus (strain ZK / E33L)
     132        530  Oceanobacillus iheyensis
     133        529  Pseudomonas aeruginosa (strain UCBPP-PA14)
     134        526  Bacillus licheniformis (strain DSM 13 / ATCC 14580)
     135        525  Pseudomonas fluorescens (strain Pf0-1)
     136        524  Vibrio fischeri (strain ATCC 700601 / ES114)
     137        521  Pseudomonas fluorescens (strain Pf-5 / ATCC BAA-477)
     138        516  Listeria monocytogenes serotype 4b (strain F2365)
     139        512  Pseudomonas syringae pv. phaseolicola (strain 1448A / Race 6)
     140        510  Streptococcus pneumoniae
     141        510  Xylella fastidiosa
     142        508  Bordetella bronchiseptica (Alcaligenes bronchisepticus)
     143        507  Buchnera aphidicola subsp. Baizongia pistaciae
     144        502  Thermotoga maritima
     145        501  Xylella fastidiosa (strain Temecula1 / ATCC 700964)
     146        496  Chromobacterium violaceum
     147        493  Bordetella parapertussis
     148        493  Rickettsia conorii
     149        493  Sodalis glossinidius (strain morsitans)
     150        493  Bordetella pertussis
     151        492  Vibrio cholerae serotype O1 (strain ATCC 39541 / Ogawa 395 / O395)
     152        491  Haemophilus ducreyi
     153        485  Brucella abortus
     154        483  Mycoplasma genitalium
     155        483  Deinococcus radiodurans
     156        480  Pseudomonas aeruginosa (strain PA7)
     157        479  Clostridium perfringens
     158        475  Corynebacterium glutamicum (Brevibacterium flavum)
     159        474  Pseudomonas entomophila (strain L48)
     160        473  Haemophilus influenzae (strain 86-028NP)
     161        472  Methanosarcina acetivorans
     162        472  Xanthomonas campestris pv. campestris (strain 8004)
     163        470  Geobacillus kaustophilus
     164        469  Streptomyces avermitilis
     165        469  Bacillus clausii (strain KSM-K16)
     166        468  Mannheimia succiniciproducens (strain MBEL55E)
     167        468  Burkholderia pseudomallei (Pseudomonas pseudomallei)
     168        463  Shewanella sp. (strain MR-7)
     169        462  Vibrio harveyi (strain ATCC BAA-1116 / BB120)
     170        460  Pyrococcus horikoshii
     171        460  Thermosynechococcus elongatus (strain BP-1)
     172        460  Shewanella sp. (strain MR-4)
     173        459  Staphylococcus aureus (strain Newman)
     174        458  Synechococcus elongatus (strain PCC 7942) (Anacystis nidulans R2)
     175        457  Oryza sativa subsp. indica (Rice)
     176        456  Brucella abortus (strain 2308)
     177        456  Pyrococcus abyssi
     178        455  Enterococcus faecalis (Streptococcus faecalis)
     179        453  Methanosarcina mazei (Methanosarcina frisia)
     180        452  Halobacterium salinarium (Halobacterium halobium)
     181        448  Rickettsia felis (Rickettsia azadi)
     182        447  Streptococcus pneumoniae (strain ATCC BAA-255 / R6)
     183        447  Aspergillus fumigatus (Sartorya fumigata)
     184        446  Rhodopseudomonas palustris
     185        446  Lactobacillus plantarum
     186        445  Burkholderia mallei (Pseudomonas mallei)
     187        445  Anabaena variabilis (strain ATCC 29413 / PCC 7937)
     188        444  Pseudomonas putida (strain F1 / ATCC 700007)
     189        443  Burkholderia sp. (strain 383) (Burkholderia cepacia 
     190        443  Xanthomonas campestris pv. vesicatoria (strain 85-10)
     191        441  Streptococcus mutans
     192        441  Ovis aries (Sheep)
     193        440  Acinetobacter sp. (strain ADP1)
     194        440  Bacillus amyloliquefaciens (strain FZB42)
     195        439  Chlamydia trachomatis
     196        438  Thermoanaerobacter tengcongensis
     197        438  Staphylococcus aureus (strain Mu3 / ATCC 700698)
     198        437  Pyrococcus furiosus
     199        435  Ralstonia eutropha (strain JMP134) (Alcaligenes eutrophus)
     200        435  Shewanella frigidimarina (strain NCIMB 400)
     201        435  Rickettsia bellii (strain RML369-C)
     202        434  Pseudomonas putida (strain GB-1)
     203        434  Shewanella sp. (strain ANA-3)
     204        433  Streptococcus pyogenes serotype M6
     205        433  Nicotiana tabacum (Common tobacco)
     206        433  Xanthomonas oryzae pv. oryzae (strain MAFF 311018)
     207        430  Ralstonia eutropha  (Cupriavidus necator 
     208        427  Borrelia burgdorferi (Lyme disease spirochete)
     209        427  Methylococcus capsulatus
     210        427  Campylobacter jejuni
     211        426  Rhodobacter sphaeroides (strain ATCC 17023 / 2.4.1 / NCIB 8253 / DSM 158)
     212        422  Shewanella baltica (strain OS185)
     213        422  Chlamydia pneumoniae (Chlamydophila pneumoniae)
     214        418  Aeromonas hydrophila subsp. hydrophila (strain ATCC 7966 / NCIB 9240)
     215        418  Gloeobacter violaceus
     216        418  Pseudoalteromonas haloplanktis (strain TAC 125)
     217        417  Hahella chejuensis (strain KCTC 2396)
     218        415  Streptococcus pyogenes serotype M1
     219        414  Mycobacterium paratuberculosis
     220        413  Pseudomonas mendocina (strain ymp)
     221        412  Chlamydia muridarum
     222        412  Colwellia psychrerythraea (strain 34H / ATCC BAA-681) (Vibrio psychroerythus)
     223        412  Sulfolobus solfataricus
     224        412  Burkholderia xenovorans (strain LB400)
     225        411  Staphylococcus aureus (strain JH1)
     226        411  Nitrosomonas europaea
     227        409  Streptococcus pyogenes serotype M18
     228        409  Rhizobium sp. (strain NGR234)
     229        409  Dechloromonas aromatica (strain RCB)
     230        408  Shewanella sp. (strain W3-18-1)
     231        408  Streptococcus pyogenes serotype M3
     232        408  Shewanella putrefaciens (strain CN-32 / ATCC BAA-453)
     233        407  Synechococcus sp. (strain ATCC 27144 / PCC 6301 / SAUG 1402/1) 
     234        407  Shewanella baltica (strain OS195)
     235        405  Staphylococcus aureus (strain JH9)
     236        405  Aeromonas salmonicida (strain A449)
     237        404  Rickettsia typhi
     238        404  Shewanella denitrificans (strain OS217 / ATCC BAA-1090 / DSM 15013)
     239        403  Solanum lycopersicum (Tomato) (Lycopersicon esculentum)
     240        401  Shewanella baltica (strain OS155 / ATCC BAA-1091)
     241        400  Neisseria gonorrhoeae (strain ATCC 700825 / FA 1090)
     242        400  Chlorobium tepidum
     243        400  Idiomarina loihiensis
     244        400  Synechococcus sp. (strain WH8102)
     245        399  Haemophilus influenzae (strain PittEE)
     246        399  Burkholderia cenocepacia (strain AU 1054)
     247        397  Shewanella amazonensis (strain ATCC BAA-1098 / SB2B)
     248        397  Caenorhabditis briggsae
     249        396  Actinobacillus pleuropneumoniae serotype 5b (strain L20)
     250        396  Corynebacterium efficiens


   
   2.3  Taxonomic distribution of the sequences

   Kingdom        sequences (% of the database)
    Archaea           15698 (  4%)
    Bacteria         249878 ( 58%)
    Eukaryota        150533 ( 35%)
    Viruses           12541 (  3%)


   Within Eukaryota:

    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                  20334 ( 14%)           (  5%)
     Other Mammalia         43931 ( 29%)           ( 10%)
     Other Vertebrata       14925 ( 10%)           (  3%)
     Viridiplantae          27014 ( 18%)           (  6%)
     Fungi                  23102 ( 15%)           (  5%)
     Insecta                 6145 (  4%)           (  1%)
     Nematoda                3869 (  3%)           (  1%)
     Other                  11213 (  7%)           (  3%)



3.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50    7410             1001-1100     3070
                 51- 100   31441             1101-1200     2119
                101- 150   44644             1201-1300     1666
                151- 200   45150             1301-1400     1581
                201- 250   45195             1401-1500     1289
                251- 300   39221             1501-1600      599
                301- 350   39049             1601-1700      472
                351- 400   34750             1701-1800      389
                401- 450   27771             1801-1900      364
                451- 500   22997             1901-2000      301
                501- 550   16055             2001-2100      184
                551- 600   11862             2101-2200      255
                601- 650   10145             2201-2300      263
                651- 700    7254             2301-2400      162
                701- 750    6070             2401-2500      118
                751- 800    4263             >2500          938
                801- 850    3652
                851- 900    4290
                901- 950    3123
                951-1000    2210


   The average sequence length in UniProtKB/Swiss-Prot is 360 amino acids.

   The shortest sequence is   GWA_SEPOF (P83570):     2 amino acids.
   The longest sequence is  TITIN_MOUSE (A2ASS6): 35213 amino acids.


4.  JOURNAL CITATIONS

   Note: the following citation statistics reflect the number of distinct
         journal citations.

   Total number of journals cited in this release of UniProtKB/Swiss-Prot: 1975


   4.1 Table of the frequency of journal citations

        Journals cited 1x:  647
                       2x:  267
                       3x:  132
                       4x:  107
                       5x:   77
                       6x:   60
                       7x:   38
                       8x:   41
                       9x:   33
                      10x:   23
                  11- 20x:  151
                  21- 50x:  157
                  51-100x:   91
                    >100x:  151

4.2  List of the most cited journals in UniProtKB/Swiss-Prot

   Nb    Citations   Journal name
   --    ---------   -------------------------------------------------------------
    1        16828   Journal of Biological Chemistry
    2         7853   Proceedings of the National Academy of Sciences of the U.S.A.
    3         4843   Journal of Bacteriology
    4         4434   Gene
    5         4294   Biochemical and Biophysical Research Communications
    6         4221   Nucleic Acids Research
    7         3817   FEBS Letters
    8         3625   Biochemistry
    9         3557   The EMBO Journal
   10         3205   Molecular and Cellular Biology
   11         3051   Nature
   12         3045   European Journal of Biochemistry
   13         2879   Biochimica et Biophysica Acta
   14         2828   Journal of Molecular Biology
   15         2489   Cell
   16         2457   Genomics
   17         2075   Biochemical Journal
   18         1957   Science
   19         1785   Journal of Virology
   20         1652   Molecular Microbiology
   21         1472   Journal of Cell Biology
   22         1453   Plant Molecular Biology
   23         1293   Molecular and General Genetics
   24         1269   Virology
   25         1247   Genes and Development
   26         1247   Nature Genetics
   27         1235   Human Molecular Genetics
   28         1177   Plant Physiology
   29         1142   The American Journal of Human Genetics
   30         1132   Journal of Biochemistry
   31         1129   Oncogene
   32         1034   Development
   33          972   Human Mutation
   34          935   Journal of Immunology
   35          911   Genetics
   36          909   Molecular Biology of the Cell
   37          836   Infection and Immunity
   38          833   Structure
   39          810   Journal of General Virology
   40          779   Archives of Biochemistry and Biophysics
   41          777   The Plant Cell
   42          734   Blood
   43          728   Yeast
   44          706   Microbiology
   45          696   Molecular Cell
   46          645   Developmental Biology
   47          641   The Plant Journal
   48          640   Journal of Cell Science
   49          624   FEMS Microbiology Letters
   50          618   Cancer Research
   51          580   Human Genetics
   52          574   Nature Structural Biology
   53          565   Current Biology
   54          553   Mechanisms of Development
   55          515   Current Genetics
   56          495   Acta Crystallographica, Section D
   57          494   Journal of Neuroscience
   58          490   Applied and Environmental Microbiology
   59          487   Protein Science
   60          481   Journal of Clinical Investigation
   61          472   Neuron
   62          464   Mammalian Genome
   63          461   Toxicon
   64          428   Immunogenetics
   65          422   The Journal of Experimental Medicine
   66          418   Molecular Endocrinology
   67          416   American Journal of Physiology
   68          411   Molecular and Biochemical Parasitology
   69          388   Journal of Neurochemistry
   70          368   Endocrinology
   71          367   Journal of Molecular Evolution
   72          359   DNA and Cell Biology
   73          358   The Journal of Clinical Endocrinology and Metabolism
   74          351   DNA Sequence
   75          342   Molecular Biology and Evolution
   76          328   Bioscience, Biotechnology, and Biochemistry
   77          324   Journal of Medical Genetics
   78          308   Proteins
   79          308   Brain Research. Molecular Brain Research
   80          287   Biological Chemistry Hoppe-Seyler
   81          273   Cytogenetics and Cell Genetics
   82          267   Comparative Biochemistry and Physiology
   83          266   Peptides
   84          265   Journal of Investigative Dermatology
   85          265   Antimicrobial Agents and Chemotherapy
   86          256   Plant and Cell Physiology
   87          250   Molecular Pharmacology
   88          248   Biology of Reproduction
   89          246   Nature Cell Biology
   90          246   Experimental Cell Research
   91          245   Journal of General Microbiology
   92          234   Genome Research
   93          221   Virus Research
   94          218   Neurology
   95          215   Hoppe-Seyler's Zeitschrift fur Physiologische Chemie
   96          208   Developmental Dynamics
   97          204   RNA
   98          201   DNA Research
   99          197   Molecular Plant-Microbe Interactions
  100          193   Biochimie
  101          192   European Journal of Immunology
  102          184   Annals of Neurology
  103          183   Tissue Antigens
  104          182   European Journal of Human Genetics
  105          181   Planta
  106          179   Developmental Cell
  107          173   Journal of Human Genetics
  108          172   Genes to Cells
  109          168   Immunity
  110          166   Molecular and Cellular Endocrinology
  111          161   Eukaryotic cell
  112          161   Molecular Phylogenetics and Evolution
  113          160   Archives of Microbiology
  114          159   DNA
  115          158   American Journal of Medical Genetics
  116          157   The New England Journal of Medicine
  117          152   Hemoglobin
  118          150   Insect Biochemistry and Molecular Biology
  119          148   Bioorganicheskaia Khimiia
  120          147   Investigative Ophthalmology and Visual Science
  121          144   Molecular Reproduction and Development
  122          140   Diabetes
  123          138   Molecular Immunology
  124          138   Glycobiology
  125          135   Animal Genetics
  126          132   General and Comparative Endocrinology
  127          128   Molecular and Cellular Neuroscience
  128          128   International Journal of Cancer
  129          127   Clinical Genetics
  130          124   The FASEB Journal
  131          124   Archives of Virology
  132          123   EMBO Reports
  133          119   Agricultural and Biological Chemistry
  134          119   Molecular Genetics and Metabolism
  135          115   British Journal of Haematology
  136          114   Nature Structural and Molecular Biology
  137          113   Molecular Genetics and Genomics
  138          112   Journal of Cellular Biochemistry
  139          111   Journal of Protein Chemistry
  140          110   The FEBS Journal
  141          109   Biological Chemistry
  142          107   Thrombosis and Haemostasis
  143          107   Journal of Neuroscience Research
  144          107   Journal of the American Chemical Society
  145          106   American Journal of Medical Genetics. Part A
  146          105   Nature Immunology
  147          105   Neuroscience Letters
  148          105   Journal of Lipid Research
  149          104   Journal of Molecular Endocrinology
  150          103   Protein Expression and Purification


5.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/Swiss-Prot lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                      Total    Number of  Average
   Line type / subtype                number   entries    per entry
------------------------------------  -------- ---------  ---------

References (RL)                       781540                 1.82                                         
   Journal                            628701     333076      1.47       1                                 
   Submitted to EMBL/GenBank/DDBJ     141218     129587      0.33       2                                 
   Submitted to other databases         9617       8507      0.02       3                                 
   Book citation                         622        611     <0.01       4                                 
   Plant Gene Register                   556        544     <0.01       5                                 
   Thesis                                389        387     <0.01       6                                 
   Unpublished observations              288        284     <0.01       7                                 
   Patent                                143        141     <0.01       8                                 
   Worm Breeder's Gazette                  6          6     <0.01       9                                 

Total number of distinct authors cited in UniProtKB/Swiss-Prot: 271220

                                      Total    Number of  Average
   Line type / subtype                number   entries    per entry  Rank
------------------------------------  -------- ---------  ---------  ----
Comments (CC)                        1789693                 4.18                                         
   ALLERGEN                              452        452     <0.01      26                                 
   ALTERNATIVE PRODUCTS                17720      17720      0.04      12                                 
   BIOPHYSICOCHEMICAL PROPERTIES        2500       2500      0.01      22                                 
   BIOTECHNOLOGY                         241        239     <0.01      28                                 
   CATALYTIC ACTIVITY                 175352     160149      0.41       4                                 
   CAUTION                              6045       5925      0.01      19                                 
   COFACTOR                            75945      69750      0.18       7                                 
   DEVELOPMENTAL STAGE                  7930       7930      0.02      16                                 
   DISEASE                              4495       3090      0.01      20                                 
   DISRUPTION PHENOTYPE                 1609       1609     <0.01      23                                 
   DOMAIN                              26533      23452      0.06      11                                 
   ENZYME REGULATION                    6664       6664      0.02      18                                 
   FUNCTION                           310429     299159      0.72       2                                 
   INDUCTION                            9805       9805      0.02      15                                 
   INTERACTION                         11265      11265      0.03      14                                 
   MASS SPECTROMETRY                    3883       2946      0.01      21                                 
   MISCELLANEOUS                       27141      24903      0.06      10                                 
   PATHWAY                             98123      89621      0.23       6                                 
   PHARMACEUTICAL                         80         80     <0.01      29                                 
   POLYMORPHISM                          735        706     <0.01      24                                 
   PTM                                 31005      25377      0.07       8                                 
   RNA EDITING                           560        560     <0.01      25                                 
   SEQUENCE CAUTION                    11577      11577      0.03      13                                 
   SIMILARITY                         497292     404571      1.16       1                                 
   SUBCELLULAR LOCATION               249793     245311      0.58       3                                 
   SUBUNIT                            173827     173827      0.41       5                                 
   TISSUE SPECIFICITY                  30654      30654      0.07       9                                 
   TOXIC DOSE                            392        384     <0.01      27                                 
   WEB RESOURCE                         7646       6129      0.02      17                                 

Total number of comment topics: 29


                                      Total    Number of  Average
   Line type / subtype                number   entries    per entry  Rank
------------------------------------  -------- ---------  ---------  ----
Features (FT)                        2723263                 6.35                                         
   ACT_SITE                           104207      62057      0.24      11                                 
   BINDING                            155163      49238      0.36       4                                 
   CA_BIND                              3566       1449      0.01      35                                 
   CARBOHYD                            89711      23184      0.21      12                                 
   CHAIN                              434909     424497      1.01       1                                 
   COILED                              16267      10833      0.04      26                                 
   COMPBIAS                            43223      23345      0.10      18                                 
   CONFLICT                           111609      38964      0.26       9                                 
   CROSSLNK                             4122       2734      0.01      34                                 
   DISULFID                            88560      23114      0.21      13                                 
   DNA_BIND                             9381       8704      0.02      31                                 
   DOMAIN                             126447      73205      0.29       6                                 
   HELIX                              112953      11607      0.26       8                                 
   INIT_MET                            12879      12879      0.03      27                                 
   LIPID                                9803       6321      0.02      29                                 
   METAL                              208340      52300      0.49       3                                 
   MOD_RES                            129042      42164      0.30       5                                 
   MOTIF                               28332      18294      0.07      22                                 
   MUTAGEN                             26290       6325      0.06      25                                 
   NON_CONS                             1569        627     <0.01      36                                 
   NON_STD                               340        266     <0.01      38                                 
   NON_TER                             11304       8588      0.03      28                                 
   NP_BIND                             82019      55056      0.19      14                                 
   PEPTIDE                              7852       4848      0.02      32                                 
   PROPEP                               9800       8166      0.02      30                                 
   REGION                              69832      39220      0.16      17                                 
   REPEAT                              81125      11967      0.19      15                                 
   SIGNAL                              30875      30865      0.07      20                                 
   SITE                                29264      17057      0.07      21                                 
   STRAND                             116537      10976      0.27       7                                 
   TOPO_DOM                           107079      21833      0.25      10                                 
   TRANSIT                              5932       5846      0.01      33                                 
   TRANSMEM                           292168      59681      0.68       2                                 
   TURN                                27890       9332      0.07      23                                 
   UNSURE                                946        300     <0.01      37                                 
   VAR_SEQ                             37208      15859      0.09      19                                 
   VARIANT                             70313      15203      0.16      16                                 
   ZN_FING                             26406      11048      0.06      24                                 

Total number of feature keys: 38



                                      Total    Number of  Average
   Line type / subtype                number   entries    per entry  Rank      Category
------------------------------------  -------- ---------  ---------  ----      -------------------------------------------
Cross-references (DR)                8910885                20.79                                                           
   2DBase-Ecoli                           84         84     <0.01     102      2D gel databases                             
   Aarhus/Ghent-2DPAGE                   126         96     <0.01      99      2D gel databases                             
   AGD                                   790        784     <0.01      77      Organism-specific databases                  
   ANU-2DPAGE                             23         23     <0.01     109      2D gel databases                             
   ArrayExpress                        54151      54151      0.13      30      Gene expression databases                    
   Bgee                                35505      35495      0.08      34      Gene expression databases                    
   BindingDB                             297        297     <0.01      92      Other                                        
   BioCyc                             157015     148907      0.37      14      Enzyme and pathway databases                 
   BRENDA                              65123      62330      0.15      26      Enzyme and pathway databases                 
   BuruList                              296        296     <0.01      93      Organism-specific databases                  
   CGD                                   514        512     <0.01      82      Organism-specific databases                  
   CleanEx                             30264      29611      0.07      37      Gene expression databases                    
   COMPLUYEAST-2DPAGE                     59         59     <0.01     104      2D gel databases                             
   Cornea-2DPAGE                          67         67     <0.01     103      2D gel databases                             
   CYGD                                 6628       6522      0.02      52      Organism-specific databases                  
   dictyBase                            3667       3557      0.01      65      Organism-specific databases                  
   DIP                                  9016       8966      0.02      47      Protein-protein interaction databases        
   DisProt                               397        394     <0.01      86      3D structure databases                       
   DOSAC-COBS-2DPAGE                     150        150     <0.01      98      2D gel databases                             
   DrugBank                             5316       1625      0.01      54      Other                                        
   EchoBASE                             4159       4124      0.01      61      Organism-specific databases                  
   ECO2DBASE                             351        299     <0.01      90      2D gel databases                             
   EcoGene                              4331       4328      0.01      60      Organism-specific databases                  
   EMBL                               733511     419465      1.71       3      Sequence databases                           
   Ensembl                             68473      66943      0.16      25      Genome annotation databases                  
   euHCVdb                                55         44     <0.01     105      Organism-specific databases                  
   FlyBase                              4415       4043      0.01      59      Organism-specific databases                  
   Gene3D                             194637     161088      0.45      13      Family and domain databases                  
   GeneCards                           21183      19899      0.05      38      Organism-specific databases                  
   GeneDB_Spombe                        4793       4749      0.01      56      Organism-specific databases                  
   GeneFarm                             2504       2483      0.01      70      Organism-specific databases                  
   GeneID                             381309     363101      0.89       7      Genome annotation databases                  
   GenomeReviews                      284894     266392      0.66       9      Genome annotation databases                  
   GermOnline                          41962      41352      0.10      33      Gene expression databases                    
   GlycoSuiteDB                          280        280     <0.01      94      PTM databases                                
   GO                                1730543     399299      4.04       1      Ontologies                                   
   Gramene                              3990       3990      0.01      62      Organism-specific databases                  
   H-InvDB                             11259       9565      0.03      46      Organism-specific databases                  
   HAMAP                              232695     232581      0.54      10      Family and domain databases                  
   HGNC                                19216      19059      0.04      40      Organism-specific databases                  
   HOGENOM                            204967     204967      0.48      12      Phylogenomic databases                       
   HOVERGEN                            76378      76378      0.18      24      Phylogenomic databases                       
   HPA                                  6200       4994      0.01      53      Organism-specific databases                  
   HSC-2DPAGE                             85         85     <0.01     101      2D gel databases                             
   HSSP                                84683      84683      0.20      23      3D structure databases                       
   IntAct                              20253      20251      0.05      39      Protein-protein interaction databases        
   InterPro                          1083300     399424      2.53       2      Family and domain databases                  
   IPI                                 85696      61732      0.20      22      Sequence databases                           
   KEGG                               355038     334366      0.83       8      Genome annotation databases                  
   LegioList                             725        723     <0.01      78      Organism-specific databases                  
   Leproma                               655        652     <0.01      81      Organism-specific databases                  
   LinkHub                             18287      18287      0.04      41      Other                                        
   ListiList                            1159       1151     <0.01      75      Organism-specific databases                  
   MaizeGDB                              469        464     <0.01      84      Organism-specific databases                  
   MEROPS                               7866       7604      0.02      49      Protein family/group databases               
   MGI                                 15977      15927      0.04      43      Organism-specific databases                  
   MIM                                 15492      12279      0.04      45      Organism-specific databases                  
   MypuList                              201        201     <0.01      97      Organism-specific databases                  
   NextBio                             48267      48265      0.11      32      Other                                        
   NMPDR                              122888     122860      0.29      16      Genome annotation databases                  
   OGP                                   378        378     <0.01      88      2D gel databases                             
   Orphanet                             3382       1995      0.01      67      Organism-specific databases                  
   PANTHER                            155906     143918      0.36      15      Family and domain databases                  
   Pathway_Interaction_DB               4568       1665      0.01      58      Enzyme and pathway databases                 
   PDB                                 56257      13928      0.13      28      3D structure databases                       
   PDBsum                              56248      13927      0.13      29      3D structure databases                       
   PeptideAtlas                         5167       5167      0.01      55      Proteomic databases                          
   PeroxiBase                            662        646     <0.01      80      Protein family/group databases               
   Pfam                               559239     391183      1.30       4      Family and domain databases                  
   PharmGKB                            15843      15831      0.04      44      Organism-specific databases                  
   PHCI-2DPAGE                           245        245     <0.01      96      2D gel databases                             
   PhosphoSite                         16726      16726      0.04      42      PTM databases                                
   PhosSite                              266        266     <0.01      95      PTM databases                                
   PhotoList                             716        716     <0.01      79      Organism-specific databases                  
   PIR                                113036     103221      0.26      19      Sequence databases                           
   PIRSF                               64427      64427      0.15      27      Family and domain databases                  
   PMMA-2DPAGE                            52         52     <0.01     106      2D gel databases                             
   PptaseDB                               34         34     <0.01     107      Protein family/group databases               
   PRIDE                               33839      33839      0.08      35      Proteomic databases                          
   PRINTS                             114793      98243      0.27      18      Family and domain databases                  
   ProDom                             114838     111821      0.27      17      Family and domain databases                  
   ProMEX                                431        431     <0.01      85      Proteomic databases                          
   PROSITE                            385455     243280      0.90       6      Family and domain databases                  
   PseudoCAP                            1199       1190     <0.01      73      Organism-specific databases                  
   Rat-heart-2DPAGE                       28         28     <0.01     108      2D gel databases                             
   Reactome                             4620       2749      0.01      57      Enzyme and pathway databases                 
   REBASE                                354        345     <0.01      89      Protein family/group databases               
   RefSeq                             396625     363318      0.93       5      Sequence databases                           
   REPRODUCTION-2DPAGE                  1030        942     <0.01      76      2D gel databases                             
   RGD                                  7194       7189      0.02      50      Organism-specific databases                  
   SagaList                              381        380     <0.01      87      Organism-specific databases                  
   SGD                                  6640       6537      0.02      51      Organism-specific databases                  
   Siena-2DPAGE                          102        102     <0.01     100      2D gel databases                             
   SMART                              111692      85039      0.26      20      Family and domain databases                  
   SMR                                 50798      50798      0.12      31      3D structure databases                       
   SubtiList                            3537       3535      0.01      66      Organism-specific databases                  
   SWISS-2DPAGE                         1182       1182     <0.01      74      2D gel databases                             
   TAIR                                 7957       7843      0.02      48      Organism-specific databases                  
   TCDB                                 3095       3060      0.01      69      Protein family/group databases               
   TIGR                                32672      31933      0.08      36      Genome annotation databases                  
   TIGRFAMs                           215422     200843      0.50      11      Family and domain databases                  
   TubercuList                          1490       1454     <0.01      72      Organism-specific databases                  
   UniGene                             85716      78769      0.20      21      Sequence databases                           
   VectorBase                            305        296     <0.01      91      Genome annotation databases                  
   World-2DPAGE                          501        501     <0.01      83      2D gel databases                             
   WormBase                             3670       3585      0.01      64      Organism-specific databases                  
   WormPep                              3933       3209      0.01      63      Organism-specific databases                  
   Xenbase                              3227       3160      0.01      68      Organism-specific databases                  
   ZFIN                                 2373       2357      0.01      71      Organism-specific databases                  

Total number of cross-referenced databases: 109

6.  AMINO ACID COMPOSITION

   6.1  Composition in percent for the complete database

   Ala (A) 8.17   Gln (Q) 3.95   Leu (L) 9.67   Ser (S) 6.62
   Arg (R) 5.50   Glu (E) 6.74   Lys (K) 5.87   Thr (T) 5.34
   Asn (N) 4.07   Gly (G) 7.04   Met (M) 2.41   Trp (W) 1.09
   Asp (D) 5.42   His (H) 2.28   Phe (F) 3.88   Tyr (Y) 2.93
   Cys (C) 1.40   Ile (I) 5.94   Pro (P) 4.74   Val (V) 6.82

   Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.00


   6.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Val, Glu, Ser, Ile, Lys, Arg, Asp, Thr, Pro, Asn, Gln,
   Phe, Tyr, Met, His, Cys, Trp


7.  MISCELLANEOUS STATISTICS

4433 entries are encoded on a mitochondrion, and 3492 are encoded on a plasmid.

11919 entries are encoded on a plastid, 
of which 20 are encoded on apicoplasts, 
11406 on chloroplasts, 
39 on chromatophores,
145 on cyanelles, 
149 on non-photosynthetic plastids and 
199 on unspecified types of plastid.

Number of entries with at least one sequence correction: 64801




UniProtKB/TrEMBL protein database release 40.0 statistics


1.  INTRODUCTION

Release 40.0 of 24-Mar-2009 of UniProtKB/TrEMBL contains 7'753'442 sequence entries
comprising 2'459'135'421 amino acids.

1'700'878 sequences have been added since release 39, the sequence data of
24'829 existing entries has been updated and the annotations of
4'218'268 entries have been revised. This represents an increase of 31%.



2.  AMINO ACID COMPOSITION

   2.1  Composition in percent for the complete database
   
   Ala (A) 8.54   Gln (Q) 3.93   Leu (L) 9.83   Ser (S) 6.84
   Arg (R) 5.54   Glu (E) 6.09   Lys (K) 5.22   Thr (T) 5.60
   Asn (N) 4.17   Gly (G) 7.05   Met (M) 2.42   Trp (W) 1.33
   Asp (D) 5.26   His (H) 2.22   Phe (F) 4.02   Tyr (Y) 3.02
   Cys (C) 1.36   Ile (I) 5.89   Pro (P) 4.84   Val (V) 6.65

   Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.07


   2.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Ser, Val, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
   Gln, Tyr, Met, His, Cys, Trp


3.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/TrEMBL: 193405

   The first twenty species represent 1076730 sequences:  14.3 % of the
   total number of entries.


   3.1 Table of the frequency of occurrence of species

        Species represented 1x:87309
                            2x:34896
                            3x:18055
                            4x:10752
                            5x: 6919
                            6x: 4612
                            7x: 3548
                            8x: 2634
                            9x: 2158
                           10x: 2513
                       11- 20x:11482
                       21- 50x: 4206
                       51-100x: 1591
                         >100x: 2730


   3.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1     262485  Human immunodeficiency virus 1
       2      95008  Oryza sativa subsp. japonica (Rice)
       3      67189  Homo sapiens (Human)
       4      54387  Vitis vinifera (Grape)
       5      50193  Branchiostoma floridae (Florida lancelet) (Amphioxus)
       6      50188  Trichomonas vaginalis G3
       7      46361  Hepatitis C virus
       8      44805  Mus musculus (Mouse)
       9      43980  Populus trichocarpa (Western balsam poplar) 
      10      43557  Arabidopsis thaliana (Mouse-ear cress)
      11      39850  Paramecium tetraurelia
      12      38756  Oryza sativa subsp. indica (Rice)
      13      34771  Physcomitrella patens subsp. patens
      14      33127  uncultured bacterium
      15      31220  Ricinus communis (Castor bean)
      16      30108  Zea mays (Maize)
      17      29407  Drosophila melanogaster (Fruit fly)
      18      28078  Tetraodon nigroviridis (Green puffer)
      19      26658  Hepatitis B virus (HBV)
      20      26602  Danio rerio (Zebrafish) (Brachydanio rerio)
      21      24830  Nematostella vectensis (Starlet sea anemone)
      22      21418  Caenorhabditis briggsae
      23      21089  Ixodes scapularis (Black-legged tick) (Deer tick)
      24      20639  Caenorhabditis elegans
      25      20525  Trypanosoma cruzi
      26      18820  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
      27      17880  Laccaria bicolor (strain S238N-H82) (Bicoloured deceiver) 
      28      17513  Drosophila simulans (Fruit fly)
      29      16989  Drosophila yakuba (Fruit fly)
      30      16785  Drosophila persimilis (Fruit fly)
      31      16779  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
      32      16685  Tetrahymena thermophila SB210
      33      16281  Drosophila sechellia (Fruit fly)
      34      16281  Botryotinia fuckeliana (strain B05.10) (Noble rot fungus) (Botrytis cinerea)
      35      16064  Drosophila pseudoobscura pseudoobscura (Fruit fly)
      36      15883  Phaeosphaeria nodorum (Septoria nodorum)
      37      15513  Drosophila willistoni (Fruit fly)
      38      15064  Drosophila ananassae (Fruit fly)
      39      15040  Drosophila erecta (Fruit fly)
      40      14781  Drosophila mojavensis (Fruit fly)
      41      14756  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
      42      14736  Drosophila virilis (Fruit fly)
      43      14724  Chlamydomonas reinhardtii
      44      14675  Plasmodium chabaudi
      45      14301  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
      46      14296  Anopheles gambiae (African malaria mosquito)
      47      13747  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
      48      13489  Coprinopsis cinerea (strain Okayama-7 / 130 / FGSC 9003) (Inky cap fungus) 
      49      13487  Aspergillus flavus NRRL3357
      50      12996  Talaromyces stipitatus ATCC 10500
      51      12810  Xenopus laevis (African clawed frog)
      52      12772  Penicillium chrysogenum Wisconsin 54-1255
      53      12737  Magnaporthe grisea (Rice blast fungus) (Pyricularia grisea)
      54      12057  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
      55      11927  Aspergillus oryzae
      56      11793  Plasmodium berghei
      57      11612  Thalassiosira pseudonana CCMP1335
      58      11574  Trichoplax adhaerens
      59      11562  Brugia malayi (Filarial nematode worm)
      60      11045  Escherichia coli
      61      10926  Hepatitis C virus subtype 1b
      62      10892  Chaetomium globosum (Soil fungus)
      63      10709  Podospora anserina
      64      10559  Ralstonia solanacearum (Pseudomonas solanacearum)
      65      10467  Dictyostelium discoideum (Slime mold)
      66      10427  Neurospora crassa
      67      10422  Penicillium marneffei ATCC 18224
      68      10336  Phaeodactylum tricornutum CCAP 1055/1
      69      10294  Coccidioides immitis
      70      10288  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      71      10238  Aspergillus terreus (strain NIH 2624)
      72      10230  Neosartorya fischeri  (Aspergillus fischerianus 
      73       9892  Schistosoma japonicum (Blood fluke)
      74       9878  Aspergillus fumigatus (strain CEA10 / CBS 144.89 / FGSC A1163) 
      75       9813  Bos taurus (Bovine)
      76       9669  Cryptococcus neoformans (Filobasidiella neoformans)
      77       9665  Aspergillus fumigatus (Sartorya fumigata)
      78       9471  Trypanosoma brucei
      79       9416  Emericella nidulans (Aspergillus nidulans)
      80       9258  Monosiga brevicollis (Choanoflagellate)
      81       9192  Ajellomyces capsulata (strain NAm1 / WU24) (Darling's disease fungus) 
      82       9190  Candida albicans (Yeast)
      83       9166  Sorangium cellulosum (strain So ce56) (Polyangium cellulosum (strain So ce56))
      84       9090  Rattus norvegicus (Rat)
      85       8983  Postia placenta Mad-698-R
      86       8954  Aspergillus clavatus
      87       8932  Porcine reproductive and respiratory syndrome virus (PRRSV)
      88       8915  Simian immunodeficiency virus (isolate CPZ GAB1) (SIV-cpz) 
      89       8884  Helicobacter pylori (Campylobacter pylori)
      90       8809  Rhodococcus sp. (strain RHA1)
      91       8731  Escherichia coli (strain 55989 / EAEC)
      92       8607  Entamoeba dispar SAW760
      93       8523  Stigmatella aurantiaca DW4/3-1
      94       8437  Plesiocystis pacifica SIR-1
      95       8275  Plasmodium falciparum
      96       8253  Streptomyces sviceus ATCC 29083
      97       8249  Microscilla marina ATCC 23134
      98       8201  Microcoleus chthonoplastes PCC 7420
      99       8180  Burkholderia xenovorans (strain LB400)
     100       8129  Bradyrhizobium japonicum


  
   3.3  Taxonomic distribution of the sequences


   Kingdom        sequences (% of the database)
    Archaea          137049 (  2%)
    Bacteria        4165881 ( 55%)
    Eukaryota       2492847 ( 33%)
    Viruses          733065 ( 10%)
    Other              8599 ( <1%)



   Within Eukaryota:


    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                  67203 (  3%)           (  1%)
     Other Mammalia        144489 (  6%)           (  2%)
     Other Vertebrata      247964 ( 10%)           (  3%)
     Viridiplantae         608401 ( 24%)           (  8%)
     Fungi                 448052 ( 18%)           (  6%)
     Insecta               366292 ( 15%)           (  5%)
     Nematoda               59871 (  2%)           (  1%)
     Other                 550575 ( 22%)           (  7%)



4.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50  148439             1001-1100    47501
                 51- 100  564448             1101-1200    33699
                101- 150  664698             1201-1300    23447
                151- 200  640557             1301-1400    16104
                201- 250  639142             1401-1500    12723
                251- 300  614727             1501-1600     9280
                301- 350  566716             1601-1700     7128
                351- 400  445777             1701-1800     5819
                401- 450  370905             1801-1900     4539
                451- 500  311596             1901-2000     3897
                501- 550  222942             2001-2100     3109
                551- 600  167193             2101-2200     3213
                601- 650  123928             2201-2300     2449
                651- 700   97281             2301-2400     1977
                701- 750   83493             2401-2500     1658
                751- 800   74339             >2500        15038
                801- 850   56358
                851- 900   50046
                901- 950   35680
                951-1000   28236


   The average sequence length in UniProtKB/TrEMBL is   326 amino acids.

   The shortest sequence is Q16047_HUMAN:     4 amino acids.
   The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.



5.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                    9885975              1.31
   Submitted to EMBL/GenBank/DDBJ  5435937   4559102    0.72
   Journal                         4342038   3784310    0.58
   Thesis                             7110      7054   <0.01
   Submitted to other databases       4628      4620   <0.01
   Book citation                      4517      4471   <0.01
   Other                             91745     90595    0.01

Comments (CC)                      5086365              0.67
   SIMILARITY                      1529419   1283560    0.20
   CAUTION                         1522553   1522553    0.20
   FUNCTION                         550977    487269    0.07
   CATALYTIC ACTIVITY               520201    431091    0.07
   SUBCELLULAR LOCATION             451790    420050    0.06
   SUBUNIT                          244362    219337    0.03
   COFACTOR                         159132    146980    0.02
   PATHWAY                           98881     95980    0.01
   MISCELLANEOUS                      5827      5827   <0.01
   INTERACTION                        2627      2627   <0.01
   DOMAIN                              596       596   <0.01

Features (FT)                      2925361              0.39
   NON_TER                         2416053   1437324    0.32
   CHAIN                            314678    246799    0.04
   SIGNAL                           194054    194054    0.03
   TRANSIT                             576       576   <0.01

Cross-references (DR)             71009542              9.42
   GO                             13450712   4329612    1.78
   InterPro                       12179422   5288931    1.62
   EMBL                            8471780   7530279    1.12
   Pfam                            6773292   5019113    0.90
   PROSITE                         3705440   2401590    0.49
   RefSeq                          3705117   3565145    0.49
   GeneID                          3690642   3558116    0.49
   KEGG                            2964305   2877187    0.39
   Gene3D                          2221743   1897018    0.29
   GenomeReviews                   2190231   2128466    0.29
   SMART                           1291968   1012465    0.17
   TIGRFAMs                        1203718   1100600    0.16
   PANTHER                         1141465   1081325    0.15
   PRINTS                          1132757    986972    0.15
   HOGENOM                         1046657   1046653    0.14
   NMPDR                            941154    941143    0.12
   BioCyc                           833412    811420    0.11
   ProDom                           669141    638944    0.09
   SMR                              490641    490505    0.07
   UniGene                          360267    332031    0.05
   PIRSF                            337379    337379    0.04
   HOVERGEN                         309523    309327    0.04
   HSSP                             259517    259229    0.03
   TIGR                             197613    190359    0.03
   FlyBase                          194222    192697    0.03
   IPI                              193089    193089    0.03
   PIR                              179433    146432    0.02
   Ensembl                          150065    144403    0.02
   ArrayExpress                      95180     95144    0.01
   Bgee                              80715     80670    0.01
   Gramene                           69538     69538    0.01
   PRIDE                             60147     60147    0.01
   euHCVdb                           55083     55082    0.01
   NextBio                           53147     53147    0.01
   MGI                               39407     39130    0.01
   VectorBase                        28981     28654   <0.01
   HGNC                              27771     27735   <0.01
   MEROPS                            25649     24990   <0.01
   ZFIN                              19621     19615   <0.01
   WormPep                           18815     18712   <0.01
   WormBase                          18806     18712   <0.01
   TAIR                              18615     18566   <0.01
   IntAct                            12617     12617   <0.01
   LinkHub                           11554     11554   <0.01
   Xenbase                           10331     10045   <0.01
   dictyBase                          9048      9047   <0.01
   CGD                                6852      6852   <0.01
   PDBsum                             5675      3203   <0.01
   PDB                                5675      3203   <0.01
   LegioList                          5178      5150   <0.01
   ListiList                          4656      4639   <0.01
   PseudoCAP                          4369      4366   <0.01
   PhotoList                          3964      3840   <0.01
   BuruList                           3944      3910   <0.01
   AGD                                3904      3904   <0.01
   RGD                                3684      3678   <0.01
   REBASE                             3674      3650   <0.01
   BRENDA                             2972      2902   <0.01
   TubercuList                        2500      2494   <0.01
   DIP                                2229      2224   <0.01
   PeroxiBase                         2093      2088   <0.01
   TCDB                               1977      1958   <0.01
   SagaList                           1713      1619   <0.01
   PhosphoSite                        1250      1250   <0.01
   Leproma                             952       951   <0.01
   MypuList                            581       577   <0.01
   ProMEX                              473       473   <0.01
   World-2DPAGE                        412       412   <0.01
   SGD                                 317       317   <0.01
   GeneDB_Spombe                       206       202   <0.01
   PeptideAtlas                        165       165   <0.01
   PHCI-2DPAGE                         102       102   <0.01
   PharmGKB                             89        89   <0.01
   Reactome                             68        64   <0.01
   ANU-2DPAGE                           58        58   <0.01
   SWISS-2DPAGE                         29        29   <0.01
   Pathway_Interaction_DB               16        13   <0.01
   CYGD                                 16        16   <0.01
   REPRODUCTION-2DPAGE                  13        13   <0.01
   PMMA-2DPAGE                           3         3   <0.01
   Siena-2DPAGE                          2         2   <0.01
   COMPLUYEAST-2DPAGE                    1         1   <0.01

Number of explicitly cross-referenced databases: 110


6.  MISCELLANEOUS STATISTICS

Total number of distinct authors cited in UniProtKB/TrEMBL: 271939

Total number of entries encoded on a Mitochondrion: 246125
Total number of entries encoded on a Plasmid: 121183
Total number of entries encoded on a Plastid: 7064
Total number of entries encoded on a Plastid; Apicoplast: 316
Total number of entries encoded on a Plastid; Chloroplast: 85864
Total number of entries encoded on a Plastid; Cyanelle: 7
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 419

Number of fragments: 1439360


Submissions and Updates

We welcome feedback from our users. We would especially appreciate your notifying us if you find that sequences belonging to your field of expertise are missing from the database. We also would like to be notified about annotations to be updated, if, for example, the function of a protein has been clarified or if new information about post-translational modifications has become available.

Submit new sequence data, updates and corrections at http://www.uniprot.org/help/submissions.

For all queries regarding submissions to UniProtKB, please contact:


Download information

Minor releases (every 3 weeks)

The latest data of the UniProt Knowledgebase is available in various format (flatfile, XML or FASTA) at http://www.uniprot.org/downloads. The data is further supplemented by a file containing the sequences of all additional alternative isoforms annotated in UniProtKB/Swiss-Prot. This data set is documented in the file ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/README.varsplic

Major releases

For users who wish to download the UniProt Knowledgebase only occasionally, we distribute the latest major release (updated 3 times per year) in flatfile format. Previous UniProtKB/Swiss-Prot and UniProtKB/TrEMBL are archived under ftp://ftp.uniprot.org/pub/databases/uniprot/previous_major_releases. The UniProt Knowledgebase major release is also available on DVD from the EBI.


Contact

EMBL Outstation
European Bioinformatics Institute (EBI)
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom

Telephone: (+44 1223) 494 444
Fax: (+44 1223) 494 468
Electronic mail address:
WWW server: http://www.ebi.ac.uk/


Swiss Institute of Bioinformatics (SIB)
Centre Medical Universitaire
1, rue Michel Servet
1211 Geneva 4
Switzerland

Telephone: (+41 22) 379 50 50
Fax: (+41 22) 379 58 58
Electronic mail address:
WWW server: http://www.expasy.org/


Protein Information Resource (PIR)
Georgetown University Medical Center
3300 Whitehaven St., Suite 1200
Washington, DC 20008
United States of America

Telephone: (+1 202) 687 1039
Fax: (+1 202) 687 0057)
Electronic mail address:
WWW server: http://pir.georgetown.edu

Citation

If you want to cite UniProt in a publication, please use the following reference:

The UniProt Consortium
"The Universal Protein Resource (UniProt) 2009"
Nucleic Acids Res. 37:D169-D174(2009) 10.1093/nar/gkn664