ExPASy logo ExPASy Home page Site Map Search ExPASy Contact us Swiss-Prot
Notice: This page will be replaced with www.uniprot.org. Please send us your feedback!
Search for

UniProt
Swiss-ProtTrEMBL
UniProt Knowledgebase
Swiss-Prot Protein Knowledgebase
TrEMBL Protein Database

What's new?
Release 14.0 of 22-Jul-2008

Also read about forthcoming changes, the latest release statistics (Swiss-Prot, TrEMBL), Swiss-Prot headlines, and recent and forthcoming changes for the XML version of the UniProt Knowledgebase.

UniProtKB release 14.0 of 22-Jul-2008

Change of the protein description (DE line)

Up to now, the UniProtKB description (DE) lines were listing protein names in a computer parsable format, but with a minimal amount of structure. In UniProtKB/Swiss-Prot the description starts with the recommended name of the protein and additional alternative names are indicated between parentheses. In UniProtKB/TrEMBL the description is derived directly from the underlying nucleotide entry and its accuracy relies on the information provided by the submitter of the nucleotide entry, unless it has been improved by automatic annotation procedures.

Consistent nomenclature is indispensable for communication, literature searching and entry retrieval. The protein names provided in the description lines of UniProtKB/Swiss-Prot are widely used by life scientists and often propagated during the annotation of new genomic sequences. For these reasons we have structured the UniProtKB DE lines more explicitly: We introduced 3 categories, as well as several subcategories, of protein names:

Category FieldSubcategory FieldCardinalityDescription
RecName:1 in UniProtKB/Swiss-Prot
0-1 in UniProtKB/TrEMBL
The name recommended by the UniProt consortium.
Full=1 The full name.
Short=0-n An abbreviation of the full name or an acronym.
EC=0-n An Enzyme Commission number.
AltName:0-n A synonym of the recommended name.
Full=0-1 The full name.
Short=0-n An abbreviation of the full name or an acronym.
EC=0-n An Enzyme Commission number.
AltName:Allergen=0-1 See allergen.txt.
AltName:Biotech=0-1 A name used in a biotechnological context.
AltName:CD_antigen=0-n See cdlist.txt.
AltName:INN=0-n The international nonproprietary name: A generic name for a pharmaceutical substance or active pharmaceutical ingredient that is globally recognized and is a public property.
SubName:0 in UniProtKB/Swiss-Prot
0-n in UniProtKB/TrEMBL
A name provided by the submitter of the underlying nucleotide sequence.
Full=1 The full name.
EC=0-n An Enzyme Commission number.

Each name is shown on a separate line; lines may therefore exceed 75 characters.

A block of DE lines may further contain multiple Includes: and/or Contains: sections and a separate field Flags: to indicate whether the protein sequence is a precursor or a fragment:

FieldCardinalityValue
Includes:0-n A block of protein names as described in the table above.
Contains:0-n A block of protein names as described in the table above.
Flags:0-1 Precursor and/or Fragment or Fragments

Examples:

P09919:

Previous format:

DE   Granulocyte colony-stimulating factor precursor (G-CSF) (Pluripoietin)
DE   (Filgrastim) (Lenograstim).

New format:

DE   RecName: Full=Granulocyte colony-stimulating factor;
DE            Short=G-CSF;
DE   AltName: Full=Pluripoietin;
DE   AltName: INN=Filgrastim;
DE   AltName: INN=Lenograstim;
DE   Flags: Precursor;
Q10743:

Previous format:

DE   ADAM 10 precursor (EC 3.4.24.81) (A disintegrin and metalloproteinase
DE   domain 10) (Mammalian disintegrin-metalloprotease) (Kuzbanian protein
DE   homolog) (CD156c antigen) (Fragment).

New format:

DE   RecName: Full=ADAM 10;
DE            EC=3.4.24.81;
DE   AltName: Full=A disintegrin and metalloproteinase domain 10;
DE   AltName: Full=Mammalian disintegrin-metalloprotease;
DE   AltName: Full=Kuzbanian protein homolog;
DE   AltName: CD_antigen=CD156c;
DE   Flags: Precursor; Fragment;
Q07908:

Previous format:

DE   Arginine biosynthesis bifunctional protein argJ [Includes: Glutamate
DE   N-acetyltransferase (EC 2.3.1.35) (Ornithine acetyltransferase)
DE   (Ornithine transacetylase) (OATase); Amino-acid acetyltransferase
DE   (EC 2.3.1.1) (N-acetylglutamate synthase) (AGS)] [Contains: Arginine
DE   biosynthesis bifunctional protein argJ alpha chain; Arginine
DE   biosynthesis bifunctional protein argJ beta chain].

New format:

DE   RecName: Full=Arginine biosynthesis bifunctional protein argJ;
DE   Includes:
DE     RecName: Full=Glutamate N-acetyltransferase;
DE              EC=2.3.1.35;
DE     AltName: Full=Ornithine acetyltransferase;
DE              Short=OATase;
DE     AltName: Full=Ornithine transacetylase;
DE   Includes:
DE     RecName: Full=Amino-acid acetyltransferase;
DE              EC=2.3.1.1;
DE     AltName: Full=N-acetylglutamate synthase;
DE              Short=AGS;
DE   Contains:
DE     RecName: Full=Arginine biosynthesis bifunctional protein argJ alpha chain;
DE   Contains:
DE     RecName: Full=Arginine biosynthesis bifunctional protein argJ beta chain;
Changes in the FASTA header line

The UniProtKB FASTA headers were unfortunately incompatible with the -o option of the NCBI's program formatdb. We have been working with the NCBI to remedy this and changes were required on both sides. The new version of formatdb now accepts a database code for UniProtKB/TrEMBL, and we have modified our UniProtKB FASTA headers accordingly. For consistency reasons, we also changed the FASTA headers of the other UniProt databases.

UniProtKB
>db|UniqueIdentifier|EntryName ProteinName OS=OrganismName[ GN=GeneName]PE=ProteinExistence SV=SequenceVersion
Where:

Examples:

>sp|Q8I6R7|ACN2_ACAGO Acanthoscurrin-2 (Fragment) OS=Acanthoscurria gomesiana GN=acantho2 PE=1 SV=1
>sp|P27748|ACOX_RALEH Acetoin catabolism protein X OS=Ralstonia eutropha (strain ATCC 17699 / H16 / DSM 428 / Stanier 337) GN=acoX PE=4 SV=2
>sp|P04224|HA22_MOUSE H-2 class II histocompatibility antigen, E-K alpha chain OS=Mus musculus PE=1 SV=1

>tr|A3SA23|A3SA23_9RHOB TonB dependent, hydroxamate-type ferrisiderophore, outer membrane receptor OS=Sulfitobacter sp. EE-36 GN=EE36_08023 PE=3 SV=1
>tr|Q8N2H2|Q8N2H2_HUMAN CDNA FLJ90785 fis, clone THYRO1001457, moderately similar to H.sapiens protein kinase C mu OS=Homo sapiens PE=2 SV=1
Alternative isoforms (this only applies to UniProtKB/Swiss-Prot):
>sp|IsoID|EntryName Isoform IsoformName of ProteinName OS=OrganismName[ GN=GeneName]
Where: ProteinExistence and SequenceVersion do not apply to alternative isoforms (ProteinExistence is dependent on the number of cDNA sequences, which is not known for individual isoforms).

Example:

sp|Q4R572-2|1433B_MACFA Isoform Short of 14-3-3 protein beta/alpha OS=Macaca fascicularis GN=YWHAB
UniRef
>UniqueIdentifier ClusterName n=Members Tax=Taxon RepID=RepresentativeMember
Where:

Example:

>UniRef100_A5DI11 Elongation factor 2 n=1 Tax=Pichia guilliermondii RepID=EF2_PICGU
UniParc
>UniqueIdentifier status=Status
Where:

Example:

>UPI0000000005 status=active
UniMES
>UniqueIDentifier ProteinName OS=OrganismName[ Pep=SourcePeptideIdentifier]SV=SequenceVersion
Where:

Example:

>MES00000000005 Putative uncharacterized protein GOS_3018412 (Fragment) OS=marine metagenome Pep=JCVI_PEP_1096688850003 SV=1
Archived UniProtKB sequence versions
>db|UniqueIdentifier archived from Release ReleaseNumber ReleaseDate SV=SequenceVersion
Where:

Examples:

"pre-UniProt":
>sp|P05067 archived from Release 18.0 01-MAY-1991 SV=3
>tr|Q55167 archived from Release 17.0 01-JUN-2001 SV=1
"post-UniProt":
>sp|P05067 archived from Release 9.2/51.2 28-NOV-2006 SV=3
>tr|A0RTJ8 archived from Release 11.0/36.0 29-MAY-2007 SV=1
New OG (OrGanelle) line value: Chromatophore

We have added Chromatophore to the list of valid plastid values in the OG line. The chromatophore is the photosynthetic inclusion found in Paulinella chromatophora, a photosynthetic thecate amoeba. It encodes and houses the machinery necessary for photosynthesis and CO2 fixation; it also has the genetic capacity to synthesize some amino acids, some fatty acids and a few cofactors. It is not yet clear whether the chromatophore derives from the same endosymbiotic event that is thought to have led to all other plastids. The chromatophore genome of P. chromatophora has been sequenced (PubMed:18356055) and been found to be just over 1 Mb, approximately 9 times larger than the average photosynthetic plastid and approximately 1/3 smaller than the smallest cyanobacterial genome.

Example:

OG   Plastid; Chromatophore.
Changes concerning cross-references (DR line)
BindingDB

Cross-references have been added to The Binding Database. BindingDB is a public, web-accessible database of measured binding affinities, focusing chiefly on the interactions of proteins considered to be drug-targets with small, drug-like molecules.

The Binding Database is available at http://www.bindingdb.org/.

The format of the explicit link is:

Data bank identifier BindingDB
Primary identifier The primary identifier consists of a UniProtKB accession number.
Secondary identifier None; a dash '-' is stored in that field.
Examples
P50613:
DR   BindingDB; P50613; -.

P68850:
DR   BindingDB; P68850; -.
UniProt decoy databases

The target-decoy search strategy, which has become widespread and is recommended in journal guidelines, consists of attaching a decoy database to a forward database and searching MS/MS spectra against this composite database. It is more stringent than a simple search, and allows to compute an estimation of the false discovery rate.
For this strategy to be efficient, the decoy database has to preserve the general composition of the target database while minimizing the peptide sequence overlap between the target and the decoy.
We developed a new algorithm that shuffles proteins and keeps re-shuffling each tryptic peptide until it no longer matches with any peptide from the original database. This method ensures that no tryptic peptide is shared between the target and decoy databases.

Decoy versions of UniProtKB/Swiss-Prot, UniProtKB/TrEMBL and UniRef100 can now be retrieved in FASTA format from our : public FTP site.

Changes concerning keywords (KW line)

New keywords:

Deleted keywords:

New subcellular locations:

Changes concerning the controlled vocabulary for PTMs

Terms introduced:

Terms for the feature key 'CROSSLNK':

Terms for the feature key 'MOD_RES':

UniProtKB release 13.6 of 01-Jul-2008

New RX (Reference cross-reference) line value: AGRICOLA

The RX (Reference cross-reference) line is an optional line which is used to indicate cross-references to bibliographic databases. We have introduced cross-references to AGRICOLA, the National Agricultural Library's catalog of citations to agricultural literature. The valid bibliographic database names and their associated identifiers are now:

NameIdentifier
MEDLINEEight-digit MEDLINE Unique Identifier (UI)
PubMedPubMed Unique Identifier (PMID)
DOIDigital Object Identifier (DOI)
AGRICOLAAGRICOLA Unique Identifier

Example:

RX   AGRICOLA=IND20450567;
Changes concerning keywords (KW line)

New keywords:

UniProtKB release 13.5 of 10-Jun-2008

Changes concerning cross-references (DR line)
HOGENOM

Cross-references have been added to the HOGENOM Database of Homologous Genes from Fully Sequenced Organisms. HOGENOM allows to select sets of homologous genes among species, and to visualize multiple alignments and phylogenetic trees. It is as well possible to search for orthologous genes in a wide range of taxons. Thus HOGENOM is particularly useful for comparative sequence analysis, phylogeny and molecular evolution studies. More generaly, HOGENOM gives an overall view of what is known about a peculiar gene family.

The HOGENOM Database of Homologous Genes from Fully Sequenced Organisms is available at http://pbil.univ-lyon1.fr/databases/hogenom.php.

The format of the explicit link is:

Data bank identifier HOGENOM
Primary identifier The primary identifier consists of a UniProtKB accession number.
Secondary identifier None; a dash '-' is stored in that field.
Examples
P0A9I1:
DR   HOGENOM; P0A9I1; -.

P49642:
DR   HOGENOM; P49642; -.
HOVERGEN

Cross-references have been added to the HOVERGEN Database of Homologous Vertebrate Genes. HOVERGEN allows one to select sets of homologous genes among vertebrate species, and to visualize multiple alignments and phylogenetic trees. Thus HOVERGEN is particularly useful for comparative sequence analysis, phylogeny and molecular evolution studies. More generaly, HOVERGEN gives an overall view of what is known about a peculiar gene family.

The HOVERGEN Database of Homologous Vertebrate Genes is available at http://pbil.univ-lyon1.fr/databases/hovergen.php.

The format of the explicit link is:

Data bank identifier HOVERGEN
Primary identifier The primary identifier consists of a UniProtKB accession number.
Secondary identifier None; a dash '-' is stored in that field.
Examples
P31946:
DR   HOVERGEN; P31946; -.

Q91ZB4:
DR   HOVERGEN; Q91ZB4; -.
Changes concerning keywords (KW line)

New keywords:

UniProtKB release 13.4 of 20-May-2008

Changes concerning cross-references (DR line)
CGD

Cross-references have been added to the Candida Genome Database. CGD is a resource for genomic sequence data and gene and protein information for Candida albicans. CGD is based on the Saccharomyces Genome Database and is funded by the National Institute of Dental and Craniofacial Research at the US National Institutes of Health.

The Candida Genome Database is available at http://www.candidagenome.org/.

The format of the explicit link is:

Data bank identifier CGD
Primary identifier The primary identifier consists of a CGD identifier.
Secondary identifier The secondary identifier consists of a gene name.
Examples
O74198:
DR   CGD; CAL0006397; ERG6.

Q59TD3:
DR   CGD; CAL0079252; MED8.
Changes concerning keywords (KW line)

New keywords:

UniProtKB release 13.3 of 29-Apr-2008

Changes concerning cross-references (DR line)
NMPDR

Cross-references have been added to the National Microbial Pathogen Data Resource. NMPDR is a National Institute of Allergy and Infections Disease (NIAID)-funded Bioinformatics Resource Center that supports research in selected Category B pathogens. NMPDR contains the complete genomes of approximately 50 strains of pathogenic bacteria as well as >400 other genomes that provide a broad context for comparative analysis across the three phylogenetic domains. NMPDR integrates complete, public genomes with expertly curated biological subsystems to provide the most consistent genome annotations. Subsystems are sets of functional roles related by a biologically meaningful organizing principle, which are built over large collections of genomes; they provide researchers with consistent functional assignments in a biologically structured context.

The National Microbial Pathogen Data Resource is available at http://www.nmpdr.org/.

The format of the explicit link is:

Data bank identifier NMPDR
Primary identifier The primary identifier consists of a NMPDR protein identifier.
Secondary identifier None; a dash '-' is stored in that field.
Examples
Q88K84:
DR   NMPDR; fig|160488.1.peg.2385; -.

Q1QN15:
DR   NMPDR; fig|323097.3.peg.1480; -.
Changes concerning keywords (KW line)

New keywords:

UniProtKB release 13.2 of 08-Apr-2008

Release of a new document which lists all the secondary UniProtKB accession numbers together with their corresponding current primary accession number(s).

The document sec_ac.txt, available by ftp and on the Web site, lists all secondary accession numbers in UniProtKB (UniProtKB/Swiss-Prot and UniProtKB/TrEMBL), together with their corresponding current primary accession number(s).

Changes concerning cross-references (DR line)
HIV

Cross-references to the HIV have been removed.

TRANSFAC

Cross-references to the TRANSFAC have been removed.

Changes concerning keywords (KW line)

New keywords:

UniProtKB release 13.1 of 18-Mar-2008

Changes concerning cross-references (DR line)
ProMEX

Cross-references have been added to the Protein Mass spectra EXtraction database. ProMEX is a mass spectral library consisting of tryptic peptide product ion spectra generated by liquid chromatography coupled to ion trap mass spectrometry (LC-ITMS) and was developed using samples derived from Arabidopsis thaliana and Medicago truncatula. The database serves as a reference and can be used for protein identification in uncharacterized samples. Protein identification by ProMEX is linked to other molecular levels of biological organization such as metabolite, pathway and transcript data. The database is further connected to annotation and classification services.

The Protein Mass spectra EXtraction database is available at http://promex.mpimp-golm.mpg.de/.

The format of the explicit link is:

Data bank identifier ProMEX
Primary identifier The primary identifier consists of a UniProtKB accession number.
Secondary identifier None; a dash '-' is stored in that field.
Examples
O80448:
DR   ProMEX; O80448; -.
   
P49200:
DR   ProMEX; P49200; -.
   
Changes concerning keywords (KW line)

New keywords:

UniProtKB release 13.0 of 26-Feb-2008

Change of the representation of non-standard amino acids (selenocysteine and pyrrolysine)

The non-standard amino acid selenocysteine was annotated with the feature key SE_CYS and represented by the one-letter code 'C' in the sequence. Pyrrolysines were annotated with the more generic feature key MOD_RES and represented by the one-letter code 'K' in the sequence. In order to annotate these and future non-standard amino acids in the same fashion, we replaced the feature key SE_CYS and the MOD_RES feature key used with the description Pyrrolysine with the new feature key NON_STD (non-standard) and the descriptions Selenocysteine and Pyrrolysine, as appropriate. At the same time, we changed the sequence to use the IUPAC/IUBMB recommended one-letter codes 'U' for selenocysteine and 'O' for pyrrolysine.

Previous annotation:

ID   BTHD_DROME              Reviewed;         249 AA.
..
FT   SE_CYS       37     37
..
     MPPKRNKKAE APIAERDAGE ELDPNAPVLY VEHCRSCRVF RRRAEELHSA LRERGLQQLQ
                                            *
ID   MTBB1_METAC             Reviewed;         467 AA.
..
FT   MOD_RES     356    356       Pyrrolysine (Probable).
..
     RAVNFMKAAV QASPIPCHVD MGMGVGGIPM LETPPVDAVT RASKAMVEVA GVDGIKIGVG
                                                                 *

New annotation:

ID   BTHD_DROME              Reviewed;         249 AA.
..
FT   NON_STD      37     37       Selenocysteine.
..
     MPPKRNKKAE APIAERDAGE ELDPNAPVLY VEHCRSURVF RRRAEELHSA LRERGLQQLQ
                                            *
ID   MTBB1_METAC             Reviewed;         467 AA.
..
FT   NON_STD     356    356       Pyrrolysine (Probable).
..
     RAVNFMKAAV QASPIPCHVD MGMGVGGIPM LETPPVDAVT RASKAMVEVA GVDGIOIGVG
                                                                 *
Changes concerning cross-references (DR line)
PhosphoSite

Cross-references have been added to the Phosphorylation site database. PhosphoSite is an expert-curated knowledgebase of information focused on protein phosphorylation mainly in vertebrates. In addition to phosphorylation sites curated from the literature, large numbers of new unpublished sites discovered by MS/MS analyses are being added regularly.

The Phosphorylation site database is available at http://phosphosite.cellsignal.com/.

The format of the explicit link is:

Data bank identifier PhosphoSite
Primary identifier The primary identifier consists of a UniProtKB accession number.
Secondary identifier None; a dash '-' is stored in that field.
Examples
P01266:
DR   PhosphoSite; P01266; -.
   
Q9JMH6:
DR   PhosphoSite; Q9JMH6; -.
   
2DBase-Ecoli

Cross-references have been added to the 2D-PAGE Database of Escherichia coli. The 2DBase-Ecoli database currently contains 12 gels consisting of 1185 protein spots information in which 723 proteins where identified and annotated. Individual protein spots in the existing gels can be displayed, queried, analysed and compared in a tabular format based on various functional categories enabling quick and subsequent analysis.

The 2D-PAGE Database of Escherichia coli is available at http://2dbase.techfak.uni-bielefeld.de/.

The format of the explicit link is:

Data bank identifier 2DBase-Ecoli
Primary identifier The primary identifier consists of a UniProtKB accession number.
Secondary identifier None; a dash '-' is stored in that field.
Examples
P02930:
DR   2DBase-Ecoli; P02930; -.
   
P04816:
DR   2DBase-Ecoli; P04816; -.
   
Changes concerning keywords (KW line)

New keywords:

New subcellular locations:

Changes concerning the controlled vocabulary for PTMs

Terms introduced:

Terms for the feature key 'MOD_RES':

UniProtKB release 12.8 of 05-Feb-2008

Changes concerning cross-references (DR line)
World-2DPAGE

Cross-references have been added to the public repository of 2D-gel data World-2DPAGE. All 2D gel data to be published in the journal Proteomics needs to be available on the web. The World-2DPAGE repository hosts the data for resources who cannot build and maintain a web interface. There are currently two data sources submitted to World-2DPAGE, which are numbered consecutively:

The format of the explicit link is:

Data bank identifier World-2DPAGE
Primary identifier The primary identifier is a combination of the database name and the accession number (usually from UniProtKB) in this database. Both are concatenated by a ":".
Secondary identifier None; a dash '-' is stored in that field.
Examples
P61108:
DR   World-2DPAGE; 0002:P61108; -.
   
P77845:
DR   World-2DPAGE; 0001:P77845; -.
   
Cornea-2DPAGE, DOSAC-COBS-2DPAGE, HSC-2DPAGE, REPRODUCTION-2DPAGE, SWISS-2DPAGE

In cross-references to Cornea-2DPAGE, DOSAC-COBS-2DPAGE, HSC-2DPAGE, REPRODUCTION-2DPAGE and SWISS-2DPAGE, the secondary identifier used to be the species origin. The species information has become obsolete/redundant since UniProtKB/Swiss-Prot no longer contains entries describing the same protein from different species (see Release 6.7). We have therefore removed the species information from these secondary identifiers and replaced them by "-".

Examples:

Previous format:

DR   SWISS-2DPAGE; P04217; HUMAN.
DR   Cornea-2DPAGE; P04217; HUMAN.
DR   DOSAC-COBS-2DPAGE; P04217; HUMAN.
DR   REPRODUCTION-2DPAGE; P04217; HUMAN.

New format:

DR   SWISS-2DPAGE; P04217; -.
DR   Cornea-2DPAGE; P04217; -.
DR   DOSAC-COBS-2DPAGE; P04217; -.
DR   REPRODUCTION-2DPAGE; P04217; -.
Release of a new document which provides the classification of human and mouse protein kinases into subfamilies or subgroups.

The document pkinfam.txt, available by ftp and on the Web site, provides the classification of human and mouse protein kinases into subfamilies or subgroups, as developed by Gerard Manning. The classification from Diego Miranda-Saavedra has also been taken into account.

This document contains all the human and mouse protein kinase UniProtKB/Swiss-Prot entries, subdivided into 10 subfamilies or subgroups. Each gene name is followed by the corresponding human and/or mouse 'UniProtKB/Swiss-Prot entry name (UniProtKB/Swiss-Prot accession number)'.

Changes concerning keywords (KW line)

New keyword:

UniProtKB release 12.7 of 15-Jan-2008

New clustered sequence sets for UniMES

The UniProt Metagenomic and Environmental Sequences (UniMES) database is a repository specifically developed for metagenomic and environmental data. We now provide UniMES clusters, i.e. clustered sets of sequences, at two resolutions: 100% (unimes_cluster100.fasta) and >90% (unimes_cluster90.fasta). In unimes_cluster100.fasta, identical sequences and subfragments from unimes.fasta are placed into a single cluster. The unimes_cluster90.fasta is built by clustering unimes_cluster100.fasta representative sequences (the longest sequence in a cluster) using the CD-HIT algorithm (Li W., Jaroszewski L., and Godzik A., Bioinformatics, 17: 282-283, 2001) such that each cluster is composed of sequences that have at least 90% sequence identity, to the representative sequence. Only the representative sequences of the clusters are present in these files.

UniMES is available in the subdirectory current_release/unimes of the UniProt ftp servers ftp.uniprot.org/pub/databases/uniprot, ftp.ebi.ac.uk/pub/databases/uniprot and ftp.expasy.org/databases/uniprot.

Changes concerning cross-references (DR line)
dictyBase

The DictyBase database was renamed to dictyBase. We changed the database name in the relevant cross-references (DR lines) accordingly.

Example:

DR   dictyBase; DDB0201569; manA.
PDBsum

Cross-references have been added to the PDBsum database. PDBsum provides an overview of every macromolecular structure deposited in the Protein Data Bank (PDB), giving schematic diagrams of the molecules in each structure and of the interactions between them.

The PDBsum database is available at http://www.ebi.ac.uk/pdbsum.

The format of the explicit link is:

Data bank identifier PDBsum
Primary identifier The primary identifier consists of a PDB entry name.
Secondary identifier None; a dash '-' is stored in that field.
Examples
Q07540:
DR   PDBsum; 2FQL; -.
DR   PDBsum; 2GA5; -.
   
P78536:
DR   PDBsum; 1BKC; -.
DR   PDBsum; 1ZXC; -.
DR   PDBsum; 2A8H; -.
DR   PDBsum; 2DDF; -.
DR   PDBsum; 2FV5; -.
DR   PDBsum; 2FV9; -.
DR   PDBsum; 2I47; -.
   
VectorBase

Cross-references have been added to the Invertebrate Vectors of Human Pathogens database. VectorBase is a NIAID Bioinformatics Resource Center for Invertebrate Vectors of Human Pathogens. VectorBase annotates and maintains vector genomes providing an integrated resource for the research community.

The VectorBase database is available at http://www.vectorbase.org/index.php.

The format of the explicit link is:

Data bank identifier VectorBase
Primary identifier The primary identifier consists of a VectorBase Gene ID.
Secondary identifier The secondary identifier consists of a species name.
Examples
Q17KX3:
DR   VectorBase; AAEL001551; Aedes aegypti.
   
Q7PD39:
DR   VectorBase; AGAP005024; Anopheles gambiae.
DR   VectorBase; AGAP005025; Anopheles gambiae.
   
Release of new species-specific documents which list entries and their corresponding gene designations

There are 9 new documents for several Brucella, Rickettsia and Coxiella complete proteomes, listing all the UniProtKB/Swiss-Prot entries from these proteomes and their corresponding gene designations.

The documents contain, for each relevant UniProtKB/Swiss-Prot entry, the corresponding ordered locus name, entry name, accession number, sequence length and gene name(s).

Changes concerning keywords (KW line)

New keywords:

Modified keywords:

Changes concerning the controlled vocabulary of subcellular locations and membrane topologies and orientations (comment line (CC) topic SUBCELLULAR LOCATION)

New subcellular locations:

UniProtKB release 12.6 of 04-Dec-2007

Changes concerning keywords (KW line)

Deleted keyword:

UniProtKB release 12.5 of 13-Nov-2007

Format change in the ptmlist.txt document file

The ptmlist.txt document, which is available by ftp and on the Web site, describes the post-translational modifications (PTMs) that are annotated in UniProtKB/Swiss-Prot entries in the feature (FT) keys CROSSLNK, LIPID and MOD_RES. The document was in a format that is suitable for computer applications (e.g. ExPASy's proteomics tools) but which was not very human readable. The new file format should improve this.

Previous format:

N,N-dimethylproline  MOD_RES P  BB Nter C2H4  28.031300  28.06  in  e:6446,7586,33682  Methylation  FT=MOD_RES%20dimethylproline&wild=1  AA0066  MOD:00075

New format:

ID   N,N-dimethylproline
AC   PTM-0179
FT   MOD_RES
TG   Proline.
PA   Amino acid backbone.
PP   N-terminal.
CF   C2 H4
MM   28.031300
MA   28.06
LC   Intracellular localisation.
TR   Eukaryota; taxId:6446 (Sipunculus nudus), taxId:7586 (Echinodermata), taxId:33682 (Euglenozoa).
KW   Methylation.
DR   RESID:AA0066.
DR   MOD:00075.
//

With the following definitions of the line types:

  ---------  ---------------------------     ----------------------
  Line code  Content                         Occurrence in an entry
  ---------  ---------------------------     ----------------------
  ID         Identifier (FT description)     Once; starts a PTM entry.
  AC         Accession (PTM-xxxx)            Once.
  FT         Feature key                     Once.
  TG         Target                          Once; two targets separated
                                             by a dash in case of intrachain
                                             crosslinks.
  PA         Position of the modified        Optional, once.
             amino acid
  PP         Position of the modification    Optional, once.
             in the polypeptide
  CF         Correction formula              Optional, once.
  MM         Monoisotopic mass difference    Optional, once.
  MA         Average mass difference         Optional, once.
  LC         Cellular location               Optional, once; alternatives
                                             can be proposed.
  TR         Taxonomic range                 Optional, once or more.
  KW         Keyword                         Optional, once or more.
  DR         Cross-reference to PTM          Optional, once or more.
             databases
  //         Terminator                      Once; ends an entry.
Changes concerning cross-references (DR line)
PDB

We added an additional field to the cross-reference (DR line) to the PDB database to show the resolution of structures that were determined by X-ray crystallography or electron microscopy.

For the chain names we use now the remediated data from wwPDB, therefore the chain names have changed for some entries.

Previous format:

DR   PDB; ENTRY_NAME; METHOD; CHAIN.

New format:

DR   PDB; ENTRY_NAME; METHOD; RESOLUTION; CHAIN.

Examples:

Q20728:
DR   PDB; 1LPL; X-ray; 1.77 A; A=135-229.
Q5HEB7:
DR   PDB; 2I8C; X-ray; 2.46 A; A/B=1-356.   

A dash indicates that we found no information about the resolution or that the field is not applicable (for NMR structures and theoretical models).

Examples:

P02768:
DR   PDB; 2ESG; X-ray; -; C=25-609.
P12872:
DR   PDB; 1LBJ; NMR; -; A=26-47.   
P0AC41:
DR   PDB; 2AD0; Model; -; A=1-588.  
CleanEx

Cross-references have been added to the CleanEx database of gene expression profiles. CleanEx is a database which provides access to public gene expression data via unique approved gene symbols and which represents heterogeneous expression data produced by different technologies in a way that facilitates joint analysis and cross-dataset comparisons.

The CleanEx database is available at http://www.cleanex.isb-sib.ch/.

The format of the explicit link is:

Data bank identifier CleanEx
Primary identifier The primary identifier consists of a combination of a species code and a gene identifier.
Secondary identifier None; a dash '-' is stored in that field.
Examples
O08788:
DR   CleanEx; MM_DCTN1; -.    
   
P78358:
DR   CleanEx; HS_CTAG1A; -.
DR   CleanEx; HS_CTAG1B; -.
   
Changes concerning keywords (KW line)

Modified keywords:

UniProtKB release 12.4 of 23-Oct-2007

Release of a new document which lists the controlled vocabularies used in the comment line (CC) topic SUBCELLULAR LOCATION

The document subcell.txt, available by ftp and on the Web site, lists the controlled vocabularies used in the comment line (CC) topic SUBCELLULAR LOCATION, their definitions and further information such as synonyms or relevant GO terms in the following format:

  ---------  -------------------------------   ----------------------------------------------
  Line code  Content                           Occurrence in an entry
  ---------  -------------------------------   ----------------------------------------------
  ID         Identifier (location)             Once; starts an entry
  IT         Identifier (topology)             Once; starts a 'topology' entry
  IO         Identifier (orientation)          Once; starts an 'orientation' entry
  AC         Accession (SL-xxxx)               Once
  DE         Definition                        Once or more
  SY         Synonyms                          Optional; Once or more
  SL         Content of subc. loc. lines       Once
  HI         Hierarchy ('is-a')                Optional; Once or more
  HP         Hierarchy ('part-of')             Optional; Once or more
  KW         Associated keyword (accession)    Optional; Once or more
  GO         Gene ontology (GO) mapping        Optional; Once or more
  WW         Interesting links or references   Optional; Once or more
  //         Terminator                        Once; ends an entry
  

Example:

ID   Cyanelle.
AC   SL-0082
DE   A cyanelle is a photosynthetic organelle of glaucocystophyte algae.
DE   Cyanelles are surrounded by a double membrane and, in between, a
DE   peptidoglycan wall. Thylakoid membrane architecture and the presence
DE   of carboxysomes are cyanobacteria-like. Historically, the term
DE   cyanelle is derived from a classification as endosymbiotic
DE   cyanobacteria, and thus is not fully correct.
SY   Muroplast; Cyanoplast.
SL   Plastid, cyanelle.
HI   Plastid.
KW   KW-0194
GO   GO:0009842; cyanelle
//
  
Syntax modification of the comment line (CC) topic SUBCELLULAR LOCATION

We have structured the comment line topic SUBCELLULAR LOCATION in order to improve the consistency of annotation and to allow to parse its content.

The new format of SUBCELLULAR LOCATION is:

CC   -!- SUBCELLULAR LOCATION:(( Molecule:)?( Location\.)+)?( Note=Free_text( Flag)?\.)?
  
Where:

Note: Perl-style multipliers indicate whether a pattern (as delimited by parentheses) is optional (?) or may occur 1 or more times (+). Alternative values are separated by a pipe symbol (|).

Examples:

P32755:
CC   -!- SUBCELLULAR LOCATION: Cytoplasm. Endoplasmic reticulum membrane;
CC       Peripheral membrane protein. Golgi apparatus membrane; Peripheral
CC       membrane protein.
  
Q96QV1:
CC   -!- SUBCELLULAR LOCATION: Cell membrane; Peripheral membrane protein
CC       (By similarity). Secreted (By similarity). Note=The last 22 C-
CC       terminal amino acids may participate in cell membrane attachment.
CC   -!- SUBCELLULAR LOCATION: Isoform 2: Cytoplasm (Probable).
  
P35670:
CC   -!- SUBCELLULAR LOCATION: Golgi apparatus, trans-Golgi network
CC       membrane; Multi-pass membrane protein (By similarity).
CC       Note=Predominantly found in the trans-Golgi network (TGN). Not
CC       redistributed to the plasma membrane in response to elevated
CC       copper levels.
CC   -!- SUBCELLULAR LOCATION: Isoform 2: Cytoplasm.
CC   -!- SUBCELLULAR LOCATION: WND/140 kDa: Mitochondrion.
  
Modification of the EC (Enzyme Commission) number format

EC numbers are used to describe enzyme reactions and are based on the recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (IUBMB). The EC numbers and the reactions they describe are stored in the ENZYME and IntEnz databases.

In the UniProt Knowledgebase some enzymes are assigned so-called partial EC numbers where part of the numbers are replaced by dashes (e.g. EC 3.4.24.-). This happens in the following situations:

  1. The catalytic activity of the protein is not known exactly.
  2. The protein catalyzes a reaction that is known, but not yet included in the IUBMB EC list.

To distinguish these two meanings, we have started to use the letter 'n' with a preliminary number instead of a dash '-' for the latter case. The retrofit of those existing EC numbers of proteins in UniProtKB that catalyze a reaction that is known, but not yet included in the IUBMB EC list will be an ongoing process.

Examples:

The catalytic activity of the protein is not known exactly:

Q9VAC5:
DE   ADAM 17-like protease precursor (EC 3.4.24.-).

The protein catalyzes a reaction that is known, but not yet included in the IUBMB's EC list:

Q9ES52:
DE   Phosphatidylinositol-3,4,5-trisphosphate 5-phosphatase 1 (EC 3.1.3.n1)
UniProtKB release 12.3 of 02-Oct-2007

Changes concerning the comment line (CC) topic MASS SPECTROMETRY

To be consistent with other comment line topics, we have changed the field tags of the topic MASS SPECTROMETRY. At the same time, we have extracted literature references into a new field, Source=, and replaced all molecule descriptions by isoform identifiers.

Previous format:

   CC   -!- MASS SPECTROMETRY: MW=mass(; MW_ERR=error)?; METHOD=method; RANGE=ranges( (molecule))?; NOTE=(references|free_text (references)).
  

New format:

   CC   -!- MASS SPECTROMETRY: Mass=mass(; Mass_error=error)?; Method=method; Range=ranges( (IsoformID))?(; Note=free_text)?; Source=references;
  

Examples:

P61409:

Previous format:

   CC   -!- MASS SPECTROMETRY: MW=3979.9; METHOD=Electrospray; RANGE=1-31;
   CC       NOTE=Ref.1, Ref.2.
  

New format:

   CC   -!- MASS SPECTROMETRY: Mass=3979.9; Method=Electrospray; Range=1-31;
   CC       Source=Ref.1, Ref.2;
  
P04653:

Previous format:

   CC   -!- MASS SPECTROMETRY: MW=23638.14; MW_ERR=3.0; METHOD=Electrospray;
   CC       RANGE=16-214 (P04653-2; Allele A); NOTE=With eleven phosphate
   CC       groups (Ref.2).
  

New format:

   CC   -!- MASS SPECTROMETRY: Mass=23638.14; Mass_error=3.0; Method=Electrospray;
   CC       Range=16-214 (P04653-2); Note=Allele A, with 11 phosphate groups;
   CC       Source=PubMed:7601973;
  

Note that literature references of the form Ref.n are replaced by PubMed identifiers where this is possible.

Changes concerning cross-references (DR line)
RefSeq

Cross-references have been added to the NCBI Reference Sequences database. The Reference Sequence (RefSeq) collection aims to provide a comprehensive, integrated, non-redundant set of sequences, including genomic DNA, transcript (RNA), and protein products for taxonomically diverse organisms including eukaryotes, bacteria, and viruses. RefSeq is a baseline for medical, functional, and diversity studies; they provide a stable reference for genome annotation, gene identification and characterization, mutation and polymorphism analysis, expression studies, and comparative analyses.

The RefSeq database is available at http://www.ncbi.nlm.nih.gov/RefSeq/.

The format of the explicit link is:

Data bank identifier RefSeq
Primary identifier The primary identifier consists of a RefSeq protein accession ID.
Secondary identifier None; a dash '-' is stored in that field.
Examples
O34697:
      DR   RefSeq; NP_390916.1; -.    
     
Q8IN81:
      DR   RefSeq; NP_524397.2; -.
      DR   RefSeq; NP_732344.1; -.
      DR   RefSeq; NP_732345.1; -.
      DR   RefSeq; NP_732346.1; -.
      DR   RefSeq; NP_732347.1; -.
      DR   RefSeq; NP_732348.1; -.
      DR   RefSeq; NP_732349.1; -.
      DR   RefSeq; NP_732350.1; -.   
     
GeneID

Cross-references have been added to the Database of genes from NCBI RefSeq genomes. Entrez Gene is the NCBI's database for gene-specific information. It does not include all known or predicted genes; instead Entrez Gene focuses on the genomes that have been completely sequenced, that have an active research community to contribute gene-specific information, or that are scheduled for intense sequence analysis. The content of Entrez Gene represents the result of curation and automated integration of data from NCBI's Reference Sequence project (RefSeq), from collaborating model organism databases, and from many other databases available from NCBI. Records are assigned unique, stable and tracked integers as identifiers. The content (nomenclature, map location, gene products and their attributes, markers, phenotypes, and links to citations, sequences, variation details, maps, expression, homologs, protein domains and external databases) is updated as new information becomes available. Entrez Gene is a step forward from NCBI's LocusLink, with both a major increase in taxonomic scope and improved access through the many tools associated with NCBI Entrez.

The GeneID database is available at http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene.

The format of the explicit link is:

Data bank identifier GeneID
Primary identifier The primary identifier consists of a GeneID accession ID.
Secondary identifier None; a dash '-' is stored in that field.
Examples
P63272:
      DR   GeneID; 6827; -.  
     
P74750:
      DR   GeneID; 951978; -.
      DR   GeneID; 953863; -.   
     
Change in the name of the documentation file orysa.txt

We changed the name of the documentation file orysa.txt, which is an index of Oryza sativa subsp. japonica (rice) entries and their corresponding gene designations, to rice.txt

UniProtKB release 12.2 of 11-Sep-2007

Changes concerning the comment line (CC) topic WEB RESOURCE

To be consistent with other comment line topics, we have changed the topic WEB RESOURCE from

CC   -!- WEB RESOURCE: NAME=resource_name(; NOTE=free_text)?; URL="url".
to
CC   -!- WEB RESOURCE: Name=resource_name(; Note=free_text)?; URL="url";
Format change in the dbxref.txt and jourlist.txt document files

The dbxref.txt file lists the names and abbreviations and URLs of all databases cross-referenced in the UniProt Knowledgebase. The jourlist.txt file lists the titles and abbreviations of all journals cited in the Swiss-Prot section of the UniProt Knowledgebase. We have added a new field, AC, to assign a stable identifier to each record in these files.

Examples:

dbxref.txt

AC    : DB-0022
Abbrev: EMBL
Name  : EMBL nucleotide sequence database
Ref   : Nucleic Acids Res. 35:D16-D20(2007); PubMed=17148479; DOI=10.1093/nar/gkl913;
LinkTp: Explicit
Server: http://www.ebi.ac.uk/embl/
Db_URL: www.ebi.ac.uk/htbin/expasyfetch?%s
Cat   : Sequence databases

jourlist.txt

AC    : JN-1120
Abbrev: J. Mol. Biol.
Title : Journal of Molecular Biology
ISSN  : 0022-2836
e-ISSN: 1089-8638
CODEN : JMOBAK
Short : JMB
Publis: Elsevier Science
Server: http://www.elsevier.com/locate/issn/00222836
UniProtKB release 12.1 of 21-Aug-2007

Change of release cycle

We are changing our release cycle from 2 to 3 weeks, i.e. release 12.2 is going to be published on Sep 11th, 2007.

Changes concerning cross-references (DR line)
RZPD-ProtExp

Cross-references to the RZPD-ProtExp have been removed.

UniProtKB release 12.0 of 24-Jul-2007

Introduction of the new line type PE (Protein Existence)

Most protein sequences are derived from translations of gene predictions. Some of them exhibit strong sequence similarity to known proteins in closely related species. For other proteins there is experimental evidence, such as Edman sequencing, clear identification by mass spectrometry (MSI), X-ray or NMR structure, detection by antibodies, etc. To indicate these different levels of evidence for the existence of a protein, we have introduced the PE (Protein Existence) line.

Note that the PE line does not describe the accuracy or correctness of a sequence displayed in UniProtKB, but the evidence for the existence of a protein. It may happen that the protein sequence is not entirely accurate, especially for sequences derived from gene predictions from genomic sequences.

The format of the PE line is:

PE   Level: Evidence;
With the following values:

Example:

PE   1: Evidence at protein level;

The PE line appears between the DR and KW lines of UniProtKB entries.

Modification of the RL (Reference Location) line for submissions

The format of the RL line for submissions is:

RL   Submitted (MMM-YYYY) to DatabaseName.

We have replaced the DatabaseName value Swiss-Prot by UniProtKB. The full list of valid DatabaseName values is now:

Changes concerning keywords (KW line)

New keywords:

Modified keywords:

Deleted keyword:

UniProtKB release 11.3 of 10-Jul-2007

Changes concerning cross-references (DR line)
PharmGKB

Cross-references have been added to the PharmGKB database. PharmGKB curates information that establishes knowledge about the relationships among drugs, diseases and genes, including their variations and gene products. It is a repository for genetic, genomic, molecular and cellular phenotype data and clinical information about people who have participated in pharmacogenomics research studies. The data includes, but is not limited to, clinical and basic pharmacokinetic and pharmacogenomic research in the cardiovascular, pulmonary, cancer, pathways, metabolic and transporter domains.

The PharmGKB database is available at http://www.pharmgkb.org/.

The format of the explicit link is:

Data bank identifier PharmGKB
Primary identifier The primary identifier consists of a PharmGKB accession ID.
Secondary identifier None; a dash '-' is stored in that field.
Example
Q96S55:
      DR   PharmGKB; PA134982239; -.   
   
Changes concerning keywords (KW line)

New keyword:

UniProtKB release 11.2 of 26-Jun-2007

Changes concerning keywords (KW line)

New keyword:

Modified keywords:

Changes concerning the controlled vocabulary for PTMs

Terms introduced:

Terms for the feature key 'CROSSLNK':

Terms for the feature key 'LIPID':

Terms for the feature key 'MOD_RES':

UniProtKB release 11.1 of 12-Jun-2007

Changes concerning cross-references (DR line)
PeptideAtlas

Cross-references have been added to the PeptideAtlas database. PeptideAtlas is a multi-organism, publicly accessible compendium of peptides that have been identified in a large set of tandem mass spectrometry proteomics experiments. All results of sequence searching have subsequently been processed through PeptideProphet to derive a probability of correct identification for all results in a uniform manner to insure a high quality database. All peptides have been mapped to Ensembl and can be viewed as custom tracks on the Ensembl Genome Browser.

The PeptideAtlas database is available at http://www.peptideatlas.org/.

The format of the explicit link is:

Data bank identifier PeptideAtlas
Primary identifier The primary identifier consists of a UniProtKB accession number.
Secondary identifier None; a dash '-' is stored in that field.
Example
P08524:
   DR   PeptideAtlas; P08524; -.
   
Changes concerning cross-references (DR line)
DisProt

Cross-references have been added to the Database of Protein Disorder (DisProt). The Database of Protein Disorder (DisProt) is a curated database that provides information about proteins that lack fixed 3D structure in their putatively native states, either in their entirety or in part. DisProt is a collaborative effort between Center for Computational Biology and Bioinformatics at Indiana University School of Medicine and Center for Information Science and Technology at Temple University.

The DisProt database is available at http://www.disprot.org/.

The format of the explicit link is:

Data bank identifier DisProt
Primary identifier The primary identifier consists of a DisProt accession number.
Secondary identifier None; a dash '-' is stored in that field.
Example
P07293:
   DR   DisProt; DP00228; -.
   DR   DisProt; DP00440; -.
   
UniProtKB release 11.0 of 29-May-2007

New ftp directory for UniProt Metagenomic and Environmental Sequences (UniMES)

We are pleased to announce a new UniProt database. The UniProt Metagenomic and Environmental Sequences (UniMES) database is a repository specifically developed for metagenomic and environmental data. Currently the database contains only data from the Global Ocean Sampling Expedition (GOS). The environmental sample data contained within this database is not present in either the UniProt Knowledgebase or the UniProt Reference Clusters. UniMES is released in FASTA format and to add further value, we have collaborated with the InterPro team to provide a file containing InterPro matches to UniMES.

UniMES is available in the new subdirectory current_release/unimes of the UniProt ftp servers ftp.uniprot.org/pub/databases/uniprot, ftp.ebi.ac.uk/pub/databases/uniprot and ftp.expasy.org/databases/uniprot.

New comment line (CC) topic SEQUENCE CAUTION

We have introduced the new CC line topic SEQUENCE CAUTION to describe protein sequence reports that differ from the sequence that is shown in UniProtKB due to conflicts that are not described in FT CONFLICT lines, such as frameshifts, erroneous gene model predictions, etc. This kind of information was before reported in the CC line topic CAUTION together with other warnings that are unrelated to sequence conflicts.

The format of the SEQUENCE CAUTION topic is:

CC   -!- SEQUENCE CAUTION:
         Sequence=Sequence; Type=Type;[ Positions=Positions;][ Note=Note;]

Where:

These lines are not wrapped and their length may therefore exceed 75 characters.

Examples:

Q93W20:
Previous annotation:
CC   -!- CAUTION: Ref.2 (BAA97015) sequence differs from that shown due to
CC       erroneous gene model prediction. The predicted gene At5g49940 has
CC       been split into 2 genes: At5g49940 and At5g49945.
New annotation:
CC   -!- SEQUENCE CAUTION:
CC       Sequence=BAA97015.1; Type=Erroneous gene model prediction; Note=The predicted gene At5g49940 has been split into 2 genes: At5g49940 and At5g49945;
Q83M39:
Previous annotation:
CC   -!- CAUTION: Ref.1 and Ref.2 sequences differ from that shown due to a
CC       stop codon at position 273 which was translated as Gln to extend
CC       the sequence.
New annotation:
CC   -!- SEQUENCE CAUTION:
CC       Sequence=AAN42076.1; Type=Erroneous termination; Positions=273; Note=Translated as Gln;
CC       Sequence=AAP15953.1; Type=Erroneous termination; Positions=273; Note=Translated as Gln;
P17814:
Previous annotation:
CC   -!- CAUTION: Ref.1 (CAA36850) sequence differs from that shown due to
CC       a frameshift in position 496.
CC   -!- CAUTION: Ref.1 (CAA36850) sequence differs from that shown due to
CC       erroneous gene model prediction.
New annotation:
CC   -!- SEQUENCE CAUTION:
CC       Sequence=CAA36850.1; Type=Erroneous gene model prediction;
CC       Sequence=CAA36850.1; Type=Frameshift; Positions=496;
P0A7B3:
Previous annotation:
CC   -!- CAUTION: Ref.4 (X07863) sequence differs from that shown due to
CC       several frameshifts.
CC   -!- CAUTION: Ref.5 (Y00357) sequence differs from that shown due to
CC       frameshifts in positions 204, 215 and 282.
New annotation:
CC   -!- SEQUENCE CAUTION:
CC       Sequence=X07863; Type=Frameshift; Positions=Several;
CC       Sequence=Y00357; Type=Frameshift; Positions=204, 215, 282;
P27612:
Previous annotation:
CC   -!- CAUTION: Ref.2 (AAA39943) sequence differs from that shown due to
CC       frameshifts in positions 4, 32, and 42.
CC   -!- CAUTION: Ref.2 (AAA39943) sequence differs from that shown due to
CC       contaminating sequence.
CC   -!- CAUTION: Ref.3 sequence differs from that shown due to a
CC       frameshift in position 697.
Current annotation:
CC   -!- SEQUENCE CAUTION:
CC       Sequence=AAA39943.1; Type=Miscellaneous discrepancy; Note=Several frameshifts and contaminating sequence;
CC       Sequence=Ref.3; Type=Frameshift; Positions=697;
Multiple occurrence of comment line (CC) topic SUBCELLULAR LOCATION

From now on, the CC line topic SUBCELLULAR LOCATION may occur more than once per entry.

Changes concerning cross-references (DR line)
PseudoCAP

Cross-references have been added to the Pseudomonas aeruginosa Community Annotation Project database. This database provides genome annotation of P. aeruginosa strain PAO1 and of other Pseudomonas species, acting as a valuable comparative resource for P. aeruginosa research, as well as being useful for the larger Pseudomonas research community. Over the coming year this database will be further enhanced toward more focus on comparative analysis of P. aeruginosa isolates and more specific information about putative drug and vaccine targets.

The Pseudomonas aeruginosa Community Annotation Project database is available at http://www.pseudomonas.com/.

The format of the explicit link is:

Data bank identifier PseudoCAP
Primary identifier The primary identifier consists of the ordered locus name.
Secondary identifier None; a dash '-' is stored in that field.
Example
Q9I576:
   DR   PseudoCAP; PA0865; -.
   
Orphanet

Cross-references have been added to the Orphanet database. This database is dedicated to information on rare diseases and orphan drugs. It aims to improve management and treatment of genetic, auto-immune or infectious rare diseases, rare cancers, or not yet classified rare diseases. ORPHANET offers services adapted to the needs of patients and their families, health professionals and researchers, support groups and industry.

The Orphanet database is available at http://www.orpha.net/consor/cgi-bin/home.php?Lng=GB.

The format of the explicit link is:

Data bank identifier Orphanet
Primary identifier The primary identifier consists of the Orpha unique disease identifier.
Secondary identifier The secondary identifier consists of the name of the disease.
Example
P26439:
   DR   Orphanet; 418; Adrenal hyperplasia, congenital.
   DR   Orphanet; 3185; Stein-Leventhal syndrome.
   
Changes concerning keywords (KW line)

New keyword:

UniProtKB release 10.4 of 01-May-2007

Changes concerning keywords (KW line)

Modified keyword:

UniProtKB release 10.2 of 03-Apr-2007

Changes concerning cross-references (DR line)
BuruList

Cross-references have been added to the Mycobacterium ulcerans genome database. This database is dedicated to the analysis of the genome of Mycobacterium ulcerans, the Buruli ulcer bacillus: BuruList. BuruList provides a complete dataset of DNA and protein sequences derived from the epidemic strain Agy99, linked to the relevant annotations and functional assignments. It allows one to easily browse through these data and retrieve information, using various criteria (gene names, location, keywords, etc.).

The Mycobacterium ulcerans genome database is available at http://genolist.pasteur.fr/BuruList/.

The format of the explicit link is:

Data bank identifier BuruList
Primary identifier The primary identifier consists of the ordered locus name.
Secondary identifier None; a dash '-' is stored in that field.
Example
A0PW55:
   DR   BuruList; MUL_4631; -.
   
Changes concerning keywords (KW line)

New keyword:

UniProtKB release 10.0 of 06-Mar-2007

Format change in the dbxref.txt document file

The dbxref.txt file lists the names and abbreviations and URLs of all databases cross-referenced in the UniProt Knowledgebase. We have added a new optional field, "Ref". This field contains the database reference in the following format:

Ref   : Journal_abbrev Volume:First_page-Last_page(YYYY); [PubMed=Pubmed_identifier; ][DOI=Digital_object_identifier;]

Example:

Abbrev: PROSITE
Name  : PROSITE; a protein domain and family database
Ref   : Nucleic Acids Res. 34:D227-D230(2006); PubMed=16381852; DOI=10.1093/nar/gkj063;
LinkTp: Explicit
Server: http://www.expasy.org/prosite/
Db_URL: www.expasy.org/cgi-bin/get-prosite-raw.pl?%s
Cat   : Family and domain databases
Changes concerning keywords (KW line)

New keyword:

UniProtKB release 9.7 of 20-Feb-2007

Changes concerning cross-references (DR line)
CYGD

Cross-references have been added to the MIPS Comprehensive Yeast Genome Database. This database aims to present information on the molecular structure and functional network of the entirely sequenced, well-studied model eukaryote, the budding yeast Saccharomyces cerevisiae. In addition the data of various projects on related yeasts are used for comparative analysis.

The CYGD is available at http://mips.gsf.de/genre/proj/yeast.

The format of the explicit links is:

Data bank identifier CYGD
Primary identifier The primary identifier consists of the ordered locus name.
Example
P35688:
   DR   CYGD; YDL240w; -.
   
New molecule type in the cross-references to EMBL

We added the value Viral_cRNA to the controlled vocabulary of the field MoleculeType of the cross-references to the EMBL nucleotide sequence database. The format of the DR EMBL line is:

DR   EMBL; AccessionNumber; ProteinID; StatusIdentifier; MoleculeType.

The controlled vocabulary of the field MoleculeType is:

Changes concerning keywords (KW line)

New keyword:

UniProtKB release 9.6 of 06-Feb-2007

Changes concerning cross-references (DR line)
Cornea-2DPAGE

Cross-references have been added to the Human Cornea 2-DE database, a two-dimensional polyacrylamide gel electrophoresis federated database available at the Aarhus University (Denmark).

The Cornea-2DPAGE is available at http://www.cornea-proteomics.com/.

The format for the explicit links is:

Data bank identifier Cornea-2DPAGE
Primary identifier The primary identifier consists of a UniProtKB accession number.
Secondary identifier The secondary identifier consists of the organism common name.
Example
P31946:
   DR   Cornea-2DPAGE; P31946; HUMAN.
DOSAC-COBS-2DPAGE

Cross-references have been added to the DOSAC-COBS 2D Page, a two-dimensional polyacrylamide gel electrophoresis federated database available at the DOSAC and COBS genome and proteome laboratory (La Maddalena, Italy).

The DOSAC-COBS-2DPAGE is available at http://www.dosac.unipa.it/2d/.

The format for the explicit links is:

Data bank identifier DOSAC-COBS-2DPAGE
Primary identifier The primary identifier consists of a UniProtKB accession number.
Secondary identifier The secondary identifier consists of the organism common name.
Example
P15531:
   DR   DOSAC-COBS-2DPAGE; P15531; HUMAN.
REPRODUCTION-2DPAGE

Cross-references have been added to the REPRODUCTION-2DPAGE, a two-dimensional polyacrylamide gel electrophoresis database available at the laboratory of Reproductive Medicine, Nanjing Medical University, P. R. China.

The REPRODUCTION-2DPAGE is available at http://reprod.njmu.edu.cn/cgi-bin/2d/2d.cgi.

The format for the explicit links is:

Data bank identifier REPRODUCTION-2DPAGE
Primary identifier The primary identifier consists of a UniProtKB accession number.
Secondary identifier The secondary identifier consists of the organism common name.
Example
P32119:
   DR   REPRODUCTION-2DPAGE; P32119; HUMAN.
UniProtKB release 9.5 of 23-Jan-2007

Changes in the usage of the feature key INIT_MET

The feature key INIT_MET indicates that there is experimental evidence that the initiator methionine has been cleaved off. In the past, the initiator methionine was not included in the sequence of an UniProtKB entry in such a case and the INIT_MET sequence coordinates were therefore 0.

Example:

FT   INIT_MET      0      0
FT   CHAIN         1    104       Cytochrome c.
FT                                /FTId=PRO_0000108218.
..
SQ   SEQUENCE   104 AA;  11618 MW;  D47C9B513DF1C5C2 CRC64;
     GDVEKGKKIF IMKCSQCHTV EKGGKHKTGP NLHGLFGRKT GQAPGYSYTA ANKNKGIIWG
     EDTLMEYLEN PKKYIPGTKM IFVGIKKKEE RADLIAYLKK ATNE
//

We have added back the initiator methionine to such protein sequences and changed the sequence coordinates of the feature key INIT_MET accordingly to 1.

Example:

FT   INIT_MET      1      1
FT   CHAIN         2    105       Cytochrome c.
FT                                /FTId=PRO_0000108218.
..
SQ   SEQUENCE   105 AA;  11749 MW;  8EE9689E0102506B CRC64;
     MGDVEKGKKI FIMKCSQCHT VEKGGKHKTG PNLHGLFGRK TGQAPGYSYT AANKNKGIIW
     GEDTLMEYLE NPKKYIPGTK MIFVGIKKKE ERADLIAYLK KATNE
//
UniProtKB release 9.4 of 09-Jan-2007

Changes concerning cross-references (DR line)
MaizeGDB

We changed the Data bank identifier for the Maize Genetics and Genomics Database MaizeGDB from MaizeDB to MaizeGDB.

Example:

DR   MaizeDB; 58111; -.

has changed to

DR   MaizeGDB; 58111; -.
Changes concerning keywords (KW line)

New keyword:

UniProtKB release 9.3 of 12-Dec-2006

Release of a new document presenting our Protein Spotlight articles and cited UniProtKB/Swiss-Prot entries

The document protspot.txt, available by ftp and on the Web site, lists the Protein Spotlight articles and cited UniProtKB/Swiss-Prot entries.

This document contains, for each Protein Spotlight article, the corresponding entries cited in that article. Protein Spotlight (ISSN 1424-4721) is a monthly review written by the Swiss-Prot team of the Swiss Institute of Bioinformatics. Spotlight articles describe a specific protein or family of proteins on an informal tone. Protein Spotlight is available at: http://www.expasy.org/spotlight/.

Changes concerning cross-references (DR line)
DIP

Cross-references have been added to the Database of interacting proteins. The DIP database catalogs experimentally determined interactions between proteins. It combines information from a variety of sources to create a single, consistent set of protein-protein interactions. The data stored within the DIP database were curated, both, manually by expert curators and also automatically using computational approaches that utilize the the knowledge about the protein-protein interaction networks extracted from the most reliable, core subset of the DIP data.

The DIP is available at http://dip.doe-mbi.ucla.edu/.

The format for the explicit links is:

Data bank identifier DIP
Primary identifier The primary identifier consists of the DIP accession number.
Secondary identifier None; a dash '-' is stored in that field.
Examples
Q9W1K5:
   DR   DIP; DIP:19601N; -.
P41597:
   DR   DIP; DIP:5833N; -.
   DR   DIP; DIP:5839N; -.