ExPASy logo ExPASy Home page Site Map Search ExPASy Contact us Swiss-Prot
Notice: This page will be replaced with www.uniprot.org. Please send us your feedback!
Search for

UniProt
Swiss-ProtTrEMBL
UniProt Knowledgebase
Swiss-Prot Protein Knowledgebase
TrEMBL Protein Database

What's new?
Release 15.4 of 16-Jun-2009

Also read about forthcoming changes, the latest release statistics (Swiss-Prot, TrEMBL), Swiss-Prot headlines, and recent and forthcoming changes for the XML version of the UniProt Knowledgebase.

UniProtKB release 15.4 of 16-Jun-2009

Changes concerning cross-references (DR line)
LinkHub

Cross-references to LinkHub have been removed.

Changes concerning keywords (KW line)

New keywords:

Changes concerning the controlled vocabulary for Subcellular locations

New subcellular location:

Modified subcellular locations:

UniProtKB release 15.3 of 26-May-2009

Changes concerning cross-references (DR line)
PMAP-CutDB

Cross-references have been added to the CutDB - Proteolytic event database. PMAP-CutDB is one of the first systematic efforts to build an easily accessible collection of documented proteolytic events for natural proteins in vivo or in vitro. A CutDB entry is defined by a unique combination of these three attributes: protease, protein substrate and cleavage site.

PMAP-CutDB is available at http://www.proteolysis.org/.

The format of the explicit link is:

Data bank identifier PMAP-CutDB
Primary identifier The primary identifier consists of a UniProtKB accession number.
Secondary identifier None; a dash '-' is stored in that field.
Examples
P02760:
DR   PMAP-CutDB; P02760; -.


Q02383:
DR   PMAP-CutDB; Q02383; -.

Removal of the ec2dtosp.txt document file.

The document ec2dtosp.txt, which listed the Escherichia coli Gene-protein database (ECO2DBASE) entries cross-referenced in UniProtKB/Swiss-Prot, has been removed.

Changes concerning keywords (KW line)

New keywords:

UniProtKB release 15.2 of 05-May-2009

Changes concerning cross-references (DR line)
OMA

Cross-references have been added to the OMA project. The OMA project is a massive cross-comparison of complete genomes to identify the evolutionary relation between any pair of proteins. The main features of OMA are the large number of genomes from all kingdoms of life, the strict verification of orthology assignments and the determination of the phylogenetic relationship between any two proteins.

OMA is available at http://www.omabrowser.org/.

The format of the explicit link is:

Data bank identifier OMA
Primary identifier The primary identifier consists of a UniProtKB accession number.
Secondary identifier The secondary identifier consists of an OMA group fingerprint.
Examples
P39899:
DR   OMA; P39899; YANTHIA.


Q9Y6C2:
DR   OMA; Q9Y6C2; EGLENKP.

Changes concerning keywords (KW line)

New keywords:

Changes concerning the controlled vocabulary for Subcellular locations

New subcellular location:

New subcellular topology:

UniProtKB release 15.1 of 14-Apr-2009

Changes concerning cross-references (DR line)
CAZy

Cross-references have been added to the Carbohydrate-Active enZymes database CAZy. CAZy describes the families of structurally-related catalytic and carbohydrate-binding modules ( or functional domains) of enzymes that degrade, modify, or create glycosidic bonds.

IPI is available at http://www.cazy.org/.

The format of the explicit link is:

Data bank identifier CAZy
Primary identifier The primary identifier consists of a CAZy family number.
Secondary identifier The secondary identifier consists of a CAZy family name.
Examples
P30590:
DR   CAZy; GT2; Glycosyltransferase Family 2.


P32775:
DR   CAZy; CBM48; Carbohydrate-Binding Module Family 48.
DR   CAZy; GH13; Glycoside Hydrolase Family 13.

Changes concerning keywords (KW line)

New keywords:

Changes concerning the controlled vocabulary for Subcellular locations

New subcellular location:

UniProtKB release 15.0 of 24-Mar-2009

Change of OG (OrGanelle) line value 'Chromatophore' to 'Organellar chromatophore'

After discussion with experts in the field and in consultation with the Gene Ontology we have changed the OG line describing proteins encoded by the chromatophore of Paulinella chromatophora from

OG   Plastid; Chromatophore.

to

OG   Plastid; Organellar chromatophore.
Changes concerning cross-references (DR line)
StyGene

Cross-references to StyGene have been removed.

Removal of the salty.txt document file.

The document salty.txt, which listed Salmonella typhimurium strain LT2 entries, gene names and cross-references to StyGene, has been removed.

Changes concerning keywords (KW line)

New keywords:

Modified keywords:

Changes concerning the controlled vocabulary for Subcellular locations

New subcellular location:

Modified subcellular locations:

UniProtKB release 14.9 of 03-Mar-2009

Changes concerning cross-references (DR line)
TCDB

Cross-references have been added to the Transport Classification Database TCDB. TCDB details a comprehensive IUBMB approved classification system for membrane transport proteins known as the Transporter Classification (TC) system. The TC system is analogous to the Enzyme Commission (EC) system for classification of enzymes, but incorporates phylogenetic information additionally.

TCDB is available at http://www.tcdb.org/.

The format of the explicit link is:

Data bank identifier TCDB
Primary identifier The primary identifier consists of a Transporter Classification number (TC#).
Secondary identifier The secondary identifier consists of a Transporter Classification family name.
Examples
P0A903:
DR   TCDB; 1.B.33.1.3; outer membrane protein insertion porin (OmpIP) family.

P0AC02:
DR   TCDB; 1.B.33.1.3; outer membrane protein insertion porin (OmpIP) family.

O60840:
DR   TCDB; 1.A.1.11.11; voltage-gated ion channel (VIC) superfamily.
DR   TCDB; 1.A.1.11.15; voltage-gated ion channel (VIC) superfamily.

Pathway_Interaction_DB

Cross-references have been added to the Pathway Interaction Database Pathway_Interaction_DB. The Pathway Interaction Database is a highly-structured, curated collection of information about known biomolecular interactions and key cellular processes assembled into signaling pathways.

Pathway_Interaction_DB is available at http://pid.nci.nih.gov/.

The format of the explicit link is:

Data bank identifier Pathway_Interaction_DB
Primary identifier The primary identifier consists of a short pathway name.
Secondary identifier The secondary identifier consists of a full pathway name.
Examples
O00422:
DR   Pathway_Interaction_DB; hdac_classi_pathway; Signaling events mediated by HDAC Class I.
DR   Pathway_Interaction_DB; hedgehog_glipathway; Hedgehog signaling events mediated by Gli proteins.
DR   Pathway_Interaction_DB; smad2_3nuclearpathway; Regulation of nuclear SMAD2/3 signaling.
DR   Pathway_Interaction_DB; telomerasepathway; Regulation of Telomerase.

O14640:
DR   Pathway_Interaction_DB; ps1pathway; Presenilin action in Notch and Wnt signaling.
DR   Pathway_Interaction_DB; wnt_canonical_pathway; Canonical Wnt signaling pathway.

Changes concerning keywords (KW line)

New keywords:

Modified keywords:

UniProtKB release 14.8 of 10-Feb-2009

Cross-references to Plant Ontology (PO) have been added in the tisslist.txt file

Each term in the list may be mapped to a corresponding eVOC term and now it can also be mapped to a corresponding Plant Ontology (PO) term.

Examples:

ID   Aleurone.
AC   TS-0027
SY   Aleurone layer.
DR   PO; PO:0005360; aleurone layer.
//
ID   Embryo.
AC   TS-0229
SY   Embryonic; Embryonic tissue; Whole embryo; Parthenogenote.
DR   eVOC; EV:0300001; development-stage: embryo.
DR   PO; PO:0009009; embryo.
//
Changes concerning cross-references (DR line)
IPI

Cross-references have been added to the International Protein Index IPI. IPI maintains a database of cross references between the primary data sources, provides minimally redundant yet maximally complete sets of proteins and maintains stable identifiers for proteomes of higher eukaryotic organisms.

IPI is available at http://www.ebi.ac.uk/IPI/IPIhelp.html.

The format of the explicit link is:

Data bank identifier IPI
Primary identifier The primary identifier consists of a unique IPI identifier.
Secondary identifier None; a dash '-' is stored in that field.
Examples
Q8NFR9:
DR   IPI; IPI00168887; -.
DR   IPI; IPI00177866; -.
DR   IPI; IPI00747706; -.
DR   IPI; IPI00789075; -.
DR   IPI; IPI00876915; -.

P03898:
DR   IPI; IPI00716083; -.
Bgee

Cross-references have been added to a dataBase for Gene Expression Evolution Bgee. Bgee is a database to retrieve and compare gene expression patterns between animal species. Bgee first maps heterogeneous expression data (currently EST, Affymetrix, and in situ hybridization data) on anatomical and developmental ontologies.

Bgee is available at http://bgee.unil.ch/bgee/bgee.

The format of the explicit link is:

Data bank identifier Bgee
Primary identifier The primary identifier consists of a UniProtKB accession number.
Secondary identifier None; a dash '-' is stored in that field.
Examples
Q9Z351:
DR   Bgee; Q9Z351; -.

P62835:
DR   Bgee; P62835; -.
Changes concerning the controlled vocabulary for Subcellular locations

New subcellular locations:

Deleted subcellular locations:

Changes concerning the controlled vocabulary for PTMs

Terms introduced:

UniProtKB release 14.7 of 20-Jan-2009

Release of a new document which describes the controlled vocabulary used in the comment line (CC) topic PATHWAY

The document pathlist.txt, available by ftp and on the Web site, describes the controlled vocabulary used in the comment line (CC) topic PATHWAY in the following format:

  ---------  -------------------------------   ----------------------------
  Line code  Content                           Occurrence in an entry
  ---------  -------------------------------   ----------------------------
  ID         Identifier                        Once; starts an entry
  AC         Accession number                  Once
  CL         UniPathway class                  Once
  DE         Definition                        Once or more
  SY         Synonym(s)                        Optional; once or more
  HI         Relationship is-a                 Optional; once or more
  HP         Relationship part-of              Optional; once or more
  DR         Cross-reference(s)                Optional; once or more
  //         Terminator                        Once; ends an entry
  

Example:

ID   D-alanine biosynthesis.
AC   UPA00042
CL   Pathway.
DE   Biosynthesis of D-alanine. D-alanine is used either as an energy
DE   source or as a component of bacterial cell wall, where it is directly
DE   involved in the cross-linking of adjacent peptidoglycan chains. In
DE   Gram-positive bacteria, D-alanine can also be found to variable
DE   extents in cell wall teichoic acid and lipoteichoic acid residues.
SY   D-2-aminopropionic acid biosynthesis.
HI   UPA00402; amino-acid biosynthesis.
DR   GO; GO:0030632; P:D-alanine biosynthetic process.
DR   KEGG; map00252; Alanine and aspartate metabolism.
DR   KEGG; map00473; D-Alanine metabolism.
DR   MetaCyc; ALADEG-PWY.
//
  
Syntax modification of the comment line (CC) topic PATHWAY

We have structured the comment line topic PATHWAY, using the controlled vocabulary provided by the UniPathway resource, in order to improve the consistency of annotation and to allow to parse its content.

The new format of PATHWAY is:

CC   -!- PATHWAY: Super-pathway; Pathway(; Sub-pathway: Enzymatic_reaction)?( [regulation])?.
  
Where:

Note: Perl-style multipliers indicate whether a pattern (as delimited by parentheses) is optional.

Examples:

P49367:
CC   -!- PATHWAY: Amino-acid biosynthesis; L-lysine biosynthesis via AAA
CC       pathway; L-alpha-aminoadipate from 2-oxoglutarate: step 2/4.
  
P0A877:
CC   -!- PATHWAY: Amino-acid biosynthesis; L-tryptophan biosynthesis; L-
CC       tryptophan from chorismate: step 5/5.
  
P95477:
CC   -!- PATHWAY: Siderophore biosynthesis; pseudomonine biosynthesis.
  
P52957:
CC   -!- PATHWAY: Mycotoxin biosynthesis; sterigmatocystin biosynthesis
CC       [regulation].
  
New comment line (CC) topic DISRUPTION PHENOTYPE

We have introduced the new CC line topic DISRUPTION PHENOTYPE to describe the effects caused by the disruption of the gene coding for a protein. Note that we only describe effects caused by the complete absence of a gene and thus of a protein in vivo (null mutants caused by random or target deletions, insertions of a transposable element etc.) To avoid description of phenotypes due to partial or dominant negative mutants, missense mutations are not described in this topic, but in FT MUTAGEN instead. Defects caused by transient inactivation by methods such as RNA interference or blockage by antibodies are also not described in this topic due to the difficulty of interpreting results, except for C. elegans RNAi studies, which are widely used and done in vivo.

The format of the new topic is free text.

Examples:

Q8R1N0:
CC   -!- DISRUPTION PHENOTYPE: Death occurs by the end of preimplantation
CC       development. Embryos exhibit a dramatic reduction in the total cell
CC       number, a high mitotic index, and the presence of abnormal mitotic
CC       figures.
Q05753:
CC   -!- DISRUPTION PHENOTYPE: Developmental arrest of the embryos at the
CC       globular stage.
P11911:
CC   -!- DISRUPTION PHENOTYPE: Impaired B-cell development which fails to
CC       progress past the progenitor stage.
Changes concerning cross-references (DR line)
BRENDA

Cross-references have been added to the Comprehensive Enzyme Information System BRENDA . BRENDA (BRaunschweig ENzyme DAtabase) is a large publicly available enzyme information system, which includes information on all identified enzymes. The range of data encompasses functional, structural, sequence, localisation, disease-related, isolation, stability information on enzyme and ligand-related data.

BRENDA is available at http://www.brenda-enzymes.org/.

The format of the explicit link is:

Data bank identifier BRENDA
Primary identifier The primary identifier consists of an EC number.
Secondary identifier The secondary identifier consists of an organism code from BRENDA.
Examples
O07560:
DR   BRENDA; 3.4.21.89; 150.
P06115:
DR   BRENDA; 1.11.1.6; 250.
Change of molecule type in the cross-references to EMBL

Following changes in the the EMBL nucleotide sequence database, the term pre-RNA was replaced by Transcribed_RNA as a valid value for the field MoleculeType of cross-references to EMBL. The format of the DR EMBL line is:

DR   EMBL; AccessionNumber; ProteinID; StatusIdentifier; MoleculeType.

The controlled vocabulary of the field MoleculeType is:

Changes concerning keywords (KW line)

New keywords:

UniProtKB release 14.6 of 16-Dec-2008

Changes concerning cross-references (DR line)
GeneCards

Cross-references have been added to GeneCards. GeneCards is a searchable, integrated database of human genes that provides concise genomic, proteomic, transcriptomic, genetic and functional information on all known and predicted human genes.

GeneCards is available at http://www.genecards.org/.

The format of the explicit link is:

Data bank identifier GeneCards
Primary identifier The primary identifier consists of a unique GeneCards identifier.
Secondary identifier None; a dash '-' is stored in that field.
Examples
Q6PCB8:
DR   GeneCards; GC05M049731; -.
P69905:
DR   GeneCards; GC16P000162; -.
DR   GeneCards; GC16P000166; -.
PRIDE

Cross-references have been added to PRIDE PRoteomics IDEntifications database. The PRIDE PRoteomics IDEntifications database is a centralized, standards compliant, public data repository for proteomics data.

PRIDE is available at http://www.ebi.ac.uk/pride/.

The format of the explicit link is:

Data bank identifier PRIDE
Primary identifier The primary identifier consists of a UniProtKB accession number.
Secondary identifier None; a dash '-' is stored in that field.
Examples
Q9Y5P4:
DR   PRIDE; Q9Y5P4; -.
P25296:
DR   PRIDE; P25296; -.
IntAct

We have changed the format of the cross-reference lines to IntAct to add the number of interactions.

The secondary identifier is the number of interactions:

Secondary identifier The secondary identifier consists of the number of interactions.
Example
O01802:
DR   IntAct; O01802; 12.
UniProtKB release 14.5 of 25-Nov-2008

Changes concerning keywords (KW line)

New keywords:

Modified keywords:

Deleted keywords:

UniProtKB release 14.4 of 04-Nov-2008

Changes concerning cross-references (DR line)
NextBio

Cross-references have been added to NextBio. NextBio is a life science search engine that enables researchers and clinicians to access and understand the world's life sciences information. NextBio contains amongst other things gene-centric data for human, mouse, rat, fly, worm and yeast.

NextBio is available at http://www.nextbio.com/.

The format of the explicit link is:

Data bank identifier NextBio
Primary identifier The primary identifier consists of a unique NextBio identifier.
Secondary identifier None; a dash '-' is stored in that field.
Examples
O95793:
DR   NextBio; 26468; -.
P55002:
DR   NextBio; 291402; -.
Xenbase

Cross-references have been added to the Xenbase: Xenopus laevis and tropicalis biology and genomics resource. Xenbase is a model organism database integrating a diverse array of biological and genomic data on the frogs, Xenopus laevis and Xenopus (Silurana) tropicalis. Data is collected from other databases, high-throughput screens and the scientific literature and integrated into a number of database modules covering subjects such as community, literature, gene and genomic analysis.

Xenbase: Xenopus laevis and tropicalis biology and genomics resource is available at http://http://www.xenbase.org/.

The format of the explicit link is:

Data bank identifier Xenbase
Primary identifier The primary identifier consists of a Xenbase accession number.
Secondary identifier The secondary identifier consists of a gene name.
Examples
Q7ZXH3:
DR   Xenbase; XB-FEAT-942651; map3k7ip3.

P02281:
DR   Xenbase; XB-FEAT-5722946; -.
DR   Xenbase; XB-FEAT-5717970; hist1h2bj.
DR   Xenbase; XB-FEAT-5719554; hist1h2bk.
Changes concerning keywords (KW line)

New keywords:

UniProtKB release 14.3 of 14-Oct-2008

UniProtKB release 14.2 of 23-Sep-2008

UniProtKB release 14.1 of 02-Sep-2008

Changes concerning keywords (KW line)

New keywords:

New subcellular locations:

UniProtKB release 14.0 of 22-Jul-2008

Change of the protein description (DE line)

Up to now, the UniProtKB description (DE) lines were listing protein names in a computer parsable format, but with a minimal amount of structure. In UniProtKB/Swiss-Prot the description starts with the recommended name of the protein and additional alternative names are indicated between parentheses. In UniProtKB/TrEMBL the description is derived directly from the underlying nucleotide entry and its accuracy relies on the information provided by the submitter of the nucleotide entry, unless it has been improved by automatic annotation procedures.

Consistent nomenclature is indispensable for communication, literature searching and entry retrieval. The protein names provided in the description lines of UniProtKB/Swiss-Prot are widely used by life scientists and often propagated during the annotation of new genomic sequences. For these reasons we have structured the UniProtKB DE lines more explicitly: We introduced 3 categories, as well as several subcategories, of protein names:

Category FieldSubcategory FieldCardinalityDescription
RecName:1 in UniProtKB/Swiss-Prot
0-1 in UniProtKB/TrEMBL
The name recommended by the UniProt consortium.
Full=1 The full name.
Short=0-n An abbreviation of the full name or an acronym.
EC=0-n An Enzyme Commission number.
AltName:0-n A synonym of the recommended name.
Full=0-1 The full name.
Short=0-n An abbreviation of the full name or an acronym.
EC=0-n An Enzyme Commission number.
AltName:Allergen=0-1 See allergen.txt.
AltName:Biotech=0-1 A name used in a biotechnological context.
AltName:CD_antigen=0-n See cdlist.txt.
AltName:INN=0-n The international nonproprietary name: A generic name for a pharmaceutical substance or active pharmaceutical ingredient that is globally recognized and is a public property.
SubName:0 in UniProtKB/Swiss-Prot
0-n in UniProtKB/TrEMBL
A name provided by the submitter of the underlying nucleotide sequence.
Full=1 The full name.
EC=0-n An Enzyme Commission number.

Each name is shown on a separate line; lines may therefore exceed 75 characters.

A block of DE lines may further contain multiple Includes: and/or Contains: sections and a separate field Flags: to indicate whether the protein sequence is a precursor or a fragment:

FieldCardinalityValue
Includes:0-n A block of protein names as described in the table above.
Contains:0-n A block of protein names as described in the table above.
Flags:0-1 Precursor and/or Fragment or Fragments

Examples:

P09919:

Previous format:

DE   Granulocyte colony-stimulating factor precursor (G-CSF) (Pluripoietin)
DE   (Filgrastim) (Lenograstim).

New format:

DE   RecName: Full=Granulocyte colony-stimulating factor;
DE            Short=G-CSF;
DE   AltName: Full=Pluripoietin;
DE   AltName: INN=Filgrastim;
DE   AltName: INN=Lenograstim;
DE   Flags: Precursor;
Q10743:

Previous format:

DE   ADAM 10 precursor (EC 3.4.24.81) (A disintegrin and metalloproteinase
DE   domain 10) (Mammalian disintegrin-metalloprotease) (Kuzbanian protein
DE   homolog) (CD156c antigen) (Fragment).

New format:

DE   RecName: Full=ADAM 10;
DE            EC=3.4.24.81;
DE   AltName: Full=A disintegrin and metalloproteinase domain 10;
DE   AltName: Full=Mammalian disintegrin-metalloprotease;
DE   AltName: Full=Kuzbanian protein homolog;
DE   AltName: CD_antigen=CD156c;
DE   Flags: Precursor; Fragment;
Q07908:

Previous format:

DE   Arginine biosynthesis bifunctional protein argJ [Includes: Glutamate
DE   N-acetyltransferase (EC 2.3.1.35) (Ornithine acetyltransferase)
DE   (Ornithine transacetylase) (OATase); Amino-acid acetyltransferase
DE   (EC 2.3.1.1) (N-acetylglutamate synthase) (AGS)] [Contains: Arginine
DE   biosynthesis bifunctional protein argJ alpha chain; Arginine
DE   biosynthesis bifunctional protein argJ beta chain].

New format:

DE   RecName: Full=Arginine biosynthesis bifunctional protein argJ;
DE   Includes:
DE     RecName: Full=Glutamate N-acetyltransferase;
DE              EC=2.3.1.35;
DE     AltName: Full=Ornithine acetyltransferase;
DE              Short=OATase;
DE     AltName: Full=Ornithine transacetylase;
DE   Includes:
DE     RecName: Full=Amino-acid acetyltransferase;
DE              EC=2.3.1.1;
DE     AltName: Full=N-acetylglutamate synthase;
DE              Short=AGS;
DE   Contains:
DE     RecName: Full=Arginine biosynthesis bifunctional protein argJ alpha chain;
DE   Contains:
DE     RecName: Full=Arginine biosynthesis bifunctional protein argJ beta chain;
Changes in the FASTA header line

The UniProtKB FASTA headers were unfortunately incompatible with the -o option of the NCBI's program formatdb. We have been working with the NCBI to remedy this and changes were required on both sides. The new version of formatdb now accepts a database code for UniProtKB/TrEMBL, and we have modified our UniProtKB FASTA headers accordingly. For consistency reasons, we also changed the FASTA headers of the other UniProt databases.

UniProtKB
>db|UniqueIdentifier|EntryName ProteinName OS=OrganismName[ GN=GeneName]PE=ProteinExistence SV=SequenceVersion
Where:

Examples:

>sp|Q8I6R7|ACN2_ACAGO Acanthoscurrin-2 (Fragment) OS=Acanthoscurria gomesiana GN=acantho2 PE=1 SV=1
>sp|P27748|ACOX_RALEH Acetoin catabolism protein X OS=Ralstonia eutropha (strain ATCC 17699 / H16 / DSM 428 / Stanier 337) GN=acoX PE=4 SV=2
>sp|P04224|HA22_MOUSE H-2 class II histocompatibility antigen, E-K alpha chain OS=Mus musculus PE=1 SV=1

>tr|A3SA23|A3SA23_9RHOB TonB dependent, hydroxamate-type ferrisiderophore, outer membrane receptor OS=Sulfitobacter sp. EE-36 GN=EE36_08023 PE=3 SV=1
>tr|Q8N2H2|Q8N2H2_HUMAN CDNA FLJ90785 fis, clone THYRO1001457, moderately similar to H.sapiens protein kinase C mu OS=Homo sapiens PE=2 SV=1
Alternative isoforms (this only applies to UniProtKB/Swiss-Prot):
>sp|IsoID|EntryName Isoform IsoformName of ProteinName OS=OrganismName[ GN=GeneName]
Where: ProteinExistence and SequenceVersion do not apply to alternative isoforms (ProteinExistence is dependent on the number of cDNA sequences, which is not known for individual isoforms).

Example:

sp|Q4R572-2|1433B_MACFA Isoform Short of 14-3-3 protein beta/alpha OS=Macaca fascicularis GN=YWHAB
UniRef
>UniqueIdentifier ClusterName n=Members Tax=Taxon RepID=RepresentativeMember
Where:

Example:

>UniRef100_A5DI11 Elongation factor 2 n=1 Tax=Pichia guilliermondii RepID=EF2_PICGU
UniParc
>UniqueIdentifier status=Status
Where:

Example:

>UPI0000000005 status=active
UniMES
>UniqueIDentifier ProteinName OS=OrganismName[ Pep=SourcePeptideIdentifier]SV=SequenceVersion
Where:

Example:

>MES00000000005 Putative uncharacterized protein GOS_3018412 (Fragment) OS=marine metagenome Pep=JCVI_PEP_1096688850003 SV=1
Archived UniProtKB sequence versions
>db|UniqueIdentifier archived from Release ReleaseNumber ReleaseDate SV=SequenceVersion
Where:

Examples:

"pre-UniProt":
>sp|P05067 archived from Release 18.0 01-MAY-1991 SV=3
>tr|Q55167 archived from Release 17.0 01-JUN-2001 SV=1
"post-UniProt":
>sp|P05067 archived from Release 9.2/51.2 28-NOV-2006 SV=3
>tr|A0RTJ8 archived from Release 11.0/36.0 29-MAY-2007 SV=1
New OG (OrGanelle) line value: Chromatophore

We have added Chromatophore to the list of valid plastid values in the OG line. The chromatophore is the photosynthetic inclusion found in Paulinella chromatophora, a photosynthetic thecate amoeba. It encodes and houses the machinery necessary for photosynthesis and CO2 fixation; it also has the genetic capacity to synthesize some amino acids, some fatty acids and a few cofactors. It is not yet clear whether the chromatophore derives from the same endosymbiotic event that is thought to have led to all other plastids. The chromatophore genome of P. chromatophora has been sequenced (PubMed:18356055) and been found to be just over 1 Mb, approximately 9 times larger than the average photosynthetic plastid and approximately 1/3 smaller than the smallest cyanobacterial genome.

Example:

OG   Plastid; Chromatophore.
Changes concerning cross-references (DR line)
BindingDB

Cross-references have been added to The Binding Database. BindingDB is a public, web-accessible database of measured binding affinities, focusing chiefly on the interactions of proteins considered to be drug-targets with small, drug-like molecules.

The Binding Database is available at http://www.bindingdb.org/.

The format of the explicit link is:

Data bank identifier BindingDB
Primary identifier The primary identifier consists of a UniProtKB accession number.
Secondary identifier None; a dash '-' is stored in that field.
Examples
P50613:
DR   BindingDB; P50613; -.

P68850:
DR   BindingDB; P68850; -.
UniProt decoy databases

The target-decoy search strategy, which has become widespread and is recommended in journal guidelines, consists of attaching a decoy database to a forward database and searching MS/MS spectra against this composite database. It is more stringent than a simple search, and allows to compute an estimation of the false discovery rate.
For this strategy to be efficient, the decoy database has to preserve the general composition of the target database while minimizing the peptide sequence overlap between the target and the decoy.
We developed a new algorithm that shuffles proteins and keeps re-shuffling each tryptic peptide until it no longer matches with any peptide from the original database. This method ensures that no tryptic peptide is shared between the target and decoy databases.

Decoy versions of UniProtKB/Swiss-Prot, UniProtKB/TrEMBL and UniRef100 can now be retrieved in FASTA format from our : public FTP site.

Changes concerning keywords (KW line)

New keywords:

Deleted keywords:

New subcellular locations:

Changes concerning the controlled vocabulary for PTMs

Terms introduced:

Terms for the feature key 'CROSSLNK':

Terms for the feature key 'MOD_RES':

UniProtKB release 13.6 of 01-Jul-2008

New RX (Reference cross-reference) line value: AGRICOLA

The RX (Reference cross-reference) line is an optional line which is used to indicate cross-references to bibliographic databases. We have introduced cross-references to AGRICOLA, the National Agricultural Library's catalog of citations to agricultural literature. The valid bibliographic database names and their associated identifiers are now:

NameIdentifier
MEDLINEEight-digit MEDLINE Unique Identifier (UI)
PubMedPubMed Unique Identifier (PMID)
DOIDigital Object Identifier (DOI)
AGRICOLAAGRICOLA Unique Identifier

Example:

RX   AGRICOLA=IND20450567;
Changes concerning keywords (KW line)

New keywords:

UniProtKB release 13.5 of 10-Jun-2008

Changes concerning cross-references (DR line)
HOGENOM

Cross-references have been added to the HOGENOM Database of Homologous Genes from Fully Sequenced Organisms. HOGENOM allows to select sets of homologous genes among species, and to visualize multiple alignments and phylogenetic trees. It is as well possible to search for orthologous genes in a wide range of taxons. Thus HOGENOM is particularly useful for comparative sequence analysis, phylogeny and molecular evolution studies. More generaly, HOGENOM gives an overall view of what is known about a peculiar gene family.

The HOGENOM Database of Homologous Genes from Fully Sequenced Organisms is available at http://pbil.univ-lyon1.fr/databases/hogenom.php.

The format of the explicit link is:

Data bank identifier HOGENOM
Primary identifier The primary identifier consists of a UniProtKB accession number.
Secondary identifier None; a dash '-' is stored in that field.
Examples
P0A9I1:
DR   HOGENOM; P0A9I1; -.

P49642:
DR   HOGENOM; P49642; -.
HOVERGEN

Cross-references have been added to the HOVERGEN Database of Homologous Vertebrate Genes. HOVERGEN allows one to select sets of homologous genes among vertebrate species, and to visualize multiple alignments and phylogenetic trees. Thus HOVERGEN is particularly useful for comparative sequence analysis, phylogeny and molecular evolution studies. More generaly, HOVERGEN gives an overall view of what is known about a peculiar gene family.

The HOVERGEN Database of Homologous Vertebrate Genes is available at http://pbil.univ-lyon1.fr/databases/hovergen.php.

The format of the explicit link is:

Data bank identifier HOVERGEN
Primary identifier The primary identifier consists of a UniProtKB accession number.
Secondary identifier None; a dash '-' is stored in that field.
Examples
P31946:
DR   HOVERGEN; P31946; -.

Q91ZB4:
DR   HOVERGEN; Q91ZB4; -.
Changes concerning keywords (KW line)

New keywords:

UniProtKB release 13.4 of 20-May-2008

Changes concerning cross-references (DR line)
CGD

Cross-references have been added to the Candida Genome Database. CGD is a resource for genomic sequence data and gene and protein information for Candida albicans. CGD is based on the Saccharomyces Genome Database and is funded by the National Institute of Dental and Craniofacial Research at the US National Institutes of Health.

The Candida Genome Database is available at http://www.candidagenome.org/.

The format of the explicit link is:

Data bank identifier CGD
Primary identifier The primary identifier consists of a CGD identifier.
Secondary identifier The secondary identifier consists of a gene name.
Examples
O74198:
DR   CGD; CAL0006397; ERG6.

Q59TD3:
DR   CGD; CAL0079252; MED8.
Changes concerning keywords (KW line)

New keywords:

UniProtKB release 13.3 of 29-Apr-2008

Changes concerning cross-references (DR line)
NMPDR

Cross-references have been added to the National Microbial Pathogen Data Resource. NMPDR is a National Institute of Allergy and Infections Disease (NIAID)-funded Bioinformatics Resource Center that supports research in selected Category B pathogens. NMPDR contains the complete genomes of approximately 50 strains of pathogenic bacteria as well as >400 other genomes that provide a broad context for comparative analysis across the three phylogenetic domains. NMPDR integrates complete, public genomes with expertly curated biological subsystems to provide the most consistent genome annotations. Subsystems are sets of functional roles related by a biologically meaningful organizing principle, which are built over large collections of genomes; they provide researchers with consistent functional assignments in a biologically structured context.

The National Microbial Pathogen Data Resource is available at http://www.nmpdr.org/.

The format of the explicit link is:

Data bank identifier NMPDR
Primary identifier The primary identifier consists of a NMPDR protein identifier.
Secondary identifier None; a dash '-' is stored in that field.
Examples
Q88K84:
DR   NMPDR; fig|160488.1.peg.2385; -.

Q1QN15:
DR   NMPDR; fig|323097.3.peg.1480; -.
Changes concerning keywords (KW line)

New keywords:

UniProtKB release 13.2 of 08-Apr-2008

Release of a new document which lists all the secondary UniProtKB accession numbers together with their corresponding current primary accession number(s).

The document sec_ac.txt, available by ftp lists all secondary accession numbers in UniProtKB (UniProtKB/Swiss-Prot and UniProtKB/TrEMBL), together with their corresponding current primary accession number(s).

Changes concerning cross-references (DR line)
HIV

Cross-references to the HIV have been removed.

TRANSFAC

Cross-references to the TRANSFAC have been removed.

Changes concerning keywords (KW line)

New keywords:

UniProtKB release 13.1 of 18-Mar-2008

Changes concerning cross-references (DR line)
ProMEX

Cross-references have been added to the Protein Mass spectra EXtraction database. ProMEX is a mass spectral library consisting of tryptic peptide product ion spectra generated by liquid chromatography coupled to ion trap mass spectrometry (LC-ITMS) and was developed using samples derived from Arabidopsis thaliana and Medicago truncatula. The database serves as a reference and can be used for protein identification in uncharacterized samples. Protein identification by ProMEX is linked to other molecular levels of biological organization such as metabolite, pathway and transcript data. The database is further connected to annotation and classification services.

The Protein Mass spectra EXtraction database is available at http://promex.mpimp-golm.mpg.de/.

The format of the explicit link is:

Data bank identifier ProMEX
Primary identifier The primary identifier consists of a UniProtKB accession number.
Secondary identifier None; a dash '-' is stored in that field.
Examples
O80448:
DR   ProMEX; O80448; -.
   
P49200:
DR   ProMEX; P49200; -.
   
Changes concerning keywords (KW line)

New keywords:

UniProtKB release 13.0 of 26-Feb-2008

Change of the representation of non-standard amino acids (selenocysteine and pyrrolysine)

The non-standard amino acid selenocysteine was annotated with the feature key SE_CYS and represented by the one-letter code 'C' in the sequence. Pyrrolysines were annotated with the more generic feature key MOD_RES and represented by the one-letter code 'K' in the sequence. In order to annotate these and future non-standard amino acids in the same fashion, we replaced the feature key SE_CYS and the MOD_RES feature key used with the description Pyrrolysine with the new feature key NON_STD (non-standard) and the descriptions Selenocysteine and Pyrrolysine, as appropriate. At the same time, we changed the sequence to use the IUPAC/IUBMB recommended one-letter codes 'U' for selenocysteine and 'O' for pyrrolysine.

Previous annotation:

ID   BTHD_DROME              Reviewed;         249 AA.
..
FT   SE_CYS       37     37
..
     MPPKRNKKAE APIAERDAGE ELDPNAPVLY VEHCRSCRVF RRRAEELHSA LRERGLQQLQ
                                            *
ID   MTBB1_METAC             Reviewed;         467 AA.
..
FT   MOD_RES     356    356       Pyrrolysine (Probable).
..
     RAVNFMKAAV QASPIPCHVD MGMGVGGIPM LETPPVDAVT RASKAMVEVA GVDGIKIGVG
                                                                 *

New annotation:

ID   BTHD_DROME              Reviewed;         249 AA.
..
FT   NON_STD      37     37       Selenocysteine.
..
     MPPKRNKKAE APIAERDAGE ELDPNAPVLY VEHCRSURVF RRRAEELHSA LRERGLQQLQ
                                            *
ID   MTBB1_METAC             Reviewed;         467 AA.
..
FT   NON_STD     356    356       Pyrrolysine (Probable).
..
     RAVNFMKAAV QASPIPCHVD MGMGVGGIPM LETPPVDAVT RASKAMVEVA GVDGIOIGVG
                                                                 *
Changes concerning cross-references (DR line)
PhosphoSite

Cross-references have been added to the Phosphorylation site database. PhosphoSite is an expert-curated knowledgebase of information focused on protein phosphorylation mainly in vertebrates. In addition to phosphorylation sites curated from the literature, large numbers of new unpublished sites discovered by MS/MS analyses are being added regularly.

The Phosphorylation site database is available at http://phosphosite.cellsignal.com/.

The format of the explicit link is:

Data bank identifier PhosphoSite
Primary identifier The primary identifier consists of a UniProtKB accession number.
Secondary identifier None; a dash '-' is stored in that field.
Examples
P01266:
DR   PhosphoSite; P01266; -.
   
Q9JMH6:
DR   PhosphoSite; Q9JMH6; -.
   
2DBase-Ecoli

Cross-references have been added to the 2D-PAGE Database of Escherichia coli. The 2DBase-Ecoli database currently contains 12 gels consisting of 1185 protein spots information in which 723 proteins where identified and annotated. Individual protein spots in the existing gels can be displayed, queried, analysed and compared in a tabular format based on various functional categories enabling quick and subsequent analysis.

The 2D-PAGE Database of Escherichia coli is available at http://2dbase.techfak.uni-bielefeld.de/.

The format of the explicit link is:

Data bank identifier 2DBase-Ecoli
Primary identifier The primary identifier consists of a UniProtKB accession number.
Secondary identifier None; a dash '-' is stored in that field.
Examples
P02930:
DR   2DBase-Ecoli; P02930; -.
   
P04816:
DR   2DBase-Ecoli; P04816; -.
   
Changes concerning keywords (KW line)

New keywords:

New subcellular locations:

Changes concerning the controlled vocabulary for PTMs

Terms introduced:

Terms for the feature key 'MOD_RES':

UniProtKB release 12.8 of 05-Feb-2008

Changes concerning cross-references (DR line)
World-2DPAGE

Cross-references have been added to the public repository of 2D-gel data World-2DPAGE. All 2D gel data to be published in the journal Proteomics needs to be available on the web. The World-2DPAGE repository hosts the data for resources who cannot build and maintain a web interface. There are currently two data sources submitted to World-2DPAGE, which are numbered consecutively:

The format of the explicit link is:

Data bank identifier World-2DPAGE
Primary identifier The primary identifier is a combination of the database name and the accession number (usually from UniProtKB) in this database. Both are concatenated by a ":".
Secondary identifier None; a dash '-' is stored in that field.
Examples
P61108:
DR   World-2DPAGE; 0002:P61108; -.
   
P77845:
DR   World-2DPAGE; 0001:P77845; -.
   
Cornea-2DPAGE, DOSAC-COBS-2DPAGE, HSC-2DPAGE, REPRODUCTION-2DPAGE, SWISS-2DPAGE

In cross-references to Cornea-2DPAGE, DOSAC-COBS-2DPAGE, HSC-2DPAGE, REPRODUCTION-2DPAGE and SWISS-2DPAGE, the secondary identifier used to be the species origin. The species information has become obsolete/redundant since UniProtKB/Swiss-Prot no longer contains entries describing the same protein from different species (see Release 6.7). We have therefore removed the species information from these secondary identifiers and replaced them by "-".

Examples:

Previous format:

DR   SWISS-2DPAGE; P04217; HUMAN.
DR   Cornea-2DPAGE; P04217; HUMAN.
DR   DOSAC-COBS-2DPAGE; P04217; HUMAN.
DR   REPRODUCTION-2DPAGE; P04217; HUMAN.

New format:

DR   SWISS-2DPAGE; P04217; -.
DR   Cornea-2DPAGE; P04217; -.
DR   DOSAC-COBS-2DPAGE; P04217; -.
DR   REPRODUCTION-2DPAGE; P04217; -.
Release of a new document which provides the classification of human and mouse protein kinases into subfamilies or subgroups.

The document pkinfam.txt, available by ftp and on the Web site, provides the classification of human and mouse protein kinases into subfamilies or subgroups, as developed by Gerard Manning. The classification from Diego Miranda-Saavedra has also been taken into account.

This document contains all the human and mouse protein kinase UniProtKB/Swiss-Prot entries, subdivided into 10 subfamilies or subgroups. Each gene name is followed by the corresponding human and/or mouse 'UniProtKB/Swiss-Prot entry name (UniProtKB/Swiss-Prot accession number)'.

Changes concerning keywords (KW line)

New keyword:

UniProtKB release 12.7 of 15-Jan-2008

New clustered sequence sets for UniMES

The UniProt Metagenomic and Environmental Sequences (UniMES) database is a repository specifically developed for metagenomic and environmental data. We now provide UniMES clusters, i.e. clustered sets of sequences, at two resolutions: 100% (unimes_cluster100.fasta) and >90% (unimes_cluster90.fasta). In unimes_cluster100.fasta, identical sequences and subfragments from unimes.fasta are placed into a single cluster. The unimes_cluster90.fasta is built by clustering unimes_cluster100.fasta representative sequences (the longest sequence in a cluster) using the CD-HIT algorithm (Li W., Jaroszewski L., and Godzik A., Bioinformatics, 17: 282-283, 2001) such that each cluster is composed of sequences that have at least 90% sequence identity, to the representative sequence. Only the representative sequences of the clusters are present in these files.

UniMES is available in the subdirectory current_release/unimes of the UniProt ftp servers ftp.uniprot.org/pub/databases/uniprot, ftp.ebi.ac.uk/pub/databases/uniprot and ftp.expasy.org/databases/uniprot.

Changes concerning cross-references (DR line)
dictyBase

The DictyBase database was renamed to dictyBase. We changed the database name in the relevant cross-references (DR lines) accordingly.

Example:

DR   dictyBase; DDB0201569; manA.
PDBsum

Cross-references have been added to the PDBsum database. PDBsum provides an overview of every macromolecular structure deposited in the Protein Data Bank (PDB), giving schematic diagrams of the molecules in each structure and of the interactions between them.

The PDBsum database is available at http://www.ebi.ac.uk/pdbsum.

The format of the explicit link is:

Data bank identifier PDBsum
Primary identifier The primary identifier consists of a PDB entry name.
Secondary identifier None; a dash '-' is stored in that field.
Examples
Q07540:
DR   PDBsum; 2FQL; -.
DR   PDBsum; 2GA5; -.
   
P78536:
DR   PDBsum; 1BKC; -.
DR   PDBsum; 1ZXC; -.
DR   PDBsum; 2A8H; -.
DR   PDBsum; 2DDF; -.
DR   PDBsum; 2FV5; -.
DR   PDBsum; 2FV9; -.
DR   PDBsum; 2I47; -.
   
VectorBase

Cross-references have been added to the Invertebrate Vectors of Human Pathogens database. VectorBase is a NIAID Bioinformatics Resource Center for Invertebrate Vectors of Human Pathogens. VectorBase annotates and maintains vector genomes providing an integrated resource for the research community.

The VectorBase database is available at http://www.vectorbase.org/index.php.

The format of the explicit link is:

Data bank identifier VectorBase
Primary identifier The primary identifier consists of a VectorBase Gene ID.
Secondary identifier The secondary identifier consists of a species name.
Examples
Q17KX3:
DR   VectorBase; AAEL001551; Aedes aegypti.
   
Q7PD39:
DR   VectorBase; AGAP005024; Anopheles gambiae.
DR   VectorBase; AGAP005025; Anopheles gambiae.
   
Release of new species-specific documents which list entries and their corresponding gene designations

There are 9 new documents for several Brucella, Rickettsia and Coxiella complete proteomes, listing all the UniProtKB/Swiss-Prot entries from these proteomes and their corresponding gene designations.

The documents contain, for each relevant UniProtKB/Swiss-Prot entry, the corresponding ordered locus name, entry name, accession number, sequence length and gene name(s).

Changes concerning keywords (KW line)

New keywords:

Modified keywords:

Changes concerning the controlled vocabulary of subcellular locations and membrane topologies and orientations (comment line (CC) topic SUBCELLULAR LOCATION)

New subcellular locations:

UniProtKB release 12.6 of 04-Dec-2007

Changes concerning keywords (KW line)

Deleted keyword:

UniProtKB release 12.5 of 13-Nov-2007

Format change in the ptmlist.txt document file

The ptmlist.txt document, which is available by ftp and on the Web site, describes the post-translational modifications (PTMs) that are annotated in UniProtKB/Swiss-Prot entries in the feature (FT) keys CROSSLNK, LIPID and MOD_RES. The document was in a format that is suitable for computer applications (e.g. ExPASy's proteomics tools) but which was not very human readable. The new file format should improve this.

Previous format:

N,N-dimethylproline  MOD_RES P  BB Nter C2H4  28.031300  28.06  in  e:6446,7586,33682  Methylation  FT=MOD_RES%20dimethylproline&wild=1  AA0066  MOD:00075

New format:

ID   N,N-dimethylproline
AC   PTM-0179
FT   MOD_RES
TG   Proline.
PA   Amino acid backbone.
PP   N-terminal.
CF   C2 H4
MM   28.031300
MA   28.06
LC   Intracellular localisation.
TR   Eukaryota; taxId:6446 (Sipunculus nudus), taxId:7586 (Echinodermata), taxId:33682 (Euglenozoa).
KW   Methylation.
DR   RESID:AA0066.
DR   MOD:00075.
//

With the following definitions of the line types:

  ---------  ---------------------------     ----------------------
  Line code  Content                         Occurrence in an entry
  ---------  ---------------------------     ----------------------
  ID         Identifier (FT description)     Once; starts a PTM entry.
  AC         Accession (PTM-xxxx)            Once.
  FT         Feature key                     Once.
  TG         Target                          Once; two targets separated
                                             by a dash in case of intrachain
                                             crosslinks.
  PA         Position of the modified        Optional, once.
             amino acid
  PP         Position of the modification    Optional, once.
             in the polypeptide
  CF         Correction formula              Optional, once.
  MM         Monoisotopic mass difference    Optional, once.
  MA         Average mass difference         Optional, once.
  LC         Cellular location               Optional, once; alternatives
                                             can be proposed.
  TR         Taxonomic range                 Optional, once or more.
  KW         Keyword                         Optional, once or more.
  DR         Cross-reference to PTM          Optional, once or more.
             databases
  //         Terminator                      Once; ends an entry.
Changes concerning cross-references (DR line)
PDB

We added an additional field to the cross-reference (DR line) to the PDB database to show the resolution of structures that were determined by X-ray crystallography or electron microscopy.

For the chain names we use now the remediated data from wwPDB, therefore the chain names have changed for some entries.

Previous format:

DR   PDB; ENTRY_NAME; METHOD; CHAIN.

New format:

DR   PDB; ENTRY_NAME; METHOD; RESOLUTION; CHAIN.

Examples:

Q20728:
DR   PDB; 1LPL; X-ray; 1.77 A; A=135-229.
Q5HEB7:
DR   PDB; 2I8C; X-ray; 2.46 A; A/B=1-356.   

A dash indicates that we found no information about the resolution or that the field is not applicable (for NMR structures and theoretical models).

Examples:

P02768:
DR   PDB; 2ESG; X-ray; -; C=25-609.
P12872:
DR   PDB; 1LBJ; NMR; -; A=26-47.   
P0AC41:
DR   PDB; 2AD0; Model; -; A=1-588.  
CleanEx

Cross-references have been added to the CleanEx database of gene expression profiles. CleanEx is a database which provides access to public gene expression data via unique approved gene symbols and which represents heterogeneous expression data produced by different technologies in a way that facilitates joint analysis and cross-dataset comparisons.

The CleanEx database is available at http://www.cleanex.isb-sib.ch/.

The format of the explicit link is:

Data bank identifier CleanEx
Primary identifier The primary identifier consists of a combination of a species code and a gene identifier.
Secondary identifier None; a dash '-' is stored in that field.
Examples
O08788:
DR   CleanEx; MM_DCTN1; -.    
   
P78358:
DR   CleanEx; HS_CTAG1A; -.
DR   CleanEx; HS_CTAG1B; -.
   
Changes concerning keywords (KW line)

Modified keywords:

UniProtKB release 12.4 of 23-Oct-2007

Release of a new document which lists the controlled vocabularies used in the comment line (CC) topic SUBCELLULAR LOCATION

The document subcell.txt, available by ftp and on the Web site, lists the controlled vocabularies used in the comment line (CC) topic SUBCELLULAR LOCATION, their definitions and further information such as synonyms or relevant GO terms in the following format:

  ---------  -------------------------------   ----------------------------------------------
  Line code  Content                           Occurrence in an entry
  ---------  -------------------------------   ----------------------------------------------
  ID         Identifier (location)             Once; starts an entry
  IT         Identifier (topology)             Once; starts a 'topology' entry
  IO         Identifier (orientation)          Once; starts an 'orientation' entry
  AC         Accession (SL-xxxx)               Once
  DE         Definition                        Once or more
  SY         Synonyms                          Optional; Once or more
  SL         Content of subc. loc. lines       Once
  HI         Hierarchy ('is-a')                Optional; Once or more
  HP         Hierarchy ('part-of')             Optional; Once or more
  KW         Associated keyword (accession)    Optional; Once or more
  GO         Gene ontology (GO) mapping        Optional; Once or more
  WW         Interesting links or references   Optional; Once or more
  //         Terminator                        Once; ends an entry
  

Example:

ID   Cyanelle.
AC   SL-0082
DE   A cyanelle is a photosynthetic organelle of glaucocystophyte algae.
DE   Cyanelles are surrounded by a double membrane and, in between, a
DE   peptidoglycan wall. Thylakoid membrane architecture and the presence
DE   of carboxysomes are cyanobacteria-like. Historically, the term
DE   cyanelle is derived from a classification as endosymbiotic
DE   cyanobacteria, and thus is not fully correct.
SY   Muroplast; Cyanoplast.
SL   Plastid, cyanelle.
HI   Plastid.
KW   KW-0194
GO   GO:0009842; cyanelle
//
  
Syntax modification of the comment line (CC) topic SUBCELLULAR LOCATION

We have structured the comment line topic SUBCELLULAR LOCATION in order to improve the consistency of annotation and to allow to parse its content.

The new format of SUBCELLULAR LOCATION is:

CC   -!- SUBCELLULAR LOCATION:(( Molecule:)?( Location\.)+)?( Note=Free_text( Flag)?\.)?
  
Where:

Note: Perl-style multipliers indicate whether a pattern (as delimited by parentheses) is optional (?) or may occur 1 or more times (+). Alternative values are separated by a pipe symbol (|).

Examples:

P32755:
CC   -!- SUBCELLULAR LOCATION: Cytoplasm. Endoplasmic reticulum membrane;
CC       Peripheral membrane protein. Golgi apparatus membrane; Peripheral
CC       membrane protein.
  
Q96QV1:
CC   -!- SUBCELLULAR LOCATION: Cell membrane; Peripheral membrane protein
CC       (By similarity). Secreted (By similarity). Note=The last 22 C-
CC       terminal amino acids may participate in cell membrane attachment.
CC   -!- SUBCELLULAR LOCATION: Isoform 2: Cytoplasm (Probable).
  
P35670:
CC   -!- SUBCELLULAR LOCATION: Golgi apparatus, trans-Golgi network
CC       membrane; Multi-pass membrane protein (By similarity).
CC       Note=Predominantly found in the trans-Golgi network (TGN). Not
CC       redistributed to the plasma membrane in response to elevated
CC       copper levels.
CC   -!- SUBCELLULAR LOCATION: Isoform 2: Cytoplasm.
CC   -!- SUBCELLULAR LOCATION: WND/140 kDa: Mitochondrion.
  
Modification of the EC (Enzyme Commission) number format

EC numbers are used to describe enzyme reactions and are based on the recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (IUBMB). The EC numbers and the reactions they describe are stored in the ENZYME and IntEnz databases.

In the UniProt Knowledgebase some enzymes are assigned so-called partial EC numbers where part of the numbers are replaced by dashes (e.g. EC 3.4.24.-). This happens in the following situations:

  1. The catalytic activity of the protein is not known exactly.
  2. The protein catalyzes a reaction that is known, but not yet included in the IUBMB EC list.

To distinguish these two meanings, we have started to use the letter 'n' with a preliminary number instead of a dash '-' for the latter case. The retrofit of those existing EC numbers of proteins in UniProtKB that catalyze a reaction that is known, but not yet included in the IUBMB EC list will be an ongoing process.

Examples:

The catalytic activity of the protein is not known exactly:

Q9VAC5:
DE   ADAM 17-like protease precursor (EC 3.4.24.-).

The protein catalyzes a reaction that is known, but not yet included in the IUBMB's EC list:

Q9ES52:
DE   Phosphatidylinositol-3,4,5-trisphosphate 5-phosphatase 1 (EC 3.1.3.n1)
UniProtKB release 12.3 of 02-Oct-2007

Changes concerning the comment line (CC) topic MASS SPECTROMETRY

To be consistent with other comment line topics, we have changed the field tags of the topic MASS SPECTROMETRY. At the same time, we have extracted literature references into a new field, Source=, and replaced all molecule descriptions by isoform identifiers.

Previous format:

   CC   -!- MASS SPECTROMETRY: MW=mass(; MW_ERR=error)?; METHOD=method; RANGE=ranges( (molecule))?; NOTE=(references|free_text (references)).
  

New format:

   CC   -!- MASS SPECTROMETRY: Mass=mass(; Mass_error=error)?; Method=method; Range=ranges( (IsoformID))?(; Note=free_text)?; Source=references;
  

Examples:

P61409:

Previous format:

   CC   -!- MASS SPECTROMETRY: MW=3979.9; METHOD=Electrospray; RANGE=1-31;
   CC       NOTE=Ref.1, Ref.2.
  

New format:

   CC   -!- MASS SPECTROMETRY: Mass=3979.9; Method=Electrospray; Range=1-31;
   CC       Source=Ref.1, Ref.2;
  
P04653:

Previous format:

   CC   -!- MASS SPECTROMETRY: MW=23638.14; MW_ERR=3.0; METHOD=Electrospray;
   CC       RANGE=16-214 (P04653-2; Allele A); NOTE=With eleven phosphate
   CC       groups (Ref.2).
  

New format:

   CC   -!- MASS SPECTROMETRY: Mass=23638.14; Mass_error=3.0; Method=Electrospray;
   CC       Range=16-214 (P04653-2); Note=Allele A, with 11 phosphate groups;
   CC       Source=PubMed:7601973;
  

Note that literature references of the form Ref.n are replaced by PubMed identifiers where this is possible.

Changes concerning cross-references (DR line)
RefSeq

Cross-references have been added to the NCBI Reference Sequences database. The Reference Sequence (RefSeq) collection aims to provide a comprehensive, integrated, non-redundant set of sequences, including genomic DNA, transcript (RNA), and protein products for taxonomically diverse organisms including eukaryotes, bacteria, and viruses. RefSeq is a baseline for medical, functional, and diversity studies; they provide a stable reference for genome annotation, gene identification and characterization, mutation and polymorphism analysis, expression studies, and comparative analyses.

The RefSeq database is available at http://www.ncbi.nlm.nih.gov/RefSeq/.

The format of the explicit link is:

Data bank identifier RefSeq
Primary identifier The primary identifier consists of a RefSeq protein accession ID.
Secondary identifier None; a dash '-' is stored in that field.
Examples
O34697:
      DR   RefSeq; NP_390916.1; -.    
     
Q8IN81:
      DR   RefSeq; NP_524397.2; -.
      DR   RefSeq; NP_732344.1; -.
      DR   RefSeq; NP_732345.1; -.
      DR   RefSeq; NP_732346.1; -.
      DR   RefSeq; NP_732347.1; -.
      DR   RefSeq; NP_732348.1; -.
      DR   RefSeq; NP_732349.1; -.
      DR   RefSeq; NP_732350.1; -.   
     
GeneID

Cross-references have been added to the Database of genes from NCBI RefSeq genomes. Entrez Gene is the NCBI's database for gene-specific information. It does not include all known or predicted genes; instead Entrez Gene focuses on the genomes that have been completely sequenced, that have an active research community to contribute gene-specific information, or that are scheduled for intense sequence analysis. The content of Entrez Gene represents the result of curation and automated integration of data from NCBI's Reference Sequence project (RefSeq), from collaborating model organism databases, and from many other databases available from NCBI. Records are assigned unique, stable and tracked integers as identifiers. The content (nomenclature, map location, gene products and their attributes, markers, phenotypes, and links to citations, sequences, variation details, maps, expression, homologs, protein domains and external databases) is updated as new information becomes available. Entrez Gene is a step forward from NCBI's LocusLink, with both a major increase in taxonomic scope and improved access through the many tools associated with NCBI Entrez.

The GeneID database is available at http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene.

The format of the explicit link is:

Data bank identifier GeneID
Primary identifier The primary identifier consists of a GeneID accession ID.
Secondary identifier None; a dash '-' is stored in that field.
Examples
P63272:
      DR   GeneID; 6827; -.  
     
P74750:
      DR   GeneID; 951978; -.
      DR   GeneID; 953863; -.   
     
Change in the name of the documentation file orysa.txt

We changed the name of the documentation file orysa.txt, which is an index of Oryza sativa subsp. japonica (rice) entries and their corresponding gene designations, to rice.txt

UniProtKB release 12.2 of 11-Sep-2007

Changes concerning the comment line (CC) topic WEB RESOURCE

To be consistent with other comment line topics, we have changed the topic WEB RESOURCE from

CC   -!- WEB RESOURCE: NAME=resource_name(; NOTE=free_text)?; URL="url".
to
CC   -!- WEB RESOURCE: Name=resource_name(; Note=free_text)?; URL="url";
Format change in the dbxref.txt and jourlist.txt document files

The dbxref.txt file lists the names and abbreviations and URLs of all databases cross-referenced in the UniProt Knowledgebase. The jourlist.txt file lists the titles and abbreviations of all journals cited in the Swiss-Prot section of the UniProt Knowledgebase. We have added a new field, AC, to assign a stable identifier to each record in these files.

Examples:

dbxref.txt

AC    : DB-0022
Abbrev: EMBL
Name  : EMBL nucleotide sequence database
Ref   : Nucleic Acids Res. 35:D16-D20(2007); PubMed=17148479; DOI=10.1093/nar/gkl913;
LinkTp: Explicit
Server: http://www.ebi.ac.uk/embl/
Db_URL: www.ebi.ac.uk/htbin/expasyfetch?%s
Cat   : Sequence databases

jourlist.txt

AC    : JN-1120
Abbrev: J. Mol. Biol.
Title : Journal of Molecular Biology
ISSN  : 0022-2836
e-ISSN: 1089-8638
CODEN : JMOBAK
Short : JMB
Publis: Elsevier Science
Server: http://www.elsevier.com/locate/issn/00222836
UniProtKB release 12.1 of 21-Aug-2007

Change of release cycle

We are changing our release cycle from 2 to 3 weeks, i.e. release 12.2 is going to be published on Sep 11th, 2007.

Changes concerning cross-references (DR line)
RZPD-ProtExp

Cross-references to the RZPD-ProtExp have been removed.

UniProtKB release 12.0 of 24-Jul-2007

Introduction of the new line type PE (Protein Existence)

Most protein sequences are derived from translations of gene predictions. Some of them exhibit strong sequence similarity to known proteins in closely related species. For other proteins there is experimental evidence, such as Edman sequencing, clear identification by mass spectrometry (MSI), X-ray or NMR structure, detection by antibodies, etc. To indicate these different levels of evidence for the existence of a protein, we have introduced the PE (Protein Existence) line.

Note that the PE line does not describe the accuracy or correctness of a sequence displayed in UniProtKB, but the evidence for the existence of a protein. It may happen that the protein sequence is not entirely accurate, especially for sequences derived from gene predictions from genomic sequences.

The format of the PE line is:

PE   Level: Evidence;
With the following values:

Example:

PE   1: Evidence at protein level;

The PE line appears between the DR and KW lines of UniProtKB entries.

Modification of the RL (Reference Location) line for submissions

The format of the RL line for submissions is:

RL   Submitted (MMM-YYYY) to DatabaseName.

We have replaced the DatabaseName value Swiss-Prot by UniProtKB. The full list of valid DatabaseName values is now:

Changes concerning keywords (KW line)

New keywords:

Modified keywords:

Deleted keyword:

UniProtKB release 11.3 of 10-Jul-2007

Changes concerning cross-references (DR line)
PharmGKB

Cross-references have been added to the PharmGKB database. PharmGKB curates information that establishes knowledge about the relationships among drugs, diseases and genes, including their variations and gene products. It is a repository for genetic, genomic, molecular and cellular phenotype data and clinical information about people who have participated in pharmacogenomics research studies. The data includes, but is not limited to, clinical and basic pharmacokinetic and pharmacogenomic research in the cardiovascular, pulmonary, cancer, pathways, metabolic and transporter domains.

The PharmGKB database is available at http://www.pharmgkb.org/.

The format of the explicit link is:

Data bank identifier PharmGKB
Primary identifier The primary identifier consists of a PharmGKB accession ID.
Secondary identifier None; a dash '-' is stored in that field.
Example
Q96S55:
      DR   PharmGKB; PA134982239; -.   
   
Changes concerning keywords (KW line)

New keyword:

UniProtKB release 11.2 of 26-Jun-2007

Changes concerning keywords (KW line)

New keyword:

Modified keywords:

Changes concerning the controlled vocabulary for PTMs

Terms introduced:

Terms for the feature key 'CROSSLNK':

Terms for the feature key 'LIPID':

Terms for the feature key 'MOD_RES':

UniProtKB release 11.1 of 12-Jun-2007

Changes concerning cross-references (DR line)
PeptideAtlas

Cross-references have been added to the PeptideAtlas database. PeptideAtlas is a multi-organism, publicly accessible compendium of peptides that have been identified in a large set of tandem mass spectrometry proteomics experiments. All results of sequence searching have subsequently been processed through PeptideProphet to derive a probability of correct identification for all results in a uniform manner to insure a high quality database. All peptides have been mapped to Ensembl and can be viewed as custom tracks on the Ensembl Genome Browser.

The PeptideAtlas database is available at http://www.peptideatlas.org/.

The format of the explicit link is:

Data bank identifier PeptideAtlas
Primary identifier The primary identifier consists of a UniProtKB accession number.
Secondary identifier None; a dash '-' is stored in that field.
Example
P08524:
   DR   PeptideAtlas; P08524; -.
   
Changes concerning cross-references (DR line)
DisProt

Cross-references have been added to the Database of Protein Disorder (DisProt). The Database of Protein Disorder (DisProt) is a curated database that provides information about proteins that lack fixed 3D structure in their putatively native states, either in their entirety or in part. DisProt is a collaborative effort between Center for Computational Biology and Bioinformatics at Indiana University School of Medicine and Center for Information Science and Technology at Temple University.

The DisProt database is available at http://www.disprot.org/.

The format of the explicit link is:

Data bank identifier DisProt
Primary identifier The primary identifier consists of a DisProt accession number.
Secondary identifier None; a dash '-' is stored in that field.
Example
P07293:
   DR   DisProt; DP00228; -.
   DR   DisProt; DP00440; -.
   
UniProtKB release 11.0 of 29-May-2007

New ftp directory for UniProt Metagenomic and Environmental Sequences (UniMES)

We are pleased to announce a new UniProt database. The UniProt Metagenomic and Environmental Sequences (UniMES) database is a repository specifically developed for metagenomic and environmental data. Currently the database contains only data from the Global Ocean Sampling Expedition (GOS). The environmental sample data contained within this database is not present in either the UniProt Knowledgebase or the UniProt Reference Clusters. UniMES is released in FASTA format and to add further value, we have collaborated with the InterPro team to provide a file containing InterPro matches to UniMES.

UniMES is available in the new subdirectory current_release/unimes of the UniProt ftp servers ftp.uniprot.org/pub/databases/uniprot, ftp.ebi.ac.uk/pub/databases/uniprot and ftp.expasy.org/databases/uniprot.

New comment line (CC) topic SEQUENCE CAUTION

We have introduced the new CC line topic SEQUENCE CAUTION to describe protein sequence reports that differ from the sequence that is shown in UniProtKB due to conflicts that are not described in FT CONFLICT lines, such as frameshifts, erroneous gene model predictions, etc. This kind of information was before reported in the CC line topic CAUTION together with other warnings that are unrelated to sequence conflicts.

The format of the SEQUENCE CAUTION topic is:

CC   -!- SEQUENCE CAUTION:
         Sequence=Sequence; Type=Type;[ Positions=Positions;][ Note=Note;]

Where:

These lines are not wrapped and their length may therefore exceed 75 characters.

Examples:

Q93W20:
Previous annotation:
CC   -!- CAUTION: Ref.2 (BAA97015) sequence differs from that shown due to
CC       erroneous gene model prediction. The predicted gene At5g49940 has
CC       been split into 2 genes: At5g49940 and At5g49945.
New annotation:
CC   -!- SEQUENCE CAUTION:
CC       Sequence=BAA97015.1; Type=Erroneous gene model prediction; Note=The predicted gene At5g49940 has been split into 2 genes: At5g49940 and At5g49945;
Q83M39:
Previous annotation:
CC   -!- CAUTION: Ref.1 and Ref.2 sequences differ from that shown due to a
CC       stop codon at position 273 which was translated as Gln to extend
CC       the sequence.
New annotation:
CC   -!- SEQUENCE CAUTION:
CC       Sequence=AAN42076.1; Type=Erroneous termination; Positions=273; Note=Translated as Gln;
CC       Sequence=AAP15953.1; Type=Erroneous termination; Positions=273; Note=Translated as Gln;
P17814:
Previous annotation:
CC   -!- CAUTION: Ref.1 (CAA36850) sequence differs from that shown due to
CC       a frameshift in position 496.
CC   -!- CAUTION: Ref.1 (CAA36850) sequence differs from that shown due to
CC       erroneous gene model prediction.
New annotation:
CC   -!- SEQUENCE CAUTION:
CC       Sequence=CAA36850.1; Type=Erroneous gene model prediction;
CC       Sequence=CAA36850.1; Type=Frameshift; Positions=496;
P0A7B3:
Previous annotation:
CC   -!- CAUTION: Ref.4 (X07863) sequence differs from that shown due to
CC       several frameshifts.
CC   -!- CAUTION: Ref.5 (Y00357) sequence differs from that shown due to
CC       frameshifts in positions 204, 215 and 282.
New annotation:
CC   -!- SEQUENCE CAUTION:
CC       Sequence=X07863; Type=Frameshift; Positions=Several;
CC       Sequence=Y00357; Type=Frameshift; Positions=204, 215, 282;
P27612:
Previous annotation:
CC   -!- CAUTION: Ref.2 (AAA39943) sequence differs from that shown due to
CC       frameshifts in positions 4, 32, and 42.
CC   -!- CAUTION: Ref.2 (AAA39943) sequence differs from that shown due to
CC       contaminating sequence.
CC   -!- CAUTION: Ref.3 sequence differs from that shown due to a
CC       frameshift in position 697.
Current annotation:
CC   -!- SEQUENCE CAUTION:
CC       Sequence=AAA39943.1; Type=Miscellaneous discrepancy; Note=Several frameshifts and contaminating sequence;
CC       Sequence=Ref.3; Type=Frameshift; Positions=697;
Multiple occurrence of comment line (CC) topic SUBCELLULAR LOCATION

From now on, the CC line topic SUBCELLULAR LOCATION may occur more than once per entry.

Changes concerning cross-references (DR line)
PseudoCAP

Cross-references have been added to the Pseudomonas aeruginosa Community Annotation Project database. This database provides genome annotation of P. aeruginosa strain PAO1 and of other Pseudomonas species, acting as a valuable comparative resource for P. aeruginosa research, as well as being useful for the larger Pseudomonas research community. Over the coming year this database will be further enhanced toward more focus on comparative analysis of P. aeruginosa isolates and more specific information about putative drug and vaccine targets.

The Pseudomonas aeruginosa Community Annotation Project database is available at http://www.pseudomonas.com/.

The format of the explicit link is:

Data bank identifier PseudoCAP
Primary identifier The primary identifier consists of the ordered locus name.
Secondary identifier None; a dash '-' is stored in that field.
Example
Q9I576:
   DR   PseudoCAP; PA0865; -.
   
Orphanet

Cross-references have been added to the Orphanet database. This database is dedicated to information on rare diseases and orphan drugs. It aims to improve management and treatment of genetic, auto-immune or infectious rare diseases, rare cancers, or not yet classified rare diseases. ORPHANET offers services adapted to the needs of patients and their families, health professionals and researchers, support groups and industry.

The Orphanet database is available at http://www.orpha.net/consor/cgi-bin/home.php?Lng=GB.

The format of the explicit link is:

Data bank identifier Orphanet
Primary identifier The primary identifier consists of the Orpha unique disease identifier.
Secondary identifier The secondary identifier consists of the name of the disease.
Example
P26439:
   DR   Orphanet; 418; Adrenal hyperplasia, congenital.
   DR   Orphanet; 3185; Stein-Leventhal syndrome.
   
Changes concerning keywords (KW line)

New keyword:

UniProtKB release 10.4 of 01-May-2007

Changes concerning keywords (KW line)

Modified keyword:

UniProtKB release 10.2 of 03-Apr-2007

Changes concerning cross-references (DR line)
BuruList

Cross-references have been added to the Mycobacterium ulcerans genome database. This database is dedicated to the analysis of the genome of Mycobacterium ulcerans, the Buruli ulcer bacillus: BuruList. BuruList provides a complete dataset of DNA and protein sequences derived from the epidemic strain Agy99, linked to the relevant annotations and functional assignments. It allows one to easily browse through these data and retrieve information, using various criteria (gene names, location, keywords, etc.).

The Mycobacterium ulcerans genome database is available at http://genolist.pasteur.fr/BuruList/.

The format of the explicit link is:

Data bank identifier BuruList
Primary identifier The primary identifier consists of the ordered locus name.
Secondary identifier None; a dash '-' is stored in that field.
Example
A0PW55:
   DR   BuruList; MUL_4631; -.
   
Changes concerning keywords (KW line)

New keyword:

UniProtKB release 10.0 of 06-Mar-2007

Format change in the dbxref.txt document file

The dbxref.txt file lists the names and abbreviations and URLs of all databases cross-referenced in the UniProt Knowledgebase. We have added a new optional field, "Ref". This field contains the database reference in the following format:

Ref   : Journal_abbrev Volume:First_page-Last_page(YYYY); [PubMed=Pubmed_identifier; ][DOI=Digital_object_identifier;]

Example:

Abbrev: PROSITE
Name  : PROSITE; a protein domain and family database
Ref   : Nucleic Acids Res. 34:D227-D230(2006); PubMed=16381852; DOI=10.1093/nar/gkj063;
LinkTp: Explicit
Server: http://www.expasy.org/prosite/
Db_URL: www.expasy.org/cgi-bin/get-prosite-raw.pl?%s
Cat   : Family and domain databases
Changes concerning keywords (KW line)

New keyword:

UniProtKB release 9.7 of 20-Feb-2007

Changes concerning cross-references (DR line)
CYGD

Cross-references have been added to the MIPS Comprehensive Yeast Genome Database. This database aims to present information on the molecular structure and functional network of the entirely sequenced, well-studied model eukaryote, the budding yeast Saccharomyces cerevisiae. In addition the data of various projects on related yeasts are used for comparative analysis.

The CYGD is available at http://mips.gsf.de/genre/proj/yeast.

The format of the explicit links is:

Data bank identifier CYGD
Primary identifier The primary identifier consists of the ordered locus name.
Example
P35688:
   DR   CYGD; YDL240w; -.
   
New molecule type in the cross-references to EMBL

We added the value Viral_cRNA to the controlled vocabulary of the field MoleculeType of the cross-references to the EMBL nucleotide sequence database. The format of the DR EMBL line is:

DR   EMBL; AccessionNumber; ProteinID; StatusIdentifier; MoleculeType.

The controlled vocabulary of the field MoleculeType is:

Changes concerning keywords (KW line)

New keyword:

UniProtKB release 9.6 of 06-Feb-2007

Changes concerning cross-references (DR line)
Cornea-2DPAGE

Cross-references have been added to the Human Cornea 2-DE database, a two-dimensional polyacrylamide gel electrophoresis federated database available at the Aarhus University (Denmark).

The Cornea-2DPAGE is available at http://www.cornea-proteomics.com/.

The format for the explicit links is:

Data bank identifier Cornea-2DPAGE
Primary identifier The primary identifier consists of a UniProtKB accession number.
Secondary identifier The secondary identifier consists of the organism common name.
Example
P31946:
   DR   Cornea-2DPAGE; P31946; HUMAN.
DOSAC-COBS-2DPAGE

Cross-references have been added to the DOSAC-COBS 2D Page, a two-dimensional polyacrylamide gel electrophoresis federated database available at the DOSAC and COBS genome and proteome laboratory (La Maddalena, Italy).

The DOSAC-COBS-2DPAGE is available at http://www.dosac.unipa.it/2d/.

The format for the explicit links is:

Data bank identifier DOSAC-COBS-2DPAGE
Primary identifier The primary identifier consists of a UniProtKB accession number.
Secondary identifier The secondary identifier consists of the organism common name.
Example
P15531:
   DR   DOSAC-COBS-2DPAGE; P15531; HUMAN.
REPRODUCTION-2DPAGE

Cross-references have been added to the REPRODUCTION-2DPAGE, a two-dimensional polyacrylamide gel electrophoresis database available at the laboratory of Reproductive Medicine, Nanjing Medical University, P. R. China.

The REPRODUCTION-2DPAGE is available at http://reprod.njmu.edu.cn/cgi-bin/2d/2d.cgi.

The format for the explicit links is:

Data bank identifier REPRODUCTION-2DPAGE
Primary identifier The primary identifier consists of a UniProtKB accession number.
Secondary identifier The secondary identifier consists of the organism common name.
Example
P32119:
   DR   REPRODUCTION-2DPAGE; P32119; HUMAN.
UniProtKB release 9.5 of 23-Jan-2007

Changes in the usage of the feature key INIT_MET

The feature key INIT_MET indicates that there is experimental evidence that the initiator methionine has been cleaved off. In the past, the initiator methionine was not included in the sequence of an UniProtKB entry in such a case and the INIT_MET sequence coordinates were therefore 0.

Example:

FT   INIT_MET      0      0
FT   CHAIN         1    104       Cytochrome c.
FT                                /FTId=PRO_0000108218.
..
SQ   SEQUENCE   104 AA;  11618 MW;  D47C9B513DF1C5C2 CRC64;
     GDVEKGKKIF IMKCSQCHTV EKGGKHKTGP NLHGLFGRKT GQAPGYSYTA ANKNKGIIWG
     EDTLMEYLEN PKKYIPGTKM IFVGIKKKEE RADLIAYLKK ATNE
//

We have added back the initiator methionine to such protein sequences and changed the sequence coordinates of the feature key INIT_MET accordingly to 1.

Example:

FT   INIT_MET      1      1
FT   CHAIN         2    105       Cytochrome c.
FT                                /FTId=PRO_0000108218.
..
SQ   SEQUENCE   105 AA;  11749 MW;  8EE9689E0102506B CRC64;
     MGDVEKGKKI FIMKCSQCHT VEKGGKHKTG PNLHGLFGRK TGQAPGYSYT AANKNKGIIW
     GEDTLMEYLE NPKKYIPGTK MIFVGIKKKE ERADLIAYLK KATNE
//
UniProtKB release 9.4 of 09-Jan-2007

Changes concerning cross-references (DR line)
MaizeGDB

We changed the Data bank identifier for the Maize Genetics and Genomics Database MaizeGDB from MaizeDB to MaizeGDB.

Example:

DR   MaizeDB; 58111; -.

has changed to

DR   MaizeGDB; 58111; -.
Changes concerning keywords (KW line)

New keyword:

UniProtKB release 9.3 of 12-Dec-2006

Release of a new document presenting our Protein Spotlight articles and cited UniProtKB/Swiss-Prot entries

The document protspot.txt, available by ftp and on the Web site, lists the Protein Spotlight articles and cited UniProtKB/Swiss-Prot entries.

This document contains, for each Protein Spotlight article, the corresponding entries cited in that article. Protein Spotlight (ISSN 1424-4721) is a monthly review written by the Swiss-Prot team of the Swiss Institute of Bioinformatics. Spotlight articles describe a specific protein or family of proteins on an informal tone. Protein Spotlight is available at: http://www.expasy.org/spotlight/.

Changes concerning cross-references (DR line)
DIP

Cross-references have been added to the Database of interacting proteins. The DIP database catalogs experimentally determined interactions between proteins. It combines information from a variety of sources to create a single, consistent set of protein-protein interactions. The data stored within the DIP database were curated, both, manually by expert curators and also automatically using computational approaches that utilize the the knowledge about the protein-protein interaction networks extracted from the most reliable, core subset of the DIP data.

The DIP is available at http://dip.doe-mbi.ucla.edu/.

The format for the explicit links is:

Data bank identifier DIP
Primary identifier The primary identifier consists of the DIP accession number.
Secondary identifier None; a dash '-' is stored in that field.
Examples
Q9W1K5:
   DR   DIP; DIP:19601N; -.
P41597:
   DR   DIP; DIP:5833N; -.
   DR   DIP; DIP:5839N; -.
Reactome

The primary identifier of the cross-references to the Reactome database has been modified. The primary identifier was a Reactome's unique identifier for a protein, which was identical to the Swiss-Prot primary AC number of that protein. Now it is a stable Reactome identifier. In addition, the pathway name is given as a secondary identifier.

Primary identifier The primary identifier consists of the Reactome identifier.
Secondary identifier The secondary identifier consists of a Pathway name
Examples
P61978:
   DR   Reactome; REACT_1675.1; mRNA Processing.
   DR   Reactome; REACT_71.1; Gene Expression.
P62191:
   DR   Reactome; REACT_152.2; Cell Cycle, Mitotic.
   DR   Reactome; REACT_1538.1; Cell Cycle Checkpoints.
   DR   Reactome; REACT_383.2; DNA Replication.
   DR   Reactome; REACT_6185.3; HIV Infection.
   DR   Reactome; REACT_6850.1; Cdc20:Phospho-APC/C mediated degradation of Cyclin A.
GermOnline

The primary identifier of the cross-references to the GermOnline database has been modified. The primary identifier was a GermOnline's identifier for a gene. Now it is a gene identifier from any source, e.g. Ensembl or model organism database. In addition, the organism name is the secondary identifier.

Primary identifier The primary identifier consists of a gene identifier from any source, e.g. Ensembl or model organism database.
Secondary identifier The secondary identifier consists of the organism name
Examples
P02766:
   DR   GermOnline; ENSG00000118271; Homo sapiens.
P32559:
   DR   GermOnline; YMR023C; Saccharomyces cerevisiae.
Changes concerning keywords (KW line)

New keywords:

Modified keyword:

UniProtKB release 9.2 of 28-Nov-2006

Changes concerning cross-references (DR line)
Gene3D

Cross-references have been added to the Gene3D Structural and Functional Annotation of Protein Families database. Gene3D database provides a combined structural, functional and evolutionary view of the protein world. It is focussed on providing structural annotation for protein sequences without structural representatives--including the complete proteome sets of over 240 different species. The protein sequences have also been clustered into whole-chain families so as to aid functional prediction. The structural annotation is generated using HMM models based on the CATH domain families; CATH is a repository for manually deduced protein domains.

The Gene3D is available at http://cathwww.biochem.ucl.ac.uk:8080/Gene3D/.

The format for the explicit links is:

Data bank identifier Gene3D
Primary identifier The primary identifier consists of the Gene3D ID.
Secondary identifier The secondary identifier consists of a Gene3D entry name
Examples
Q12933:
   DR   Gene3D; G3DSA:3.90.890.10; SIAH-type; 1.
   DR   Gene3D; G3DSA:2.60.210.10; TRAF-type; 1.
   
Q04311:
   DR   Gene3D; G3DSA:1.25.40.20; ANK; 1.
Changes concerning keywords (KW line)

New keyword:

UniProtKB release 9.1 of 14-Nov-2006

Extension of the UniProtKB accession number format

Before release 9.1, UniProtKB accession numbers consisted of 6 alphanumerical characters in the following format:

1 2 3 4 5 6
[O,P,Q] [0-9] [A-Z, 0-9] [A-Z, 0-9] [A-Z, 0-9] [0-9]

Due to the large increase in the number of protein sequences in UniProtKB, we had to extend the existing accession number format by allowing the first character to be any of the 26 letters (instead of only O, P and Q). To avoid assigning accession numbers identical to those which have been used by the International Nucleotide Sequence Database, the extension in the first position goes along with a restriction in the third position which can only be a letter. The new format for UniProtKB accession numbers is therefore:

1 2 3 4 5 6
[A-N,R-Z] [0-9] [A-Z] [A-Z, 0-9] [A-Z, 0-9] [0-9]
[O,P,Q] [0-9] [A-Z, 0-9] [A-Z, 0-9] [A-Z, 0-9] [0-9]
Changes concerning cross-references (DR line)
DrugBank

Cross-references have been added to the DrugBank database. The DrugBank database is a unique bioinformatics and cheminformatics resource that combines detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e. sequence, structure, and pathway) information. The database contains many drug entries including FDA-approved small molecule drugs, FDA-approved biotech (protein/peptide) drugs, nutraceuticals and experimental drugs. Additionally, protein (i.e. drug target) sequences are linked to these drug entries. Each DrugCard entry contains more than 80 data fields with half of the information being devoted to drug/chemical data and the other half devoted to drug target or protein data.

The DrugBank database is available at http://redpoll.pharmacy.ualberta.ca/drugbank/.

The format for the explicit links is:

Data bank identifier DrugBank
Primary identifier The primary identifier consists of the DrugBank ACcession number.
Secondary identifier The secondary identifier consists of a generic name
Examples
P08908:
   DR  DrugBank; APRD00096; Tegaserod.         
   DR  DrugBank; APRD00222; Buspirone.         
   DR  DrugBank; APRD00540; Ziprasidone.       
   DR  DrugBank; APRD00638; Aripiprazole.      
   DR  DrugBank; APRD00711; Ergoloid mesylate. 
   DR  DrugBank; APRD00945; Eletriptan.
P08185:
   DR  DrugBank; APRD00065; Fluticasone Propionate.  
   DR  DrugBank; APRD00197; Prednisolone.            
   DR  DrugBank; APRD00422; Triamcinolone.           
   DR  DrugBank; APRD00564; Beclomethasone.          
   DR  DrugBank; APRD00783; Alclometasone.           
   DR  DrugBank; APRD00975; Flumethasone Pivalate.   
   DR  DrugBank; APRD00976; Flunisolide.             
   DR  DrugBank; APRD00977; Fluocinolone Acetonide.  
   DR  DrugBank; APRD00978; Fluocinonide.            
   DR  DrugBank; APRD00980; Fluorometholone.         
   DR  DrugBank; APRD00982; Flurandrenolide.         
   DR  DrugBank; APRD01010; Halobetasol Propionate.  
   DR  DrugBank; APRD01091; Medrysone.               
   DR  DrugBank; APRD01220; Rimexolone.
euHCVdb

Cross-references have been added to the European Hepatitis C Virus database. The development of the European Hepatitis C Virus database (euHCVdb) started in 1999 as the French HCV Database. The euHCVdb is mainly oriented towards protein sequence, structure and function analyses and structural biology of HCV.

The European Hepatitis C Virus database is available at http://euhcvdb.ibcp.fr/euHCVdb/.

The format for the explicit links is:

Data bank identifier euHCVdb
Primary identifier The primary identifier consists of an EMBL ACcession number.
Examples
P26664:
   DR   euHCVdb; AF271632; -.
   DR   euHCVdb; M62321; -.
P27953:
   DR   euHCVdb; X53136; -.
Changes concerning cross-references (FT line)
dbSNP

Explicit links are present in the FT VARIANT lines of protein sequence entries of Hominidae to the Single Nucleotide Polymorphism database (dbSNP). We will prefix dbSNP identifiers in human FT VARIANT lines by "rs". NCBI/dbSNP has rs and ss numbers, but we only refer to SNPs with rs numbers.

Examples:

P08185: 
   FT   VARIANT     246    246       S -> A (in dbSNP:rs2228541).
   FT                                /FTId=VAR_024350.
P06307: 
   FT   VARIANT      32     32       G -> E (in dbSNP:rs11571848).
   FT                                /FTId=VAR_018818.
   FT   VARIANT      95     95       R -> W (in dbSNP:rs3774395).
   FT                                /FTId=VAR_024452.
UniProtKB release 9.0 of 31-Oct-2006

Changes in the ID (IDentification) line

The format of the ID line was:

ID   EntryName DataClass; MoleculeType; SequenceLength.

We have changed the values of the DataClass field as described in this table:

Old DataClassNew DataClassDescription
STANDARDReviewed Entries that have been manually reviewed and annotated by UniProtKB curators (Swiss-Prot section of the UniProt Knowledgebase).
PRELIMINARYUnreviewed Computer-annotated entries that have not been reviewed by UniProtKB curators (TrEMBL section of the UniProt Knowledgebase).

We have also dropped the field MoleculeType, which was a legacy of compatibility with the EMBL flat file format. The new format of the ID line is:

ID   EntryName DataClass; SequenceLength.

Examples:

ID   CYC_PIG                 Reviewed;         104 AA.
     
ID   Q3ASY8_CHLCH            Unreviewed;     36805 AA.
Changes in the FASTA header line

We have standardized the FASTA header line of UniProtKB and UniRef entries in the following way:

Format for UniProtKB
>UniqueIdentifier|EntryName ProteinName - OrganismName

Examples:

>P24856|ANP_NOTCO Ice-structuring glycoprotein (Fragment) - Notothenia coriiceps neglecta

>P51650-1|SSDH_RAT Succinate semialdehyde dehydrogenase - Rattus norvegicus
Format for UniRef
>UniqueIdentifier Cluster: ClusterName; n=Members; Taxon|Rep: ProteinName - OrganismName

Example:

>UniRef50_P24856 Cluster: Ice-structuring glycoprotein (Fragment); n=15; Holacanthopterygii|Rep: Ice-structuring glycoprotein (Fragment) - Notothenia coriiceps neglecta
Changes concerning cross-references (DR line)
HPA

Cross-references have been added to the Human Protein Atlas. The Human Protein Atlas shows the expression and localization of proteins in a large variety of normal human tissues and cancer cells. The data is presented as high resolution images representing immunohistochemically stained tissue sections. Each antibody in the database has been used for immunohistochemical staining of both normal and cancer tissue. The immunohistochemical protocols used result in a brown-black staining, localized where an antibody has bound to its corresponding antigen.

The Human Protein Atlas is available at http://www.proteinatlas.org/.

The format for the explicit links is:

Data bank identifier HPA
Primary identifier The primary identifier consists of the antibody identifier.
Examples
O75843:
   DR   HPA; HPA004106; -.
P08183:
   DR   HPA; CAB001716; -.
Gene Ontology (GO)

The last field of the cross-references to the Gene Ontology (GO) database has been modified. This field displays the GO evidence code and we have appended to this the source database from which the cross-reference was obtained, separated by a colon.

Examples
Q15738:
   DR   GO; GO:0005783; C:endoplasmic reticulum; IDA:LIFEdb.
P13569:
   DR   GO; GO:0005524; F:ATP binding; TAS:ProtInc.
Changes concerning keywords (KW line)

New keyword:

UniProtKB release 8.9 of 17-Oct-2006

Changes concerning keywords (KW line)

Modified keywords:

Deleted keyword:

UniProtKB release 8.7 of 19-Sep-2006

Changes concerning cross-references (DR line)
KEGG

Cross-references have been added to the KEGG database. KEGG: Kyoto Encyclopedia of Genes and Genomes is part of the research projects of the Kanehisa Laboratories in the Bioinformatics Center of Kyoto University and the Human Genome Center of the University of Tokyo. The aim of this bioinformatics resource is to provide as far as possible a complete computer representation of the cell, the organism, and the biosphere, which will enable computational prediction of higher-level complexity of cellular processes and organism behaviors from genomic and molecular information.

The KEGG database is available at http://www.genome.jp/kegg/.

The format for the explicit links is:

Data bank identifier KEGG
Primary identifier The primary identifier consists of a (KEGG-)organism code for the genome, colon, gene number.
Examples
P54609:
   DR   KEGG; ath:At3g09840; -.
O43623:
   DR   KEGG; hsa:6591; -.
UniProtKB release 8.6 of 05-Sep-2006

Changes concerning cross-references (DR line)
ArrayExpress

Cross-references have been added to the ArrayExpress , a public repository database for microarray gene expression data. The ArrayExpress Data Warehouse stores gene-indexed expression profiles from a curated subset of experiments in the repository.

The ArrayExpress database is available at http://www.ebi.ac.uk/arrayexpress/.

The format for the explicit links is:

Data bank identifier ArrayExpress
Primary identifier UniProtKB primary AC.
Example
O00139:
   DR   ArrayExpress; O00139; -.
PeroxiBase

Cross-references have been added to the PeroxiBase, a database which centralizes most of the peroxidase superfamilies encoding sequences, to follow the evolution of peroxidase among living organism and to compile the information concerning putative functions and transcription regulation.

The PeroxiBase database is available at http://peroxidase.isb-sib.ch/.

The format for the explicit links is:

Data bank identifier PeroxiBase
Primary identifier PeroxiBase's accession number.
Secondary identifier PeroxiBase's entry name
Example
O23044:
   DR   PeroxiBase; 79; AtPrx03.
Changes concerning keywords (KW line)

New keyword:

Changes concerning the controlled vocabulary for PTMs

Terms introduced:

Terms for the feature key 'CROSSLNK':

Terms for the feature key 'LIPID':

Terms for the feature key 'MOD_RES':

UniProtKB release 8.4 of 25-Jul-2006

Changes in the RP (Reference Position) line

The qualifier 'LARGE SCALE ANALYSIS' was added in RP lines for references that report large screen results to indicate that results have not been extensively studied.

P33304:
RP   PHOSPHORYLATION [LARGE SCALE ANALYSIS] AT SER-22, AND MASS
RP   SPECTROMETRY.
UniProtKB release 8.3 of 11-Jul-2006

Changes concerning keywords (KW line)

Deleted keyword:

UniProtKB release 8.2 of 27-June-2006

Release of a new document which lists all the Oryza sativa (rice) entries and their corresponding gene designations

The document orysa.txt, available by ftp and on the Web site, lists all the Oryza sativa (rice) entries of UniProtKB/Swiss-Prot.

This document contains, for each individual UniProtKB/Swiss-Prot rice entry, the corresponding chromosome locus, the UniProtKB/Swiss-Prot accession number, the UniProtKB/Swiss-Prot entry name, the description and the gene name(s).

Changes concerning keywords (KW line)

Modified keyword:

UniProtKB release 8.1 of 13-Jun-2006

Release of a new document presenting our protein naming guidelines

The document nameprot.txt, available by ftp and on the Web site, lists a number of rules for naming proteins. UniProt is constantly striving to further standardize the nomenclature for a given protein across related organisms. In this context, we try to use these rules to attribute a recommended name to all the proteins of UniProtKB/Swiss-Prot. We also we hope that authors/laboratories will follow as much as possible these rules for naming new proteins.

Changes concerning cross-references (DR line)
RZPD-ProtExp

Cross-references have been added to the RZPD-ProtExp. RZPD Deutsches Ressourcenzentrum fuer Genomforschung is a non-profit service center for genomics and proteomics research. We introduced a new cross-reference to the RZPD "Full ORF Clones" product, which is a collection of validated ORF protein expression clones containing the complete coding sequences for genes.

The RZPD-ProtExp database is available at http://www.rzpd.de/products/orfclones/.

The format for the explicit links is:

Data bank identifier RZPD-ProtExp
Primary identifier The primary identifier is the clone name.
Examples
Q8NHQ1:
   DR   RZPD-ProtExp; IOH13284; -.
   DR   RZPD-ProtExp; IOH22331; -.
   DR   RZPD-ProtExp; W0600; -.
   
Q9NP90:
   DR   RZPD-ProtExp; IOH42108; -.
   DR   RZPD-ProtExp; U1183; -.
   
Changes concerning keywords (KW line)

New keyword:

Deleted keyword:

UniProtKB release 8.0 of 30-May-2006

Replacement of the feature key VARSPLIC by VAR_SEQ

Pre-translational events have so far been represented by several feature keys, e.g. alternative splicing and promoter usage were annotated with the VARSPLIC feature key, alternative initiation with the INIT_MET feature key and RNA editing with the VARIANT feature key. In order to improve the consistency of annotation of pre- and co-translational events, we have removed the feature key VARSPLIC and introduced the new feature key VAR_SEQ for the description of alternative splicing, alternative promoter usage, alternative initiation and ribosomal frameshifting. The INIT_MET feature key remains, but its usage is now restricted to the annotation of initiator methionine cleavage. We will continue to use the VARIANT feature key and the comment line topic RNA EDITING to describe RNA editing.

Syntax modification of the comment line (CC) topic ALTERNATIVE PRODUCTS

In order to improve the consistency of annotation of pre- and co-translational events, we have modified the syntax of the comment line topic ALTERNATIVE PRODUCTS. This modification allows programs to reconstruct alternative sequences according to the corresponding feature identifiers not only for alternative splicing events, as with the old syntax, but also for alternative promoter usage and alternative initiation events.

The new format of ALTERNATIVE PRODUCTS is:

 CC   -!- ALTERNATIVE PRODUCTS:
 CC       Event=Event(, Event)*; Named isoforms=Number_of_isoforms;
(CC         Comment=Free_text;)?
(CC       Name=Isoform_name;( Synonyms=Synonym(, Synonym)*;)?
 CC         IsoId=Isoform_identifier(, Isoform_identifer)*;
 CC         Sequence=(Displayed|External|Not described|Feature_identifier(, Feature_identifier)*);
(CC         Note=Free_text;)?)+

Note: Variable values are represented in italics. Perl-style multipliers indicate whether a pattern (as delimited by parentheses) is optional (?), may occur 0 or more times (*), or 1 or more times (+). Alternative values are separated by a pipe symbol (|).

The "Event" item lists one or a combination of the following values:

The "Note" item may specify the event(s), if there are several.

Example:

CC   -!- ALTERNATIVE PRODUCTS:
CC       Event=Alternative splicing, Alternative initiation; Named isoforms=3;
CC         Comment=Isoform 1 and isoform 2 arise due to the use of two
CC         alternative first exons joined to a common exon 2 at the same
CC         acceptor site but in different reading frames, resulting in two
CC         completely different isoforms;
CC       Name=1; Synonyms=p16INK4a;
CC         IsoId=O77617-1; Sequence=Displayed;
CC       Name=3;
CC         IsoId=O77617-2; Sequence=VSP_004099;
CC         Note=Produced by alternative initiation at Met-35 of isoform 1;
CC       Name=2; Synonyms=p19ARF;
CC         IsoId=O77618-1; Sequence=External;
..
FT   VAR_SEQ       1     34       Missing (in isoform 3).
FT                                /FTId=VSP_004099.
Replacement of the comment line (CC) topic DATABASE by WEB RESOURCE

We have replaced the CC line topic DATABASE by WEB RESOURCE to clarify the conceptual difference between the content of these lines and the DR (Database cross-Reference) lines. At the same time we have simplified the format by suppressing the 'FTP=' field, which is no longer in use.

The format of the DATABASE topic was:

CC   -!- DATABASE: NAME=ResourceName[; NOTE=FreeText][; WWW=WWWAddress][; FTP=FTPAddress].

The format of the WEB RESOURCE topic is:

CC   -!- WEB RESOURCE: NAME=ResourceName[; NOTE=FreeText]; URL=WWWAddress.

The length of these lines may exceed 75 characters because long URL addresses are not wrapped into multiple lines.

Introduction of the new line type OH (Organism Host) for viral hosts

A virus is a living organism only if we consider it associated with its host. The viral taxonomy is arbitrarily based on the nature of viral genomes, and viruses of the same family can infect a wide range of hosts. There are numerous virus-host interactions, which we intend to annotate. We have therefore introduced to viral UniProtKB entries a new line type, the OH line, to indicate the host(s) either as a specific organism or taxonomic group of organisms.

The format of the OH line is:

OH   NCBI_TaxID=TaxID; HostName.
The HostName consists of the official name and, optionally, a common name and/or synonym. The length of an OH line may exceed 75 characters.

Example:

OS   Tomato black ring virus (strain E) (TBRV).
OC   Viruses; ssRNA positive-strand viruses, no DNA stage; Comoviridae;
OC   Nepovirus; Subgroup B.
OX   NCBI_TaxID=12277;
OH   NCBI_TaxID=4681; Allium porrum (Leek).
OH   NCBI_TaxID=4045; Apium graveolens (Celery).
OH   NCBI_TaxID=161934; Beta vulgaris (Sugar beet).
OH   NCBI_TaxID=38871; Fraxinus (ash trees).
OH   NCBI_TaxID=4236; Lactuca sativa (Garden lettuce).
OH   NCBI_TaxID=4081; Lycopersicon esculentum (Tomato).
OH   NCBI_TaxID=39639; Narcissus pseudonarcissus (Daffodil).
OH   NCBI_TaxID=3885; Phaseolus vulgaris (Kidney bean) (French bean).
OH   NCBI_TaxID=35938; Robinia pseudoacacia (Black locust).
OH   NCBI_TaxID=23216; Rubus (bramble).
OH   NCBI_TaxID=4113; Solanum tuberosum (Potato).
OH   NCBI_TaxID=13305; Tulipa.
OH   NCBI_TaxID=3603; Vitis.
UniProtKB release 7.7 of 16-May-2006

Release of a new document about the nomenclature of scorpion potassium channel toxins

The document scorpktx.txt, available by ftp and on the Web site, lists the potassium-channel-specific scorpion toxins known to date, according to the nomenclature system first described in 1999 by Tytgat et al. [1] and extended in de la Vega et al. [2].

[1] PubMed=10542442;
    Tytgat J., Chandy K.G., Garcia M.L., Gutman G.A.,
    Martin-Eauclaire M.F., van der Walt J.J., Possani L.D.
    A unified nomenclature for short-chain peptides isolated from
    scorpion venoms: alpha-KTx molecular subfamilies.
    Trends Pharmacol. Sci. 20:444-447(1999).

[2] PubMed=15208019; DOI=10.1016/j.toxicon.2004.03.022;
    Rodriguez de la Vega R.C., Possani L.D.;
    Current views on scorpion toxins specific for K+-channels.
    Toxicon 43:865-875(2004).

This document contains, for each individual UniProtKB/Swiss-Prot scorpion potassium channel toxin, the UniProtKB/Swiss-Prot accession number, the UniProtKB/Swiss-Prot entry name, the systematic name, together with other scorpion potassium channel toxin names.

Changes concerning keywords (KW line)

New keywords:

UniProtKB release 7.6 of 02-May-2006

Sequences with over 10000 amino acids in UniProtKB/Swiss-Prot
The first sequence with over 10000 amino acids has entered the Swiss-Prot section of the UniProt Knowledgebase: Q09165.
Changes concerning keywords (KW line)

New keyword:

UniProtKB release 7.5 of 18-Apr-2006

Changes in the tisslist.txt file
The tisslist.txt file lists the tissues that are used in the "TISSUE" topic of the RC lines of UniProtKB/Swiss-Prot entries. It has been changed in the following way:

Each term in the list has an accession number and optionally further relevant information such as synonyms and mappings to eVOC terms. eVOC contains four ontologies - Anatomical system, Cell type, Developmental stage, Pathology - which provide appropriate sets of detailed terms that describe the sample source of human experimental material such as cDNA and SAGE libraries.

The file contains the following line types:

---------  ---------------------------     ----------------------
Line code  Content                         Occurrence in an entry
---------  ---------------------------     ----------------------
ID         Identifier (tissue)             Once; starts an entry
AC         Accession (TS-xxxx)             Once
SY         Synonyms                        Optional; Once or more
DR         eVOC ontologies (eVOC) mapping  Optional; Once or more
//         Terminator                      Once; ends an entry

Examples:

ID   Embryonic lung fibroblast.
AC   TS-0254
DR   eVoc; 0100042; anatomical-system: lung.
DR   eVoc; 0200032; cell-type: fibroblast.
DR   eVoc; 0300001; development-stage: embryo.
//
ID   Mammary tumor.
AC   TS-0597
SY   Breast tumor; Mammary gland tumor; Mammary tumour.
DR   eVoc; 0100124; anatomical-system: breast.
DR   eVoc; 0400051; pathology: tumour.
//
Changes concerning keywords (KW line)

Modified keyword:

UniProtKB release 7.4 of 04-Apr-2006

Changes concerning cross-references (DR line)
UniGene

Cross-references have been added to the UniGene, a sequence database which provides the automatic partition of GenBank sequences into a non-redundant set of gene-oriented clusters. Each UniGene cluster contains sequences that represent a unique gene, as well as related information such as the tissue types in which the gene has been expressed and map location.

The UniGene database is available at http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=UniGene.

The format for the explicit links is:

Data bank identifier UniGene
Primary identifier UniGene cluster identifier, which consists of a 2-letter species code followed by a period and several digits
Examples
Q9ZNT7:
   DR   UniGene; At.24021.
   DR   UniGene; At.64486.
P59990:
   DR   UniGene; Hs.505267.
UniProtKB release 7.3 of 21-Mar-2006

Changes concerning keywords (KW line)

New keyword:

UniProtKB release 7.2 of 07-Mar-2006

Changes concerning cross-references (DR line)
GenomeReviews

Cross-references have been added to the GenomeReviews, a genome annotation database which provides up-to-date, standardised and comprehensively annotated view of the genomic sequence of organisms with completely deciphered genomes.

The GenomeReviews database is available at http://www.ebi.ac.uk/GenomeReviews/.

The format for the explicit links is:

Data bank identifier GenomeReviews
Primary identifier GenomeReviews's accession number.
Secondary identifier Ordered locus name, or, if it doesn't exist, gene name.
Examples
P08409:
   DR GenomeReviews; U00096_GR; b0016.
   DR GenomeReviews; U00096_GR; b0582.
   DR GenomeReviews; U00096_GR; b2394.
Q92YD2: 
   DR GenomeReviews; AE006469_GR; betB2.
Changes concerning keywords (KW line)

New keyword:

Modified keyword:

UniProtKB release 7.1 of 21-Feb-2006

New service: the UniProtKB Sequence/Annotation Version Database (UniSave)

The introduction of more exact sequence and entry modification dates allowed us to introduce a new service: the UniProtKB Sequence/Annotation Version Database (UniSave) is a comprehensive archive of UniProtKB/Swiss-Prot and UniProtKB/TrEMBL entry versions. Unlike the UniProt Knowledgebase, which contains only the latest Swiss-Prot and TrEMBL entry and sequence versions, the UniProtKB Sequence/Annotation Version Database provides access to all versions of these entries. This allows to track sequence changes, to find out when a given annotation appeared in an entry and how it evolved. All archived entry versions are available through the UniProtKB Sequence/Annotation Version Database in flat file and fasta format. Any two given entry versions can be compared to each other.

Changes concerning keywords (KW line)

New keyword:

UniProtKB release 7.0 of 07-Feb-2006

Changes concerning dates and versions numbers (DT lines)

We changed from showing only the dates corresponding to full UniProtKB releases in the DT lines to displaying the date of the biweekly release at which an entry is integrated or updated. We dropped the information concerning the release number and introduced entry and sequence version numbers in the DT lines.

The new format of the three DT lines is:

DT   DD-MMM-YYYY, integrated into UniProtKB/database_name.
DT   DD-MMM-YYYY, sequence version version_number.
DT   DD-MMM-YYYY, entry version version_number.

Example for UniProtKB/Swiss-Prot:

DT   01-JAN-1998, integrated into UniProtKB/Swiss-Prot.
DT   15-OCT-2001, sequence version 3.
DT   01-APR-2004, entry version 14.

Example for UniProtKB/TrEMBL:

DT   01-FEB-1999, integrated into UniProtKB/TrEMBL.
DT   15-OCT-2000, sequence version 2.
DT   15-DEC-2004, entry version 5.

The sequence version number of an entry is incremented by one when its amino acid sequence is modified. The entry version number is incremented by one whenever any data in the flat file representation of the entry is modified.

We retrofitted the entry and sequence version numbers, as well as all dates, using archived UniProtKB releases.

Addition of a feature (FT) key CHAIN over the whole sequence length

The feature key CHAIN was previously only used to describe processed protein sequences. Now we added, in the UniProtKB/Swiss-Prot database, a "FT CHAIN" to all the entries having neither a "FT CHAIN" nor a "FT PEPTIDE". This led to the addition of a "FT CHAIN", covering the full length of the sequence, to more than 170 000 entries. In this way, in UniProtKB/Swiss-Prot, all the mature proteins will be described in the feature lines.

Release of a new document about post-translational modifications

The controlled vocabulary for the post-translational modifications (PTMs) that are annotated in the UniProtKB feature table has moved from the UniProtKB User Manual to a separate document, ptmlist.txt, that is available by ftp and on the Web site.

The document contains, for each individual modification, the controlled vocabulary term and its associated feature key, and additional information, such as the amino acid that can be modified and the mass difference, the position on the protein sequence, the taxonomic distribution, the protein location, associated keyword(s), as well as links to the UniProtKB entries that contain the annotation, and to the corresponding entry in the RESID Database of Protein Modifications.

Changes concerning the copyright statement
We have changed the copyright statement in all UniProt Knowledgebase entries. All UniProtKB/Swiss-Prot and UniProtKB/TrEMBL entries, as well as all Swiss-Prot documents now contain the statement
CC   -----------------------------------------------------------------------
CC   Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms
CC   Distributed under the Creative Commons Attribution-NoDerivs License
CC   -----------------------------------------------------------------------
Changes concerning cross-references (DR line)
PptaseDB

Cross-references have been added to the Prokaryotic Protein Phosphatase Database, a database which provides information concerning prokaryotic and archaeal phosphatases for which experimental evidence exists demonstrating phosphatase activity. The Prokaryotic Protein Phosphatase Database is available at http://vigen.biochem.vt.edu/p3d/p3d.htm.

The format for the explicit links is:

Data bank identifier PptaseDB
Primary identifier PptaseDB's unique phosphatase identifier.
Secondary identifier None; a dash '-' is stored in that field.
Example
O52787: DR   PptaseDB; P3D040495; -.
BioCyc

Cross-references have been added to BioCyc, a collection of Pathway/Genome Databases. Each Pathway/Genome Database describes the genome and metabolic pathways of a single organism, with the exception of the MetaCyc database, which is a reference source on metabolic pathways from many organisms. BioCyc is available at http://www.biocyc.org/.

The format for the explicit links is:

Data bank identifier BioCyc
Primary identifier BioCyc database code:identifier.
Secondary identifier None; a dash '-' is stored in that field.
Examples
P21170: DR   BioCyc; EcoCyc:ARGDECARBOXBIO-MONOMER; -.
Q9HCC0: DR   BioCyc; MetaCyc:MONOMER-10082; -.
UniProtKB release 6.9 of 24-Jan-2006

Changes in the DR MIM line

Various MIM cross-references can be present in a single UniProtKB/Swiss-Prot human entry. They were annotated in the DR lines according to a format that does not distinguish between MIM entries describing a gene and MIM entries describing a phenotype:

DR   MIM; 608463; -.

We added a field to the DR MIM line to allow users and programs to distinguish between MIM "gene" and "phenotype" entries.

The new format of the DR MIM line is:

DR   MIM; MIM_identifier; token.

Where token is one of the following values:

gene
MIM entries which describe a gene
phenotype
MIM entries which describe a phenotype
gene+phenotype
MIM entries which describe both a gene and a phenotype

Examples:

DR   MIM; 608463; gene.
DR   MIM; 603813; phenotype.
DR   MIM; 124080; gene+phenotype.
Changes concerning keywords (KW line)

Deleted keywords:

UniProtKB release 6.8 of 10-Jan-2006

Format change in the dbxref.txt document file

The dbxref.txt file lists the names and abbreviations and URLs of all databases cross-referenced in the UniProt Knowledgebase. We have added a new mandatory field, "Cat". This field contains the database category, and will allow us to display cross-references in our entry view in a more user-friendly and explicit manner.

Currently used categories are:

Example:

Abbrev: EcoGene
Name  : Escherichia coli strain K12 genome database
LinkTp: Explicit
Server: http://www.ecogene.org/
Db_URL: www.ecogene.org/geneInfo.php?eg_id=%s
Cat   : Organism-specific gene databases
Changes concerning keywords (KW line)

Modified keywords:

Deleted keywords:

UniProtKB release 6.7 of 20-Dec-2005

Changes concerning keywords (KW line)

Modified keyword:

Deleted keywords:

Changes concerning the controlled vocabulary for PTMs

New terms for the feature key 'CROSSLNK':

New terms for the feature key 'MOD_RES':

UniProtKB release 6.5 of 22-Nov-2005

Changes in the keywlist.txt file
The keywlist.txt file describes the keywords that are used in the KW lines of UniProtKB entries. It was changed in the following way:
Changes concerning keywords (KW line)

New keywords:

UniProtKB release 6.4 of 8-Nov-2005

Changes concerning cross-references (DR line)
LinkHub

Cross-references have been added to LinkHub, a database providing links to different genomics and protein resources. LinkHub is available at http://hub.gersteinlab.org/.

The format for the explicit links is:

Data bank identifier LinkHub
Primary identifier UniProtKB primary AC.
Secondary identifier None; a dash '-' is stored in that field.
Example
O00623: DR   LinkHub; O00623; -.
UniProtKB release 6.3 of 25-Oct-2005

Changes concerning keywords (KW line)

New keywords:

Deleted keyword:

UniProtKB release 6.2 of 11-Oct-2005

Definition of a further molecule type in the DR EMBL line

In the cross-references to the EMBL nucleotide sequence database the term pre-RNA has been added as a valid value for the quarternary qualifier (MOLECULE_TYPE).

The format of the DR EMBL line is:

DR   EMBL; ACCESSION_NUMBER; PROTEIN_ID; STATUS_IDENTIFIER; MOLECULE_TYPE.

The controlled vocabulary of the MOLECULE_TYPE now consists of:

Changes concerning cross-references (DR line)
MAIZE-2DPAGE

Cross-references to the Maize-2DPAGE have been removed.

UniProtKB release 6.1 of 27-Sep-2005

Annotation changes concerning the feature key METAL

The feature key METAL describes the binding of metal ions. More than 1 metal ion could be listed in the description field, when more than one ion binds to the same sequence residue. We have now restricted the annotation to only 1 metal ion per FT METAL line. Example:

FT   METAL        61     61       Copper and zinc.

became:

FT   METAL        61     61       Copper.
FT   METAL        61     61       Zinc. 
UniProtKB release 6.0 of 13-Sep-2005

Changes in the OG (OrGanelle) line

We changed the format of the OG Chloroplast and Cyanelle lines, to be able to indicate more precisely the kind of plastid organelle. So far we defined the following lines:

OG   Plastid.
OG   Plastid; Apicoplast.
OG   Plastid; Chloroplast.
OG   Plastid; Cyanelle.
OG   Plastid; Non-photosynthetic plastid.

The line "OG Plastid" is used when the type of plastid - from which the gene coding for a protein originates - is unknown. This will be the case for most TrEMBL entries.

The line "OG Plastid; Apicoplast" is used for plastid-type organelles from the apicocomplexan parasites. These plastids are not photosynthetic, and encode a different suite of proteins than do photosynthetic organisms.

The line "OG Plastid; Chloroplast" is used for plastids from all organisms able to perform photosynthesis except the glaucophyte algae (see next).

The line "OG Plastid; Cyanelle" is used for plastids from the glaucophyte algae.

The line "OG Plastid; Non-photosynthetic plastid" is used for plastids derived from non-photosynthetic, but not apicocomplexan organisms. Examples of such organisms are the land plant Epifagus virginiana, the chlorophyte algae Prototheca wickerhamii and the euglenoid Astasia longa, none of which encode the genes necessary for photosynthesis on their plastid genome.

Changes concerning cross-references (DR line)
TAIR

Cross-references have been added to TAIR, The Arabidopsis Information Resource, which is a model organism database providing a centralized, curated gateway to Arabidopsis biology. TAIR is available at http://arabidopsis.org. Implicit links to this database have already been provided before in the NiceProt view of relevant Swiss-Prot entries on ExPASy.

The format for the explicit links is:

Data bank identifier TAIR
Primary identifier TAIR's unique locus identifier.
Secondary identifier None; a dash '-' is stored in that field.
Example
P33487: DR   TAIR; At4g02980; -.
Introduction of a new feature identifier

The system of feature identifiers has been expanded. All feature keys concerning protein processing (CHAIN, PEPTIDE, PROPEP) have been tagged with the new feature identifier with the prefix PRO. We now have 4 types of feature identifiers:

Examples:

Q9W568: 
FT   CHAIN        23    611       Halfway protein.
FT                                /FTId=PRO_0000021413.
P15515: 
FT   PEPTIDE      20     57       Histatin 1.
FT                                /FTId=PRO_0000021416.
Q7XAD0: 
FT   PROPEP       25     48
FT                                /FTId=PRO_0000021449.
Changes concerning keywords (KW line)

New keywords:

Deleted keywords:

UniProtKB release 5.7 of 16-Aug-2005

Changes concerning keywords (KW line)

New keyword:

UniProtKB release 5.6 of 02-Aug-2005

Changes concerning keywords (KW line)

Deleted keyword:

UniProtKB release 5.5 of 19-Jul-2005

Obsolete file uniprot_trembl_varsplic.fasta.gz

All UniProtKB/TrEMBL entries with annotated alternative splicing events (KW Alternative splicing) have been moved to UniProtKB/Swiss-Prot. Thus the file uniprot_trembl_varsplic.fasta.gz became obsolete and has been removed from the ftp site.

Please note that UniProtKB/TrEMBL still includes splice isoforms, but each in an individual entry and not merged into one single entry.

Format change in the cross-reference (DR line) to Genew

The format of the UniProtKB cross-reference to the Human Gene Nomenclature Database Genew has changed: The term Genew has been replaced by HGNC, which stands for HUGO Gene Nomenclature Committee.

Example:

DR Genew; HGNC:12849; YWHAB.

has changed to

DR HGNC; HGNC:12849; YWHAB.
Changes concerning keywords (KW line)

New keyword:

Deleted keyword:

Changes concerning the controlled vocabulary for PTMs

New terms for the feature key 'CROSSLNK':

New term for the feature key 'MOD_RES':

UniProtKB release 5.4 of 5-Jul-2005

Modified wording of reldate.txt in UniProt Knowledgebase ftp directories

The official names for the manually and automatically annotated sections of the UniProt Knowledgebase are UniProtKB/Swiss-Prot and UniProtKB/TrEMBL.

To reflect this, we have changed the wording of the reldate.txt file in UniProt Knowledgebase ftp directories (example)

from

UniProt Release 5.4 consists of:
Swiss-Prot Release 47.4 of 05-Jul-2005
TrEMBL Release 30.4 of 05-Jul-2005

to

UniProt Knowledgebase Release 5.4 consists of:
UniProtKB/Swiss-Prot Release 47.4 of 05-Jul-2005
UniProtKB/TrEMBL Release 30.4 of 05-Jul-2005
Format change in the dbxref.txt document file

The dbxref.txt file lists the names and abbreviations and URLs of all databases cross-referenced in the UniProt Knowledgebase. We have added a new field, "Note", which is optional. This field will be used, among others, to list obsolete abbreviations for the cross-referenced databases.

Example:

Abbrev: MGI
Name  : Mouse genome database (MGD) from Mouse Genome Informatics (MGI)
LinkTp: Explicit
Server: http://www.informatics.jax.org/
Db_URL: www.informatics.jax.org/searches/accession_report.cgi?id=%s
Note  : Obsolete abbreviation: MGD
Multiple comment line (CC) topics COFACTOR

From now on, the CC line topic COFACTOR can occur more than once per entry. When an enzyme can bind several cofactors, each of them is indicated in a separate topic.

Example:

CC   -!- COFACTOR: Binds 1 2Fe-2S cluster per subunit (By similarity).
CC   -!- COFACTOR: Binds 1 Fe(2+) ion per subunit (By similarity).
CC   -!- COFACTOR: Binds 5 heme groups covalently per monomer.
CC   -!- COFACTOR: Binds 1 calcium ion per monomer.
Changes concerning keywords (KW line)

New keyword:

Deleted keywords:

Changes concerning the controlled vocabulary for PTMs

New term for the feature key 'CROSSLNK':

UniProtKB release 5.3 of 21-Jun-2005

New OG (OrGanelle) line value: Hydrogenosome

As it was recently found (see Nature 434:74-79(2005); PubMed=15744302) that some anaerobic ciliates such as Nyctotherus ovalis (which thrives in the hindgut of cockroaches!) have retained a rudimentary hydrogenosomal genome, we have added "Hydrogenosome" to the list of valid values in the OG line.

Example Q5DUX5: OG Hydrogenosome.
Changes concerning keywords (KW line)

New keywords:

Changes concerning the controlled vocabulary for PTMs

New terms for the feature key 'MOD_RES':

New terms for the feature key 'LIPID':

UniProtKB release 5.0 of 10-May-2005

Format change in the DR line

The DR (Database cross-Reference) lines are used as pointers to information related to entries and found in data collections other than Swiss-Prot. Until now, the format of a DR line was:

DR   DATABASE_IDENTIFIER; PRIMARY_IDENTIFIER; SECONDARY_IDENTIFIER[; TERTIARY_IDENTIFIER].

We have introduced a forth identifier, changing the DR line format to:

DR   DATABASE_IDENTIFIER; PRIMARY_IDENTIFIER; SECONDARY_IDENTIFIER[; TERTIARY_IDENTIFIER][; QUATERNARY_IDENTIFIER].

The database cross-references to EMBL is affected by this modification (see below).

Format change in the cross-reference to EMBL

The biological source of the molecule has been added as quaternary identifier to the cross-reference (DR line) of the EMBL database.

Former format:

DR   EMBL; ACCESSION_NUMBER; PROTEIN_ID; STATUS_IDENTIFIER.

New format:

DR   EMBL; ACCESSION_NUMBER; PROTEIN_ID; STATUS_IDENTIFIER; MOLECULE_TYPE.

The molecule type is controlled vocabulary and currently includes:

Examples:

DR   EMBL; M68939; AAA26107.1; -; Genomic_DNA.
DR   EMBL; U56386; AAB72034.1; -; mRNA.
Changes concerning cross-references (DR line)
PANTHER

Cross-references have been added to PANTHER, which stands for Protein ANalysis THrough Evolutionary Relationships, a classification system that was designed to classify proteins (and their genes) in order to facilitate high-throughput analysis. Proteins have been classified according to families and subfamilies, molecular functions, biological processes and pathways. PANTHER is available at https://panther.appliedbiosystems.com/.

The format for the explicit links is:

Data bank identifier PANTHER
Primary identifier PANTHER's unique identifier for a protein family or sub-family.
Secondary identifier PANTHER's entry name for a protein family or sub-family.
Tertiary identifier Number of domains found, which is generally 1, rarely 2 for the fusion of identical domains/proteins.
Example
O59826:
DR   PANTHER; PTHR11732:SF69; KCNAB_channel; 1.
New feature (FT) keys and redefinition of existing FT keys

The feature keys DOMAIN and SITE were used to describe distinct types of regions in a protein sequence and we found this situation unsatisfactory. We therefore redefined these two feature keys and introduced 5 new ones.

Redefinition of the feature keys DOMAIN and SITE:

Description of the 5 new feature keys:

The introduction of these new feature keys allows to establish a clear sorting order for feature tables. The following order is used:

1. Molecule processing
     * INIT_MET, SIGNAL, PROPEP, TRANSIT, CHAIN, PEPTIDE
2. Regions
     * TOPO_DOM, TRANSMEM
     * DOMAIN, REPEAT
     * CA_BIND, ZN_FING, DNA_BIND, NP_BIND
     * REGION
     * COILED
     * MOTIF
     * COMPBIAS
3. Sites
     * ACT_SITE
     * METAL
     * BINDING
     * SITE
4. Amino acid modifications (pre and PTM)
     * SE_CYS
     * MOD_RES
     * LIPID
     * CARBOHYD
     * DISULFID
     * CROSSLNK
5. Natural variations
     * VARSPLIC
     * VARIANT
6. Experimental info
     * MUTAGEN
     * UNSURE
     * CONFLICT
     * NON_CONS
     * NON_TER
7. Secondary structure
      * HELIX, TURN, STRAND

Keys of equal priority (listed on one line above) are ordered according to sequence positions.

UniProtKB release 4.6 of 26-Apr-2005

New Swiss-Prot document: pathway.txt

The new Swiss-Prot document pathway.txt includes an index of CC PATHWAY lines. For each step of an annotated pathway, a list of Swiss-Prot entries is given that are annotated to participate in that pathway.

A text-only version of this index can be downloaded by ftp.

List of all Swiss-Prot documents.

Changes concerning cross-references (DR line)
Name change of the cross-referenced database MGI (former MGD)

Mouse Genome Informatics have asked us to use the acronym MGI in our cross-references to the Mouse Genome Database, which we used to refer to as "MGD". We changed the database name in the relevant cross-references (DR lines) accordingly.

Example:

AC   P07724;
DR   MGI; MGI:87991; Alb1..

The Index of MGD entries referenced in Swiss-Prot (mgdtosp.txt) keeps its name, and so does the "special selections file" (mgd.seq.gz) containing all entries with "DR MGI" lines.
Changes concerning keywords (KW line)

New keywords:

Modified keywords:

Deleted keyword:

UniProtKB release 4.5 of 12-Apr-2005

Changes concerning cross-references (DR line)
SMR

Cross-references have been added to The SWISS-MODEL Repository, which is a database of annotated three-dimensional comparative protein structure models generated by the fully automated homology-modelling pipeline SWISS-MODEL. The repository is developed at the Biozentrum Basel within the Swiss Institute of Bioinformatics and available at http://swissmodel.expasy.org/repository.

The format for the explicit links is:

Data bank identifier SMR
Primary identifier SWISS-MODEL's unique identifier for a protein, which is identical to the UniProtKB primary accession number of that protein.
Secondary identifier Range(s) covered by the structural model.
Example
P11416:
DR   SMR; P11416; 87-161, 182-416.
UniProtKB release 4.4 of 29-Mar-2005

New Swiss-Prot document: humsavar.txt

The new Swiss-Prot document humsavar.txt includes an index of sequence variation in human proteins. For each variant annotated in the feature table (FT) of the Swiss-Prot entry of a human protein, the following information is indicated:

A text-only version of this index can be downloaded by ftp.

List of all Swiss-Prot documents.

Changes concerning keywords (KW line)

New keywords:

UniProtKB release 4.3 of 15-Mar-2005

Changes concerning keywords (KW line)

New keyword:

UniProtKB release 4.2 of 1-Mar-2005

Changes concerning keywords (KW line)

New keyword:

UniProtKB release 4.1 of 15-Feb-2005

Changes concerning cross-references (DR line)
GeneDB_Spombe

We changed the Data bank identifier for the Schizosaccharomyces pombe GeneDB Prototype from GeneDB_SPombe to GeneDB_Spombe.

LegioList

Cross-references have been added to LegioList, a database which provides a complete dataset of DNA and protein sequences derived from L. pneumophila strain Paris and strain Lens, linked to the relevant annotations and functional assignments. LegioList is available at http://genolist.pasteur.fr/LegioList/.

The format for the explicit links is:

Data bank identifier LegioList
Primary identifier Ordered locus name.
Secondary identifier None; a dash '-' is stored in that field.
Example
Q5X2T6:
DR   LegioList; lpp2301; -.
UniProtKB release 4.0 of 1-Feb-2005

Modification to the ftp server directory structure

In order to provide access to the last major Swiss-Prot and TrEMBL releases (as opposed to the biweekly releases) via the UniProt ftp servers, ftp.uniprot.org/databases/uniprot, ftp.expasy.org/databases/uniprot and ftp.ebi.ac.uk/databases/uniprot, we changed the directory structure of our ftp sites.

In addition to the possibility of downloading the complete databases, we provide the data in the form of taxonomic divisions for archaea, bacteria, fungi, human, invertebrates, mammals, plants, rodents, vertebrates, viruses and unclassified.

The new structure will be:

/databases/uniprot
     /current_release
          /knowledgebase
              /complete
                 uniprot_sprot.dat.gz
                 uniprot_sprot.fasta.gz
                 uniprot_sprot.xml.gz
                 uniprot_trembl.dat.gz
                 uniprot_trembl.fasta.gz
                 uniprot_trembl.xml.gz
                 etc
              /taxonomic_divisions
                 uniprot_sprot_archaea.dat.gz
                 uniprot_trembl_archaea.dat.gz
                 uniprot_sprot_bacteria.dat.gz
                 uniprot_trembl_bacteria.dat.gz
                 etc
          /uniref 
     /previous_major_releases
          /release1.0
              /knowledgebase
              /uniref
          /release2.0
              /knowledgebase
              /uniref
          etc
Symbolic links will be established for the following existing directories:
/databases/uniprot/knowledgebase to 
              	/databases/uniprot/current_release/knowledgebase
/databases/uniprot/uniref        to 
                /databases/uniprot/current_release/uniref

On ftp.expasy.org and ftp.ebi.ac.uk:
/databases/swiss-prot/release_compressed to
                /databases/uniprot/previous_major_releases/releaseX.0/knowledgebase
/databases/trembl/release_compressed     to
                /databases/uniprot/previous_major_releases/releaseX.0/knowledgebase

The directory on ExPASy that used to contain uncompressed Swiss-Prot releases, /databases/swiss-prot/release/, will be removed.

Please note that if you are interested in complete proteome sets, you can download:

Extension of the TrEMBL entry name format

Previously, TrEMBL used the accession number as the entry name. With this release, TrEMBL entry names are composed of the accession number and organism identification codes (O95417_HUMAN, Q9VVG0_DROME, P71025_BACSU, Q9SR52_ARATH, etc.). The speclist.txt file lists the organism identification codes which are used to build the "organism" part of an entry name in Swiss-Prot. This file has been extended to include codes to be used in TrEMBL. As it is not possible in a reasonable timeframe to manually assign organism codes to all species represented in TrEMBL, it was decided to define "virtual" codes that regroup organisms at a certain taxonomic level. Such codes are prefixed by the number "9" and generally correspond to a "pool" of organisms which can be 'wide' as a kingdom. Here are some examples of such codes:

9BACT B      2: N=Bacteria
9CNID E   6073: N=Cnidaria
9FUNG E   4751: N=Fungi
9REOV V  10880: N=Reoviridae
9TETR E  32523: N=Tetrapoda
9VIRI E  33090: N=Viridiplantae

TrEMBL entries are widely used for sequence analysis such as similarity search, multiple sequence alignments or phylogenetic analysis. The extension of the entry name will simplify the species identification in the analysis results.

Change of the entry name in many Swiss-Prot entries

In the last release we introduced to Swiss-Prot the first entry with the new format of the entry name. With this release, many entry names have changed to the new format.

New comment line (CC) topic: INTERACTION

The CC line topic INTERACTION is used to convey information relevant to binary protein-protein interactions. It is automatically derived from the IntAct database and is updated on a monthly basis. The occurrence is one INTERACTION topic per entry, with each binary interaction being presented in a separate line. Each data line can be longer than 75 characters.

Interactions can be derived by any appropriate experimental method, but must be confirmed by a second experiment, if resulting from a single yeast- two-hybrid experiment. For large-scale experiments interactions are referred, if a high confidence is assigned from the authors.

The format of the CC line topic INTERACTION is:


CC   -!- INTERACTION:
CC       {{SP_Ac:identifier[ (xeno)]}|Self}; NbExp=n; IntAct=IntAct_Protein_Ac, IntAct_Protein_Ac;

where

SP_Ac is the Swiss-Prot or TrEMBL accession number of the interacting protein. If appropriate, the IsoId is used instead to specify the relevant interacting protein isoform.
identifier serves to describe the interacting protein. It is derived from the Swiss-Prot or TrEMBL GN line and thus presents either a "gene name", a "ordered locus name" or a "ORF name". When no GN line is available a dash is indicated instead.
(xeno) is an optional qualifier indicating that the interacting proteins are derived from different species. This may be due to the experimental set-up or may reflect a pathogen-host interaction.
Self reflects a self-association; the corresponding current entry's SP_Ac and 'identifier' are not given/repeated.
NbExp=n refers to the number of experiments in IntAct supporting the interaction.
IntAct_Protein_Ac is the IntAct accession number of a interacting protein. The first IntAct_Protein_Ac refers to the protein or an isoform of the current entry, the second refers to the interacting protein or isoform.

Within the CC INTERACTION topic, homomeric interactions are listed before the heteromeric interactions; latter are sorted alphanumerical according the 'identifier'.

"IntAct=IntAct_Protein_Ac, IntAct_Protein_Ac" identifies the interaction in IntAct by using the two IntAct protein identifiers.

Examples of interaction lines are given below. The CC INTERACTION topics are not complete; only explained interaction lines are indicated.

CC   -!- INTERACTION:
CC       P11450:fcp3c; NbExp=1; IntAct=EBI-126914, EBI-159556;

In the typical example the current protein is interacting with P11450 which is further characterized by "fcp3c" derived from its GN line and presents its gene name "Fcp3C". The interaction is supported by one experiment stored in IntAct. Experimental details for this interaction can be found by quering IntAct with "EBI-126914, EBI-159556".

CC   -!- INTERACTION:
CC       Q9W1K5-1:cg11299; NbExp=1; IntAct=EBI-133844, EBI-212772;
CC       ...

The current protein interacts with an isoform of Q9W1K5 defined by the IsoID Q9W1K5-1.

CC   -!- INTERACTION:
CC       Q8NI08:-; NbExp=1; IntAct=EBI-80809, EBI-80799;

No gene name information for the interacting protein is available.

CC   -!- INTERACTION:
CC       Self; NbExp=1; IntAct=EBI-123485, EBI-123485;

The protein self-associates.

CC   -!- INTERACTION:
CC       Q8C1S0:2410018m14rik (xeno); NbExp=1; IntAct=EBI-394562, EBI-398761;

The source organisms of the interacting proteins are different.

CC   -!- INTERACTION:
CC       P51617:irak1; NbExp=1; IntAct=EBI-448466, EBI-358664;
CC       P51617:irak1; NbExp=1; IntAct=EBI-448472, EBI-358664;

Different isoforms of the current protein are shown to interact with the same protein (P51617). This is reflected by different IntAct_Protein_Acs for the current protein.

Example entry with many interaction lines: Q02821.

New Swiss-Prot document: similar.txt

There is a new Swiss-Prot document: similar.txt: Index of CC SIMILARITY lines. This index lists all names of families and domains occurring in CC SIMILARITY lines of Swiss-Prot entries. A text-only version of this index can be downloaded by ftp.

Changes concerning keywords (KW line)

New keyword:

UniProtKB release 3.5 of 4-Jan-2005

Extension of the Swiss-Prot entry name format

We endeavor to assign meaningful entry names that facilitate the identification of the proteins and the species of origin. Swiss-Prot uses a general purpose naming convention that can be symbolized as X_Y, where X is a mnemonic code of alphanumeric characters representing the protein name, the '_' sign serves as a separator, and the Y is a mnemonic species identification code of at most 5 alphanumeric characters representing the biological source of the protein.

The entry name used to consist of up to ten uppercase alphanumeric characters. We now elongated the mnemonic code for the protein name from up to 4 characters to up to 5 characters, thus entry names can from now on consist of up to 11 characters.

As this modification might have an impact on many programs, we introduced in this release only one Swiss-Prot entry with an entry name in the new format: TINA1_DROME (Q9W0Y1). With UniProtKB release 4.0 at the beginning of February, we will change the entry names of many Swiss-Prot entries.

We strongly advise users to cite Swiss-Prot entries by their unique and stable identifer, which is the first (primary) accession number of an entry. It happens occasionally that entries are only referred to by the entry name. As we will soon change the entry names of thousands of entries, we provide the tool IDtracker, which allows users of the Swiss-Prot protein knowledgebase to trace the identifiers (ID) of protein entries.

Changes concerning cross-references (DR line)
Ensembl

Cross-references have been added to the Ensembl database, a bioinformatics project that organizes biological information around the sequences of large genomes. Ensembl is available at http://www.ensembl.org.

The format for the explicit links is:

Data bank identifier Ensembl
Primary identifier Ensembl's unique identifier for a gene.
Secondary identifier Species name.
Example
O43462: 
DR   Ensembl; ENSG00000012174; Homo sapiens.
UniProtKB release 3.4 of 21-Dec-2004

Changes in the RP (Reference Position) line

We changed the following items of the RP line:

Example:

RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] (ISOFORM 1), PROTEIN SEQUENCE 
RP   OF 108-131; 220-231 AND 349-393, CHARACTERIZATION, AND MUTAGENESIS OF 
RP   ARG-336.

If 2 qualifiers apply, both are indicated, separated by a '/'.

Example:

RP   NUCLEOTIDE SEQUENCE [GENOMIC DNA / MRNA].
New comment line (CC) topic: BIOPHYSICOCHEMICAL PROPERTIES

A new comment line (CC) topic has been introduced: BIOPHYSICOCHEMICAL PROPERTIES. This topic is used to convey information relevant to biophysical and physicochemical data and information on pH dependence, temperature dependence, kinetic parameters, redox potentials, and maximal absorption.

The format of this comment block is:

CC   -!- BIOPHYSICOCHEMICAL PROPERTIES:
CC       Absorption:
CC         Abs(max)=xx nm;
CC         Note=free_text;
CC       Kinetic parameters:
CC         KM=xx unit for substrate [(free_text)];
CC         Vmax=xx unit enzyme [free_text];
CC         Note=free_text;
CC       pH dependence:
CC         free_text;
CC       Redox potential:
CC         free_text;
CC       Temperature dependence:
CC         free_text;

A BIOPHYSICOCHEMICAL PROPERTIES block must contain at least one of the properties Absorption, Kinetic parameters, pH dependence, Redox potential, Temperature dependence and may have any combination of these properties (ordered as indicated above). The meaning of these subtopics is as follows:

Property Description
Absorption indicates the wavelength at which photoreactive proteins such as opsins and DNA photolyases show maximal absorption
Kinetic parameters mentions the Michaelis-Menten constant (KM) and maximal velocity (Vmax) of enzymes
pH dependence describes the optimum pH for enzyme activity and/or the variation of enzyme activity with pH variation
Redox potential reports the value of the standard (midpoint) oxido-reduction potential(s) for electron transport proteins
Temperature dependence indicates the optimum temperature for enzyme activity and/or the variation of enzyme activity with temperature variation; the thermostability/thermolability of the enzyme is also mentioned when it is known

Examples:

CC   -!- BIOPHYSICOCHEMICAL PROPERTIES:
CC       Absorption:
CC         Abs(max)=395 nm;
CC         Note=Exhibits a smaller absorbance peak at 470 nm. The
CC         fluorescence emission spectrum peaks at 509 nm with a shoulder
CC         at 540 nm;

CC   -!- BIOPHYSICOCHEMICAL PROPERTIES:
CC       Kinetic parameters:
CC         KM=62 mM for glucose;
CC         KM=90 mM for maltose;
CC         Vmax=0.20 mmol/min/mg enzyme with glucose as substrate;
CC         Vmax=0.11 mmol/min/mg enzyme with maltose as substrate;
CC         Note=Acetylates glucose, maltose, mannose, galactose, and
CC         fructose with a decreasing relative rate of 1, 0.55, 0.20, 0.07,
CC         0.04; 

CC   -!- BIOPHYSICOCHEMICAL PROPERTIES:
CC       Kinetic parameters:
CC         KM=1.76 uM for chlorophyll;
CC       pH dependence:
CC         Optimum pH is 7.5. Active from pH 5.0 to 9.0;
CC       Temperature dependence:
CC         Optimum temperature is 45 degrees Celsius. Active from 30 to 60
CC         degrees Celsius;
Changes concerning keywords (KW line)

New keywords:

Modified keywords:

Deleted keywords:

Changes concerning the controlled vocabulary for PTMs

New terms for the feature key 'CROSSLNK':

UniProtKB release 3.3 of 7-Dec-2004

Changes concerning keywords (KW line)

New keyword:

UniProtKB release 3.2 of 23-Nov-2004

Removal of the file submit.txt

The file submit.txt is no longer distributed. Information on how to submit sequence data, updates or corrections can be found in the user manual.

UniProtKB release 3.1 of 09-Nov-2004

Conversion of Swiss-Prot to mixed-case characters

The conversion of Swiss-Prot entries from all UPPER CASE to MiXeD CaSe is now completed. This modification does not apply to the following line types:

Changes concerning keywords (KW line)

New keywords:

UniProtKB release 3.0 of 25-Oct-2004

Changes concerning keywords (KW line)

New keyword:

UniProtKB release notes: relnotes.html

With release 3.0 we introduce the UniProtKB release notes relnotes.html, which replaces the Swiss-Prot release notes (rnote_sp.html) and TrEMBL release notes (rnote_tr.html). The UniProtKB release notes includes the release statistics of both databases, the status of the model organisms and various other useful information. It can all be downloaded from ftp://ftp.expasy.org/databases/uniprot/knowledgebase/docs/.

UniProtKB release 2.7 of 11-Oct-2004

Changes concerning cross-references (DR line)
H-InvDB

Cross-references have been added to the human gene database H-Invitational Database (H-InvDB), which provides information on annotated full-length cDNA clones available from six high throughput cDNA sequencing projects. The H-Invitational Database is available at http://www.h-invitational.jp/.

The format for the explicit links is:

Data bank identifier H-InvDB
Primary identifier H-InvDB's unique identifier for a cDNA cluster.
Secondary identifier None; a dash '-' is stored in that field.
Example
P78314:
DR   H-InvDB; HIX0004037; -.
WormBase

We have added cross-references to WormBase, which provides information concerning the genetics, genomics and biology of C. elegans and some related nematodes. WormBase is available at http://www.wormbase.org/.

The identifiers of the appropriate DR line are:

Data bank identifier WormBase
Primary identifier WormBase's unique identifier for a gene.
Secondary identifier Gene designation.
Example
DR   WormBase; WBGene00006806; unc-74.
Changes concerning keywords (KW line)

New keywords:

Deleted keywords:

UniProtKB release 2.6 of 27-Sep-2004

Changes concerning keywords (KW line)

Deleted keywords:

UniProtKB release 2.5 of 13-Sep-2004

Changes concerning keywords (KW line)

Deleted keyword:

Controlled vocabularies

We are continuously overhauling the annotation of post-translational modifications (PTMs). For the feature key MOD_RES, the new introduced controlled vocabularies for PTMs are:

UniProtKB release 2.4 of 31-Aug-2004

Release notes: rnote_sp.html & rnote_tr.html

The TrEMBL release notes (rnote_tr.html) were added to the documents distributed with the UniProtKB release. The name of the Swiss-Prot release notes changed from relnotes.html to rnote_sp.html accordingly. These documents can all be downloaded from ftp://ftp.expasy.org/databases/uniprot/knowledgebase/docs/.

UniProtKB release 2.3 of 16-Aug-2004

New RL line structure for electronic publications

Electronic publications have been indicated in the RL line with the '(er)' prefix that stands for electronic resource:

RL   (er) Free text.

Example:

RL   (er) Plant Gene Register PGR98-023.
Removal of the submission references to HIV data bank

We replaced all submission references to the HIV data bank by publications, thus RL lines of the type:

RL   Submitted (XXX-YYYY) to the HIV data bank.

do no longer exist in Swiss-Prot.

Format change in the cross-reference to PDB

The structure determination method (X-ray, NMR, etc) as well as the mapping of the extent of the cross-reference on the sequence have been introduced as tertiary identifier to the PDB cross-reference line.

Former format:

DR   PDB; ENTRY_NAME; REVISION_DATE.

New format:

DR   PDB; ENTRY_NAME; Method; CHAIN[S]=RANGE.

The methods are controlled vocabulary and currently include:

Example:

DR   PDB; 1NB3; X-ray; A/B/C/D=116-335, P/R/S/T=98-105.

The tertiary identifier indicates the chain(s) and the corresponding range, of which the structure has been determined. If the range is unknown, a dash is given rather than the range positions. Example:

DR   PDB; 1IYJ; X-ray; B/D=-.

If the chains and the range is unknown, a dash is used. Example:

DR   PDB; 1N12; X-ray; -.

With the introduction of the new format, DR PDB lines can become longer than 75 characters.

UniProtKB release 2.2 of 30-Jul-2004

Controlled vocabularies

We are continuously overhauling the annotation of post-translational modifications (PTMs). For the feature key MOD_RES, the new introduced controlled vocabularies for PTMs are:

UniProtKB release 2.1 of 19-Jul-2004

Change in the keyword line (KW)

Keywords are now stored by alphabetical order on the KW lines of both Swiss-Prot and TrEMBL entries.

Format change in the comment line (CC) topic: MASS SPECTROMETRY

We have slightly changed the format for the comment line topic MASS SPECTROMETRY, which reports the exact molecular weight of a protein or part of a protein as determined by mass spectrometric methods. The modifications concern the topic RANGE, which has become mandatory, and the introduction of the new mandatory topic NOTE, which is used to indicate the relevant reference number.

New format:

CC   -!- MASS SPECTROMETRY: MW=XXX[; MW_ERR=XX][; METHOD=XX]; RANGE=XX-XX[ (Name)]; NOTE={Free text (Ref.n)|Ref.n}.

Where:

Example:

CC   -!- MASS SPECTROMETRY: MW=32875.93; METHOD=MALDI;
CC       RANGE=1-284 (Isoform 3); NOTE=Ref.6.
Changes concerning cross-references (DR line)
AGD

We have added cross-references to Ashbya genome database, available at http://agd.unibas.ch/.

The identifiers of the appropriate DR line are:

Data bank identifier AGD
Primary identifier AGD's unique identifier for a gene. This is generally the OLN (Ordered Locus Name) for that gene (eg: AAR059C), except for mitochondrial genes where AGD uses an identifier based on the gene name (eg: AgCOB1).
Secondary identifier None; a dash '-' is stored in that field.
Example
Q00063:
DR   AGD; AAR059C; -.
Controlled vocabularies

We are continuously overhauling the annotation of post-translational modifications (PTMs). For the feature key MOD_RES, the new introduced controlled vocabularies for PTMs are:

UniProtKB release 2.0 of 05-Jul-2004

Incorporation of new entries into the biweekly UniProtKB releases of Swiss-Prot and TrEMBL

The files provided in the ftp directory /databases/uniprot/knowledgebase/new/ (and known as TrEMBL_New) have been removed. These files contained new sequence entries and sequences to be used to update existing Swiss-Prot or TrEMBL entries. Until now, these entries were integrated into Swiss-Prot and TrEMBL mostly only at full releases of these databases. We now incorporate these new and updated sequences into the biweekly UniProtKB releases of Swiss-Prot and TrEMBL (/databases/uniprot/knowledgebase/uniprot_sprot* and /databases/uniprot/knowledgebase/uniprot_trembl*), and, therefore, the distribution of these files is no longer necessary.

New format for the GN (Gene Name) line

We have introduced a new format for the GN (Gene Name) line and all gene names have been converted to mixed case. The new format is more structured than the previous one, in order to distinguish between three types of information:

  1. Gene names (a.k.a gene symbols). The names(s) used to represent a gene. As there can be more than one name assigned to a gene. We make a distinction between the one which we believe should be used as the official gene name and the other names which are listed as "Synonyms".
  2. Ordered locus names (a.k.a. OLN, ORF numbers, CDS numbers or Gene numbers). A name used to represent an ORF in a completely sequenced genome or chromosome. It is generally based on a prefix representing the organism and a number which usually represents the sequential ordering of genes on the chromosome. Depending on the genome sequencing center, numbers are attributed only to protein-coding genes, or also to pseudogenes, or also to tRNAs and other features. Examples: HI0934, Rv3245c, At5g34500, YER456W.
  3. ORF names (a.k.a. Sequencing names or Contig names or Temporary ORFNames). A name temporarily attributed by a sequencing project to an open reading frame. This name is generally based on a cosmid numbering system. Examples: MtCY277.28c, SYGP-ORF50, SpBC2F12.04, C06E1.1, CG10954.

The new format of the GN line is:

GN   Name=<name>; Synonyms=<name1>[, <name2>...]; OrderedLocusNames=<name1>[, <name2>...];
GN   ORFNames=<name1>[, <name2>...];

None of the above four tokens are mandatory. But a "Synonyms" token can only be present if there is a "Name" token.

If there is more than one gene, GN line blocks for the different genes are separated by the following line:

GN   and

Wrapping is done preferentially at a semicolon, otherwise at a comma.

Examples:

GN   Name=atpG; Synonyms=uncG, papC;
GN   OrderedLocusNames=b3733, c4659, z5231, ECs4675, SF3813, S3955;
GN   ORFNames=SPAC1834.11c;
GN   Name=cysA1; Synonyms=cysA; OrderedLocusNames=Rv3117, MT3199;
GN   ORFNames=MTCY164.27;
GN   and
GN   Name=cysA2; OrderedLocusNames=Rv0815c, MT0837; ORFNames=MTV043.07c;
Changes concerning cross-references (DR line)
IntAct

We have added cross-references to IntAct, the Protein interaction database and analysis system available at http://www.ebi.ac.uk/intact/.

The identifiers of the appropriate DR line are:

Data bank identifier IntAct
Primary identifier The Swiss-Prot primary AC number for the protein. This is used by IntAct as a link to all the interactions in which that protein is involved.
Secondary identifier None; a dash '-' is stored in that field.
Example
P14653:
DR   IntAct; P14653; -.
Name change of the cross-referenced database Reactome (former GK)

The Genome Knowledgebase (GK) was renamed to Reactome. We changed the database name in the relevant cross-references (DR lines) accordingly.

Example:

DR   Reactome; Q9BZJ0; -.
Controlled vocabularies

We are continuously overhauling the annotation of post-translational modifications (PTMs). For the feature key MOD_RES, the new introduced controlled vocabularies for PTMs are:

UniProtKB release 1.12 of 21-Jun-2004

Digital Object Identifier (DOI) in the RX line

The Digital Object Identifier (DOI) is a system for identifying and exchanging intellectual property in the digital environment. We introduced the new optional identifier "DOI" to the RX line. It is used to store the Digital Object Identifier of a cited document. The format for this RX line topic is:

DOI=Digital_object_identifier;

The order of the optional topics in an RX line is:

RX   [MEDLINE=Medline_identifier; ][PubMed=Pubmed_identifier; ][DOI=Digital_object_identifier;]

Example:

RX   MEDLINE=97291283; PubMed=9145897; DOI=10.1007/s00248-002-2038-4;

Note: The length of a DOI is not restricted. If the topic DOI does not fit into an RX line that already contains a topic, a further RX line will be created, which may be longer than 76 characters.

New line type: RG (Reference Group)

The new reference line 'RG' (Reference Group) has been introduced to list the consortium name associated with a given citation. The RG line is mainly used in submission reference blocks, but can also be used in paper references, if the working group is cited as an author in the paper.

Note: RA (Reference Author) and RG line can be present in the same reference block; at least one RA or RG line is mandatory per reference block.

The same line type has recently been introduced in the EMBL nucleotide sequence database.

The format for this line is:

RG   Consortium_name;

Examples:

RG   The C. elegans sequencing consortium;
RG   The Brazilian network for HIV isolation and characterization;
Changes concerning cross-references (DR line)
EchoBASE

We have added cross-references to EchoBASE, the integrated post-genomic database for E. coli, available at http://www.biolws1.york.ac.uk/echobase/.

The identifiers of the appropriate DR line are:

Data bank identifier EchoBASE
Primary identifier EchoBASE's unique identifier for a gene.
Secondary identifier None; a dash '-' is stored in that field.
Example
O32528:
DR   EchoBASE; EB4119; -.
UniProtKB release 1.11 of 07-Jun-2004

Changes concerning keywords (KW line)

New keywords:


Controlled vocabularies

We are continuously overhauling the annotation of post-translational modifications (PTMs). For the feature key MOD_RES, the new initially introduced controlled vocabularies for PTMs are:

UniProtKB release 1.10 of 24-May-2004

New comment line (CC) topic: TOXIC DOSE

We have introduced a new comment (CC) line topic: TOXIC DOSE. This topic is used to store information on the poisoning potential (acute toxicity) of a toxin.

Generally this topic holds information on the LD(50) and PD(50). LD stands for "Lethal Dose". LD(50) is the amount of a toxin, given all at once, which causes the death of 50% (one half) of a group of test animals.

PD(50) stands for "Paralytic dose". It is the amount of a toxin, which causes the paralysis of 50% of a group of test animals.

Examples:

CC   -!- TOXIC DOSE: PD(50) is 1.72 mg/kg by injection in blowfly larvae.
CC   -!- TOXIC DOSE: LD(50) is 0.015 mg/kg by intravenous injection for
CC       sarafotoxin-A and sarafotoxin-B, and 0.3 mg/kg for sarafotoxin-C.
Changes concerning keywords (KW line)

New keywords:


Controlled vocabularies

We are continuously overhauling the annotation of post-translational modifications (PTMs). For the feature key MOD_RES, the new initially introduced controlled vocabularies for PTMs are:

UniProtKB release 1.9 of 10-May-2004

Controlled vocabularies

We are continuously overhauling the annotation of post-translational modifications (PTMs). Methylation sites are described in the description field of the feature key MOD_RES, all entries with such a site contain the keyword 'Methylation'. The initially defined controlled vocabulary for methylation sites is listed below:

N-methylalanine
N,N,N-trimethylalanine
Omega-N-methylated arginine
Omega-N-methylarginine
Asymmetric dimethylarginine
Symmetric dimethylarginine
5-methylarginine
N5-methylarginine
N4-methylasparagine
N4,N4-dimethylasparagine
S-methylcysteine
Cysteine methyl ester
2-methylglutamine
N5-methylglutamine
Glutamate methyl ester (Gln)
Glutamate methyl ester (Glu)
Methylhistidine
Pros-methylhistidine
Tele-methylhistidine
N-methylisoleucine
N-methylleucine
Leucine methyl ester
N6-methylated lysine
N6-methyllysine
N6,N6-dimethyllysine
N6,N6,N6-trimethyllysine
Lysine methyl ester
N-methylmethionine
N-methylphenylalanine
N,N-dimethylproline
N-methyltyrosine
Changes concerning keywords (KW line)

Deleted keyword:

UniProtKB release 1.8 of 26-Apr-2004

Changes concerning cross-references (DR line)
Structural and functional annotation of Arabidopsis thaliana gene and protein families (GeneFarm)

We have added cross-references to the Structural and functional annotation of Arabidopsis thaliana gene and protein families (GeneFarm), available at http://genoplante-info.infobiogen.fr/Genefarm/index.htpl.

The identifiers of the appropriate DR line are:

Data bank identifier GeneFarm
Primary identifier GeneFarm's unique identifier for a gene.
Secondary identifier GeneFarm's identifier for a gene family.
Example
O04500: 
DR   GeneFarm; 1671; 91.
Controlled vocabularies

We are continuously overhauling the annotation of post-translational modifications (PTMs). Sulfation sites are described in the description field of the feature key MOD_RES, all entries with such a site contain the keyword 'Sulfation'. The initially defined controlled vocabulary for sulfation sites is listed below:

Sulfotyrosine
Sulfoserine
Sulfothreonine
UniProtKB release 1.7 of 13-Apr-2004

Changes concerning cross-references (DR line)
Oxford GlycoProteomics 2-DE database (OGP)

We have added cross-references to the Oxford GlycoProteomics 2-DE database (OGP), available at http://proteomewww.bioch.ox.ac.uk/2d/2d.html.

The identifiers of the appropriate DR line are:

Data bank identifier OGP
Primary identifier OGP's unique identifier for a protein, which is identical to the Swiss-Prot primary AC number of that protein.
Secondary identifier None; a dash '-' is stored in that field.
Example
P31946: 
DR   OGP; P31946; -.
2-DE database of rat heart

We have added cross-references to the 2-DE database of rat heart, at German Heart Institute Berlin, available at http://www.mpiib-berlin.mpg.de/2D-PAGE/RAT-HEART/2d/.

The identifiers of the appropriate DR line are:

Data bank identifier Rat-heart-2DPAGE
Primary identifier Rat-heart-2DPAGE's unique identifier for a protein, which is identical to the Swiss-Prot primary AC number of that protein.
Secondary identifier None; a dash '-' is stored in that field.
Example
P03996:
DR   Rat-heart-2DPAGE; P03996; -.
Changes concerning keywords (KW line)

New keywords:


Controlled vocabularies

We are continuously overhauling the annotation of post-translational modifications (PTMs). Phosphorylation sites are described in the description field of the feature key MOD_RES, all entries with such a site contain the keyword 'Phosphorylation'. The initially defined controlled vocabulary for phosphorylation sites is listed below:

Phosphocysteine
4-aspartylphosphate
Phosphohistidine
Tele-phosphohistidine
Pros-phosphohistidine
Phosphoserine
Phosphothreonine
Phosphotyrosine 
UniProtKB release 1.6 of 29-Mar-2004

Discontinuation of the plain text versions of the user manual and release notes

Both, the userman.txt and relnotes.txt files have been replaced by an HTML-formatted version, userman.html and relnotes.html. The plain text version of these files are no longer available.

UniProtKB release 1.5 of 15-Mar-2004

Changes concerning cross-references (DR line)
Rat Genome Database (RGD)

We have added cross-references to the Rat Genome Database (RGD), available at http://rgd.mcw.edu/, which collects data from rat genetic and genomic research efforts and provides curation of mapped positions for quantitative trait loci, known mutations and other phenotypic data.

The identifiers of the appropriate DR line are:

Data bank identifier RGD
Primary identifier RGD's unique identifier for a gene.
Secondary identifier RGD's gene symbol.
Example
O08557:
DR   RGD; 70968; Ddah1.
Changes concerning keywords (KW line)

New keyword:

Changes concerning the controlled vocabulary for PTMs

New terms for the feature key 'CROSSLNK':

New term for the feature key 'LIPID':

UniProtKB release 1.4 of 1-Mar-2004

Changes concerning keywords (KW line)

New keyword:

UniProtKB release 1.3 of 16-Feb-2004

Changes concerning keywords (KW line)

New keyword:

Swiss-Prot release 42.8 of 16-Jan-2004

Changes concerning keywords (KW line)

New keywords:

Deleted keywords:

New documentation file strains.txt

The strain information is usually given in the RC line of the reference block, but can also be indicated in the Organism (OS) line of a database entry. Strains are controlled vocabulary and we created a list of the strains and their synonyms. This information is now made available in the documentation file strains.txt, together with the mnemonic species identification code representing the biological source of the protein in the knowledgebase.

New format of the documentation file keywlist.txt

Keywords are controlled vocabulary and the annotation follows strict rules. As biological terms can have several meanings, we added to the list of keywords the definition of their usage in the knowledgebase and further information such as synonyms or relevant GO terms.

Please note that the file header changed and the format for each keyword entry looks as follows:

---------  ---------------------------     ----------------------
Line code  Content                         Occurrence in an entry
---------  ---------------------------     ----------------------
ID         Identifier (keyword)            Once; starts an entry
AC         Accession (KW-xxxx)             Once
DE         Definition                      Once or more
SY         Synonyms                        Optional; Once or more
GO         Gene ontology (GO) mapping      Optional; Once or more
HI         Hierarchy                       Optional; Once or more
WW         Interesting WWW site            Optional; Once or more
CA         Category                        Once
//         Terminator                      Once; ends an entry

Example of a keyword definition entry:

ID   Acetoin catabolism.
AC   KW-0006
DE   Protein involved in the degradation of acetoin (3-hydroxy-2-butanone).
DE   Acetoin is a component of the butanediol cycle (butanediol
DE   fermentation) in microorganisms.
SY   Acetoin degradation.
GO   GO:0045150; acetoin catabolism
CA   Biological process; Pathway.
//
Change in the name of the files containing deleted AC numbers

As UniProt knowledgebase documentation now comprises both Swiss-Prot and TrEMBL, we changed the name of the files "deleteac.txt" containing deleted AC numbers to

Discontinuation of the embltosp.txt index file

The embltosp.txt file, which contained an index of EMBL Nucleotide Sequence Database entries referenced in Swiss-Prot, is no longer available.

Swiss-Prot release 42.6 of 28-Nov-2003

New comment line (CC) topic RNA EDITING

We have introduced a new comment (CC) line topic: 'RNA EDITING'. This topic is used to convey information relevant to all types of RNA editing that lead to one or more amino acid changes.

The format of this comment block is:

CC   -!- RNA EDITING: Modified_positions={x[, y, z, ...] | Not_applicable | Undetermined}[; Note=Text].

Examples:

CC   -!- RNA EDITING: Modified_positions=393, 431, 452, 495.
CC   -!- RNA EDITING: Modified_positions=59, 78, 94, 98, 102, 121; Note=The
CC       stop codon at position 121 is created by RNA editing. The nonsense
CC       codon at position 59 is modified to a sense codon.
CC   -!- RNA EDITING: Modified_positions=Not_applicable; Note=Some
CC       positions are modified by RNA editing via nucleotide insertion or
CC       deletion. The initiator methionine is created by RNA editing.

The free text in the 'Note' is standardized.

All entries with such a topic have the keyword RNA editing.

Changes concerning keywords (KW line)

New keyword:

Swiss-Prot release 42.4 of 14-Nov-2003

Content changes in the speclist.txt document file

The speclist.txt file lists the organism identification codes which are used to build the "organism" part of an entry name (Examples: ARATH, BACSU, DROME, HUMAN, etc). This file contains for each organism code, the corresponding NCBI taxonomic database node identifier (TaxID) as well as the specific official (scientific) name and optionally common name and synonym.

Up to now organisms identification codes where only used in Swiss-Prot where all species represented in the database are associated with such a code. The TrEMBL section of the combined UniProt knowledgebase will soon also make use of entry names that are based on the species of origin. As it is not possible in a reasonable time frame to manually assign organism codes to all species represented in TrEMBL, it was decided to define "virtual" codes that regroup organisms at a certain taxonomic level. Such codes are prefixed by the number "9" and generally correspond to a "pool" of organisms which can be 'wide' as a kingdom. Here are some examples of such codes:

9BACT B      2: N=Bacteria
9CNID E   6073: N=Cnidaria
9FUNG E   4751: N=Fungi
9REOV V  10880: N=Reoviridae
9TETR E  32523: N=Tetrapoda
9VIRI E  33090: N=Viridiplantae

The list of all the "9" codes that have been defined are now been integrated as a subsection of the speclist.txt file.

Changes concerning the controlled vocabulary for PTMs

New terms for the Feature key 'LIPID':

Changes concerning keywords (KW line)

New keyword:

Swiss-Prot release 42.3 of 07-Nov-2003

Changes concerning cross-references (DR line)
DictyBase

We have added cross-references to the DictyBase database (available at http://dictybase.org/), an online informatics resource for Dictyostelium discoideum. DictyBase goals are to provide a single portal for access to Dictyostelium genome information, curated Dictyostelium literature, to facilitate access to experimental resources such as the Dictyostelium stock center, and to provide an on-line presence for the Dictyostelium community.

The identifiers of the appropriate DR line are:

Data bank identifier DictyBase
Primary identifier DictyBase's unique identifier for a gene.
Secondary identifier DictyBase's gene symbol.
Example
P34092:
DR   DictyBase; DDB0002013; myoB.
DictyDb

Due to the availability of DictyBase (see above) and in agreement with the maintainers of both databases, we have removed all cross-references to the DictyDb database.

PhotoList

We have added cross-references to the PhotoList database (available at http://genolist.pasteur.fr/PhotoList/), a database dedicated to the analysis of the genome of Photorhabdus luminescens strain TT01.

The identifiers of the appropriate DR line are:

Data bank identifier PhotoList
Primary identifier PhotoList's unique identifier for an ORF.
Secondary identifier None; a dash '-' is stored in that field.
Example
Q8KM01:
DR   PhotoList; plu1253; -.
Changes concerning keywords (KW line)

New keyword:

Deleted keywords:

Swiss-Prot release 42.1 of 24-Oct-2003

Format change in the jourlist.txt document file

The jourlist.txt file lists the titles and abbreviations of all journals cited in Swiss-Prot. This file also includes other type of information such as ISSN and CODEN identifiers, publishers, web sites, etc. As of this release, we have added a field for the ISSN of the electronic (on-line) version of journals. This field which is termed "e-ISSN" is optional.

Example:

Abbrev: Acta Haematol.
Title : Acta Haematologica
ISSN  : 0001-5792
e-ISSN: 1421-9662
CODEN : ACHAAH
Publis: Karger AG
Server: http://www.karger.com/journals/aha/
Changes concerning keywords (KW line)

New keywords:

Deleted keywords:

Swiss-Prot release 41.26 of 04-Oct-2003

Controlled vocabulary in the feature (FT) key LIPID

We have revised the annotation of post-translational modified amino acids in lipoproteins, and made a major overhaul of the controlled vocabulary. Lipid annotation that was covered by other feature (FT) keys than LIPID has been moved accordingly, e.g. cholesterol-binding.

The currently defined controlled vocabulary for the feature descriptions of 'LIPID' FT lines is listed below:

Cholesterol glycine ester
Cis-14-hydroxy-10,13-dioxo-7-heptadecenoic acid aspartate ester 
GPI-anchor amidated alanine
GPI-anchor amidated asparagine
GPI-anchor amidated aspartate
GPI-anchor amidated cysteine
GPI-anchor amidated glycine
GPI-anchor amidated serine
GPI-anchor amidated threonine
GPI-like-anchor amidated glycine
GPI-like-anchor amidated serine
N-myristoyl glycine
N-palmitoyl cysteine
N(6)-myristoyl lysine
N(6)-palmitoyl lysine
O-octanoyl serine
O-palmitoyl serine
O-palmitoyl threonine
Phosphotidylethanolamine amidated glycine 
S-12-hydroxyfarnesyl cysteine
S-archaeol cysteine
S-diacylglycerol cysteine
S-farnesyl cysteine
S-geranylgeranyl cysteine
S-myristoyl cysteine
S-palmitoleyl cysteine
S-palmitoyl cysteine

This continuously updated list can be found in Appendix G of the Swiss-Prot user manual.

Swiss-Prot release 41.24 of 19-Sep-2003

Changes concerning keywords (KW line)

Deleted keyword:

Swiss-Prot release 41.22 of 29-Aug-2003

Changes concerning keywords (KW line)

Modified keywords:

New keyword:

Swiss-Prot release 41.21 of 22-Aug-2003

Changes concerning keywords (KW line)

Modified keyword:

New keyword:

Swiss-Prot release 41.20 of 16-Aug-2003

Case and wording change for submissions to Swiss-Prot in reference location (RL) lines

While proceeding with the conversion to mixed case of the different line types of a Swiss-Prot entry, we have decided to do the same for the name of our database, e.g. we are now using "Swiss-Prot" (instead of previously "SWISS-PROT") as the prevalent way of referring to it. This change affects the Swiss-Prot RL (reference location) lines of entries which were submitted directly to Swiss-Prot, and which the authors have not (yet) published. At the same time, we have changed the wording of those lines.

Former format:

RL   Submitted (MAY-2002) to the SWISS-PROT data bank.

New format:

RL   Submitted (MAY-2002) to Swiss-Prot.

Note: RL lines concerning submissions to EMBL/GenBank/DDBJ, PDB and other databases are not affected by this modification.

New comment line (CC) topic ALLERGEN

We have introduced a new comment (CC) line topic type: ALLERGEN. This topic is used to convey information relevant to allergenic proteins.

The format of this comment block is:

CC   -!- ALLERGEN: Text.

Examples:

P19121: 
CC   -!- ALLERGEN: Causes an allergic reaction in human. Binds IgE. It is a
CC       partially heat-labile allergen that may cause both respiratory and
CC       food-allergy symptoms in patients with the bird-egg syndrome.
Q28050: 
CC   -!- ALLERGEN: Causes an allergic reaction in human. Minor allergen of
CC       bovine dander.
Swiss-Prot release 41.18 of 25-Jul-2003

Changes concerning keywords (KW line)

Modified keyword:

Swiss-Prot release 41.17 of 19-Jul-2003

Changes concerning cross-references (DR line)
GermOnline

We have added cross-references to the GermOnline database (available at http://germonline.unibas.org/), which is maintained by the Genome Bioinformatics group of the Swiss Institute of Bioinformatics. GermOnline is a gateway for gametogenesis. Its goals are to provide a rapid access to a comprehensive compilation of genes, expression data and functions implicated in germline development, meiosis, gamete formation, and gamete function in 11 key model systems and H. sapiens. At this time, the majority of cross-references in Swiss-Prot concern Saccharomyces cerevisiae gene expression data.

The identifiers of the appropriate DR line are:

Data bank identifier GermOnline
Primary identifier GermOnline's identifier for a gene.
Secondary identifier None; a dash '-' is stored in that field.
Example
P58012:
DR   GermOnline; 305011; -.
Swiss-Prot release 41.16 of 11-Jul-2003

Changes concerning keywords (KW line)

New keywords:

Swiss-Prot release 41.14 of 27-Jun-2003

Changes concerning keywords (KW line)

New keywords:

Swiss-Prot release 41.12 of 16-Jun-2003

New feature key CROSSLNK, and removal of the feature keys THIOETH and THIOLEST

The feature key CROSSLNK has been introduced to describe bonds between amino acids, which are formed posttranslationally within a peptide or between peptides, such as isopeptidic bonds, carbon-carbon linkages, carbon-nitrogen linkages, thioether bonds, thiolester bonds, and backbone condensations.

Format:

FT   CROSSLNK    from     to      Description.       

The initially defined controlled vocabulary is listed below:

1'-histidyl-3'-tyrosine (His-Tyr)
2-cysteinyl-L-phenylalanine (Cys-Phe)
2-cysteinyl-D-phenylalanine (Cys-Phe)
2-cysteinyl-D-allo-threonine (Cys-Thr)
2-iminomethyl-5-imidazolinone (Gln-Gly)
2-oxazoline (Cys-Ser)
2'-(S-cysteinyl)histidine (Cys-His)
3-cysteinyl-aspartic acid (Cys-Asp)
3'-histidyl-3-tyrosine (His-Tyr)
3'-(S-cysteinyl)-tyrosine (Cys-Tyr)
4-cysteinyl-glutamic acid (Cys-Glu)
4'-cysteinyl-tryptophylquinone (Cys-Trp)
5-imidazolinone (Ser-Gly)
5-imidazolinone (Ala-Gly)
5-imidazolinone (Cys-Gly)
Beta-methyllanthionine (Cys-Thr)
Beta-methyllanthionine (Thr-Cys)
Beta-methyllanthionine sulfoxide (Cys-Thr)
Isoaspartyl glycine isopeptide (Asn-Gly)
Isoaspartyl lysine isopeptide (Lys-Asn) (interchain with N- )
Isoaspartyl lysine isopeptide (Asn-Lys) (interchain with K- )
Isodityrosine (Tyr-Tyr)
Isoglutamyl cysteine thioester (Gln-Cys)
Isoglutamyl lysine isopeptide (Lys-Gln)
Isoglutamyl lysine isopeptide (Gln-Lys)
Isoglutamyl lysine isopeptide (Gln-Lys) (interchain with K- )
Isoglutamyl lysine isopeptide (Lys-Gln) (interchain with Q- )
Lanthionine (Ser-Cys)
Lanthionine (Cys-Ser)
Lysinoalanine (Lys-Ser)
Lysine tyrosylquinone (Lys-Tyr)
Lysinoalanine (Ser-Lys)
Lysyl topaquinone (Lys-Tyr)
N-isoaspartyl cysteine isopeptide (Asn-Cys)
Oxazole (Cys-Ser)
Oxazole (Gly-Ser)
Pyrroloquinoline quinone (Glu-Tyr)
S-(2-aminovinyl)-D-cysteine (Ser-Cys)
S-(2-aminovinyl)-3-methyl-D-cysteine (Thr-Cys)
Thiazole (Gly-Cys)
Thiazole (Ser-Cys)
Thiazole (Phe-Cys)
Thiazole (Cys-Cys)
Thiazole (Lys-Cys)
Tryptophan tryptophylquinone (Trp-Trp)
Glycyl lysine isopeptide (Gly-Lys) (interchain with K- )
Glycyl lysine isopeptide (Lys-Gly) (interchain with G- )
Ubiquitinyl cysteine thioester (Cys)

Examples:

P01024:
FT   CROSSLNK   1010   1013       Isoglutamyl cysteine thioester (Cys-Gln).
P29827:
FT   CROSSLNK     60     77       Beta-methyllanthionine (Cys-Thr).
FT   CROSSLNK     63     73       Lanthionine (Ser-Cys).
FT   CROSSLNK     64     70       Beta-methyllanthionine (Cys-Thr).
FT   CROSSLNK     65     78       Lysinoalanine (Ser-Lys).

Note: The feature keys THIOETH and THIOLEST have been removed. Various bonds between amino-acids that used to be described by the feature keys BINDING, MOD_RES or SITE will progressively, in groups according the type of PTM, be modified and indicated by CROSSLNK. Disulfide bonds occur so often in proteins, that we decided to keep the special feature key DISULFID to annotate this kind of linkage.

Changes concerning keywords (KW line)

New keywords:

Swiss-Prot release 41.10 of 30-May-2003

Reference Comment (RC) line topics may span lines

The RC (Reference Comment) line store comments relevant to the reference cited, in currently 5 distinct topics: PLASMID, SPECIES, STRAIN, TISSUE and TRANSPOSON. It is not always possible to list all information within one line. Therefore we allow multiple RC lines, in which one topic might span over a line. Example:

Q9EVG8:
RC   STRAIN=AZ.026, DC.005, GA.039, GA2181, IL.014, IN.018, KY.172, KY2.37,
RC   LA.013, MN.001, MNb027, MS.040, NY.016, OH.036, TN.173, TN2.38,
RC   UT.002, AL.012, AZ.180, MI.035, VA.015, and IL2.17;
Changes concerning cross-references (DR line)
Genome Knowledgebase (GK)

We have added cross-references to the Genome Knowledgebase (GK) (available at http://www.genomeknowledge.org/), which is a collaboration among Cold Spring Harbor Laboratory, The European Bioinformatics Institute, and The Gene Ontology Consortium to develop a curated resource of core pathways and reactions in human biology.

The identifiers of the appropriate DR line are:

Data bank identifier GK
Primary identifier GK's unique identifier for a protein, which is identical to the Swiss-Prot primary AC number of that protein.
Secondary identifier None; a dash '-' is stored in that field.
Example
Q9BZJ0:
DR   GK; Q9BZJ0; -.
PIR SuperFamilies of iProClass

We have added cross-references to the PIR SuperFamilies of iProClass (available at http://pir.georgetown.edu/iproclass/), which is an integrated protein classification database.

The identifiers of the appropriate DR line are:

Data bank identifier PIRSF
Primary identifier iProClass superfamily number.
Secondary identifier Name for a superfamily.
Tertiary identifier: Number of hits found in the sequence, which is generally '1'.
Example
O28076:
DR   PIRSF; PIRSF006414; FTR; 1.
Swiss-Prot release 41.9 of 24-May-2003

Changes concerning keywords (KW line)

New keyword:

Swiss-Prot release 41.5 of 23-Apr-2003

Changes concerning keywords (KW line)

New keyword:

Swiss-Prot release 41.3 of 04-Apr-2003

Changes concerning keywords (KW line)

New keyword:

Swiss-Prot release 41.1 of 25-Mar-2003

New syntax of the CC line topic ALTERNATIVE PRODUCTS

In Swiss-Prot release 41.1 (and in the accompanying TrEMBL release), a new format was introduced for "CC ALTERNATIVE PRODUCTS" lines. The new format is more structured than the previous format. Associated with these changes are the introduction of stable identifiers for each named splice isoform in all entries that describe more than one splice isoform; the extension of feature identifiers, previously only used for human VARIANT and certain CARBOHYD features, to VARSPLIC features in entries from all species.

The new format of the CC line topic ALTERNATIVE PRODUCTS is:

CC   -!- ALTERNATIVE PRODUCTS:
CC       Event=Alternative promoter;
CC         Comment=Free text;
CC       Event=Alternative splicing; Named isoforms=n;
CC         Comment=Optional free text;
CC       Name=Isoform_1; Synonyms=Synonym_1[, Synonym_n];
CC         IsoId=Isoform_identifier_1[, Isoform_identifer_n]; 
CC         Sequence=Displayed;
CC         Note=Free text;
CC       Name=Isoform_n; Synonyms=Synonym_1[, Synonym_n];
CC         IsoId=Isoform_identifier_1[, Isoform_identifer_n]; 
CC         Sequence=VSP_identifier_1 [, VSP_identifier_n];
CC         Note=Free text;
CC       Event=Alternative initiation;
CC         Comment=Free text;

The qualifiers are described in the table below:

Topic Description
Event Biological process that results in the production of the alternative forms (Alternative promoter, Alternative splicing, Alternative initiation).
Format: Event=controlled vocabulary;
Example: Event=Alternative splicing;
Named isoforms Number of isoforms listed in the topics 'Name' currently only for 'Event=Alternative splicing'.
Format: Named isoforms=number;
Example: Named isoforms=6;
Comment Any comments concerning one or more isoforms; optional for 'Alternative splicing'; in case of 'Alternative promoter' and 'Alternative initiation' there is always a 'Comment' of free text, which includes relevant information on the isoforms.
Format: Comment=free text;
Example: Comment=Experimental confirmation may be lacking for some isoforms;
Name A common name for an isoform used in the literature or assigned by Swiss-Prot; currenty only available for spliced isoforms.
Format: Name=common name;
Example: Name=Alpha;
Synonyms Synonyms for an isoform as used in the literature; optional; currently only available for spliced isoforms.
Format: Synonyms=Synonym_1[, Synonym_n];
Example: Synonyms=B, KL5;
IsoId Unique identifier for an isoform, consisting of the Swiss-Prot accession number, followed by a dash and a number.
Format: IsoId=acc#-isoform_number[, acc#-isoform_number];
Example: IsoId=P05067-1;
Sequence Information on the isoform sequence; the term Displayed indicates, that the sequence is shown in the entry; a list of feature identifiers (VSP_#) indicates that the isoform is annotated in the feature table; the FTIds enable programs to create the sequence of a splice variant; if the accession number of the IsoId does not correspond to the accession number of the current entry, this topic contains the term External; Not described points out that the sequence of the isoform is unknown.
Format: Sequence=VSP_#[, VSP_#]|Displayed|External|Not described;
Example: Sequence=Displayed;
Example: Sequence=VSP_000013, VSP_000014; Example: Sequence=External;
Example: Sequence=Not described;
Note Lists isoform-specific information; optional.
Format: Note=Free text;
Example: Note=No experimental confirmation available;

Example of the CC lines and the corresponding FT lines for an entry with alternative splicing Q15746:

...
CC  -!- ALTERNATIVE PRODUCTS:
CC      Event=Alternative splicing; Named isoforms=6;
CC      Name=1;
CC        IsoId=Q15746-4; Sequence=Displayed;
CC      Name=2;
CC        IsoId=Q15746-5; Sequence=VSP_000040;
CC      Name=3A;
CC        IsoId=Q15746-6; Sequence=VSP_000041, VSP_000043; 
CC      Name=3B;
CC        IsoId=Q15746-7; Sequence=VSP_000040, VSP_000041, VSP_000042;
CC      Name=4;
CC        IsoId=Q15746-8; Sequence=VSP_000041, VSP_000042;
CC      Name=del-1790;
CC        IsoId=Q15746-9; Sequence=VSP_000044;
...
FT   VARSPLIC    437    506       VSGIPKPEVAWFLEGTPVRRQEGSIEVYEDAGSHYLCLLKA
FT                                RTRDSGTYSCTASNAQGQVSCSWTLQVER -> G (in
FT                                isoform 2 and isoform 3B).
FT                                /FTId=VSP_004791.
FT   VARSPLIC   1433   1439       DEVEVSD -> MKWRCQT (in isoform 3A,
FT                                isoform 3B and isoform 4).
FT                                /FTId=VSP_004792.
FT   VARSPLIC   1473   1545       Missing (in isoform 4).
FT                                /FTId=VSP_004793.
FT   VARSPLIC   1655   1705       Missing (in isoform 3A and isoform 3B).
FT                                /FTId=VSP_004794.
FT   VARSPLIC   1790   1790       Missing (in isoform Del-1790).
FT                                /FTId=VSP_004795.
 
...

The corresponding modules of the Swiss-Prot parser Swissknife have been modified, and Release 1.31 of Swissknife can be downloaded.

Changes concerning cross-references (DR line)

We have added cross-references to the Gene Ontology (GO) database (available at http://www.geneontology.org/), which provides controlled vocabularies for the description of the molecular function, biological process and cellular component of gene products.

The identifiers of the appropriate DR line are:

Data bank identifier GO
Primary identifier GO's unique identifier for a GO term.
Secondary identifier A 1-letter abbreviation for one of the 3 ontology aspects, separated from the GO term by a column. If the term is longer than 46 characters, the first 43 characters are indicated followed by 3 dots ('...'). The abbreviations for the 3 distinct aspects of the ontology are P (biological Process), F (molecular Function), and C (cellular Component).
Tertiary identifier 3-character GO evidence code. The meaning of the evidence codes is: IDA=inferred from direct assay, IMP=inferred from mutant phenotype, IGI=inferred from genetic interaction, IPI=inferred from physical interaction, IEP=inferred from expression pattern, TAS=traceable author statement, NAS=non-traceable author statement, IC=inferred by curator, ISS=inferred from sequence or structural similarity.
Examples
Q9XTD2
DR   GO; GO:0008601; F:protein phosphatase type 2A, regulator acti...; IPI.
DR   GO; GO:0000080; P:G1 phase of mitotic cell cycle; IDA.
DR   GO; GO:0008285; P:negative regulation of cell proliferation; IDA.
DR   GO; GO:0006470; P:protein amino acid dephosphorylation; IDA.

P04406:
DR   GO; GO:0005737; C:cytoplasm; NAS.
DR   GO; GO:0004365; F:glyceraldehyde 3-phosphate dehydrogenase (p...; NAS.
DR   GO; GO:0006096; P:glycolysis; NAS.
Changes concerning keywords (KW line)

New keywords:

Deleted keyword:

Swiss-Prot release 41.0, 28-Feb-2003

Progress in the conversion of Swiss-Prot to mixed-case characters

We are gradually converting Swiss-Prot entries from all UPPER CASE to MiXeD CaSe. With this release the RC (Reference Comment) line topic STRAIN and the CC line topic CATALYTIC ACTIVITY have been converted.

"Nucleomorph" added to the OrGanelle (OG) line

The OG (OrGanelle) line indicates from which genome a gene for a protein originates. Until now, defined terms in the OG line where "Chloroplast", "Cyanelle", "Mitochondrion" and "Plasmid". The term "Nucleomorph" has been added, which is the residual nucleus of an algal endosymbiont that resides inside its host cell.

Multiple RP lines

Starting with release 41, there can be more than one RP (Reference Position) line per reference in a Swiss-Prot entry. The RP line describes the extent of the work carried out by the authors of the reference, e.g. the type of molecule that has been sequenced, protein characterization, PTM characterization, protein structure analysis, variation detection, etc.

As the number of experimental results per publication has increased over the years, the limitation of using a single RP line per reference no longer allowed to add all the information while maintaining a consistent format. Therefore we decided to permit multiple RP lines.

Example:

RP   SEQUENCE FROM N.A., SEQUENCE OF 23-42 AND 351-365, AND
RP   CHARACTERIZATION.
Changes concerning cross-references (DR line)
Schizosaccharomyces pombe GeneDB Prototype

We have added cross-references to the Schizosaccharomyces pombe GeneDB Prototype (available at http://www.genedb.org/genedb/pombe/index.jsp), which contains all S. pombe known and predicted protein coding genes, pseudogenes and tRNAs. It is hosted by the Sanger Institute.

The identifiers of the appropriate DR line are:

Data bank identifier GeneDB_SPombe
Primary identifier GeneDB's unique identifier for a S. pombe gene.
Secondary identifier None; a dash "-" is stored in that field.
Example
DR   GeneDB_SPombe; SPAC9E9.12c; -.
Genew

We have added cross-references to the Human Gene Nomenclature Database Genew (available at http://www.gene.ucl.ac.uk/cgi-bin/nomenclature/searchgenes.pl), which provides data for all human genes which have approved symbols. It is managed by the HUGO Gene Nomenclature Committee (HGNC).

The identifiers of the appropriate DR line are:

Data bank identifier Genew
Primary identifier HGNC's unique identifier for a human gene
Secondary identifier HGNC's approved gene symbol.
Example
DR   Genew; HGNC:5217; HSD3B1.
Gramene

We have added cross-references to the Gramene database, a comparative mapping resource for grains (available at http://www.gramene.org/).

The format for the explicit links is:

Data bank identifier Gramene
Primary identifier Unique identifier for a protein, which is identical to the Swiss-Prot primary AC number of that protein.
Secondary identifier None; a dash '-' is stored in that field.
Example
DR   Gramene; Q06967; -.
HAMAP

We have added cross-references to the collection of orthologous microbial protein families, generated manually by expert curators of the HAMAP (High-quality Automated and Manual Annotation of microbial Proteomes) project in the framework of the Swiss-Prot protein knowledgebase. The data is accessible at /sprot/hamap/families.html.

The identifiers of the appropriate DR line are:

Data bank identifier HAMAP
Primary identifier HAMAP unique identifier for a microbe protein family
Secondary identifier The values are either '-', 'fused', 'atypical' or 'atypical/fused'. The value '-' is a placeholder for an empty field; the 'fused' value indicates that the family rule does not cover the entire protein; the value 'atypical' points out that the protein is divergent in sequence or has mutated functional sites, and should not be included in family datasets. The value 'atypical/fused' indicates both latter findings.
Tertiary identifier Number of domains found in the protein, generally '1', rarely '2' for the fusion of 2 identical domains.
Example
DR   HAMAP; MF_00012; -; 1.
Phosphorylation Site Database

We have added cross-references to the Phosphorylation Site Database, PhosSite (available at http://vigen.biochem.vt.edu/xpd/xpd.htm), which provides access to information from scientific literature concerning prokaryotic proteins that undergo covalent phosphorylation on the hydroxyl side chains of serine, threonine or tyrosine residues.

The identifiers of the appropriate DR line are:

Data bank identifier PhosSite
Primary identifier Unique identifier for a phosphoprotein, which is identical to the Swiss-Prot primary AC number of that protein.
Secondary identifier None; a dash '-' is stored in that field.
Example
DR   PhosSite; P00955; -.
TIGRFAMs

We have added cross-references to TIGRFAMs, a protein family database available at http://www.tigr.org/TIGRFAMs/. The identifiers of the appropriate DR line are:

Data bank identifier TIGRFAMs
Primary identifier TIGRFAMs unique identifier for a protein family.
Secondary identifier TIGRFAMs entry name for a protein family.
Tertiary identifier Number of hits found in the sequence.
Example
DR   TIGRFAMs; TIGR00630; uvra; 1.
CarbBank

We have removed the Swiss-Prot cross-references to CarbBank.

GCRDb

We have removed the Swiss-Prot cross-references to GCRDb.

Mendel

We have removed the Swiss-Prot cross-references to Mendel.

YEPD

We have removed the Swiss-Prot cross-references to the yeast electrophoresis protein database (YEPD).

Explicit links to dbSNP in FT VARIANT lines of human sequence entries

In human protein sequence entries we have introduced explicit links to the Single Nucleotide Polymorphism database (dbSNP) from the feature description of FT VARIANT keys.

The format of such links is:

FT   VARIANT    from     to	  description (IN dbSNP:accession_number).
FT                                /FTId=VAR_number.
Example:
FT   VARIANT      65     65       T -> I (IN dbSNP:1065419).
FT                                /FTId=VAR_012009.
Feature key SIMILAR became obsolete

The feature key SIMILAR was used to describe the extent of a similarity with another protein sequence. Nowadays, most domains with similarity to other proteins are known regions described in domain and family databases, which are annotated in Swiss-Prot with the feature key DOMAIN or REPEAT and the comment (CC) line topic SIMILARITY; thus the feature key SIMILAR became obsolete and will not be used again.

Version of SP in XML format

A distribution version of Swiss-Prot and TrEMBL in XML format is being developed. The first draft of the XML specification was released for public review on February 21, 2002.

ExPASy logo ExPASy Home page Site Map Search ExPASy Contact us Swiss-Prot
 Hosted by ca flag CBR Canada Mirror sites: Australia  Brazil  China  Korea  Switzerland
Notice: This page will be replaced with www.uniprot.org. Please send us your feedback!