top of page

SAGB Genome Web

Index
Linkage Analysis
Peptide Identification
Protein General Sequence Analysis Analysis
Primer Design
Gene Expression and Microarrays
Nucleic Acid General Sequence Analysis Analysis
Human Mutation Databases
Human Genome Databases

Alternative Splicing Events
RNA Structures


Linkage Analysis
Linkage Analysis


Some sites describing linkage analysis.

[info] Laboratory of Statistical Genetics at Rockefeller University
[info] WEB-PREPLINK
[info] Cooperative Human Linkage Center (CHLC)
[info] Genetic Power Calculator
[info] Pelican - Pedigree Editor for LInkage Computer ANalysis


Detailed information on the above options


Laboratory of Statistical Genetics at Rockefeller University


WEB-PREPLINK
WEB-PREPLINK is an alternative to the PREPLINK program which prepares "datafile.dat" for the LINKAGE program. Once finished, the WEB-PREPLINK will display a plain-text version of the datafile.dat on the screen, and the user can save it to a file on his/her local computer.


Cooperative Human Linkage Center (CHLC)
The goal of the Cooperative Human Linkage Center is to develop statistically rigorous, high heterozygosity genetic maps of the human genome that are greatly enriched for the presence of easy-to-use PCR-formatted microsatellite markers.


Genetic Power Calculator
A website for performing power calculations for the design of linkage and association genetic mapping studies of complex traits.


Pelican - Pedigree Editor for LInkage Computer ANalysis
Pelican is a Pedigree Editor for LInkage Computer ANalysis. It is a utility for graphically editing the pedigree data files used by programs such as FASTLINK, VITESSE, GENEHUNTER and MERLIN.

It can read in and write out pedigree files, saving changes that have been made to the structure of the pedigree.

Changes are made to the pedigree via a graphical display interface. The resulting display can be saved as a pedigree file and as a graphical image file.

If you use Pelican for a publication, please acknowledge our work: Dudbridge F, Carver T, Williams GW. "Pelican: Pedigree Editor for Linkage Computer Analysis." Bioinformatics 2004


top of section   top of page
 

Peptide Identification
Peptide Identification

This is a list of sites to aid in peptide identification.

[info] CombSearch
[info] PeptideSort
[info] PeptideSearch Database searching by mass spectrometric data
[info] PepSea
[info] PROPSEARCH - database query by amino acid composition
[info] ProteinProspector
[info] PROWL
[info] ProteinInfo
[info] ProFound
[info] PepFrag
[info] ExPASy Proteomic Tools
[info] Molecular Weight
[info] Protein Prospector


Detailed information on the above options


CombSearch
Attempts to provide a unified interface to query several protein identification tools accessible on the web. Currently it includes PeptIdent, TagIdent and MultiIdent from ExPASy, MS-Fit from ProteinProspector, ProFound from PROWL and PeptideSearch from EMBL Protein and Peptide Group.


PeptideSort
Shows the peptide fragments from a digest of an amino acid sequence.


PeptideSearch Database searching by mass spectrometric data
Protein identification by peptide mapping or peptide sequencing. This is a tool for database searching by mass spectrometric data, such as peptide mass maps or (partial) amino acid sequences.


PepSea
This is an advanced tool for protein database searching by mass spectrometric data, such as peptide mass maps or (partial) amino acid sequences.


PROPSEARCH - database query by amino acid composition
Compositional search using experimental Amino Acid Analysis data.

PROPSEARCH reads your amino acid compositional analysis data and performs a protein database query to identify the protein.


ProteinProspector
Tools for mining sequence databases in conjunction with Mass Proteometry experiments.


PROWL
Provides an interactive environment for protein analysis that can be easily accessed with any web browser over the Internet.


ProteinInfo
A tool for retrieval and analysis of information from protein sequence databases.


ProFound
A tool for searching a protein sequence database using information from mass spectra of peptide maps.


PepFrag
A tool for searching protein or nucleotide sequences using information from fragmentation mass spectra of peptides.


ExPASy Proteomic Tools

plus many others.

Molecular Weight
This calculates the molecular weight of your peptide.


Protein Prospector
This is a suite of tools for analysing protein Mass Spetrometry data.

Sequence Database Search Programs

Peptide / Protein MS Utility Programs FASTA Database Manipulation/Information Tools
top of section   top of page
 

Protein General Sequence Analysis Analysis
Protein General Sequence Analysis

These are a collection of protein sequence analysis utilities.

[info] BIOLOGY WORKBENCH
[info] MOWSE - search by molecular weight fingerprint
[info] ProtFun - Prediction of functional category and enzyme class
[info] NetOglyc - Prediction of Mucin type O-glycosylation sites
[info] Prediction of GlcNAc O-glycosylation sites
[info] N-glycosylation sites prediction
[info] PSORT - Analyze and predict protein sorting signals.
[info] ProtComp - sub-cellular localization of Eukaryotic proteins
[info] SubLoc - Prediction of Protein Subcellular Localization
[info] PredictNLS - prediction and analysis of nuclear localization signals
[info] Peptide MW Calculator
[info] Compute pI/Mw tool
[info] ProtParam tool
[info] SAPS - Statistical Analysis of Protein Sequences
[info] CBRG at ETHZ
[info] PepSea - Protein identification by peptide mapping or peptide sequencing
[info] SPAC - identify polypeptide using amino-acid composition
[info] GeneFIND Family Identification System
[info] GeneQuiz
[info] PEDANT - Protein Extraction, Description, and ANalysis Tool
[info] PipeAlign
[info] PANAL - integrated resource for protein sequence analysis
[info] A280/A260 calculator
[info] Pratt Pattern Discovery
[info] META-PP - submit jobs simultaneously to various servers
[info] EMBOSS Pepinfo/Pepwindow/Pepstats


Detailed information on the above options


BIOLOGY WORKBENCH
The Biology Workbench is a point and click WWW interface for an integrated set of programs and database searching tools that allow you to carry out sequence analysis without having to log into a remote computer site.

The service is free to non-commercial researchers (you just need to register). There is a comprehensive demonstration to help you get started.


MOWSE - search by molecular weight fingerprint
You can use this page to submit a MOWSE database search. MOWSE will search the OWL protein database with the protein fragment information, and return the protein(s) which most likely correspond to your peptide-data.


ProtFun - Prediction of functional category and enzyme class
The ProtFun server produces ab initio predictions of protein function from sequence. The method queries a large number of other feature prediction servers to obtain information on various post-translational and localizational aspects of the protein, which are integrated into final predictions of the functional category and enzyme class (if any) of the submitted sequence.


NetOglyc - Prediction of Mucin type O-glycosylation sites
The specificities of the UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferase family which links the carbohydrate GalNAc to the side chain of certain serine and threonine residues in mucin type glycoproteins are presently unknown. The specificity seems to be modulated by sequence context, secondary structure and surface accessibility. The sequence context of glycosylated threonines was found to differ from that of serine, and the sites were found to cluster. Non-clustered sites had a sequence context different from that of clustered sites. Charged residues were disfavoured at position -1 and +3. A jury of artificial neural networks was trained to recognize the sequence context and surface accessibility of 299 known and verified mucin type O-glycosylation sites extracted from O-GLYCBASE. The cross-validated NetOglyc network system correctly found 83 % of the glycosylated and 90 % of the non-glycosylated serine and threonine residues in independent test sets, thus proving more accurate than matrix statistics and vector projection methods.


Prediction of GlcNAc O-glycosylation sites
The specificities of the UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferase family which links the carbohydrate GalNAc to the side chain of certain serine and threonine residues in mucin type glycoproteins are presently unknown. The specificity seems to be modulated by sequence context, secondary structure and surface accessibility. The sequence context of glycosylated threonines was found to differ from that of serine, and the sites were found to cluster. Non-clustered sites had a sequence context different from that of clustered sites. Charged residues were disfavoured at position -1 and +3. A jury of artificial neural networks was trained to recognize the sequence context and surface accessibility of 299 known and verified mucin type O-glycosylation sites extracted from O-GLYCBASE. The cross-validated NetOglyc network system correctly found 83% of the glycosylated and 90% of the non-glycosylated serine and threonine residues in independent test sets, thus proving more accurate than matrix statistics and vector projection methods.


N-glycosylation sites prediction
A WWW server for predicting N-glycosylation sites is now available online. The consensus triplet, Asn-Xaa-Ser/Thr (Xaa not Pro), is not sufficient to discriminate glycosylated and non-glycosylated asparagines. The server attempts to make this distinction.

The prediction method is based on artificial neural networks that examine the surrounding sequence context. In a cross-validated performance, the networks could identify 86% of the glycosylated and 61% of the non-glycosylated sequons, with an overall accuracy of 76%. The method can be optimised for high specificity or high sensitivity.


PSORT - Analyze and predict protein sorting signals.
Analyze and predict protein sorting signals coded in amino acid sequences.


ProtComp - sub-cellular localization of Eukaryotic proteins
The program is based on complex neural-network recognizers, which identify probability of the subcellular localization in nucleus, plasma membrane, extracellular, cytoplasmic, mitochondrial, chloroplast, endoplasmic reticulum, peroxisomal, lysosomal or Golgi compartments.


SubLoc - Prediction of Protein Subcellular Localization
Subcellular localisation is a key functional characteristic of proteins. A fully automatic and reliable prediction system for protein subcellular localisation is needed.

The total prediction accuracies of SubLoc reach 91.4% for three subcellular locations in prokaryotic organisms and 79.4% for four locations in eukaryotic organisms. Predictions by this approach are robust to errors in the protein N-terminal sequences.


PredictNLS - prediction and analysis of nuclear localization signals
PredictNLS is an automated tool for the analysis and determination of Nuclear Localization Signals (NLS).

You submit a protein sequence or a potential NLS. PredictNLS predicts that your protein is nuclear or finds out whether your potential NLS is found in our database. The program also compiles statistics on the number of nuclear/non-nuclear proteins in which your potential NLS is found. Finally, proteins with similar NLS motifs are reported, and the experimental paper describing the particular NLS are given.


Peptide MW Calculator
This calculates the molecular weight of your peptide.


Compute pI/Mw tool
Compute pI/Mw is a tool which allows the computation of the theoretical pI (isoe lectric point) and Mw (molecular weight) for a protein sequence.


ProtParam tool
ProtParam is a tool which allows the computation of various physical and chemical parameters for a protein sequence.

The computed parameters include the molecular weight, theoretical pI, amino acid composition, extinction coefficient, estimated half-life, instability index, aliphatic index and grand average of hydropathicity (GRAVY).


SAPS - Statistical Analysis of Protein Sequences
This program, written by the group of Samuel Karlin, analyses proteins for statistically significant features like charge-clusters, repeats, hydrophobic regions, compositional domains etc. One of its options is to generate self-explanatory output.


CBRG at ETHZ


PepSea - Protein identification by peptide mapping or peptide sequencing
This is an advanced tool for protein database searching by mass spectrometric data, such as peptide mass maps or (partial) amino acid sequences.


SPAC - identify polypeptide using amino-acid composition
SPAC is able to retrieve in protein or nucleic acid databases, the sequence corresponding to a protein or peptide whose only amino acid composition and molecular weight are known. This algorithm is more particularly devoted to the retrieval of partial sequences, a task that other available softwares poorly perform. Its accuracy for the attribution of a protein fragment to a sequence could represent an easy and economical first tool upstream the use of more sophisticated and expensive methods in proteomic research.


GeneFIND Family Identification System
The GeneFIND family identification system aims at high-throughput full-scale gene family identification, by taking advantages of the strengths of various sea rch methods and incorporating ProClass family information. Multi-level filters are u sed, starting the fastest MOTIFIND neural networks, followed by BLAST search, SSEARCH dynamic programming, and motif pattern search. The current implementation allows large-scale identification of 942 protein families.


GeneQuiz
Genequiz provides highly automated analysis of biological sequences.

GeneQuiz derives functional annotation for protein sequences and provides supporting evidence, including family alignments.


PEDANT - Protein Extraction, Description, and ANalysis Tool
PEDANT is a software system for completely automatic and exhaustive analysis of protein sequence sets - from individual sequences to complete genomes.

the predicted open reading frames from fully sequenced genomes using a combination of sequence comparison and prediction techniques

PipeAlign
PipeAlign is an on-line protein family analysis tool providing both interactive and automatic workbench for the validation, integration and presentation of the biological insights resulting from the analysis. It integrates a 5 step process ranging from the search for sequence homologues in protein sequence and 3D structure databases to the definition of the hierarchical relationship between and within subfamilies. Each step relies upon the results from the previous ones until a validated multiple alignment integrating subfamilies information is produced. The Pipe can also be started from any point and intermediate results are easily consulted.


PANAL - integrated resource for protein sequence analysis
Panal is an integrated resource for protein sequence analysis. The tool allows the user to simultaneously search their protein query sequence for motifs from several databases, and to view an intuitive graphical summary.


A280/A260 calculator
This can be used to calculate the molecular weight (using average isotopic mass), extinction coefficient, the concentration, and the formal charge of a protein.


Pratt Pattern Discovery
Pratt is a tool that allows the user to search for patterns conserved in a set of protein sequences. The user can specify what kind of patterns should be searched for, and how many sequences should match a pattern to be reported.


META-PP - submit jobs simultaneously to various servers
This service allows you to submit protein sequences simultaneously to many other analysis services. Included are:

Homology modelling

Secondary structure Threading Transmembrane helices Various

EMBOSS Pepinfo/Pepwindow/Pepstats
Program pepinfo detects and displays various useful metrics about a protein sequence. Pepwindow reads in a protein sequence and displays a graph of the classic Kyte & Doolittle hydropathy plot of that protein. Pepstats outputs a report of simple protein sequence information.


top of section   top of page
 

Primer Design
Primer Design

These are a collection of primer design sites.

Primer Prediction and Analysis programs

[info] The PCR Jump Station
[info] GeneFisher
[info] GeneWalker
[info] Web Primer
[info] Primer3
[info] POLAND - melting profiles of double stranded DNA
[info] CODEHOP - PCR primers designed from protein multiple sequence alignments
[info] NetPrimer
[info] rawprimer - a tool for selection of PCR primers
[info] ExonPrimer
[info] PrimerX - Design of mutagenic primers for site-directed mutagenesis
[info] RevTrans - reverse translates a peptide alignment
[info] AutoPrime - automated primer design

Primer Design Criteria

[info] Choosing Primers for sequencing
[info] Design of Primers for Automated Sequencing
[info] Primer Design Workshop

Primer Databases

[info] MENDB - a database of polymorphic loci from natural populations

Detailed information on the above options


The PCR Jump Station
The ultimate Web page for information and links on all aspects of the Polymerase Chain Reaction (PCR).


GeneFisher
GeneFisher processes aligned or unaligned sequences. GeneFisher comes with a built in alignment tool that feeds the results directly into the next step. You have the option of looking at a graphical representation of the alignments and the consensus sequence that is really useful for a variety of tasks, not just primer design. At this point you can reject the alignment and adjust the parameters to try again or you can continue to the primer design step.


GeneWalker
GeneWalker helps you designing your primers. It is a powerful tool, yet it is based on the philosophy of ease to use. GeneWalker allows you to work with two primer sequences simultaneously. You can also toggle between reversed complementary primer sequences. Basic functions includes calculation of primer secondary structures, primer annealing properties as well as primer dimers. GeneWalker also helps you to easily connect to EMBLs databases.


Web Primer
An application that designs primers for PCR or sequencing purposes. The user must choose the purpose for which the primers will be used, and either specify a locus name or enter a sequence.


Primer3
Primer3 picks primers for PCR reactions, according to the conditions specified by the user. Primer considers things like melting temperature, concentrations of various solutions in PCR reactions, primer bending and folding, and many other conditions when attempting to choose the optimal pair of primers for a reaction. All of these conditions are user-specifiable, and can vary from reaction to reaction.


POLAND - melting profiles of double stranded DNA
The Poland server will calculate the thermal denaturation profile of double stranded RNA or DNA based on sequence input and parameter settings in this form. Calculation is based on D. Poland's algorithm in the implementation described by G. Steger.

Calculations can be done for oligonucleotides (>15 bases) or long double strands (>50 bases), respectively.


CODEHOP - PCR primers designed from protein multiple sequence alignments
The CODEHOP program designs PCR (Polymerase Chain Reaction) primers from protein multiple-sequence alignments. The program is intended for cases where the protein sequences are distant from each other and degenerate primers are needed.

The multiple-sequence alignments should be of amino acid sequences of the proteins and be in the Blocks Database format Proper alignments can be obtained by different methods.

The result of the CODEHOP program are suggested degenerate sequences of DNA primers that you can use for PCR. You have to choose appropriate primer pairs, get them synthesized and perform the PCR.


NetPrimer
NetPrimer combines the latest primer design algorithms with a web-based interface allowing the user to analyze primers over the Internet. All primers are analyzed for melting temperature using the nearest neighbor thermodynamic theory to ensure accurate Tm prediction. Primers are analyzed for all secondary structures including hairpins, self-dimers, and cross-dimers in primer pairs. This ensures the availability of the primer for the reaction as well as minimizing the formation of primer dimer. The program eases quantitation of primers by calculating primer molecular weight and optical activity. To facilitate the selection of an optimal primer, each primer is given a rating based on the stability of its secondary structures. A comprehensive analysis report can be printed for individual primers or primer pairs.


rawprimer - a tool for selection of PCR primers
This page provides an interface to a PCR primer selection program based on xprimer.

It is designed for selection of sets of primers along very large queries, all with a relatively narrow Tm range. It is also useful in more traditional PCR applications.

This version of xprimer produces no graphics. It is set up to use human and dog repeat files and several species models. The query sequence (raw sequence with no header - only agct in either case are significant) can be dragged into the textarea in the interface.


ExonPrimer
This helps to design intronic primers for the PCR amplification of exons. The script needs a cDNA and the corresponding genomic sequence as input. It aligns these sequences using Blat and designs PCR primers to amplify each exon using Primer3. The positions of the exons are deduced from the alignment of the genomic and the cDNA sequences. The user can define the maximum exon size. Exons larger than this size will be divided into several parts. The poly-A tail of the cDNA should be clipped to allow the alignment of the cDNA and the genomic DNA sequence. The genomic sequence must be longer than the cDNA sequence. Otherwise the design of primers for the first and/or last exon is not possible.


PrimerX - Design of mutagenic primers for site-directed mutagenesis
PrimerX is a web-based program written to automate the design of complementary mutagenic primers for site-directed mutagenesis. Based on your input, PrimerX compares a template DNA sequence with a DNA or protein sequence that already incorporates the desired mutation. It then computes for all possible oligonucleotide sequences of appropriate length that encode this mutation at the center and follow your specified constraints. Finally, PrimerX generates both forward and reverse primer sequences, and provides information such as melting temperature and GC content for each primer pair.

PrimerX can design mutagenic primers based on two different forms of input. One option is for you to directly enter the mutated DNA sequence, incorporating the desired base pair insertions, deletions, or substitutions. This is recommended for generating SNPs and indels. The other is to enter the mutated protein sequence, in which case the program first computes for all possible DNA sequences that can encode the desired change, taking into account codon degeneracy. This is recommended for changing a specific amino acid residue into another. Primers are then generated that match constraints that you specify.

In addition to this, PrimerX can characterize primers that you have designed. Here, you only need to enter a mutagenic primer sequence and the number of mismatched bases, and PrimerX will compute and report back its reverse complement, GC content, melting temperature, etc.


RevTrans - reverse translates a peptide alignment
RevTrans aligns coding DNA sequences (codons) based on aligned protein sequences greatly assisting the design of PCR primers based on aligned families of proteins.


AutoPrime - automated primer design
AutoPrime allows the effortless design of primers for Real-Time PCR. It employs the Primer3 software and sequence information from the Ensembl database for a variety of organisms.


Choosing Primers for sequencing
Information provided by the DNA sequencing facility at the University of Chicago Cancer Research Centre.


Design of Primers for Automated Sequencing
Guide to designing primers from the DNA sequencing core facility at the University of Michigan.


Primer Design Workshop
qPCR made very very simple (ideal for medics and lazy academics)


MENDB - a database of polymorphic loci from natural populations
This is the sister database for Molecular Ecology Notes, containing the details for reported loci (i.e., primer sequences, amplification conditions, polymorphism levels, cross-species amplification, and literature citations) in a searchable format.

The database contains all Primer Note submissions to Molecular Ecology, as well as primer submissions to Molecular Ecology Notes. In the future, relevant submissions from other journals will be included, as it is hoped that this database will become the on-line resource for molecular markers developed for "non-commercial" and non-model species.


top of section   top of page
 

Gene Expression and Microarrays
Gene Expression and Microarrays

These are a collection of Gene Expression and Microarrays links.

Gene Expression Databases

[info] Gene Expression Omnibus (GEO)
[info] Gene Expression Atlas: Text Query
[info] Worm Chip Directory
[info] Stanford Microarray Database (SMD)
[info] ExpressDB
[info] EPODB
[info] The Gene Expression Database (GXD)
[info] The microarray project (uAP)
[info] ArrayDB
[info] uArray Database (mAdb)
[info] ChipDB: A Genome Expression Monitoring Database System
[info] The ArrayExpress Database
[info] RNA Abundance Database (RAD)
[info] GeneX: a Collaborative Internet Database and Toolset for Gene Expression Data
[info] Microarray analysis tool(MAT) software
[info] FLEXGene Consortium
[info] Super Array


Gene Expression Analysis Tools

[info] Microarray Software
[info] microarrays.org Software
[info] Microarray group at the University of Manchester
[info] R Packages For Gene Expression Analysis
[info] ExpressYourself
[info] GAPAS - Gene expression pattern analysis suite
[info] Multi microarray normalisation
[info] Expression Profiler at the EBI

GO data mining and GO resources

[info] QuickGO - A fast GO Browser
[info] FatiGO - Data mining with Gene Ontology
[info] GoMiner
[info] FuncAssociate - The Gene Set Functionator

Other Data mining Tools

[info] EASE: the Expression Analysis Systematic Explorer
[info] Gene Network Inference from Large-Scale Gene Expression Data
[info] Knowledge-based Analysis of Microarray Gene Expression Data Using Support Vector Machines
[info] Data Mining: Making Sense of Gene Expression Data
[info] PubGeneTM Gene Database and Tools
[info] DN - digiNorthern - digital expression analysis based on ESTs
[info] MatchMiner - navigation among gene and gene product identifiers

Data formats

[info] MicroArray and Gene Expression Markup Language - MAGE-ML
[info] Minimum information about a microarray experiment - MIAME

Lists and Resources

[info] Gene Chips (DNA Microarrays)
[info] DNA Microarrays
[info] Large-Scale Gene Expression and Microarray Links and Resources
[info] MicroArray related activities at the EBI
[info] Listing of DNA microarray links
[info] Vivian Cheung's group
[info] Microarrays.Org
[info] GRID IT: Resources for Microarray Technology
[info] DNA Microarrays
[info] Microarray & Data Analysis

Protocols

[info] Gene Chip Microarray Protocol Websites
[info] NIEHS Microarray Center

Detailed information on the above options


Gene Expression Omnibus (GEO)
In order to support the public use and dissemination of gene expression data, NCBI has launched the Gene Expression Omnibus. GEO is our effort to build a gene expression data repository and online resource for the retrieval of gene expression data from any organism or artificial source. Gene expression data from multiple platforms, including spotted microarray (microarray), high-density oligonucleotide array (HDA), hybridization filter (filter) and serial analysis of gene expression (SAGE) data, will be accepted, accessioned, and archived as a public data set. A series of precomputed definitions and descriptions of the data, as well as online tools for the interactive retrieval and analysis of this expression data will follow shortly thereafter. It is anticipated that this repository and resource will become operational and ready for general submissions in Spring 2000.


Gene Expression Atlas: Text Query
Search by accession number or keyword to search for in the following databases: Genbank, Unigene, LocusLink, SwissProt, PFAM.

Display levels of expression in different human and mouse tissues. Display genes with correlated expression levels.

Display Options


Worm Chip Directory
The full genome chips have been printed and are now being used for experiments. These microarrays contain a spot for each gene, plus some control spots.


Stanford Microarray Database (SMD)
SMD stores raw and normalized data from microarray experiments, as well as their corresponding image files. In addition, SMD provides interfaces for data retrieval, analysis and visualization.


ExpressDB
ExpressDB is a relational database containing yeast RNA expression data. As of July, 1999 it contains 17.5 million pieces of information loaded from 11 published and in-house expression studies. A manuscript describing the database and the process of managing and analyzing expression data has been submitted for publication.


EPODB
EpoDB (Erythropoiesis database) is a database of genes that relate to vertebrate red blood cells. It includes DNA sequence, structural features, protein information, gene expression information and transcription factor binding sites.


The Gene Expression Database (GXD)
GXD integrates the many types of expression data and provides links to other relevant resources to place the data into the larger biological and analytical context. The time and space of gene expression is described by a controlled Dictionary of Anatomical Terms that is part of the Anatomy Database. For in situ expression assays, the textual annotations in GXD are complemented by 2 images of original expression data that are indexed via the terms from the dictionary.


The microarray project (uAP)
The Microarray Project is a collaborative research effort between numerous intramural scientists in multiple Institutes and Divisions of the National Institutes of Health (NIH), including the National Human Genome Research Institute (NHGRI), National Center for Biotechnology Information (NCBI), National Cancer Institute (NCI), National Institute of Neurological Disorders and Stroke (NINDS), Biomedical Engineering and Instrumentation Program (BEIP), Division of Computer Research and Technology (DCRT) and many others.


ArrayDB
ArrayDB 2.1.03 is available in a BETA VERSION.


uArray Database (mAdb)
NCI/DCS uArray Center mAdb Gateway.


ChipDB: A Genome Expression Monitoring Database System
chipDB is a genome expression monitoring database system designed to allow members of the Young Lab and the yeast research community to analyze data produced by high-throughput expression monitoring technologies such as Affymetrix gene chips.


The ArrayExpress Database
The EBI has discussed the possibility of establishing a public repository for DNA microarray based gene expression data with many of the major laboratories developing and using these technologies in Europe and the USA.

Following these discussions, the European Bioinformatics Institute is committed to establishing a public repository for microarray based gene expression data, named ArrayExpress. Currently the EBI is establishing a pilot database containing microarray gene expression data that are available publicly.


RNA Abundance Database (RAD)
Slides for a talk explaining RAD.


GeneX: a Collaborative Internet Database and Toolset for Gene Expression Data
The National Center for Genome Resources and the Computational Genomics Group at the University of California, Irvine are participating in the GeneX project to provide an Internet-available repository of gene expression data with an integrated toolset that will enable researchers to analyze their data and compare them with other such data. The corpus of such data will allow more confidence to be placed on the conclusions reached in this analysis, as well as sharing the considerable cost of generating these datasets.


Microarray analysis tool(MAT) software
A Microarray data management and analysis software has developed in AECOM. The following is the data analysis flow chart.


FLEXGene Consortium
FLEXGene (Full-Length Expression) will be a complete repository of full-length cDNA clones for the human and other model organisms. The fundamental goal of this repository is to enable high-throughput protein expression. To this end, the clones are all constructed using a recombination-based vector system so that hundreds or thousands of coding regions can be simultaneously transferred into any protein expression vector overnight. These transfers can be made mutation free into virtually any kind of expression vector allowing the broadest variety of experiments.

Because most high-throughput approaches will require the addition of peptide tags, a crucial feature of the cloning strategy is to remove the untranslated regions of each cDNA to allow the production of fusion proteins on either or both ends of the protein if desired. This repository of "expression-ready" clones is an important next step after the completion of the genome, and one that has the potential to revolutionize the study of protein function by enabling the high-throughput production and analysis of proteins.


Microarray Software


microarrays.org Software


Microarray group at the University of Manchester
Software support for microarray expression analysis.


R Packages For Gene Expression Analysis
This list contains R packages and software based on the R system to analyze gene expression data from DNA array experiments, both for olignonucleotide chips and cDNA microarrays.


ExpressYourself
Here we present a fully automated system for analyzing the results produced by microarray experiments. The system incorporates both novel and published algorithms for filtering problematic regions of the array, correcting the background array signal, normalizing the Cy3 and Cy5 probe signals, scoring levels of differential expression, combining the results of replicate experiments, and assessing the quality of individual and replicate experiments. The results are presented using an intuitive web-based graphical interface to allow easy comparison with the original images of the array slides. The results are clear and simple, and allow immediate identification of differentially hybridized array probes.

We also compare alternative processing methods, which can provide highly divergent results, and test them against microarray data with confirmed results. In this way, we make an initial attempt to identify the processing methods that provide the most biologically meaningful answers.

The analysis tool, called EXPRESS YOURSELF!, is quick and easy to use; the user only needs to upload an output file from an image quantification program such as GenePix Pro, and can obtain the results in a few minutes. EXPRESS YOURSELF! can be used to assess data from gene expression, comparative genomic hybridization, and Chip-chip experiments. It is freely available for use.


GAPAS - Gene expression pattern analysis suite
This is an interactive web tool for preprocessing microarray gene expression data. It analyses the data, suggests the most appropriate transformations and proceeds with them after user agreement. The normal preprocessing steps include scale transformations, management of missing values, replicate handling, flat pattern filtering and pattern standardisation and they are required before performing any pattern analysis. The preprocessed data set can be sent to other pattern analysis tools.


Multi microarray normalisation
The purpose of this facility is to allow you to compare microarray experimental data where you may have swapped the dyes in different runs of the experiment and want to allow for artefacts such as those due to differences in the print pins in the arraying robot. The method used is based on ANOVA (analysis of variance). The outcome of this proceedure is a single file of normalised data which can be read into various analysis programs for cluster and other analyses. The current file identifies the treatment on each line along with the normalised figures and the accession number or other identifier which you use to identify the gene or DNA fragment which is responsible for the hybridisation.

Two types of input files can be analysed, those which contain both Cy3 and Cy5 data in a single file and those in which the signal from the two dyes is written to separate files. Here in the Beatson spotted microarrays data are collected into separate files for Cy3 and Cy5.


Expression Profiler at the EBI
Expression Profiler: Next Generation is an open, extensible web-based collaborative platform for microarray gene expression, sequence and PPI data analysis, exposing distinct chainable components for clustering, pattern discovery, statistics (thru R), machine-learning algorithms and visualization.


QuickGO - A fast GO Browser
QuickGO is a fast web based browser of the Gene Ontology data (see geneontology.org) based at the EBI, as well as the annotation of GO to UniProt and InterPro generated by the GOA project. It integrates into InterPro, providing links between the two data sets that are navigable via the web. Various search facilities also exist.


FatiGO - Data mining with Gene Ontology
FatiGO is a web interface which carries out simple datamining using Gene Ontology for DNA microarray data. The datamining consists on the assignation of the most characteristic Gene Ontology term to each cluster. GO terms are related to Human, Mouse, Fly, Worm and Saccharomyces genes and proteins. The assignation of the most relevant GO terms to each cluster is performed by means of a chi-square test. Since each gene can contribute with a different number of GO terms to the p-value for the total chi-square test is obtained by means a permutation test.


GoMiner
GoMiner is a tool for biological interpretation of 'omic' data - including data from gene expression microarrays. Omic experiments often generate lists of dozens or hundreds of genes that differ in expression between samples, raising the question

What does it all mean biologically?

To answer this question, GoMiner leverages the Gene Ontology (GO) to identify the biological processes, functions and components represented in these lists. Instead of analyzing microarray results with a gene-by-gene approach, GoMiner classifies the genes into biologically coherent categories and assesses these categories. The insights gained through GoMiner can generate hypotheses to guide additional research.


FuncAssociate - The Gene Set Functionator
This program takes a list of genes as input and produces a ranked list of the Gene Ontology attributes that the input list is enriched (or depleted) for.


EASE: the Expression Analysis Systematic Explorer
EASE is a customizable, standalone software application that facilitates the biological interpretation of gene lists derived from the results of microarray, proteomic, and SAGE experiments. EASE provides statistical methods for discovering enriched biological themes within gene lists, generates gene annotation tables, and enables automated linking to online analysis tools.


Gene Network Inference from Large-Scale Gene Expression Data
With the advent of the "Age of Genomics" an entirely new class of data is emerging. Can we really expect to construct a detailed biochemical model of, say, an entire yeast cell with some 6000 genes (only about 1000 of which were defined before sequencing started, and about 50% of which are clearly related to other known genes), by analyzing each gene and determining all the binding and reaction constants one by one? Likewise, from the perspective of drug target identification for human disease, we cannot realistically hope to characterize all the relevant molecular interactions one-by-one as a requirement for building a predictive disease model.

There is a need for methods that can handle this data in a global fashion, and that can analyze such large systems at some intermediate level, without going all the way down to the exact biochemical reactions. At the very least, such an analysis could help guide the traditional pharmacological and biochemical approaches towards those genes most worthy of attention among the thousands of newly discovered genes. Ideally, a sufficiently predictive and explanatory model at an intermediate level could obviate the need for an exact understanding of the system at the biochemical level.


Knowledge-based Analysis of Microarray Gene Expression Data Using Support Vector Machines
We introduce a new method of functionally classifying genes using gene expression data from DNA microarray hybridization experiments. The method is based on the theory of support vector machines. SVMs are considered a supervised computer learning method because they exploit prior knowledge of gene function to identify unknown genes of similar function from expression data. SVMs avoid several problems associated with unsupervised clustering methods such as hierarchical clustering methods and self organizing maps. SVMs have many mathematical features that make them attractive for gene expression analysis, including their flexibility in choosing a similarity function, sparseness of solution when dealing with large data sets, the ability to handle large feature spaces, and the ability to identify outliers. We test several SVMs that use different similarity metrics, as well as some other supervised learning methods, and find that the SVMs best identify sets of genes with a common function using expression data. Finally, we use SVMs to predict functional roles for uncharacterized yeast ORFs based on their expression data.


Data Mining: Making Sense of Gene Expression Data


PubGeneTM Gene Database and Tools
PubGene is a collection of tools, including modules for supervised analysis of gene expression data based on the information contained in MEDLINE

The complete MEDLINE titles and abstracts are indexed for the occurrence of all instances of human gene names and literature aliases, and mapped to their correct primary gene symbols, as defined by the HUGO nomenclature committee. All pairs of primary gene symbols are indexed to create a network of genes based on their literature associations. This approach opens the path for a whole new range of tools for the biologist, and for gene expression analysis in particular. The web site is a work in progress, and content may increase with time.

Among the more prominent features at present are:


DN - digiNorthern - digital expression analysis based on ESTs
DigiNorthern is a tool for virtually displaying the expression profile of query genes (currently only accept DNA sequence as input) based on the EST sequences currently available at NCBI GenBank. There are currently two versions for this program. DN1 takes one sequence as query gene and lists all the cell lines/tissues/organs that express the gene and displays the relative expression levels of the gene in these cell lines/tissues/organs based on the number of matched ESTs vs the total number of ESTs for related libraries. Whereever available, comparison will also be made between the same tissue/organ in normal and neoplasis status. DN2 takes two sequences as query genes and compares their expression profiles side by side. digiNorthern is currently available for Human and mouse. Options for other species may become available in future depending on user's request and EST data availability.

Important disclaimer: Due to the nature of EST data, the expression profile provided by digiNorthern may not be accurate and therefore, should be used only for purpose of preliminary analysis. We strongly suggest you to verify the data by experimental methods.


MatchMiner - navigation among gene and gene product identifiers
MatchMiner is a freely available program package for batch navigation among gene and gene product identifier types commonly encountered in microarray studies and other forms of 'omic' research. The user inputs a list of gene identifiers and then uses the Merge function to find the overlap with a second list of identifiers of either the same or a different type or uses the LookUp function to find corresponding identifiers.


MicroArray and Gene Expression Markup Language - MAGE-ML
This is the homepage for the MAGE group. The group aims to provide a standard for the representation of microarray expression data that would facilitate the exchange of microarray information between different data systems.


Minimum information about a microarray experiment - MIAME
MIAME describes the Minimum Information About a Microarray Experiment that is needed to enable the interpretation of the results of the experiment unambiguously and potentially to reproduce the experiment.

MIAME checklist is a condensed description of MIAME principles, that is designed to help authors, reviewers and editors of scientific journals to meet MIAME requirements and to make microarray data available to the community in a useful way. MIAME is neither a dogma, nor a legal document - it assumes a cooperative data provider and a fair reviewer.


Gene Chips (DNA Microarrays)
Here are the basics on DNA microarray technology and a list of academic and industrial links related to this exciting new technology.


DNA Microarrays
Here are some interesting web sites and papers which will introduce you to one of the more exciting technological beakthroughs in genomics -- DNA microarrays. This site is far from a comprehensive listing, but it may be a good place to start.


Large-Scale Gene Expression and Microarray Links and Resources
This site is a collection of web resources and pointers to information on large-scale gene expression studies, and especially microarray technologies. The links included are biased towards the development and applications of informatics and bioinformatics for these technologies.


MicroArray related activities at the EBI
These pages are a jumping point to activities at the European Bioinformatics Institute that focus on microarrays, the gene expression data that results from them, the issues revolving around the immense influx of this data, and of course the analysis that one would like to perform on it.


Listing of DNA microarray links


Vivian Cheung's group
The main focus of our lab is the development of Direct Identical-by-Descent (IBD) Mapping. Direct IBD Mapping is a DNA microarray-based mapping technique that allows isolation and mapping of DNA fragments shared IBD between individuals. It unites two methods, genomic mismatch scanning (GMS) and DNA microarray technology. GMS allows physical isolation of IBD DNA fragments between two individuals. If two individuals have inherited the disease gene(s) from a common ancestor then the gene(s) should reside within the IBD regions shared between them. Once isolated the genomic location of the DNA fragments can be mapped by hybridization onto a DNA microarray that contains mapped clones arranged in physical map order. By comparing the IBD maps of sets of affected individuals, we can narrow the candidate gene region(s).


Microarrays.Org
Welcome to microarrays.org, a new public source for microarraying information, tools, and protocols.


GRID IT: Resources for Microarray Technology


DNA Microarrays


Microarray & Data Analysis
This is a collection of papers with emphasis on analysis of microarray (a.k.a. DNA chip) data. General reviews on microarry are occasionally included. Analysis of gene network is also occasionally included. However, papers on microarray technology itself are not usually in. Papers emphasizing results of such analysis are also not in.


Gene Chip Microarray Protocol Websites
Lists of protocols.


NIEHS Microarray Center
This is a description of the technology as it exists at the NIEHS, how microarray technology can be applied to studies in environmental health, some sample data, as well as proposal and sample submission forms for interested investigators.


top of section   top of page
 

Nucleic Acid General Sequence Analysis Analysis
Nucleic Acid General Sequence Analysis

These are a collection of nucleic acid sequence analysis utilities.

[info] BIOLOGY WORKBENCH
[info] BCM Search Launcher
[info] Cutter - restriction mapping tool
[info] Multi-Cut
[info] DNA Mutation Checker
[info] bend.it Server
[info] AG BIODV - PromoterInspector, MatInspector, FastM
[info] Translation Utility
[info] Mutability - check sequences for potential Nonsense/Missense/Neutral mutations
[info] EnzFinder - A restriction engine search engine.
[info] MELTING: enthalpie, entropy and melting temperature
[info] A280/A260 calculator
[info] LabOnWeb
[info] CloneIt
[info] EMBOSS CpGPlot/CpGReport/Isochore
[info] EMBOSS Pairwise Alignment Algorithms
[info] EMBOSS Transeq
[info] Readseq - biosequence conversion tool


Detailed information on the above options


BIOLOGY WORKBENCH
The Biology Workbench is a point and click WWW interface for an integrated set of programs and database searching tools that allow you to carry out sequence analysis without having to log into a remote computer site.

The service is free to non-commercial researchers (you just need to register). There is a comprehensive demonstration to help you get started.


BCM Search Launcher
The BCM Search Launcher provides

Other BCM Search and Analysis Services:

Cutter - restriction mapping tool
Webcutter is an on-line tool for restriction mapping nucleotide sequences. It features:


Multi-Cut
Multi-cut is a database of restriction endonuclease buffers. It finds compatible buffers for a list of enzymes that you want to use in a multiple restriction endonuclease digest. Multi-Cut searches through activity data from the catalogs of several major restriction endonuclease manufacturers and finds buffers that will work with all of the endonucleases in the reaction.


DNA Mutation Checker
DNA Mutation Checker program has been created to help researchers and database curators to verify the transription and translation effects of DNA level sequence variation.

The user is expected to give a valid reference sequence accession number, select the numbering system, and type in the start position of the change, followed by reference and variant nucleotide sequences.

The output gives details of the mutation and its effect in EMBL-like text format.


bend.it Server
This server predicts bendability and propensity to curvature from DNA sequences. This is an experimental, pre-release version.

The calculation is based on the observation that bendability is asymmetrically distributed in DNA segments that are intrinsically curved.

The server accepts DNA sequences of a maximum of 5000 nucleotides in length, given in one letter-code (A, C, G, T and white spaces are accepted). Two bendability scales are available, the one based on DNase I digestion is the default, it reflects DNA's bendability towards the major groove. A second "consensus scale" is given as an option, it is suitable for detecting intrinsically curved GC elements that are not detected by the DNase I-based scale.

The results of the calculation are bendability, curvature propensity and G+C content values listed along the sequence. The values are calculated in a sliding window (default is 30 residues). The curvature propensity is calculated with a constant twist angle value (default is 36 degrees, corresponding to ideal B-DNA).

The results are presented as:

endability or the G+C content are plotted along the DNA sequence her the bendability or the G+C content

AG BIODV - PromoterInspector, MatInspector, FastM
Range of utilities covering:


Translation Utility
A simple utility to translate a nucleic acid sequence into a protein.


Mutability - check sequences for potential Nonsense/Missense/Neutral mutations
Mutability will check a DNA sequence to determine how many single point mutations would result in:

It will also:

EnzFinder - A restriction engine search engine.
A restriction engine search engine. Looks on Rebase for known RE matching your query.

Find all you need to know about a RE!


MELTING: enthalpie, entropy and melting temperature
This program computes, for a nucleic acid duplex, the enthalpy, the entropy and the melting temperature of the helix-coil transitions. Three types of hybridisation are possible: DNA/DNA, DNA/RNA, and RNA/RNA. The program first computes the hybridisation enthalpy and entropy from the elementary parameters of each Crick's pair by the nearest-neighbor method. Then the melting temperature is computed.


A280/A260 calculator
This can be used to calculate the molecular weight, the extinction coefficient, the concentration, and the melting temperature of a single stranded nucleic acid.


LabOnWeb
LabOnWeb is a collection of online lab protocols and life science research tools.

InstantRACE returns a full gene sequence from an EST, RNA, or mRNA sequence. InstantRACE also provides a comprehensive gene function report, based on a multistage, multiple database search.


CloneIt
Molecular biologists often have to sub-clone plasmidic vectors: a DNA plasmid is cleaved and ligated with an exogen DNA fragment previously excised from an other plasmid. The necessary cuts are achieved by restriction enzymes which then must be carefully choosen in order to minimize the steps required to obtain the desired molecule. During the selection of those enzymes, the main difficulties encountered come from:

We developed the CloneIt program that quickly finds in-frame deletions using restriction enzymes and frameshifts (using digestion, fill-in and ligation) in a plasmid sequence, Then, as the main functions and procedures were being developed, we have extended the capacities of the program to find strategies to sub-clone a fragment from a plasmid to another vector while still controling the problems described above. This program is not an expert system, as it does not "learn" the logical steps accomplished by the biologist and it does not have to be accompanied in its search: it just runs an algorithm that explores all the possible enzymes combinations that could be used to clone the molecules. This program provides a useful aid for any molecular biologist who wants to quickly find sub-cloning, in-frame deletions, frameshifts strategies, which would otherwise be difficult to discover.

EMBOSS CpGPlot/CpGReport/Isochore
Detection of regions of genomic sequences that are rich in the CpG pattern is important because such regions are resistant to methylation and tend to be associated with genes which are frequently switched on. Regions rich in the CpG pattern are known as CpG islands. The function of the program cpgplot is to plot CpG rich areas, and cpgreport to report all CpG rich regions.

The nuclear genomes of vertebrates are mosaics of isochores, very long stretches of DNA that are homogeneous in base composition and are compositionally correlated with the coding sequences that they embed. Isochores can be partitioned in a small number of families that cover a range of GC levels. Program isochore plots GC content over a sequence.


EMBOSS Pairwise Alignment Algorithms
This tool is used to compare 2 sequences. When you want an alignment that covers the whole length of both sequences, use needle. When you are trying to find the best region of similarity between two sequences, use water.


EMBOSS Transeq
Transeq translates nucleic acid sequences to the corresponding peptide sequence. It can translate in any of the 3 forward or three reverse sense frames, or in all three forward or reverse frames, or in all six frames.


Readseq - biosequence conversion tool
Readseq is a program to convert between different nucleic or protein sequence formats.


top of section   top of page
 

Human Mutation Databases
Human Mutation Databases

The following are a collection of human mutation and SNP databases.

Major databases

[info] The SNP Consortium
[info] dbSNP - A Database of Single Nucleotide Polymorphisms
[info] HGVbase - Human Genome Variation database
[info] The Human Gene Mutation Database - HGMD (Cardiff)
[info] Frequency of Inherited Disorders Database (FIDD)
[info] Single Nucleotide Polymorphisms in the Human Genome
[info] Human SNP Database
[info] SNPview: SNPs, SSLPs, Alleles and Haplotypes
[info] ALFRED - Allele Frequency Database
[info] Mutation Database Website
[info] Universal Mutation Database
[info] Protein Mutation Database
[info] IDbases - databases for immunodeficiency-causing mutations

Specific gene locus databases

[info] The Androgen Receptor Mutations Database
[info] Antithrombin Mutation Database Homepage
[info] Asthma Gene Database
[info] Breast Cancer Mutation Data Base (BIC)
[info] BCGD - The Breast Cancer Gene Database
[info] BIOMDB - Database of mutations causing tetrahydrobiopterin deficiencies
[info] Blood Group Antigen Mutation Database
[info] BTKbase - agammaglobulinemia XLA-causing mutations
[info] The European CD40L Defect Database (CD40Lbase)
[info] Database of Human Type I and Type III Collagen Mutations
[info] Emery-Dreifuss Muscular Dystrophy Mutation Database
[info] Factor VII Mutation Database
[info] GPCRDB: Information system for G protein-coupled receptors (GPCRs)
[info] GRAP Mutant Database (GPCRs, Family A)
[info] Haemophilia B Mutation Database
[info] HAMSTeRS - Haemophilia A Mutation, Search, Test and Resource Site
[info] Human HPRT database
[info] Hypertrophic Cardiomyopathy mutation database
[info] KinMutBase: mutations in protein kinase domains
[info] LDLR Mutation Database
[info] Long QT syndrome database
[info] Marfan Database
[info] MutRes - List of Mutation Resources
[info] Neuronal Ceroid Lipofuscinoses (NCL) Mutations
[info] PAH Genes and alleles (PAHDB)
[info] Human p53 database
[info] p53 gene mutations
[info] Somatic p53 mutations in human tumors and cell lines.
[info] Database of germline p53 mutations
[info] p53link - P53 database integration
[info] PAX2
[info] PAX6 mutation database
[info] Schindler Disease
[info] Schindler Disease - NORD
[info] VHL Mutation Database
[info] VMD2 Mutation Database
[info] von Willebrand Factor (vWF) Database
[info] WS-associated WRN mutations

Detailed information on the above options


The SNP Consortium
A non-profit foundation set up by the Wellcome Trust and various pharmaceutical companies to develop up to 300,000 SNPs distributed evenly throughout the human genome and to make the information related to those SNPs publicly available. The data are also submitted to dbSNP.


dbSNP - A Database of Single Nucleotide Polymorphisms
In collaboration with the National Human Genome Research Institute, The National Center for Biotechnology Information has established the dbSNP database to serve as a central repository for both single base nucleotide subsitutions and short deletion and insertion polymorphisms. Once discovered, these polymorphisms could be used by additional laboratories, using the sequence information around the polymorphism and the specific experimental conditions. (Note that dbSNP takes the looser 'variation' definition for SNPs, so there is no requirement or assumption about minimum allele frequency.) The data in dbSNP will be integrated with other NCBI genomic data. As with all NCBI projects, the data in dbSNP will be freely available to the scientific community and made available in a variety of forms.


HGVbase - Human Genome Variation database
HGVbase (previously known as HGBASE) was initially created as a joint venture between the research team of Anthony Brookes in the Karolinska Institute (Sweden) and staff at Interactiva GmbH (Germany).

HGVbase attempts to summarize all known variations in the human genome as a non-redundant set of records. The primary purpose of HGVbase is to facilitate genotype-phenotype association analyses that explore how single nucleotide polymorphisms (SNPs) and other common sequence variations may influence phenotypes such as common disease risk and drug response differences. For this reason, all sequence variations are presented with details of how they are physically and functionally related to the closest neighbouring gene.

Variations represented in HGVbase encompass sequence changes known or suspected to exist in the human genome, including but not limited to SNPs, Indels, and simple tandem repeats. Such variations are included regardless of chromosome location, allele frequency, or affect upon phenotype. Thus, polymorphisms (rarest allele >1% frequency), rare variants (<1% allele frequencies), and new mutations are all represented in HGVbase, regardless of whether they are known or not known to be functionally neutral or pathogenic to any degree.


The Human Gene Mutation Database - HGMD (Cardiff)
This database represents an attempt to collate the majority of known (published) gene lesions responsible for human inherited disease. Originally established for the study of mutational mechanisms in human genes (Cooper and Krawczak 1993), these databases have acquired a much broader utility in that they currently represent the only available comprehensive reference source to the spectrum of mutations underlying human genetic disease. They thus provide information of practical diagnostic importance to (i) researchers in human molecular genetics, (ii) physicians interested in a particular inherited condition in a given patient or family and (iii) genetic counsellors.


Frequency of Inherited Disorders Database (FIDD)
The Frequency of Inherited Disorders Database (FIDD) has been established for use in a clinical context, in medical research, for epidemiological studies, and in the planning of genetic services.

It represents the first easily accessible repository of published data on the frequency of human inherited disorders worldwide. Data collated include the disease categorized by organ system, Online Mendelian Inheritance in Man (OMIM) number, mode of inheritance, population origin, prevalence and/or incidence rate, and a literature reference.


Single Nucleotide Polymorphisms in the Human Genome
This website is designed to provide the human genetics community with access to single nucleotide polymorphism (SNPs) that have been developed as genetic markers on the human genome. The site is organized by chromosomes and cytogenetic location. Each SNP has PCR primer and conditions associated with it.

Currently, we only post the SNPs that we have helped to develop. After we have posted all of our SNPs, we'll be adding SNPs from the literature and from collaborators, and we will be happy to have others contribute to the database.


Human SNP Database
This is the Whitehead/MIT SNP data.


SNPview: SNPs, SSLPs, Alleles and Haplotypes
The viewer allows comparing the distribution of the SNPs and SSLPs, alleles and derived haplotype blocks in different regions of the mouse genome. Several different views are available for each chromosome: a chromatogram-style view of validated SNPs, major and minor alleles, haplotypes, and combinations of the above. The data for the selected strains is displayed at the full chromosomes (or user-defined intervals), along with the interrogated SNP loci, the reference tracks of the LocusLink gene loci, publicly available SSLP markers, and the subset of SSLP markers informative for the two selected strains. Additionally, Celera SNP density is displayed for chr.16 and three strains only. The choice of the Celera or Ensembl genomic coordinates is available.


ALFRED - Allele Frequency Database
This gives gene frequency data for a diverse set of population samples and genetics systems.

It contains data on more than 40 populations representing most major regions of the world and data on more than 150 genetic systems including SNPs, STRPs and insertion-deletion polymorphisms.


Mutation Database Website
Information on nomenclature and design of mutation databases.


Universal Mutation Database
Software and databases for mutations in human genes.


Protein Mutation Database
PMD is based on literature (not on proteins); that is, each entry of the database corresponds to one article which describes protein mutations.


IDbases - databases for immunodeficiency-causing mutations
IDbases are databases for immunodeficiency-causing mutations. Their aim is to establish database for every immunodeficiency or provide links to those maintained elsewhere.

These databases contain in addition to gene mutation, also information about clinical presentation. Information has been collected from literature as well as recieved directly from researchers. They would be most glad if those analysing the mutations would send their information by using web submission available for every database.

The databases include:

ADAbase AICDAbase AIREbase AP3B1base BLMbase BLNKbase BTKbase C1QAbase C1Sbase C2base C3base C5base C6base C7base C9base CASP10base CASP8base CD3Dbase CD3Ebase CD3Gbase CD40Lbase CD79Abase CD8Abase CEBPEbase CHS1base CXCR4base CYBAbase CYBBbase DCLRE1Cbase DKC1base DNMT3Bbase ELA2base EVER1base EVER2base FCGR3Abase FUCT1base HF1base ICOSbase IFbase IFNGR1base IFNGR2base IGHMbase IGLL1base IKBKGbase IL12Bbase IL12RB1base IL2RAbase IL7Rbase IRAK4base ITGB2base JAK3base MASP2base MHC2TAbase MRE11Abase NCF1base NCF2base NFKBIAbase NPbase PFCbase PRF1base PTPRCbase RAB27Abase RAC2base RAG1base RAG2base RFX5base RFXANKbase RFXAPbase SH2D1Abase STAT1base STAT5Bbase TAP1base TAP2base TCIRG1base TNFRSF5base TNFSF6base UNC13Dbase UNGbase WHNbase ZAP70base


The Androgen Receptor Mutations Database
Constitutional mutations in the androgen receptor gene (AR ) impair androgen - dependent male sexual differentiation to various degrees . Somatic mutations in the AR have been found in metastatic prostate cancer. Severe constitutional androgen insensitivity (AI) yields an external female phenotype. Partial constitutional AI yields a range of external genital phenotypes that vary from near - normal female to normal or near - normal male, with or without gynecomastia and other relatively "mild" signs of undervirilization.


Antithrombin Mutation Database Homepage
Antithrombin is a plasma inhibitor of thrombin and other blood coagulation proteinases. Its (functional) deficiency is a strong risk factor for venous thrombosis. The gene coding for antithrombin has been localised to chromosome 1q23-25.


Asthma Gene Database
This is a database for asthma and allergy linkages and mutations.

As you can enter and change data from every part of the world they have implemented password restriction. Registration to this database is free.


Breast Cancer Mutation Data Base (BIC)
A resource for the molecular biologist investigating inherited breast cancer providing a central repository for information regarding breast cancer susceptibility genes mutations and polymorphisms.

This requires you to register as a BIC member.


BCGD - The Breast Cancer Gene Database
Contains information about genes involved in human breast cancer.


BIOMDB - Database of mutations causing tetrahydrobiopterin deficiencies
BIODEF is a locus-specific database with detailed records of disease-producing allelic variations and natural polymorphic markers.


Blood Group Antigen Mutation Database
This database will deal with mutations in loci of allelic genes that specify the common blood group antigens and the allelic variants of those common genes.


BTKbase - agammaglobulinemia XLA-causing mutations
X-linked agammaglobulinemia (XLA) is an immunodeficiency caused by mutations in the gene coding for Bruton's agammaglobulinemia tyrosine kinase (BTK).

A database (BTKbase) of BTK mutations has been compiled and the recent update lists 463 mutation entries from 406 unrelated families showing 303 unique molecular events. In addition to mutations, the database also lists variants or polymorphisms.


The European CD40L Defect Database (CD40Lbase)
X-linked Hyper-IgM syndrome-associated mutation database.


Database of Human Type I and Type III Collagen Mutations
Includes accounts of every known mutation in the genes encoding the alpha-1 and alpha-2 chains of type I collagen


Emery-Dreifuss Muscular Dystrophy Mutation Database
Brief description of mutations.


Factor VII Mutation Database
Factor VII Mutation Database, with:


GPCRDB: Information system for G protein-coupled receptors (GPCRs)
Contains information about GPCR sequences, multiple sequence alignments of GPCR families, 3D models, articles, GPCR mutation data and more.


GRAP Mutant Database (GPCRs, Family A)
A database of mutants of family A G-Protein Coupled Receptors. GRAP contains detailed description of the ligand binding and signal transductional properties.


Haemophilia B Mutation Database
A database of point mutations and short additions and deletions in the factor IX gene.


HAMSTeRS - Haemophilia A Mutation, Search, Test and Resource Site
Over the last decade there has been a dramatic increase in our understanding of the pathology of haemophilia A in molecular terms, at the levels both of nucleic acid sequence and to a much lesser extent, protein structure.


Human HPRT database
The database contains information on the mutagen, dose, spontaneous and induced mutant fraction, base position, amino acid position, amino acid change, local DNA sequence, cell type, citation, and other items. In addition, information regarding the cause and effect of mutations affecting splicing is given.


Hypertrophic Cardiomyopathy mutation database
Familial hypertrophic cardiomyopathy is a genetic disorder associated with defects in the sarcomere.


KinMutBase: mutations in protein kinase domains
The aim of KinMutBase is to collect information about disease cousing mutations apeared in kinase domains of kinases. Because of this purposes this database doesn't contain mutations in other parts of kinases but kinase domain. Moreover polymorphism are also not included except some uncertain cases.

This database will be useful if you want to know whether a certain disease coused by a mutation in the kinase domain of an enzyme or if you want to check the reported cases of a certain mutation. You will find crosslinks to other databases to make easy to find informations about genes or the articles in wich the mutations were reported


LDLR Mutation Database
Mutations in the LDL receptor gene (LDLR) cause familial hypercholesterolemia (FH), a common autosomal dominant disorder. The LDLR database is a computerized tool that has been developed to provide tools to analyse the numerous mutations that have been identified in the LDLR gene.


Long QT syndrome database
Long QT syndrome (LQTS) is a heart disease manifesting itself by a prolonged QT interval on the ECG and clinically by a propensity for tachyarrhythmias, causing syncopes and sudden cardiac death.


Marfan Database
The Marfan database is a software that contains routines for the analysis of mutations identified in the FBN1 gene that encodes fibrillin-1. Mutations in this gene are associated not only with Marfan syndrome but also with a spectrum of overlapping disorders.


MutRes - List of Mutation Reso