BIOINFORMATICS SOFTWARE TOOLS

A

Accession number: An identifier supplied by the curators of the major biological databases upon submission of a novel entry that uniquely identifies that sequence (or other) entry. 

Active site:The amino acid residues at the catalytic site of an enzyme. These residues provide the binding and activation energy needed to place the substrate into its transition state and bridge the energy barrier of the reaction undergoing catalysis 

Adenine: A purine base found in DNA and RNA 

Agents: Independent, autonomous, software modules that can search the Internet for data or content pertinent to a particular application, such as a gene, protein, or biological system. 

 

 

 

 

 

 

 

 

Algorithm: A series of steps defining a procedure or formula for solving a problem, that can be coded into a programming language and executed. Bioinformatics algorithms typically are used to process, store, analyze, visualize and make predictions from biological data. 

Alignment: The result of a comparison of two or more gene or protein sequences in order to determine their degree of base or amino acid similarity. Sequence alignments are used to determine the similarity, homology, function or other degree of relatedness between two or more genes or gene products. 

Allele: A given form of a gene that occupies a specific position or locus on a chromosome. Variant forms of genes occurring at the same locus are said to be alleles of one another. 

Alternative splicing: One of the alternate combinations of a folded protein that are possible due to by recombination of multiple gene segments during mRNA splicing that occurs in higher organisms. 

Alternative splice-form: One of the possible alternate combinations of exons into a folded protein that are possible by recombining multiple gene segments during mRNA splicing in higher organisms. 

Alu family: A common set of dispersed DNA sequences found throughout the human genome; each is about 300 bases long and they are repeated at least 500,000 times. Alu sequences are speculated to have originated from viral RNA sequences that integrated into human DNA thousands of years ago. 

Amino acid: One of the 20 chemical building blocks that are joined by amide (peptide) linkages to form a polypeptide chain of a protein 

Analogy: Reasoning by which the function of a novel gene or protein sequence may be deduced from comparisons with other gene or protein sequences of known function.  Identifying analogous or homologous genes via similarity searching and alignment is one of the chief uses of Bioinformatics. (See also alignment, similarity search.) 

Annotation: A combination of comments, notations, references, and citations, either in free format or utilizing a controlled vocabulary, that together describe all the experimental and inferred information about a gene or protein.  Annotations can also be applied to the description of other biological systems.  Batch, automated annotation of bulk biological sequence is one of the key uses of Bioinformatics tools. 

Anticodon: The triplet of contiguous bases on tRNA that binds to the codon sequence of nucleotides on mRNA. Example: The codon for Glycine is GGG. The anticodon for Glycine is CCC. 

Antigen: Any foreign molecule that stimulates an immune response in a vertebrate organism. Many antigens are proteins such as the surface proteins of foreign organisms. 

Antisense: DNA or RNA composed of the complementary sequence to the target DNA/RNA. Also used to describe a therapeutic strategy that uses antisense DNA or RNA sequences to target specific gene DNA sequences or mRNA implicated in disease, in order to bind and physically inhibit their expression by physically blocking them. 

Assembly: Compilation of overlapping sequences from one or more related genes that have been clustered together based on their degree of sequence identity or similarity. Sequence assembly may be used to piece together "shotgun" sequencing fragments (see shotgun sequencing) based upon overlapping restriction enzyme digests, or may be used to identify and index novel genes from "single-pass" cDNA sequencing efforts. 

 

B

 

Bacterial artificial chromosome (BAC): Cloning vector that can incorporate large fragments of DNA. (see YACS) 

Bacteriophage: A virus that infects bacteria. The bacteriophage DNA has served as a basis for cloning vectors, and is also utilized to create phage libraries containing human or other genes. 

Baculovirus: An insect virus which forms the basis of a protein expression system 

Base pair: A pair of nitrogenous bases (a purine and a pyrimidine), held together by hydrogen bonds, that form the core of DNA and RNA i.e the A:T, G:C and A:U interactions. 

Beta sheet: A three dimensional arrangement taken up by polypeptide chains that consists of alternating strands linked by hydrogen bonds. The alternating strands together form a sheet that is frequently twisted. One of the secondary structural elements characteristic of proteins. 

Bioinformatics: 
1.The field of endeavor that relates to the collection, organization and analysis of large amounts of biological data using networks of computers and databases (usually with reference to the genome project and DNA sequence information). 
2. Bioinformatics, sometimes, is used interchangeably with the term
Computational Biology. Precisely, Computational Biology is defined as the systematic development and application of computing systems and computational solution techniques to models of biological phenomena; Bioinformatics is defined as the systematic development and application of computing systems and
computational solution techniques analyzing data obtained by experiments, modeling, database search, and instrumentation regarding biological aspect. 

Bivalent: Having two binding sites; having 2 free electrons available for binding. 

Blunt-end (ligation): The joining of DNA fragments that contain no overhang at either end and consequently no DNA bases available for hybridization (cf. sticky-end ligation).

 

BLAST
Basic Local Alignment Search Tool. (Altschul et al.) A sequence comparison algorithm optimized for speed used to search sequence databases for optimal local alignments to a query. The initial search is done for a word of length "W" that scores at least "T" when compared to the query using a substitution matrix. Word hits are then extended in either direction in an attempt to generate an alignment with a score exceeding the threshold of "S". The "T" parameter dictates the speed and sensitivity of the search. For additional details, see one of the BLAST tutorials (Query or BLAST) or the narrative guide to BLAST.

 

C

Carboxyl group: The -COOH functional group, acidic in nature, found in all amino acids 

cDNA (complementary DNA): A DNA strand copied from mRNA using reverse transcriptase. A cDNA library represents all of the expressed DNA in a cell. 

cDNA library: A set of DNA fragments prepared from the total mRNA obtained from a selected cell, tissue or organism. 

Chimeric clone: A cloning artifact created by a foreign gene being inserted into a vector in an incorrect orientation resulting in the expression of a protein consisting of a fusion of two different gene products. 

Chromat: Data file output from most popular DNA sequencers. Chromat files consist of the fluorescent traces generated by the sequencer for each of the four chemical bases, A, C, G, and T, together with the sequence and measures of the error in the traces at each sequence position. 

Chromatin: The chromosome as it appears in its condensed state, composed of DNA and associated proteins (mainly histones). 

Chromosome: The structure in the cell nucleus that contains all of the cellular DNA together with a number of proteins that compact and package the DNA. 

Clone: A population of genetically identical cells or DNA molecules. 

Cloning: The formation of clones or exact genetic replicas. 

Cluster: The grouping of similar objects in a multidimensional space.  Clustering is used for constructing new features which are abstractions of the existing features of those objects. The quality of the clustering depends crucially on the distance metric in the space. In bioinformatics, clustering is performed on sequences, high-throughput expression and other experimental data. Clusters of partial or complete gene sequences can be used to identify the complete (contiguous) sequence and to better identify its function. Clustering expression data enables the researcher to discern patterns of co-regulation in groups of genes. 

Coding regions (CDS): The portion of a genomic sequence bounded by start and stop codons that identifies the sequence of the protein being coded for by a particular gene. 

Codon: A sequence of three adjacent nucleotides that designates a specific amino acid or start/stop site for transcription. 

Combinatorial chemistry: The use of chemical methods to generate all possible combinations of chemicals starting with a subset of compounds. The building blocks may be peptides, nucleic acids or small molecules. The libraries of compounds formed by this methodology are used to probe for new pharmaceutical reagents (see high-throughput screening). 

Complementary determining region (CDR): The hypervariable regions of an antibody molecule, consisting of three loops from the heavy chain and three from the light chain, that together form the antigen-binding site. 

Complexity (of gene sequence): The term "low complexity sequence" may be thought of as synonymous with regions of locally biased amino acid composition. In these regions, the sequence composition deviates from the random model that underlies the calculation of the statistical significance (P-value) of an alignment.  Such alignments among low complexity sequences are statistically but not biologically significant, i.e., one cannot infer homology . 
 

Conformation: The precise three-dimensional arrangement of atoms and bonds in a molecule describing its geometry and hence its molecular function. 

Consensus sequence: A single sequence delineated  from an alignment of multiple constituent sequences that represents a "best fit" for all those sequences. A "voting" or other selection procedure is used to determine which residue (nucleotide or amino acid) is placed at a given position in the event that not all of the constituent sequences have the identical residue at that position. 

Constitutive synthesis (expression): Synthesis of mRNA and protein at an unchanging or constant rate regardless of a cell’s requirements (see housekeeping genes). 

Contig: A length of contiguous sequence assembled from partial, overlapping sequences, generated from a "shotgun" sequencing project.  Contigs are typically created computationally, by comparing the overlapping ends of several sequencing reads generated by restriction enzyme digestion of a segment of genomic DNA.  The creation of contigs in the presence of sequencing errors, ambiguities and the presence of repeats is one of the most computationally challenging aspects of the role of Bioinformatics in genome analysis. 

Convergence

The end-point of any algorithm that uses iteration or recursion to guide a series of data processing steps. An algorithm is usually said to have reached convergence when the difference between the computed and observed steps falls below a pre-defined threshold. 

Cosmids

DNA vectors that allow the insertion of long fragments of DNA (up to 50 kbases). 

Crystal structure

Term used to describe the high resolution molecular structure derived by x- ray crytallographic analysis of protein or other biomolecular crystals. 

Cytoplasm

The medium of the cell between the nucleus and the cell membrane. 

Cytosine

A pyrimidine base found in DNA and RNA. 

Glossary of Bioinformatics       Terms

 

       A— C      D— F

       G— J       K—N

       O— P      Q —S

               T—Z

                 Bioinformatics Glossary