Nucleotide Structure: Software For Detection & Annotation

Published on: 2018/08/16

Himanshu Singh


The low-complexity genome sequences with random distribution are defined as regions of “biased composition of nucleotides”, which are mostly found in
non-coding regions.

It is also well-established now that such biased nucleotides are linked to pathogenesis of some human diseases. Therefore, it is very important to identify and annotate of such biased composition of repetitive genomic sequences in terms of understanding disease pathogenesis and it could also be utilized for a variety of applications in biology.

Interest Category

Nucleotide structure, genetic research, lifescience research

Do you want to read the full article?


The Human Genome Project & Sequences in the human genome

The Human Genome Project started in the year 1990, was an international scientific research project with the main aim of determining the sequence of nucleotide base pairs that make up human DNA. Also, identifying and mapping the genes from the genome from both functional and physical standpoint.

The key findings of the draft filed in 2001 were:

  1. Like other mammals, the human genome has approximately 22,3300 protein-coding genes
  2. There are more segmental duplications in the human genome.

The project was completed by 2003. Apart from studying the human genome, the researchers also studied the genomes of many other organisms like roundworm, brewers’ yeast, and the fruit fly. This helped them to study the similarities between the human genes and the genes of the organisms. The HGP has helped the medical society to understand the blueprint of a human being. The knowledge on the functionalities of the genes and proteins is expected to have a huge impact on medical, life science and biotechnology industries.

Nucleotide Structure

It comprises organic molecules that serve as monomer units which form DNA (Nucleic acid polymers deoxyribonucleic acid) and RNA (ribonucleic acid). These are the basic biomolecules of all forms of life.

Nucleotide structure: What is the molecular structure of a nucleotide?

Nucleotide structure consists of a nitrogenous base, a five-carbon sugar, and a phosphate group. The nucleoside is a nitrogenous base and a five-carbon sugar. There are five nucleotides namely adenine (A), cytosine (C), guanine (G), uracil (U) and thymine (T). when combined with sugar, these bases form the nucleotide adenosine, cytidine, guanosine, uridine, and thymidine.

DNA uses 4 bases adenine, guanine, cytosine and thymine. RNA uses 4 bases adenine, cytosine, guanine and uracil. When two complementary bases combine with each other, the helix of the molecules forms. Thymine bonds with Adenine in DNA (A-T) and uracil and adenine bonds in RNA (A-U). Cytosine and guanine complement each other.

Biased nucleotides

The biased composition of nucleotides is generally found in the non-coding regions. This is low-complexity genome sequences with a random distribution. These unstable repeat sequences provide a physical basis for integrating different regions and are essential parts of genomic sequences. They help in coordinating different aspects of genome functions. Biased nucleotides are linked to the pathogenesis of some human diseases.

Identifying the biased composition helps to understand the disease pathogenesis. This data can be utilized for a variety of applications in biology.

Read more – software for detection and annotation of tracks comprising defined nucleotides by Himanshu Singh.