Blog

Genes

Gene

What is a gene? It is supposed to be a unit of hereditary information. Hence, a gene for blue eyes. Now, everyone knows genes are packed into a deoxy-nucleotide acid (DNA for short), with the full assembly of my DNA packed into 23 (that is, if you are lucky) chromosome pairs. So, naively, I would expect one of the chromosomes to have a region called eyes, a subregion called color and then a properly delineated sequence which spells brown in a sequence of DNA coding letters (A-adenine, G-guanine, T-thymine, C-cytosine). Say, something like AGGATCAGGCCT. And having my baby have blue eyes is done simply by modifying that sequence to whatever equals blue in the strange language of our innards.

Turns out, not so simple. Only 1.1 % to 2 % (depending on the source, internet being this big) of humane genome is actually protein coding DNA, the rest commonly referred to as junk DNA. The unit is then defined as a sequence required to code a single protein - ie. a gene. Gene equals protein equals gene. This being a functional definition, we are still lacking the exact code that pertains to a particular protein. OK, protein is much less than eye color, but it is a start. But still, there is no way to correlate a series of AGCT letters and a particular protein using only the definition above.

But our smart colleagues were, well, smart. They found out that * protein coding comes in triplets. That is, a sequence of three nucleotide bases (which is what A, G, C or T are called in chemical terms) forms a single letter in the DNK alphabet, called a codon. The arrangement is best seen in the translation phase of the protein creation - the three letter part of the tRNA is a bridge between the three-letter genetic code and one of the 20 standard proteinogenic amino acids. The amino acids are called standard because they are directly coded by codons, and proteinogenic because, as they link, they form peptides of plant and animal proteins. * the parts around the protein coding DNA have an impact on how the proteins are formed. Some parts are called Enhancers or silencers, which help in regulating formation of proteins, resources being limited as they are. Similarly, promoters promote, and operators activate/repress or corepress. There are 3-prime utranslated and 5-prime untranslated regions (UTR), which, for example, generate seleoncysteine, a non-standard proteinogenic amino acid in concert with a stop codon. * special codons exist that delimit the gene - a start (AUG) and no less than three stop codons (UAA, UAG, UGA). In essence, anything between a start and a stop codon is a gene. * Since genes are formed away from the stored DNA, a messenger RNA is formed, typically in length equal to a single gene, then spliced and respliced to get rid of some stuff that is only in DNA (introns) and to acquire some stuff that is not in DNA but is in RNA (terminator codes). So one gene - one RNA, as far as I could gleam, but I am possiblly wrong.

And today, we will see what an eigengene is, and what we can infere from that.

From Orly Alter's lecture

Dataset that have at least one axis in common (patients, time points) can be decomposed jointly using generalized single value decomposition (GSVD). In that way, similarities and dissimilarities between samples in the common category can be recognized. SVD generates eigenvectors, perpendicular patterns in non-common axes (categories), which can be thought as patterns, and eigen-values, which correspond to the strength of those patterns. The stronger the eigen-value for a particular subset of patients, the more relevant is the pattern.

In cancer disease such patterns lead to disease grading, outcome segregation and therapy selection. Current work performed on cancer genome databases, future endeavours aimed at bringing findings to clincs.

links

social