Gene
What is a gene? It is supposed to be a unit of hereditary information.
Hence, a gene for blue eyes. Now, everyone knows genes are packed into
a deoxy-nucleotide acid (DNA for short), with the full assembly of my DNA
packed into 23 (that is, if you are lucky) chromosome pairs. So, naively,
I would expect one of the chromosomes to have a region called eyes, a
subregion called color and then a properly delineated sequence which
spells brown in a sequence of DNA coding letters (A-adenine, G-guanine,
T-thymine, C-cytosine). Say, something like AGGATCAGGCCT. And having my baby
have blue eyes is done simply by modifying that sequence to whatever
equals blue in the strange language of our innards.
Turns out, not so simple. Only 1.1 % to 2 % (depending on the source,
internet being this big) of humane genome is actually protein coding DNA,
the rest commonly referred to as junk DNA. The unit is then defined as
a sequence required to code a single protein - ie. a gene. Gene equals protein
equals gene. This being a functional definition, we are still lacking the
exact code that pertains to a particular protein. OK, protein is much
less than eye color, but it is a start. But still, there is no way to
correlate a series of AGCT letters and a particular protein using
only the definition above.
But our smart colleagues were, well, smart. They found out that
* protein coding comes in triplets. That is, a sequence of three
nucleotide bases (which is what A, G, C or T are called in chemical
terms) forms a single letter in the DNK alphabet, called a codon. The arrangement is
best seen in the translation phase of the protein creation - the three
letter part of the tRNA is a bridge between the three-letter genetic
code and one of the 20 standard proteinogenic amino acids. The amino
acids are called standard because they are directly coded by codons, and
proteinogenic because, as they link, they form peptides of plant and
animal proteins.
* the parts around the protein coding DNA have an impact on how
the proteins are formed. Some parts are called Enhancers or silencers,
which help in regulating formation of proteins, resources being limited as they
are. Similarly, promoters promote, and operators activate/repress or corepress.
There are 3-prime utranslated and 5-prime untranslated regions (UTR), which, for
example, generate seleoncysteine, a non-standard proteinogenic amino acid
in concert with a stop codon.
* special codons exist that delimit the gene - a start (AUG) and no less than three
stop codons (UAA, UAG, UGA). In essence, anything between a start and a stop codon is a gene.
* Since genes are formed away from the stored DNA, a messenger RNA is formed, typically in
length equal to a single gene, then spliced and respliced to get rid of some stuff that
is only in DNA (introns) and to acquire some stuff that is not in DNA but is in RNA
(terminator codes). So one gene - one RNA, as far as I could gleam, but I am possiblly wrong.
And today, we will see what an eigengene is, and what we can infere from that.
From Orly Alter's lecture
Dataset that have at least one axis in common (patients, time points) can be
decomposed jointly using generalized single value decomposition (GSVD). In
that way, similarities and dissimilarities between samples in the common
category can be recognized. SVD generates eigenvectors, perpendicular patterns
in non-common axes (categories), which can be thought as patterns, and eigen-values,
which correspond to the strength of those patterns. The stronger the eigen-value
for a particular subset of patients, the more relevant is the pattern.
In cancer disease such patterns lead to disease grading, outcome segregation and
therapy selection. Current work performed on cancer genome databases, future endeavours
aimed at bringing findings to clincs.