18.01.14 Pseudogenes Flashcards Preview

DNA and chromosome structure > 18.01.14 Pseudogenes > Flashcards

Flashcards in 18.01.14 Pseudogenes Deck (28)

Give a definition of a pseudogene.

a DNA sequence that shows a high degree of sequence homology to a functional gene.


What is the typical structure of a pseudogene?

Pseudogenes usually contain multiple exons of a functional gene and harbour inactivating mutations which abolish gene function e.g. truncating mutations and, as such, were generally considered to be non-functional ‘junk DNA’ – however, evidence suggests multiple roles for pseudogenes at the DNA, RNA and protein level (see later).


How common are pseudogenes?

Pseudogenes are common! Human genome predicted to contain 10,000-20,000 pseudogenes (vast majority are processed pseudogenes) and the distribution of pseudogenes per gene is highly uneven – only 10% of human genes have a pseudogene counterpart.

Highly-transcribed genes are more likely to produce processed pseudogenes (the small number of genes encoding ribosomal proteins account for ~20% of all processed pseudogenes).


Why are pseudogenes so common?

1. Gene duplication events may be evolutionarily advantageous if they produce new functional gene variants with some kind of selective advantage – pseudogenes may be unsuccessful by-products of this duplication mechanism.

2. Pseudogenes are functional and evolutionarily-conserved!


What are the different classes of pseudogenes? Give examples where possible.

Non-processed (duplicated) pseudogenes e.g. SMN1/SMN2, CYP2A1A2/CYP21A1P

Processed (retrotransposed) pseudogenes - UTP14a, UTP14b

Unitary (solitary) pseudogenes

RNA pseudogenes - U6 snRNA has ~800 related pseudogeens

Mitochondrial pseudogenes


Give a description a non-processed (duplicated) pseudogenes.

1. A defective gene arising from a copy of genomic DNA sequence (may contain sequence corresponding to promoter, exons, introns).

2. Arise via tandem duplication (unequal crossover between homologous chromosomes or unequal sister chromatid exchange within the same chromosome – see S&R, 4th Ed., pg.409).

3. Often located close to functional gene counterparts – e.g. SMN1/SMN2, CYP21A2/CYP21A1P, PMS2 pseudogenes, α-globin/β-globin pseudogenes and Class I HLA gene clusters; however, some may be dispersed due to recombination (notably at unstable pericentromeric and subtelomeric regions) – e.g. NF1 pseudogenes.


How do processed (retrotransposed) pseudogenes arise?

A defective gene arising from copying at the cDNA level (contains exonic sequences only).

Arise via retrotransposition – cellular reverse transcriptases (encoded by LINE1 elements) use processed gene transcripts e.g. mRNA to produce cDNA that can then integrate into the genome at a new location – see S&R, 4th Ed, pg. 274 for overview.


What is the structure of processed pseudogenes?

Processed pseudogenes contain sequence corresponding to the exons of the functional gene and lack introns, promoters and regulatory elements and are therefore usually not expressed.

However, there are examples of expressed processed genes that retain some function, known as retrogenes.


What are retrogenes?

Retrogenes are created by chance, when the cDNA copy may be integrated adjacent to a promoter that can drive expression (selective pressure may ensure that the processed gene copy continues to make a functional gene product).

Parental gene and retrogene are often expressed in different ways and often in different cell types.

Retrogenes are often autosomal homologs of X-linked genes thought to be due to their requirement for expression during male meiosis when both X and Y are silenced.

E.g. UTP14a and UTP14c


What makes unitary pseudogenes different from other pseudogenes?

Very rare!

Different to other pseudogenes in that the gene was not duplicated before becoming inactivated (they do not have a genomic counterpart that encodes a fully-functional protein; instead, they arise through spontaneous mutations in protein-coding genes).


Describe mitochondrial pseudogenes.

A defective copy of a mitochondrial gene (but found in the nuclear genome).

Mitochondrial pseudogenes are more abundant than the actual mitochondrial genome!


What is the function of pseudogenes at the DNA level?

1. Cause mutation
2. Gene conversion
3. Parental gene inactivation
4. Sequence diversity
5. Acquisition of exons (exonisation).


What is the function of pseudogenes at the RNA level?

Sense RNA - microRNA, RNA binding protein, Translation machinery

Antisense RNA - fine-tube corresponding sense transcript, unrelated function to parental gene, acting as a microRNA precursor


What is the function of pseudogenes at the protein level?

1. Encode fully functional protein - same/different activity to the functional protein

2. Partially functional protein

3. Antigenic peptide


Give clinically relevant examples of a pseudogene acting at the DNA level.

Pseudogenes share a high sequence similarity to their parental genes and can therefore elicit function via gene conversion/recombination with their parental homolog.
In some cases, upstream regulatory sequences may affect transcription of the parental gene.

They can interact with other gene loci, leading to alteration in their sequences and/or transcriptional activities.
Pseudogenes may subsequently acquire additional exons. Please see picture below for details.

Clinically-relevant examples

– Gene conversion between PMS2 and its pseudogene leading to inactivation of PMS2.

– Homologous recombination between BRCA1 and its pseudogene leading to a deletion of the BRCA1 promoter.


Give clinically relevant examples of a pseudogene acting at the RNA level.

Transcribed pseudogenes play diverse roles in post-transcriptional regulation via a number of mechanisms.

– PTENP1 (a pseudogene of PTEN) acts as a ‘miRNA decoy’ by binding to miRNA and allowing PTEN to escape miRNA-mediated silencing.

PTENP1 has been reported to be deleted in several different cancer types including breast, colon and melanoma; this is thought to be a key carcinogenic mechanism as reduced PTENP1 expression leads to increased miRNA-mediated silencing of PTEN expression and loss-of-function of the tumour-suppressor. This is particularly crucial with PTEN due to the strict dose-dependency between PTEN abundance and cancer susceptibility; however, similar relationships have been described for other genes e.g. KRAS and its pseudogene, KRASP1.

PTEN is a tumour suppressor gene and deleterious mutations to PTEN can lead to Cowden syndrome and increased incidence of cancer.

In gastric tumours LOH of PTEN has been identified but no disruption of the remaining allele suggesting haploinsufficiency.

Whilst inactivation through methylation is possible studies that have examined this have shown methylation of PTENP1 but not PTEN. Loss of PTENP1 causes a loss of PTEN expression as it is no longer present to act as a miRNA decoy, therefore the expression of this pseudogene could be important diagnostically (Emadi-Baygi et al 2017).


Give clinically relevant examples of a pseudogene acting at the protein level.

A pseudogene can be translated into a truncated or mutated protein with novel functionality. It can also retain activity but function in different environment.

Clinically-relevant examples

– Ha-RAS2, a pseudogene of HRAS, encodes a constitutively-active form of HRAS due to mutation of Gly12 and Glu63.

– ψBRAF, a pseudogene of BRAF, can sometimes be selectively-expressed in tumours in the absence of BRAF expression, encoding a truncated protein product which can activate the ERK signalling pathway.


Give two examples of disease associated with non-processed pseudogenes.

1. SMA
2. CAH


What is CAH?

CAH is a family of autosomal recessive disorders involving impaired synthesis of cortisol from cholesterol by the adrenal cortex (resulting in overproduction and accumulation of cortisol precursors, excessive production of adrenal androgens from early fetal life, resulting in virilisation).


What is the most common cause of CAH?

The commonest cause is 21-hydroxylase deficiency (21-OHD).

The functional gene is CYP21A2 (6p21.33), encoding steroid 21-hydroxylase


Which pseudogene is associated with CYP21A2?

CYP21A2 (6p21.33) is located ~30kb from a non-functional pseudogene, CYP21A1P, within the human leukocyte antigen (HLA) gene cluster. CYP21A2P is inactive due to the presence of multiple deleterious mutations.


How does the presence of a pseudogene result in CAH?

CYP21A2 and CYP21A1P share significant sequence homology (98% between exons and 95% between introns) and occur in a region of other repeated (duplicated) genes arranged in tandem, which facilitates recombination events between repeated sequences.

Such recombination events are a major cause of CYP21A2 mutations (deletions/duplications) that result in 21-OHD CAH (~20%).

The high degree of sequence similarity between CYP21A2 and CYP21A1P also facilitates gene conversion events whereby a section of CYP21A2 may be converted into the defective CYP21A1P sequence containing disease-causing mutations (~75%).


Give an example of a clinically-relevant pseudogene.

UTP14c is a retrotransposed copy of the ubiquitously-expressed UTP14a gene (UTP14a is on X chromosome; UTP14c is found within a widely-expressed putative glycosyl transferase-containing gene, GT8, on chromosome 13).

A premature truncating mutation, p.Tyr738*, has been reported in the UTP14c gene in three unrelated males with infertility (not identified in 208 fertile male controls) and is thought to lead to early spermatogenic arrest. UTP14c shows a gonad-specific pattern of expression, suggesting that the heterozygous mutation either acts through haploinsufficiency or a dominant negative effect in the testis (Rohozinski et al., 2006).


What is the association between pseudogenes and cancer?

Pseudogenes have been shown to play multifaced roles in cancer pathogenesis.

Recent publications have highlighted the potential for using pseudogene-based expression profiling for tumour subtyping. Recent data suggests this strategy may have powerful prognostic power in some cancer types.

Poliseno et al 2015 consider pseudogenes (and more generally long non-coding RNAs) important in the classification of cancer subtypes, moving towards personalised medicine in the future.


Why are pseudogenes an issue when performing mutation scarrning by sequencing-based methods?

Co-amplification of parental gene and pseudogene could lead to false results in sanger sequencing method. Therefore it is vital to ensure amplification of the parental gene only


Give clinically relevant examples of disorders in which pseudogenes may be an issue.

1. PMS2 - Lynch syndrome

2. NF1

3. PKD1: PKD1 shares 97.7% similarity with 6 pseudogenes and so diagnostic testing for this disease is particularly problematic.

4. VWD (von Willebrand disease)


Which strategies can be used to overcome issues of sequencing of genes with pseudogenes?

Long-range PCR to specifically amplify the parental gene, followed by nested PCR to amplify individual exons (Clendenning et al., 2006).

Primers designed specific to the parental gene only (de Vos et al., 2004).

cDNA-based methods – however, truncating mutations may be missed due to nonsense-mediated decay; also, need to consider tissue-specific expression patterns, transcript levels and alternative splicing.

Whole genome sequencing can overcome technical difficulties apparent in other NGS amplification or capture based methods. WGS is also better at detecting other types of variants such as CNVs

New long read sequencing methods may be particularly useful for genes such as this which have been problematic diagnostically. There is a strong clinical utility in cases of ADPKD in obtaining a genetic diagnosis particularly in younger presymptomatic patients with a strong family history of the disease


What considerations should be made with regard to pseudogenes when designing a NGS assay?

Pseudogenes are also an important consideration in next generation sequencing pipelines – how to enrich for parental genes only and not highly-similar pseudogenes? Difficult to distinguish between parental gene sequence and pseudogene sequence with shorter read lengths (?easier with increasing read length). Transcribed pseudogenes will also be detected by RNA-seq-based methods.

When devising new gene panels reference to pseudogene databases such as dreambase should be considered

As NGS is expanded to include greater panels of genes so new pseudogenes are identified that can be problematic for diagnostic testing. SMAD4 has been associated with juvenile polyposis syndrome, however it has a pseudogene that has been determined to be present in ~1 in 400 individuals. Although this is a processed pseudogene little is known about whether it is transcribed or if in these individuals it disrupts the function of the gene into which it has been inserted. The prevalence of such rare events are likely to be problematic for diagnostic testing of expanded gene panels