Filtering of repeat regions

The human genome is heavily populated with repeat regions that make designing primers difficult, a well known challenge in polymerase chain reaction (PCR) design. Ion AmpliSeq™ Designer has been developed to deliver the most robust set of amplicons it can generate. The software specifically excludes amplicons that are placed in repeat elements or other hypervariable regions to generate the best possible outcome for actual amplicon coverage when used in a reaction.

A focus of the Thermo Fisher Scientific Research & Development department is to better understand the properties of repeat regions to allow primer placement in these regions to achieve higher target design rate while maintaining coverage uniformity and on-target rates.

The biological filtering mechanism that is incorporated into the Ion AmpliSeq™ Designer pipeline to evaluate repeat elements is the RepeatMasker program. RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns). The RepeatMasker program is an annotation track that is available through the UCSC Genome Browser. Ion AmpliSeq™ Designer links directly to the browser and offers users the visual representation to distinguish between three BED files as custom annotation tracks.

  • The resulting BED file for the design that was submitted (the data appears under the "InputTargets" blue label in the UCSC browser)

  • The resulting BED file for the design that is generated by the application (the data appears under the "CoveredBases" green label in the UCSC browser)

  • The difference between these two BED files (the data appears under the "MissedBases" red label in the UCSC Genome Browser)