PolyMarker is an automated bioinformatics pipeline for SNP assay development which increases the probability of generating homoeologue-specific assays for polyploid species. PolyMarker generates a multiple alignment between the target SNP sequence and the selected reference genome (from the drop off menu in green below). It then generates a mask with informative polymorphic positions between homoeologs which are highlighted with respect to the target genome.
These positions include (see figure for example):
- Varietal polymorphism: this is the SNP that is targeted in the assay (&)
- Genome specific: this is a homoeologous polymorphism which is only present in the target genome (upper case)
- Genome semi-specific: this is a homoeologous polymorphism which is found in 2 of the 3 genomes, hence it discriminates against one of the off-target genomes (lowercase)
- Homoeologous: if the target varietal SNP is also a homoeologous polymorphism between genomes (e.g. A, B and D genomes in the wheat reference Chinese Spring)
PolyMarker will generate KASP assays which are based on a three primer system. Two diagnostic primers incorporate the alternative varietal SNP at the 3' end, but are otherwise similar (black boxed primers in figure). The third common primer is preferentially selected to incorporate a genome-specific base at the 3' end (red boxed primer in figure), or a semi-specific base in the absence of an adequate genome specific position.
The code of the PolyMarker pipeline is available in github.
- The input file must be uploaded as a CSV file (can be exported from Excel) with the following columns:
- Gene id: An unique identifier for the assay. It must be unique on each run
- Target chromosome: This will depend on the Reference sequence being used. For wheat use 1A, 2D, 7B, etc... Note that for other species you can find the exact chromosome nomenclature by generating an example in the home page (press orange “Example” button once the Reference is selected).
- Sequence: The sequence flanking the SNP. The SNP must be marked in the format [A/T] for a varietal SNP with alternative bases, A or T.
- PolyMarker takes ~1 minute per marker assuming an input sequence of 200 bp (with the varietal SNP in the middle). [Longer sequences can be used, but this will slow down the initial BLAST against the wheat survey sequence. We have not seen improvement in performance with longer sequences; therefore we recommend 200-bp of input sequence. The final multiple alignment for the primer design only considers 100-bp on either side of the target varietal SNP.]
- BLAST is used to search for the contigs which align to the SNP. By default, the miniumm identity used to match across the genomes it is 90% and the model used is est2genome.
The example input file contains three markers to design.
The input text box supports to have the table separated by TAB, so you can paste the three columns from excel.
The mask contains the details of the local alignment