About PolyMarker
PolyMarker is an automated bioinformatics pipeline for SNP assay development which increases the probability of generating homoeologue-specific assays for polyploid species. PolyMarker generates a multiple alignment between the target SNP sequence and the selected reference genome (from the drop off menu in green below). It then generates a mask with informative polymorphic positions between homoeologs which are highlighted with respect to the target genome.
These positions include (see figure for example):
- Varietal polymorphism: this is the SNP that is targeted in the assay (&)
- Genome specific: this is a homoeologous polymorphism which is only present in the target genome (upper case)
- Genome semi-specific: this is a homoeologous polymorphism which is found in 2 of the 3 genomes, hence it discriminates against one of the off-target genomes (lowercase)
- Homoeologous: if the target varietal SNP is also a homoeologous polymorphism between genomes (e.g. A, B and D genomes in the wheat reference Chinese Spring)
PolyMarker will generate KASP assays which are based on a three primer system. Two diagnostic primers incorporate the alternative varietal SNP at the 3' end, but are otherwise similar (black boxed primers in figure). The third common primer is preferentially selected to incorporate a genome-specific base at the 3' end (red boxed primer in figure), or a semi-specific base in the absence of an adequate genome specific position.
The code of the PolyMarker pipeline is available in github.
Using PolyMarker
- The input file must be uploaded as a CSV file (can be exported from Excel) with the following columns:
- Gene id: An unique identifier for the assay. It must be unique on each run
- Target chromosome: This will depend on the Reference sequence being used. For wheat use 1A, 2D, 7B, etc... Note that for other species you can find the exact chromosome nomenclature by generating an example in the home page (press orange “Example” button once the Reference is selected).
- Sequence: The sequence flanking the SNP. The SNP must be marked in the format [A/T] for a varietal SNP with alternative bases, A or T.
- PolyMarker takes ~1 minute per marker assuming an input sequence of 200 bp (with the varietal SNP in the middle). [Longer sequences can be used, but this will slow down the initial BLAST against the wheat survey sequence. We have not seen improvement in performance with longer sequences; therefore we recommend 200-bp of input sequence. The final multiple alignment for the primer design only considers 100-bp on either side of the target varietal SNP.]
- BLAST is used to search for the contigs which align to the SNP. By default, the miniumm identity used to match across the genomes it is 90% and the model used is est2genome.
Example
Input file
The example input file contains three markers to design.
1DS_1905169_Cadenza0423_2404_C2404T,1D,ccgccgtcgtatggagcaggccggccaattccttcaaggagtcaaccacctggcgcaaggaccatgaggtccatgctcacgaggtctctttcgttgacgg[C/T]aaaaacaagacggcgccaggctttgagttgctcccggctgtggtggatcaccaaggcaacccgcagccgaccttggtggggatccacgttggccatcccaa
1DS_40060_Cadenza0423_2998_G2998A,1D,ccagcagcgcccgtcccccttctcccccgaatccgccggagcccagcggacgccggccatgagcacctccgagtagtaagtccccggcgccgccgccgcc[G/A]ccgatctttctttctttctcgcttgatttgtctgcgtttcttttgttccgggtgattgattgatgtgcgtgggctgctgcagcgactacctcttcaagctg
1DS_1847781_Cadenza0423_2703_G2703A,1D,tttcctctcaaatgtagcttctgcagattcggtggaagggcattcaaccggagaacctcattctcatcacttgcggtcacctctaggtaggacaaaaact[G/A]catctgaataagagactcacagaggcgttcacagtagattctcttcacattcaataacctcaggcttctcatttgcctcagctctcccagttgtctaacag
The input text box supports to have the table separated by TAB, so you can paste the three columns from excel.
Output: mask
The mask contains the details of the local alignment
REST API
PolyMarker jobs can be submitted via a REST API. To do this you need to submit a POST request to the url 'http://www.polymarker.info/snp_files.json'
The body
of the request must follow the following structure:
{
"snp_file":
{
"reference":"RefSeq v1.0",
"email":""
},
"polymarker_manual_input":
{
"post":"1DS_1905169_Cadenza0423_2404_C2404T,1D,ccgccgtcgtatggagcaggccggccaattccttcaaggagtcaaccacctggcgcaaggaccatgaggtccatgctcacgaggtctctttcgttgacgg[C/T]aaaaacaagacggcgccaggctttgagttgctcccggctgtggtggatcaccaaggcaacccgcagccgaccttggtggggatccacgttggccatcccaa\n1DS_40060_Cadenza0423_2998_G2998A,1D,ccagcagcgcccgtcccccttctcccccgaatccgccggagcccagcggacgccggccatgagcacctccgagtagtaagtccccggcgccgccgccgcc[G/A]ccgatctttctttctttctcgcttgatttgtctgcgtttcttttgttccgggtgattgattgatgtgcgtgggctgctgcagcgactacctcttcaagctg\n1DS_1847781_Cadenza0423_2703_G2703A,1D,tttcctctcaaatgtagcttctgcagattcggtggaagggcattcaaccggagaacctcattctcatcacttgcggtcacctctaggtaggacaaaaact[G/A]catctgaataagagactcacagaggcgttcacagtagattctcttcacattcaataacctcaggcttctcatttgcctcagctctcccagttgtctaacag"
}
}
The response will contain the ID (XXXXXXXXXXXXXXXXXXXX
in the example) of the request and the URL with the link to the results as follow:
{
"id":"XXXXXXXXXXXXXXXXXXXX",
"url":"http://www.polymarker.info/snp_files/XXXXXXXXXXXXXXXXXXXX",
"path":"/snp_files/XXXXXXXXXXXXXXXXXXXX"
}
The valid reference
values for this instance are:
'Glycine max var. Williams 82'
'Hordeum vulgare (barley)'
'Tetraploid wheat, based on Chinese Spring RefSeq v1.0'
'Brassica napus cv Darmor-bzh v4.1'
'Brassica oleracea kale-like type TO1000DH'
'Brassica rapa ssp. pekinensis line Chiifu 401-42'
'Rye (Secale cereale L.) inbred line Lo7 v2'
'Durum wheat genome (cv. Svevo)'
'Triticum urartu (Tu2.0)'
'Wheat cv Chinese Spring RefSeq v1.0'
'Triticum aestivum (cv. Paragon)'
'Triticum aestivum (cv. Cadenza)'
'Triticum aestivum (cv. Robigus)'
'Triticum aestivum (cv. Claire)'
'Triticum turgidum (cv. Kronos)'
'Wheat cv Chinese Spring RefSeq v1.0 (Full sequence)'
'Triticum aestivum, Fielder, 201216'
'Triticum turgidum, Kronos, v1.1'
'Triticum aestivum, Chinese Spring, IWGSC v2.1'