PrimerFinder
What does it do?
The PrimerFinder performs in silico PCR analyses on FASTA and FASTQ formatted files.
There are two different modules:
PrimerFinder Legacy computes primer binding and amplicon statistics on FASTA formatted files using the now retired ePCR suite of tools from NCBI.
PrimerFinder Supremacy is able to process FASTA- and FASTQ-formatted files. If using FASTQ files, BBMap is employed to bait out reads containing primer sequences. A second round of baiting is performed using the previously-baited reads. Contigs are assembled by SPAdes with only the baited reads.
For both FASTA-formatted files and contigs assembled from FASTQ-formatted files, primers are BLASTed against contigs, and outputs are parsed to determine primer binding, and amplicon statistics
How do I use it?
Subject
In the Subject
field, put primer_finder
. Spelling counts, but case sensitivity does not.
Description
In the Description
field, you must provide:
Required Components
-
program=requested_program
- acceptable programs are
legacy
andsupremacy
- acceptable programs are
-
analysis=requested_analysis
-
acceptable analyses are
custom
andvtyper
-
custom
analyses require a FASTA-formatted file of primers you wish to use (see Attachments section for additional details) -
vtyper
analyses will use the primer set from the vtyper tool
-
-
-
a list of SEQIDs (one per line)
Optional Components
In order to customise your PrimerFinder analyses, several settings can be optionally modified
- Number of mismatches allowed.
- default is
2
- options are
0
,1
,2
,3
. Anything else will return an error - modify as follows:
mismatches=1
- default is
- Format of sequence files to use (note that PrimerFinder legacy will still use FASTA-formatted files even if you try
to specify FASTQ)
- default is
fasta
- options are:
fasta
andfastq
- modify as follows:
format=fastq
- default is
- K-mer size to use for SPAdes assembly
- default is
55,77,99,127
- either provide and integer, e.g.
33
, or a comma-separated list, e.g.21,33,55
- modify as follows:
kmersize=33
ORkmersize=21,33,55
- default is
- Export amplicons in PrimerFinder Legacy (amplicons are created by default by PrimerFinder Supremacy)
- default is
False
- modify as follows:
exportamplicons
- default is
Attachments
For custom
analyses, you are required to attach a FASTA-formatted file containing the primer set(s) you wish analysed.
The file must have the following format:
>gene1-F
seq
>gene1-R
seq
>gene2-F1
seq
>gene2-R1
seq
>gene2-F2
seq
>gene2-R2
seq
.....
You are allowed to use IUPAC degenerate bases in this file. Please don't add too many degenerate bases, as the number of primers combinations can increase quickly.
# Dictionary of degenerate IUPAC codes
iupac = {
'R': ['A', 'G'],
'Y': ['C', 'T'],
'S': ['C', 'G'],
'W': ['A', 'T'],
'K': ['G', 'T'],
'M': ['A', 'C'],
'B': ['C', 'G', 'T'],
'D': ['A', 'G', 'T'],
'H': ['A', 'C', 'T'],
'V': ['A', 'C', 'G'],
'N': ['A', 'C', 'G', 'T'],
'-': ['-']
}
Examples
Example PrimerFinder analyses:
PrimerFinder Legacy, custom analyses issue 16084
PrimerFinder Legacy, vtyper analyses issue 16085
PrimerFinder Supremacy, custom analyses, FASTA format issue 16086
PrimerFinder Supremacy, custom analyses, FASTQ format issue 16087
Interpreting Results
PrimerFinder Legacy
PrimerFinder Legacy will upload ePCR_report.csv
, which contains the strain name, gene name as parsed from the primer
file, location of the calculated amplicon, the size of the amplicon, the name of the contig on which the amplicon was
found, the total number of mismatches between the primer set and the target sequence, and the primers used to create the amplicon.
Here's an example created using the example primer file from the Attachments
section of Redmine issue 16084
Sample | Gene | GenomeLocation | AmpliconSize | Contig | TotalMismatches | PrimerSet |
---|---|---|---|---|---|---|
2014-SEQ-1390 | gene1 | 30506-30699 | 194 | contig_1 | 0 | gene1_0_0 |
2014-SEQ-1390 | gene2 | 476139-476423 | 285 | contig_32 | 1 | gene2_1_1 |
Note that the way the primer set is numbered is based on the number of primer sets with the same gene name: gene1 had
one primer set, gene1-F
and gene1-R
, and these are represented as gene1_0_0
. There were two
primer sets for gene2, and in this example gene2-F2
and gene2-R2
annealed (with one mismatch)
If requested, amplicon sequences for each match will be created. Using the same example as above 2014-SEQ-1390_amplicons.fa
will contain:
>2014-SEQ-1390_contig_1_476139_476423_gene1_0_0
amplicon.....
>2014-SEQ-1390_contig_32_27409_27668_gene2_1_1
amplicon.....
PrimerFinder Supremacy
PrimerFinder Supremacy will generate ePCR_report.csv
within the consolidated_report
folder. This report contains the
following fields: strain name, gene name parsed from the primer file, location of the amplicon, size of the amplicon,
name of contig on which the amplicon was found, the forward and reverse primers used to create the amplicon, the number of mismatches for the forward and reverse primers.
Here's an example created using the example primer file from the Attachments
section of Redmine issue 16084
Sample | Gene | GenomeLocation | AmpliconSize | Contig | ForwardPrimers | ReversePrimers | ForwardMismatches | ReverseMismatches |
---|---|---|---|---|---|---|---|---|
2014-SEQ-1390 | gene1 | 30506-30699 | 194 | contig_1 | gene1-F_0 | gene1-R_0 | 0 | 0 |
2014-SEQ-1390 | gene2 | 476139-476423 | 285 | contig_32 | gene2-F2_0 | gene2-R2_0 | 1 | 0 |
Note that the primer names have _0
appended to the end. In the case of IUPAC bases being present in the primer sequence, this number will be incremented for each primer created to satisfy all possible
combinations.
Amplicon sequences are always included. Using the same example as above, 2014-SEQ-1390_amplicons.fa
will contain:
>2014-SEQ-1390_contig_1_gene1-F_0_gene1-R_0
amplicon
>2014-SEQ-1390_contig_32_gene2-F2_0_gene2-R2_0
amplicon
Raw BLAST results are also uploaded. In the example above, 2014-SEQ-1390_rawresults.csv
will contain:
qseqid | sseqid | positive | mismatch | gaps | evalue | bitscore | slen | length | qstart | qend | qseq | sstart | send | sseq |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
contig_1 | gene1-R_0 | 25 | 0 | 0 | 3.07E-07 | 50.1 | 25 | 25 | 476399 | 476423 | TACGGTTCCTTTGACGGTGCGATGA | 25 | 1 | TACGGTTCCTTTGACGGTGCGATGA |
contig_1 | gene1-F_0 | 25 | 0 | 0 | 3.07E-07 | 50.1 | 25 | 25 | 476139 | 476163 | GTGAAATTATCGCCACGTTCGGGCA | 1 | 25 | GTGAAATTATCGCCACGTTCGGGCA |
contig_32 | gene2-F2_0 | 20 | 1 | 0 | 4.41E-06 | 42.1 | 21 | 21 | 27648 | 27668 | CGCCTTATTATACGACCAAAG | 21 | 1 | CGCCTTATTTTACGACCAAAG |
contig_32 | gene2-R2_0 | 20 | 0 | 0 | 1.74E-05 | 40.1 | 20 | 20 | 27409 | 27428 | TGCCCAAAGCAGAGAGATTC | 1 | 20 | TGCCCAAAGCAGAGAGATTC |
How long does it take?
PrimerFinder Legacy and PrimerFinder Supremacy on FASTA mode are very fast (seconds per sample), while PrimerFinder Supremacy FASTQ mode is relatively slow (a few minutes per sample)
What can go wrong?
- Requested SEQIDs are not available.
- Not including the
program=requested_program
component, or requesting an unsupported program - Not including the
analysis=requested_analysis
component, or requesting an unsupported analysis - Specifying an unsupported number of
mismatches
- Incorrectly formatted
kmersize
- Attaching an incorrectly formatted primer file, or not including a primer file for
custom
analyses - ?
If anything goes wrong, an error message explaining the error should be returned.