PrimerFinder

What does it do?

The PrimerFinder performs in silico PCR analyses on FASTA and FASTQ formatted files.

There are two different modules:

PrimerFinder Legacy PrimerFinder Supremacy

PrimerFinder Legacy computes primer binding and amplicon statistics on FASTA formatted files using the now retired ePCR suite of tools from NCBI.

PrimerFinder Supremacy is able to process FASTA- and FASTQ-formatted files. If using FASTQ files, BBMap is employed to bait out reads containing primer sequences. A second round of baiting is performed using the previously-baited reads. Contigs are assembled by SPAdes with only the baited reads.

For both FASTA-formatted files and contigs assembled from FASTQ-formatted files, primers are BLASTed against contigs, and outputs are parsed to determine primer binding, and amplicon statistics

How do I use it?

Subject

In the Subject field, put primer_finder. Spelling counts, but case sensitivity does not.

Description

In the Description field, you must provide:

Required Components

program=requested_program
- acceptable programs are legacy and supremacy
analysis=requested_analysis
- acceptable analyses are custom and vtyper
  - custom analyses require a FASTA-formatted file of primers you wish to use (see Attachments section for additional details)
  - vtyper analyses will use the primer set from the vtyper tool
a list of SEQIDs (one per line)

Optional Components

In order to customise your PrimerFinder analyses, several settings can be optionally modified

Number of mismatches allowed.
- default is 2
- options are 0, 1, 2, 3. Anything else will return an error
- modify as follows:
  - mismatches=1
Format of sequence files to use (note that PrimerFinder legacy will still use FASTA-formatted files even if you try to specify FASTQ)
- default is fasta
- options are: fasta and fastq
- modify as follows:
  - format=fastq
K-mer size to use for SPAdes assembly
- default is 55,77,99,127
- either provide and integer, e.g. 33, or a comma-separated list, e.g. 21,33,55
- modify as follows:
  - kmersize=33 OR
  - kmersize=21,33,55
Export amplicons in PrimerFinder Legacy (amplicons are created by default by PrimerFinder Supremacy)
- default is False
- modify as follows:
  - exportamplicons

Attachments

For custom analyses, you are required to attach a FASTA-formatted file containing the primer set(s) you wish analysed. The file must have the following format:

>gene1-F
seq
>gene1-R
seq
>gene2-F1
seq
>gene2-R1
seq
>gene2-F2
seq
>gene2-R2
seq
.....

You are allowed to use IUPAC degenerate bases in this file. Please don't add too many degenerate bases, as the number of primers combinations can increase quickly.

# Dictionary of degenerate IUPAC codes
iupac = {
    'R': ['A', 'G'],
    'Y': ['C', 'T'],
    'S': ['C', 'G'],
    'W': ['A', 'T'],
    'K': ['G', 'T'],
    'M': ['A', 'C'],
    'B': ['C', 'G', 'T'],
    'D': ['A', 'G', 'T'],
    'H': ['A', 'C', 'T'],
    'V': ['A', 'C', 'G'],
    'N': ['A', 'C', 'G', 'T'],
    '-': ['-']
}

Examples

Example PrimerFinder analyses:

PrimerFinder Legacy, custom analyses issue 16084

PrimerFinder Legacy, vtyper analyses issue 16085

PrimerFinder Supremacy, custom analyses, FASTA format issue 16086

PrimerFinder Supremacy, custom analyses, FASTQ format issue 16087

Interpreting Results

PrimerFinder Legacy

PrimerFinder Legacy will upload ePCR_report.csv, which contains the strain name, gene name as parsed from the primer file, location of the calculated amplicon, the size of the amplicon, the name of the contig on which the amplicon was found, the total number of mismatches between the primer set and the target sequence, and the primers used to create the amplicon.

Here's an example created using the example primer file from the Attachments section of Redmine issue 16084

Sample	Gene	GenomeLocation	AmpliconSize	Contig	TotalMismatches	PrimerSet
2014-SEQ-1390	gene1	30506-30699	194	contig_1	0	gene1_0_0
2014-SEQ-1390	gene2	476139-476423	285	contig_32	1	gene2_1_1

Note that the way the primer set is numbered is based on the number of primer sets with the same gene name: gene1 had one primer set, gene1-F and gene1-R, and these are represented as gene1_0_0. There were two primer sets for gene2, and in this example gene2-F2 and gene2-R2 annealed (with one mismatch)

If requested, amplicon sequences for each match will be created. Using the same example as above 2014-SEQ-1390_amplicons.fa will contain:

>2014-SEQ-1390_contig_1_476139_476423_gene1_0_0
amplicon..... 
>2014-SEQ-1390_contig_32_27409_27668_gene2_1_1
amplicon.....

PrimerFinder Supremacy

PrimerFinder Supremacy will generate ePCR_report.csvwithin the consolidated_report folder. This report contains the following fields: strain name, gene name parsed from the primer file, location of the amplicon, size of the amplicon, name of contig on which the amplicon was found, the forward and reverse primers used to create the amplicon, the number of mismatches for the forward and reverse primers.

Here's an example created using the example primer file from the Attachments section of Redmine issue 16084

Sample	Gene	GenomeLocation	AmpliconSize	Contig	ForwardPrimers	ReversePrimers	ForwardMismatches	ReverseMismatches
2014-SEQ-1390	gene1	30506-30699	194	contig_1	gene1-F_0	gene1-R_0	0	0
2014-SEQ-1390	gene2	476139-476423	285	contig_32	gene2-F2_0	gene2-R2_0	1	0

Note that the primer names have _0 appended to the end. In the case of IUPAC bases being present in the primer sequence, this number will be incremented for each primer created to satisfy all possible combinations.

Amplicon sequences are always included. Using the same example as above, 2014-SEQ-1390_amplicons.fa will contain:

>2014-SEQ-1390_contig_1_gene1-F_0_gene1-R_0
amplicon
>2014-SEQ-1390_contig_32_gene2-F2_0_gene2-R2_0
amplicon

Raw BLAST results are also uploaded. In the example above, 2014-SEQ-1390_rawresults.csv will contain:

qseqid	sseqid	positive	mismatch	evalue	bitscore	slen	length	qstart	qend	qseq	sstart	send	sseq
contig_1	gene1-R_0	25	0	3.07E-07	50.1	25	25	476399	476423	TACGGTTCCTTTGACGGTGCGATGA	25	1	TACGGTTCCTTTGACGGTGCGATGA
contig_1	gene1-F_0	25	0	3.07E-07	50.1	25	25	476139	476163	GTGAAATTATCGCCACGTTCGGGCA	1	25	GTGAAATTATCGCCACGTTCGGGCA
contig_32	gene2-F2_0	20	1	4.41E-06	42.1	21	21	27648	27668	CGCCTTATTATACGACCAAAG	21	1	CGCCTTATTTTACGACCAAAG
contig_32	gene2-R2_0	20	0	1.74E-05	40.1	20	20	27409	27428	TGCCCAAAGCAGAGAGATTC	1	20	TGCCCAAAGCAGAGAGATTC

How long does it take?

PrimerFinder Legacy and PrimerFinder Supremacy on FASTA mode are very fast (seconds per sample), while PrimerFinder Supremacy FASTQ mode is relatively slow (a few minutes per sample)

What can go wrong?

Requested SEQIDs are not available.
Not including the program=requested_program component, or requesting an unsupported program
Not including the analysis=requested_analysis component, or requesting an unsupported analysis
Specifying an unsupported number of mismatches
Incorrectly formatted kmersize
Attaching an incorrectly formatted primer file, or not including a primer file for custom analyses
?

If anything goes wrong, an error message explaining the error should be returned.