GeneSeekr
What does it do?
The GeneSeekr is a suite of analyses that detect gene targets in FASTA-formatted files.
How do I use it?
Subject
In the Subject
field, put geneseekr
. Spelling counts, but case sensitivity doesn't.
Description
Required Components
In the Description
field, you must provide the requested analysis type as follows:
analysis=requested_analysis
The GeneSeekr pipeline supports the following analyses (again, spelling counts, but case sensitivity doesn't):
gdcs
- determines the presence of genomically-dispersed conserved sequences in the following genera: Escherichia, Listeria, Salmonella, Vibrio. NOTE: you must provide an additional line:organism=ORGANISM
genesippr
- custom suite of genes derived from the following genera: Bacillus, Campylobacter, Escherichia, Listeria, Salmonella, Staphylococcus, Vibriomlst
- determines multi-locus sequence type for the following genera: Bacillus, Campylobacter, Escherichia, Listeria, Salmonella, Staphylococcus, Vibrio. NOTE: you must provide an additional line:organism=ORGANISM
cgmlst
- determines core genome multi-locus sequence type for the following genera: Escherichia, Yersinia. NOTE: you must provide an additional line:organism=ORGANISM
resfinder
- identifies acquired antimicrobial resistance genesrmlst
- determines ribosomal multi-locus sequence typeserosippr
- calculates the serotype for Escherichiasixteens
- determines closest 16S matchvirulence
- finds virulence genescustom
(you must attach a FASTA-formatted file of target(s) to the issue)- Bacterial Integrative and Conjugative Elements (ICEs) ICEberg databases from the ICEfinder publication:
all_ices
- used for all ICE gene detectionaice
- used for actinomycete (AICEs) type ICE gene detectioncime
- used for cis-mobilizable elements (CIMEs) ICE gene detectionime
- used for Integrative and Mobilizable Elements (IME) type ICE gene detectiont4ss
- used for Type IV Secretion System (T4SS) type ICE gene detection
You must also include a list of SEQIDs one per line.
Optional Components
In order to customise your GeneSeekr analyses, several settings can be optionally modified
- BLAST program. NOTE: GeneSeekr does not check to see if your query or database are the appropriate molecule for the requested program. Additionally, none of the standard analyses currently have protein databases.
- default is
blastn
- You can select one of the following BLAST programs to use:
- blastn - nt query: nt db
- blastp - protein query: protein db
- blastx - translated nt query: protein db
- tblastn - protein query: translated nt db
- tblastx - translated nt query: translated nt db
- modify as follows:
blast=tblastx
- default is
- Minimum cutoff for matches to be included in report.
- default is
70
- modify as follows:
cutoff=80
- default is
- E-value cutoff
- default is
1E-05
- modify as follows:
evalue=1E-10
- default is
- Include alignments in reports
- default is
False
- modify as follows:
align=True
- default is
- Report unique hits only - does not report multiple hits at the same location in a contig. Instead, only the best hit is reported, and the rest are discarded
- default is
False
- modify as follows:
unique=True
- default is
- Include FASTA file output of strain-specific target sequence matches
- default is
False
- modify as follows:
fasta=True
- default is
Example
For example GeneSeekr issues, see issue 14470 (ResFindr), issue 14471 (custom), or issue 27867 (cgMLST).
Interpreting Results
The GeneSeekr automator will upload a file called geneseekr_output.zip
once it has completed. This file will contain all the reports generated for the requested analysis.
How long does it take?
It depends on the analysis requested. The GeneSeekr pipeline should take about a minute to analyze each SEQID requested.
What can go wrong?
- Requested SEQIDs are not available. If we can't find some of the SEQIDs that you request, you will get a warning message informing you of it.
- There was an issue with the requested analysis: either one was not supplied, the was a typo, or you requested a currently-unsupported analysis. An error message detailing the problem will be added to the issue.
- The
custom
analysis requires an attached FASTA-formatted file of gene targets. If the file was not attached, or there was an issue reading the file, an error message detailing the problem will be add to the issue.