Snippy
What does it do?
Snippy is a tool for rapid haploid variant calling and core genome alignment of closely related genomes. This automator can also be used to build a phylogenetic tree to attempt to show the relatedness of these strains. More info can be found at the Snippy github site.
How do I use it?
Subject
In the Subject
field, put snippy
. Spelling counts, but case sensitivity doesn't.
Description
Required Components
The first line of your description needs to be reference=
, the SEQID of the strain you want to act
as your reference strain. Ideally, you'll want to pick a high-quality assembly for your reference.
eg. reference=YYYY-MIN-NNNN
If you wish to attach a reference file instead of providing a SEQID, the line must be reference=attached
You must also include a list of SEQIDs one per line.
Optional Components
We have also included the ability to 'cleanup' a multiple sequence alignment, and generate a tree using Gubbins and IQ-tree. If you call SNPs for multiple isolates from the same reference, you are able to produce an alignment of "core SNPs" to build a phylogeny. To use this function, add the following line in the description:
cleanup_&_tree=True
This automator currently uses IQtree and default settings for gubbins, but other options are available. If you would like additional functions, please contact a bioinformatician and we will do our best to add them.
Example
For an example snippy, see issue 30388.
Interpreting Results
The zip file uploaded on snippy completion will contain folders for each of the comparator sequences, as well as core alignment files. Some of the files will also be uploaded/attached directly to the redmine request. Important files are:
core.txt
: A summary file. Tab-separated columnar list of alignment/core-size statistics of the number of: aligned, unaligned, variants, and low coverage bases between every strain submitted and the reference.core.tab
: Tab-separated columnar list of core SNP sites with alleles but NO annotationscore.vcf
: Multi-sample VCF file with genotype GT tags for all discovered alleles- Within each sequence folder there will also be files, including a tab separated summary for that particular sequence in comparison to the reference.
If cleanup_&_tree was included, then tree files will also be included in the output:
{issue.id}.snippy.gubbins.iqtree.tree
: An intermediate phylogenetic tree created by gubbins + iqtree.{issue.id}.snippy_final.clean.core.snp.aln.tree
: The final phylogeny created from the core snp alignment (created using IQtree and the GTR substitution method).
If you want to view these tree files, you can use a program such as FigTree or a web-based viewer like phylo.io.
Other files can also be important - see the docs on Snippy Output files for more information.
Variant types
For more information, see the docs on Snippy Output files:
Type | Name | Example |
---|---|---|
snp | Single Nucleotide Polymorphism | A => T |
mnp | Multiple Nuclotide Polymorphism | GC => AT |
ins | Insertion | ATT => AGTT |
del | Deletion | ACGG => ACG |
complex | Combination of snp/mnp | ATTC => GTTA |
How long does it take?
It depends on the number of sequences and coverage/size of sequence files. Small snippy requests take ~15 minutes to complete (see the example request). If you submit a request for a larger snippy (>30 strains), or include larger long-read sequences, it may take substantially longer.
What can go wrong?
A few things can go wrong with this process:
1) Requested SEQIDs are not available. If we can't find some of the SEQIDs that you request, you will get a warning message informing you of it.
2) Strains too far apart. Snippy requires that the strains you want to compare to the reference be closely related to
the reference. If you ask for a snippy-analysis with things that are not very related, the analysis will fail. In this case, look at the core.txt
file that is uploaded to the redmine request. If there are sequences with 0 under "Aligned", remove those sequences from the request and re-submit.
3) Why is the cleaned snp tree not showing my single-end sequences? - I'm not sure... the cleaning step(s) using gubbins appears to remove the minion (single-end) sequence data from the cleaned alignment when a multi-analysis is performed using both illumina and minion sequences. The {}.snippy_core_alignment.iqtree.treefile
, which is generated before the cleaning steps, will contain all SEQIDS.
Should I use Snippy or SNVPhyl?
...This depends on your goal...
Snippy appears to be faster than SNVPhyl. It also allows use of both paired-end and single-end raw data files in the same analysis (whereas SNVPhyl currently does not).
A comparison of snippy vs SNVPhyl has not been conducted by our lab. The SNVPhyl publication does briefly discuss snippy.
Version
Snippy version 4.6.0 (as of 2024-07-03)