SNVPhyl
What does it do?
SNVPhyl is a pipeline developed by the Public Health Agency of Canada for evaluating the number of SNPs between a reference strain and other closely related strains. It also builds a phylogenetic tree to attempt to show the relatedness of these strains. Lots more info can be found at the SNVPhyl readthedocs site.
How do I use it?
Subject
In the Subject
field, put SNVPhyl
. Spelling counts, but case sensitivity doesn't.
Description
The first line of your description needs to be reference
, and the second line the SEQID of the strain you want to act
as your reference strain. Ideally, you'll want to pick a high-quality assembly for your reference.
If you wish to attach a reference file instead of providing a SEQID, the second line must be attached
The third line of your description should be compare
, and lines after that the SEQIDs for strains you want to compare
your reference to.
Example
For an example SNVPhyl, see issue 12494.
Interpreting Results
The zip file uploaded on SNVPhyl completion should contain 10 files. Important files are:
- snvMatrix.tsv: Shows the number of SNVs between every strain submitted.
- vcf2core.tsv: Shows how much of the genome was covered by the analysis (look at the
Percentage of all positions that are valid, included, and part of the core genome
column in theall
row). This should be at least 90 percent, or the strains you were comparing were probably too far apart to get good results. - phylogeneticTree.nwk: The phylogenetic tree created by SNVPhyl. If you want to view this tree, you can use a program such as FigTree or a web-based viewer like phylo.io.
Other files can also be important - see the docs on SNVPhyl Output files for more information.
How long does it take?
Most SNVPhyl requests take ~1 hour to complete. If you submit a request for a larger SNVPhyl (>30 strains), it may take substantially longer.
How to check if your SNVPhyl will fail using Galaxy Docker
- Log into the head node by typing the following in the terminal
ssh ubuntu@head
orssh ubuntu@192.168.1.5
- If prompted for a password enter the standard bioinformatics password.
- Enter
watch squeue
. Under the columnNAME
there will be a list of biorequests running. In the same row as your biorequest note your Job ID and the node in which your issue is running underJOBID
andNODELIST
. - On a web browser of your choice search
http://[IP_address].[node#]:[jobID]
. (Eg. for a job on node 03 with the ID 34595 the url would be: http://[IP_address].3:34595/)- Please ask a bioinformatician or Cathy for the IP address.
- Under the tab user cick log in and login with the following credentials: user: admin@galaxy.org; password: admin.
- On the right there will be a tab labeled
history
that will display all the steps the the SNVPhyl runs. If there are many tabs with red x's it will likely fail.
What can go wrong?
A few things can go wrong with this process:
1) Requested SEQIDs are not available. If we can't find some of the SEQIDs that you request, you will get a warning message informing you of it.
2) Strains too far apart. SNVPhyl requires that the strains you want to compare to the reference be closely related to the reference. If you ask for a SNVPhyl with things that are not very related, you will get a warning telling you so.
3) No output files. Sometimes, SNVPhyl will say it has completed, but the typical output files will not be present. This is either because:
a. there are no SNVs between the two strains, and so SNVPhyl crashes. If you want to confirm that there are no SNVs between your two strains you can add a third strain to your description that is related enough to run (eg. try a strain with the same rMLST) but may have variants.
b. SNVPhyl crashed for an unknown reason, which does happen occasionally. If this happens, your best bet is to try running the SNVPhyl again.
If SNVPhyl keeps crashing even after subsequent attempts, let us know and we'll do our best to fix things.
Version
SNVPhyl on redmine is version 1.0.1 with snvphyl_cli_version=1.3 (as of 2024-04-05)