COWSNPhR
What does it do?
COWSNPhR (CFIA OLC Workflow for Single Nucleotide PHylogeny Reporting) is a pipeline developed in conjunction with the USDA APHIS Veterinary Services that determines the presence of high quality Single Nucleotide Variants between a reference strain and other closely related strains. Following annotation of the reference strain, using Prokka, the pipeline maps the raw reads of the query strains to the reference genome using bowtie2, finds genetic variants with DeepVariant, extracts variants, creates multiple sequence alignments, and calculates a phylogenetic tree using FastTree to show the relatedness of these strains. The location of each variant is mapped to the annotation of the reference strain, and all variants are entered in a summary table.
How do I use it?
Subject
In the Subject
field, put cowsnphr
. Spelling counts, but case sensitivity does not.
Description
The first line of your description needs to be reference
, and the second line the SEQID of the strain you want to act
as your reference strain. Ideally, you'll want to pick a high-quality assembly for your reference.
If you wish to attach a reference file instead of providing a SEQID, the second line must be attached
The third line of your description should be compare
, and lines after that the SEQIDs for strains to which you want to
compare your reference.
Example
For example COWSNPhR analyses, see issue 15681 and issue 15682 (uploaded reference file).
Interpreting Results
The zip file uploaded on COWSNPhR completion should contain four folders:
vcf_files
: compressed global VCF filessummary_tables
: table summarizing the location, prevalence, and annotation of variantsalignments
: multiple sequence alignment of variantstree_files
: phylogenetic tree of alignment. If you want to view this tree, you can use a program such as FigTree or a web-based viewer like phylo.io.
How long does it take?
Most COWSNPhR requests take ~1 hour to complete. If you submit a request for a larger COWSNPhR analysis (>30 strains), it may take substantially longer.
What can go wrong?
A few things can go wrong with this process:
1) Requested SEQIDs are not available. If we can't find some of the SEQIDs that you request, you will get a warning message informing you of it.
2) Strains too far apart. COWSNPhR requires that the strains you want to compare to the reference be closely related to the reference. If you ask for an analysis with strains that are not very related, you will get a warning telling you so.
3) Other errors. This software is still in active development, so there may be unforeseen issues arising for novel reference: query combinations.