RoaryScoary
What does it do?
Roary
Roary is a pipeline for calculating pan genomes. Roary is not intended for metagenomics or for comparing extremely diverse sets of genomes. If you want to learn more about it, check out the Roary publication.
Scoary
Scoary takes the gene_presence_absence.csv
file output from Roary and a traits file created by the user, and calculates the associations between the given traits and all genes in the accessory genome. For more information, check out the Scoary github or Scoary publication.
How do I use it?
Subject
In the Subject
field, put roary
. Spelling counts, but case sensitivity doesn't.
Description
Required Components
In the Description
field, you must provide the requested analysis type as follows:
analysistype=requested_analysis
The Roary automator supports the following analyses which perform operations on the pan genome of isolate sequences (again, spelling counts, but case sensitivity doesn't):
- union - reports the union of genes found in sequences
- intersection - reports the intersection of genes found in isolate sequences (core genes)
- complement - reports the complement of genes found in isolates (accessory genes)
- gene_multifasta - extracts the sequence of each gene listed and creates multi-fasta files for each gene listed (outputs protein multi-fastas). NOTE: you must provide an additional line:
genes=gene1,gene2,geneN
which is a comma-separated list of genes you would like multi-fasta files for (eg.genes=fliC,gyrA
). Case sensitivity counts for gene names - difference - reports the gene differences between sets of isolates. You must differentiate between sequence sets by adding the line
set_two
above your second set of sequences in the description (sequences you want compared).
You must also include a list of SEQIDs one per line.
Optional Components - Scoary
If you would like scoary analysis to be completed, calculating the associations between traits and all genes in the accessory genome you must attach a csv file of traits (traits.csv) to the issue. Attaching a traits.csv
file will automatically result in scoary analysis being conducted.
This traits file must be formatted in a specific way.
- the rows should correspond to your isolate sequence IDs
- the top right cell should be left blank
- the trait(s) data must be binary (0 for trait absent, 1 for trait present)
- all SEQ-IDs and traits should be uniquely named and not contain any weird characters (e.g. %;,/&[]@? etc)
The below table is an example from the Scoary github:
Trait 1 | Trait 2 | ... | Trait N | |
---|---|---|---|---|
YYYY-SEQ-0001 | 1 | 0 | ... | 1 |
YYYY-SEQ-0002 | 0 | 0 | ... | 1 |
YYYY-SEQ-0003 | 1 | 0 | ... | 1 |
For more information, check out the Scoary github
Example
For an example Roary/Scoary issue see issue 24968, for an example Roary analysistype=gene_multifasta
issue see issue 25013. (NOTE: the output files for these issues no longer exist on the ftp server.)
Interpreting Results
The Roary automator will upload links to the ftp for files called prokka_output.zip
and roary_output.zip
once it has completed.
The prokka_output.zip
contains all of the outputs from prokka. Prokka will output a lot of files for each genome you give it - you can find a quick description of
each file here. Of particular interest are the .gff
files, which Roary uses for analysis.
The roary_output.zip
file contains all of the outputs from roary. Some additional files will be present depending on the analysis type chosen:
- gene_multifasta - the zip file will include
.fa
files that contain the aligned protein sequences for the requested gene(s) (one for file each gene included in the analysis, eg.gene_multifasta_fliC.fa
andgene_multifasta_gyrA.fa
). If you'd like, these sequences can be further analysed using NCBI protein blast. - scoary - the zip file will include
.csv
file(s) for each of the traits (column headers) provided in the attachedtraits.csv
file.
How long does it take?
Prokka isn't the quickest thing around - expect it to take 2 to 3 minutes for each genome you give it. After prokka is finished the Roary pipeline time will depend on the analysis type, and the number of genes/traits requested (in the case of gene_multifasta and scoary).
What can go wrong?
- Requested SEQIDs are not available. If we can't find some of the SEQIDs that you request, you will get a warning message informing you of it.
- There was an issue with the requested analysistype: either one was not supplied, the was a typo, or you requested a currently-unsupported analysis. An error message detailing the problem will be added to the issue.
- Scoary You will not get an error if the traits.csv file is not attached or formatted correctly. Roary will complete, but no
.csv
files for each of the traits will be included in the zip file. Unfortunately, you will have to submit the analysis request again with the correcttraits.csv
formatting.