bcgTree
What does it do?
bcgTree is a an automated phylogenetic tree building pipeline that builds trees from bacterial core genomes. The pipeline "automatically extracts 107 essential single-copy core genes, found in a majority of bacteria" (these genes were statistically determined, see bcgTree publication). It then uses hidden Markov models and performs a partitioned maximum-likelihood analysis. If you want to learn more about it, check out the bcgTree publication. Dont forget to cite Ankenbrand and Keller, 2016!
How do I use it?
Subject
In the Subject
field, put bcgtree
. Spelling counts, but case sensitivity doesnt.
Description
Required Components
In the Description
field, you must include a list of SEQIDs one per line.
Optional Components
In order to customise your bcgTree analysis, several settings can be optionally modified.
- If you would like to change the number of bootstraps performed, add the following line to the description before the list of SEQIDs:
- default is
100
- modify as follows:
bootstraps=1000
- default is
- If you would like to change the minimum number of proteomes in which a gene must occur in order to be kept:
- default is
2
- modify as follows:
min_proteomes=5
- default is
- If you would like to change the amino acid substitution model used for the partitions by RAxML:
- default is
AUTO
- modify as follows:
aa_substitution_model=GTR
- you can select one of the following amino acid substitution models to use:
- AUTO, DAYHOFF, DCMUT, JTT, MTREV, WAG,RTREV, CPREV, VT, BLOSUM62, MTMAM, LG, MTART, MTZOA, PMB, HIVB,HIVW, JTTDCMUT, FLU, STMTREV, DUMMY, DUMMY2, LG4M, LG4X, PROT_FILE, GTR_UNLINKED, GTR
- default is
Example
For an example bcgtree issue see issue 25083. NOTE: the output files for these issues no longer exist on the ftp server.
Interpreting Results
The bcgTree automator will upload links to the ftp for files called prokka_output.zip
and bcgtree_output.zip
once it has completed.
The prokka_output.zip
contains all of the outputs from prokka. Prokka will output a lot of files for each genome you give it - you can find a quick description of
each file here. Of particular interest are the .faa
files, which bcgTree uses for analysis.
The bcgtree_output.zip
file contains all of the outputs from bcgTree. The alignment files, and gene-id files output by bcgtree can be found in subfolders in the zip file. The most interesting outputs from bcgtree are the RAxML files. These files conaint the phylogenetic trees output by bcgTree:
- RAxML_bestTree.final
- RAxML_bipartitionsBranchLabels.final
- RAxML_bipartitions.final
- RAxML_bootstrap.final
These files can be opened using a phylogenetic tree viewer of your choice.
There will also be the config.txt
file containing the command(s) passed to bcgTree, as well as a bcgtree.log
file containing all of the executed commands and their output(s) (including the random seed values used by RAxML).
How long does it take?
Prokka isnt the quickest thing around - expect it to take 2 to 3 minutes for each genome you give it. After prokka is finished the bcgTree pipeline time will depend on the number of sequences and bootstraps requested.
What can go wrong?
- Requested SEQIDs are not available. If we cant find some of the SEQIDs that you request, you will get a warning message informing you of it.
- There was an issue with the requested amino acid substitution model: there was a typo, or you requested a currently-unsupported substitution model. An error message detailing the problem will be added to the issue.