Table of Contents
- Introduction
- Prerequisites
- Inputs
- Running the Script
- Command Line Arguments
- Outputs
- Interpreting the Results
- Troubleshooting
- Additional Resources
Introduction
This script processes STEC alleles with both A and B subunits. It performs BLAST analyses to identify the best matching alleles for each subunit, detects novel alleles, and generates reports summarizing the results. This is essential for accurate subunit-level typing and characterization of Shiga toxin-producing Escherichia coli (STEC).
Prerequisites
Before running the script, ensure you have the following:
- Query files in FASTA format containing nucleotide or amino acid sequences.
- Allele database files for the relevant subunits (A and B) in FASTA format.
Inputs
The script requires the following inputs:
- Query files in FASTA format.
- Allele database files for the A and B subunits (e.g.,
Stx1A_aa_*.fasta
,Stx1B_aa_*.fasta
,Stx2A_aa_*.fasta
,Stx2B_aa_*.fasta
). - Optionally, a directory containing split AA allele databases for subunit-level BLAST analyses.
Running the Script
To run the script, use the following command:
stec.py stec_combined_subunits --allele_path /path/to/allele_folder --report_path /path/to/output_folder --query_path /path/to/query_folder --blast_mode blastn --split_aa_db_dir /path/to/split_aa_db_dir
Command Line Arguments
The script accepts the following command line arguments:
-h, --help
: Show this help message and exit.--version
: Show program's version number and exit.-a, --allele_path
: Specify the path to the folder containing allele files. Default:alleles
in the current working directory.-r, --report_path
: Specify the path to the folder for output reports. Default:reports
in the current working directory.-q, --query_path
: Specify the path to the folder containing query files in FASTA format. Default:query
in the current working directory.--blast_mode
: Choose BLAST mode:blastn
,tblastx
,blastx
, orblastn+tblastx
. Default:blastx
.--preliminary
: Run a preliminary screen using blastn (do not run additional downstream analyses).--split_aa_db_dir
: Path to directory containing split AA allele databases (required for full analysis).-n, --num_alignments
: Number of alignments to return. Default: 10.-c, --cutoff
: Percent identity cutoff. Default: 99.0.-v, --verbosity
: Set the logging level. Options: debug, info, warning, error, critical. Default: info.
Outputs
The script generates the following output files:
novel_alleles.fasta
: FASTA file containing all novel alleles discovered.stec_combined_nt_report.tsv
: TSV report summarizing nucleotide-level matches and novel alleles.stec_combined_aa_report.tsv
: TSV report summarizing amino acid-level matches and novel alleles.stec_combined_nt_aa_report.tsv
: Combined report for both nucleotide and amino acid analyses.- Individual FASTA files for novel subunits and operons, as appropriate.
Interpreting the Results
The main reports (stec_combined_nt_report.tsv
, stec_combined_aa_report.tsv
, and stec_combined_nt_aa_report.tsv
) summarize the best matches for each query, percent identity, and any novel alleles detected. Additional notes may indicate partial matches, ambiguous bases, or gap events.
Troubleshooting
If you encounter issues:
- Ensure all input files are in the correct format and location.
- Confirm that BLAST+ is installed and accessible from your command line.
- Check that the split AA database directory contains the required subunit FASTA files.
- Review error messages for missing files or permission issues.
Additional Resources
For more information, see the stec_combined_subunits
documentation or open an issue on GitHub.