Table of Contents
- Introduction
- Prerequisites
- Inputs
- Running the Script
- Command Line Arguments
- Outputs
- Interpreting the Results
- Troubleshooting
- Additional Resources
Introduction
This script performs BLAST analyses on a nucleotide allele database to discover their sequence types. It updates nucleotide and amino acid profiles and allele databases, which is crucial for understanding the genetic diversity of your samples.
Prerequisites
Before running the script, ensure you have the following:
- Nucleotide query files in FASTA format.
- All outputs from
allele_translate_reduce
.
Inputs
The script requires the following inputs:
- Nucleotide query files in FASTA format.
- All outputs from
allele_translate_reduce
.
Running the Script
To run the script, use the following command:
stec.py allele_find --nt_profile /path/to/nt_profile_file --aa_profile /path/to_aa_profile_file --nt_alleles /path/to/nt_allele_folder --aa_alleles /path/to/aa_allele_folder -r /path/to/output_folder -q /path/to/query_folder
Command Line Arguments
The script accepts the following command line arguments:
-h, --help
: Show this help message and exit.-version, --version
: Show program's version number and exit.-v verbosity, --verbosity verbosity
: Set the logging level. Options are debug, info, warning, error, and critical. Default is info.--nt_profile nt_profile
: Specify name and path of nucleotide profile file. If not provided,profile.txt
in thent_profile
folder in the current working directory will be used by default.--aa_profile aa_profile
: Specify name and path of amino acid profile file. If not provided,profile.txt
in theaa_profile
folder in the current working directory will be used by default.--nt_alleles nt_alleles
: Specify name and path of folder containing nucleotide alleles. If not provided, thent_allele
folder in the current working directory will be used by default.--aa_alleles aa_alleles
: Specify name and path of folder containing amino acid alleles. If not provided, theaa_allele
folder in the current working directory will be used by default.-r report_path, --report_path report_path
: Specify name and path of folder into which reports are to be placed. If not provided, thereports
folder in the current working directory will be used.-q query_path, --query_path query_path
: Specify name and path of folder containing query files in FASTA format. If not provided, thequery
folder in the current working directory will be used.
Outputs
The script generates the following output files:
aa_novel_profiles.txt
: A text file containing all novel amino acid profiles generated from query sequences.aa_gene_name_novel_alleles.fasta
: A FASTA-formatted text file containing all novel amino acid alleles from query sequences.nt_novel_profiles.txt
: A text file containing all novel nucleotide profiles generated from query sequences.nt_gene_name_novel_alleles.fasta
: A FASTA-formatted text file containing all novel nucleotide alleles from query sequences.stec_report.tsv
: A TSV file containing results for each query. Includes sample name, nucleotide allele identifiers, nucleotide sequence type, amino acid allele identifiers, amino acid sequence type, and notes.
Interpreting the Results
The stec_report.tsv
file contains the main results of the analysis. Each row corresponds to a query sequence, and the columns provide information about the identified nucleotide and amino acid alleles and their sequence types.
Troubleshooting
If you encounter any issues while running the script, check the following:
- Ensure all input files are in the correct format and location.
- Make sure you have the necessary permissions to read the input files and write to the output directory.
- If the script fails with an error message, try to understand what the message is saying. Often, the error message provides clues about what went wrong.
Additional Resources
For more information about the script and its functionality, refer to the allele_find
documentation.