Table of Contents
Find Alleles from Amino Query Files
This script performs BLAST analyses on an amino acid database prepared by allele_find
against amino acid query sequences to find matching alleles. Updates allele database.
Inputs
- Amino acid query files in FASTA format. One query allele per file. Note that the allele naming scheme must match the outputs from the previous scripts
- Amino acid allele database prepared by
allele_translate_reduce
orallele_find
Running the Script
stec.py aa_allele_find --aa_alleles /path/to/aa_allele_folder -r /path/to/output_folder -q /path/to/query_folder
An example with the amino acid alleles in the aa_alleles
folder, the amino acid query files in the query
folder, the desired reports folder reports
, and a cutoff value of 100
(all in the current working directory):
stec.py aa_allele_find --aa_alleles aa_alleles -q query -r reports -c 100
Usage
usage: stec.py aa_allele_find [-h] [-version] [-v verbosity]
[--aa_alleles aa_alleles] [-r report_path]
[-q query_path] [-c cutoff]
Analyse amino acid sequences to determine allele complement. Update profiles and databases. Keep notes
optional arguments:
-h, --help show this help message and exit
-version, --version show program's version number and exit
-v verbosity, --verbosity verbosity
Set the logging level. Options are debug, info, warning, error, and critical. Default is info.
--aa_alleles aa_alleles
Specify name and path of folder containing amino acid alleles. If not provided, the aa_allele folder in the current working directory will be used by default
-r report_path, --report_path report_path
Specify name and path of folder into which reports are to be placed. If not provided, the reports folder in the current working directory will be used
-q query_path, --query_path query_path
Specify name and path of folder containing query files in FASTA format. If not provided, the query folder in the current working directory will be used
-c cutoff, --cutoff cutoff
Specify the percent identity cutoff for matches. Allowed values are between 90 and 100. Default is 100
Outputs
All amino acid allele files will be updated
reports
allele_report.tsv
: TSV file containing results for each query. Includes sample name, matching allele(s), notesgene_name_filtered_alleles.fasta
: FASTA-formatted text file containing all filtered query allelesgene_name_novel_alleles.fasta
: FASTA-formatted text file containing all novel alleles