Hi,
I am trying to use (nextflow) ampliseq pipeline (https://github.com/nf-core/ampliseq ) with the multi region option which use sidle. For the taxonomy step, sidle can use 2 readily databases (silva or GreenGenes2). Unfortunately, those databases have only 16S. I am working with ITS plant, rbcl plant, 12S bird and 12S mammal samples.
I would like to know if there is other available databases for sidle which could fit with my samples (ITS or Rbcl or 12S) ?
If this doesn't exist, could you advise me on how to make a customized database that works with sidle ?
Thank you very much for your kind and useful answer !
On your advice, I used Unite with my ampliseq pipeline. Unfortunatly, the taxonomy is wrong (I got Cercozoa sp. instead of plants). I blast on ncbi the ASV sequences and it matches with my plants. So I think there is maybe a problem with the Unite database I used. I will check this out.
Currently, I have 5 different amplicons. ITS and Rbcl which are made separatly (simplex PCR) and 12S + 2 16S which are made together (multiplex PCR). I guess that I will need to create 3 different databases ( one ITS, one Rbcl and one 12S + 16S + 16S). I'll take a closer look at how RESCRIPt works and keep you posted on my progress !
I am also interested in running nf-core/ampliseq for taxonomic assignments with multi-region ITS amplicons using SIDLE. For custom database, nf-core/ampliseq (v2.11.0) requires three input files - fasta, aligned fasta, and taxonomy. UNITE database 10.0 (QIIME release) includes fasta and taxonomy files, but no aligned fasta. How did you manage to create the missing aligned fasta?
The aligned file is required for tree construction. AFAIK, there isn't an insertion backbone for UNITE right now. So, you shoudl be able to get away with just the sequence and taxonomy, with the clear caveat of no phylogeny.
Thank you for the clarification! I can't comment on the phylogeny of fungi, as I have no prior experience with fungi.
I modified the nf-core/ampliseq pipeline to bypass the requirement for an aligned file. I am targeting the ITS1 and ITS2 regions. Unfortunately, the feature classifier for the ITS2 region returned 'No matches found':
Caused by:
Process `NFCORE_AMPLISEQ:AMPLISEQ:SIDLE_WF:SIDLE_DBEXTRACT (ITS2,100)` terminated with an error exit status (1)
Command executed:
# https://q2-sidle.readthedocs.io/en/latest/database_preparation.html#prepare-a-regional-database-for-each-primer-set
export XDG_CONFIG_HOME="./xdgconfig"
export MPLCONFIGDIR="./mplconfigdir"
export NUMBA_CACHE_DIR="./numbacache"
#extract sequences
qiime feature-classifier extract-reads \
--p-n-jobs 6 \
--i-sequences db_filtered_sequences.qza \
--p-identity 2 \
--p-f-primer AACTTTYRRCAAYGGATCWCT \
--p-r-primer AGCCTCCGCTTATTGATATGCTTAART \
--o-reads db_ITS2.qza
#prepare to be used in alignment
qiime sidle prepare-extracted-region \
--p-n-workers 6 \
--i-sequences db_ITS2.qza \
--p-region "ITS2" \
--p-fwd-primer AACTTTYRRCAAYGGATCWCT \
--p-rev-primer AGCCTCCGCTTATTGATATGCTTAART \
--p-trim-length 100 \
--o-collapsed-kmers db_ITS2_100_kmers.qza \
--o-kmer-map db_ITS2_100_map.qza
cat <<-END_VERSIONS > versions.yml
"NFCORE_AMPLISEQ:AMPLISEQ:SIDLE_WF:SIDLE_DBEXTRACT":
qiime2: $( qiime --version | sed '1!d;s/.* //' )
qiime2 plugin sidle: $( qiime sidle --version | sed 's/ (.*//' | sed 's/.*version //' )
q2-sidle: $( qiime sidle --version | sed 's/.*version //' | sed 's/)//' )
END_VERSIONS
Command exit status:
1
Command output:
(empty)
Command error:
QIIME is caching your current deployment for improved performance. This may take a few moments and should only happen once per deployment.
Plugin error from feature-classifier:
No matches found
Debug info has been saved to /tmp/qiime2-q2cli-err-hpf5_hoc.log
Analyzing the ITS1 and ITS2 regions independently using the nf-core/ampliseq pipeline (configured with the default single-region setup) works fine.
So, I will caveat this with the fact that I've not run with nextflow, and tend ot run sidle locally. So, if its a nextflow issue, we may nee dot see that.
Would it be possible to share that full output log file so we can check it?
I'm not sure why the command would fail serially but not int he nextflow workflow. I have some other (potentially stupid) ideas that need more testing but wouldn't be implemented in nf-core.
Traceback (most recent call last):
File "/opt/conda/envs/sidle-0.1.0-beta/lib/python3.8/site-packages/q2cli/commands.py", line 329, in __call__
results = action(**arguments)
File "<decorator-gen-119>", line 2, in extract_reads
File "/opt/conda/envs/sidle-0.1.0-beta/lib/python3.8/site-packages/qiime2/sdk/action.py", line 244, in bound_callable
outputs = self._callable_executor_(scope, callable_args,
File "/opt/conda/envs/sidle-0.1.0-beta/lib/python3.8/site-packages/qiime2/sdk/action.py", line 390, in _callable_executor_
output_views = self._callable(**view_args)
File "/opt/conda/envs/sidle-0.1.0-beta/lib/python3.8/site-packages/q2_feature_classifier/_cutter.py", line 215, in extract_reads
raise RuntimeError("No matches found")
RuntimeError: No matches found
The error message seems clear enough: you do not have any sequences that contain both primers.
You should spot-check a few to make sure... issues with orientation etc could always be involved (though I think this action checks both orientations)
More likely issue: UNITE has a "developer" version (with untrimmed seqs) and a regular version (trimmed to the ITS domain). The primers sit outside of the ITS domain proper, in the conserved SSU or 5.8S or LSU domains. You are probably using the regular version, hence trimmed and hence no hits.