running qiime2 on metagenome-derived illumina mitags

bkolody · January 14, 2021, 2:49am

Hi all!

I'm trying to process 16S sequences derived from Tara oceans metagenomic assemblies (miTAGs) with qiime2 in order to best compare them with my own 16S marine amplicon data.

I was able to import the data as a 'FeatureData[Sequence]' because I'm dealing with .fna files, not .fastqs.

Now, I'm trying to run dada2 in order to create ASVs, but I'm coming across the problem that dada2 will only run on fastq files.

Is there any way around this, or is it impossible to make ASVs without quality score info? How does qiime2 typically handle metagenome-derived 16S sequences?

I'm running qiime2-2019.4 with conda.

Here is my command:

cat import_fnas.py
#!/usr/bin/env python
import os
os.system('source activate qiime2-2019.4 && qiime tools import
--input-path tara_mitags_formatted_for_q2.fna
--output-path tara_mitags.qza
--type 'FeatureData[Sequence]'')

Here is the error message:
There were some problems with the command:
(1/3) Invalid value for "--i-demultiplexed-seqs": Expected an artifact of at
least type SampleData[PairedEndSequencesWithQuality]. An artifact of type
FeatureData[Sequence] was provided.
(2/3) Missing option "--p-trunc-len-f".
(3/3) Missing option "--p-trunc-len-r".

Thanks!

bkolody · January 14, 2021, 6:39am

Sorry, just realized I included the importing script but not the dada2 command. It was:

cat call_mitag_asvs.py

#!/usr/bin/env python
import os
os.system('source activate qiime2-2019.4 && \
qiime dada2 denoise-paired \
--i-demultiplexed-seqs tara_mitags.qza \
--p-n-threads 28 \
--p-n-reads-learn 1000000 \
--p-chimera-method pooled \
--o-table tara_mitags_16S_table-dada2.qza \
--o-representative-sequences tara_mitags_16S_rep-seqs-dada2.qza \
--o-denoising-stats tara_mitags_16S_stats-dada2.qza')

thermokarst · January 14, 2021, 2:29pm

Thanks @bkolody!

A few notes for you:

The first error here is letting you know that you a providing the wrong kind of data to this command. DADA2 needs sequences with quality scores, but you appear to be providing reads sans quality. Taking a step back, I'm not sure if running DADA2 on metagenome data is recommended, and if it is, what kinds of steps you might need to take. I suggest you take a close look at DADA2: Fast and accurate sample inference from amplicon data with single-nucleotide resolution for more information.

These errors are letting you know that you have omitted required parameters for this command, and you script appears to confirm that. Please check out denoise-paired: Denoise and dereplicate paired-end sequences — QIIME 2 2020.11.1 documentation for more information.

Finally, I noticed you're invoking q2cli via inline shell commands inside of a python script. While you can certainly do this, you can also "cut out the middleman" so to speak, and run QIIME 2 directly in python. Please see Artifact API — QIIME 2 2020.11.1 documentation for more details.

:qiime2:

system · February 14, 2021, 8:29pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.