I'm trying to process 16S sequences derived from Tara oceans metagenomic assemblies (miTAGs) with qiime2 in order to best compare them with my own 16S marine amplicon data.
I was able to import the data as a 'FeatureData[Sequence]' because I'm dealing with .fna files, not .fastqs.
Now, I'm trying to run dada2 in order to create ASVs, but I'm coming across the problem that dada2 will only run on fastq files.
Is there any way around this, or is it impossible to make ASVs without quality score info? How does qiime2 typically handle metagenome-derived 16S sequences?
Here is the error message:
There were some problems with the command:
(1/3) Invalid value for "--i-demultiplexed-seqs": Expected an artifact of at
least type SampleData[PairedEndSequencesWithQuality]. An artifact of type
FeatureData[Sequence] was provided.
(2/3) Missing option "--p-trunc-len-f".
(3/3) Missing option "--p-trunc-len-r".
The first error here is letting you know that you a providing the wrong kind of data to this command. DADA2 needs sequences with quality scores, but you appear to be providing reads sans quality. Taking a step back, I'm not sure if running DADA2 on metagenome data is recommended, and if it is, what kinds of steps you might need to take. I suggest you take a close look at DADA2: Fast and accurate sample inference from amplicon data with single-nucleotide resolution for more information.
Finally, I noticed you're invoking q2cli via inline shell commands inside of a python script. While you can certainly do this, you can also "cut out the middleman" so to speak, and run QIIME 2 directly in python. Please see Artifact API — QIIME 2 2020.11.1 documentation for more details.