DADA2 pairwise alignments parameter tuning

salias · May 31, 2024, 10:58am

Hi again,

I'm doing the benchmarking and facing a problem. I'm using the mock ITS community of Bakker, 2018 (3 replicates of a Even mock community, 3 replicates of a Staggered A and 3 replicates of a Staggered B, Standard PCR conditions). I manually created a TSV (attached, expected-taxonomy.tsv) for the expected taxonomic composition using tables of the paper and the taxonomy as in UNITE last version (sh_general_release_dynamic_s_all_04.04.2024.fasta). I could annotate until species level in all species except for one (Candida apicola), because I did not find it in the UNITE file and for which I only put information until g__Candida level. I converted TSV to BIOM and BIOM to QZA following this part of the fungal ITS tutorial:

Fungal ITS analysis tutorial

Convert to biom and import

biom convert \
  -i expected-taxonomy-mod.tsv \
  -o expected-taxonomy.biom \
  --table-type="OTU table" \
  --to-json
qiime tools import \
 --type FeatureTable[RelativeFrequency] \
 --input-path expected-taxonomy.biom \
 --input-format BIOMV100Format \
 --output-path expected-taxonomy.qza

Then I used the DADA2 ITS tutorial (the only difference with the general DADA2 tutorial is the use of Cutadapt to remove primers and their reverse complements to prevent read-through). Once I ran Cutadapt and DADA2, I assigned taxonomy direcly with the native Naive Bayes implementation of DADA2. For that I used the exact same FASTA file I used for manually creating the expected composition (sh_general_release_dynamic_s_all_04.04.2024.fasta). Then I exported the ASV table as BIOM and the taxonomy as a text file, and converted both to QZA format following instructions in the tutorial Importing dada2 and Phyloseq objects to QIIME 2.

Then I followed the rest of the Fungal ITS analysis tutorial: I collapsed the table to the species level, and converted it to relative frequencies. But the evaluate-composition step keeps failing:

Plugin error from quality-control:

  min() arg is an empty sequence

Looking for someone facing the same error in the forum, I found e.g. here that they got the error because the expected composition table was not correctly built, and also because they used a different database for the observed and expected. However, I think my expected table is correctly built, and the database I used for the taxonomic assignation and for building the expected table is the same. I don't know where I am failing here. Any ideas? Thanks in advance

Best
expected-taxonomy.qza (8.8 KB)
expected-taxonomy.tsv (4.6 KB)
feature-table-relative.qza (27.1 KB)