I am for the first time using Qiime2 to analyze ITS data (Illumina Miseq V3 - 300 bp paired-end reads) for fungal endophytes diversity in plant roots.
I received different folders from the sequencing facility with sequences in different stages that were pre-processed. I started the pipeline importing primer-clipped sequences -> trimming with Q2-ITSxpress -> dada2 to identify sequence variants (truncation length \ to 0 because the data quality was good).
At the end, I got the "taxa-bar-plots.qzv" file and for my surprise, the was very little diversity in my samples. The same pattern I have seen in a different data set that I analyzed. My question is: is there a chance that I am doing something wrong and ending up with this classification?
I know that I have some negative controls with contamination, but even though I should be able to get a more refined taxonomic assignment, right?
Off the top of my head, I'm not sure. I would recommend sharing the exact commands that you ran here as well as the actual .qzv. you mentioned if possible. That will make it much easier for somebody to spot-check your analysis.
Thank you for your reply. I tried with a different database and I got different results (see attached taxa-bar-plot.qzv) but still with many taxa unidentified. What i did different this time was to change the UNITE database to a more recent one (sh_qiime_release_s_04.02.2020.tar.gz).
But I'm also not sure on which version of UNITE I should use: the one that "Includes singletons set as RefS (in dynamic files)" or "Includes global and 97% singletons" or if that makes no difference.
Hi @Danilo_Reis ,
Two hypotheses for the low diversity: either (1) too many reads were lost during QC, or (2) your reads are hitting junk reads in the UNITE database, e.g., abnormally short seqs.
Troubleshooting/solutions:
look at your dada2 stats and feature table summaries, keeping an eye on if/where reads are lost. If you are losing many reads during merging, analyze single-end reads instead of paired-end.
Hi @Nicholas_Bokulich ,
Thank you so much for your reply. I think I figured out the problem. I tried two different UNITE databases: one including only fungal sequences and another with all eurkaryotes sequences.
Those sequences that were previously not assigned to any fungal phyla are actually plant sequences
I'm looking at root-associated fungal communities in 10 different plant species and this happened in some of them, especially those that are known to be less colonized by fungi.
I'll filter out the plant sequences and work only with the fungal ones. I just don't know whether this low number of fungal reads is enough to compare my samples. What do you think?