Different results between QIIME2 and other platforms

Humberto_Aponte · November 24, 2020, 3:32pm

Hi QIIME2 community,

I ask for help since I have a very strange situation with my data.

The context: my results came from Illumina of 16S for bacteria communities in soil, I processed this data with QIIME2 and I got similar results when comparing with outputs obtained by the sequencing company that also processed the data in QIIME2.

The problem: for this study, an expert collaborator ran the bioinformatic analysis in Unix (Ubuntu) and we realized that our results are the opposite, in summary, we got the opposite trends for alpha diversity! also, in QIIME2 I got half of the sequences than the collaborator. I thought in wrong labels, but that's not the problem, so, has anyone experienced something like this?

I realized that: 1) I obtained approx. 7000 features after denoising and he got approx. 14000; 2) we got similar trends until the rarefaction, after that the outputs show the opposite results; he rarefies in R, I made all the analysis in QIIME2.

I attach part of my script in case someone wants to check it.

Thanks!!!

Script_QIIME2_forum.txt (5.4 KB)

jwdebelius · November 24, 2020, 3:55pm

Hi @Humberto_Aponte,

Without seeing the two work flows with parameters side by side, it's hard to diagnosis where things may different. So, did you both denoise with DADA2? Did you use the same trimming parameters? How did you pick rarefaction depths? What were the pre-filtering parameters? Like, all these things will make a difference in the final community and so without doing an in-depth comparison on how the data was processed for each dataset, its hard to figure out why the result differ.

Best,
Justine

Humberto_Aponte · November 24, 2020, 5:00pm

Hi @jwdebelius,

Yes, there are many details, sorry if I showed an unclear case. I answered your question below.

did you use the same trimming parameters?

No, I think. I explained to the collaborator the QIIME2 suggestions about the trimming parameters, but he does not apply a fixed-length cut-off but, he finds matches to the actual primers. Also, he truncates from the 3’-end by defining a minimum Q-value, so, I do not know what parameters he used.

did you both denoise with DADA2?

No, I denoised with DADA2 in QIIME2 but the collaborator made it in Unix as follows: primer trimming (cutadapt), paired-end read merging, quality filtering, and chimera removal (vseach). He does not use DADA2.

How did you pick rarefaction depths?

I followed the QIIME2 recommendations based on quality plots, I choose the highest possible value to avoid losing samples. The collaborator performs rarefaction in R based on 100 iterations but, as I can see in his script, he does not use a "sampling depth". As I understand from his R script, he obtains several rarefaction tables, which are merged to get median values; however, I am not sure about it.

What were the pre-filtering parameters?

In both cases, we filtered organelles and Eukarya. Also, I filtered by frequency (min = 10) and rare features (present in min = 2 samples). The collaborator applied the following filters:

threshold_rare = 0.001 # minimum percent abundance of ASV in whole dataset
threshold_sparse = 2 # minimum number of samples where ASV occurs
threshold_outlier = 0.1 # ratio from second most abundant to most abundant sample for a given ASV

As you can see, there are many differences, but, even so, we expected similar trends, and the most concerning is the literal opposite trend. Thus, as I don't understand his bioinformatic pipeline well, I attached my QIIME2 pipeline at least to check if my process is ok.

Many thanks for help!! and sorry for the long explanation.

Regards!

jwdebelius · November 24, 2020, 7:54pm

Hi @Humberto_Aponte,

Thanks for the detailed description! You're not comparing the same data processing or diversity approaches. I might look at primer trimming using the sequences if I were you, on the off chance there is variation due to length there. The quality filtering without denoising or clustering may inflate the number of sequences because it's not able to distinguish error from true signal - which is the promise of the denoising algorithms. It likely also left a lot more singletons/doubletons, which DADA2 removes.

Best,
Justine

Humberto_Aponte · November 26, 2020, 1:34pm

Hi @jwdebelius,

Many thanks for your answer. I also removed primers with cutadapt using the sequences but differences are still there. I will try several parameters, I will let you know if I solve the mystery. Many thanks for your help.

Regards,

system · December 27, 2020, 7:35pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.