Good morning, I am a student and I am using Qiime2 in a project to determine the microbiome of protozoa.My tutors have not started using it yet so they are doing the same analysis as me but with Qiime1. I know that the results will not be comparable, but I have some doubts that I would like to understand better.
Talking first just about Qiime2, I have been provided with the sequences of my samples both joined and forward+reverse.It is not clear to me whether I should expect to get the same results (OTU clustering) by importing both types of sequences, in the first case using deblur for denoising, and in the second case using dada2 (with the same parameters).
For taxonomic assignment I am using classify-consensus-vsearch.I wanted to compare my results with those obtained by my colleagues using QIIME1. I have never used QIIME1 before, so I wanted to know which is the way to assign taxonomy to the samples, more similar to the
classify-consensus-vsearch, in QIIME1.
As I start from the unjoined sequences, I use Dada2 for the denoise step. After this step, I get the table of OTUs, but it is not clear to me how the clustering works. Are the OTUs formed with 99% similarity? I don’t quite understand if it is possible to choose the way clustering occurs. I mean, I have seen that there are the following commands:
qiime vsearch cluster-features-de-novo/open-reference/closed-reference
But I don’t know if they can be used after using dada2. I’m a bit lost on this, I think.
Another question we have is about the number of OTUs. When we compare it, it differs a lot between our analyses. With Qiime1 we get a number up to 4 times higher than using Qiime2. As I said, I use Dada2 and my colleagues, pick OTUs by combining open-reference and de novo. Hence also my question above, if I can somehow choose the way to generate the OTUs to be as similar as possible to my colleagues. I know that the results are not comparable, but it is very strange that the final number of OTUs is so different.
OTU picking is based upon clustering similar sequences (97% similarity) in a group but DADA2 and deblur is upon ASV technique in which even if single nucleotide changes between the sequence, it will create an unique id. I recommend to read this
Well you had already done that step, so I think it is not possible
I recommend you to read the DADA2 paper to find how the ASV is picked..!
I agree with @Sreevatshan that I generally prefer DADA2 to deblur is most cases. (There are a few places where deblur solves all my problems.) The one key for you is that if you plan to use the joined reads, you need to use deblur.
In QIIME 2, a naive baysian classsifer (classify sklearn) will be closer to the QIIME 1 implemenetation, if I remember correctly. I think you'd need to run against greengenes 13_8 to get the correct match.
This is a good explaination, thanks @Sreevatshan. I do think its worth noting that they're two different technqiues that bring you to the same end. ASV generation does not perform clustering, it's closer to a really good quality filtering approach (not strictly true, but probably most analogous). I think the docs are a good place to start. You might also find this video on denoising useful.
You can absloutely cluster denoised sequences. Whether you want to is a debate. (I'm generally int he ASV >>> OTU camp, but loks of people disagree.) It depends on what your goals are.
This is discussed in the benchmarking papers I linked above.