I used my 16s rRNA paired end reads to compare the joining-paried-end-reads methods,including QIIME2(vsearch), QIIME1.9(fast-join), QIIME1.9( SeqPrep) and usearch10 . All the parameters were default. The results shows below:
No. of raw data QIIME2(vsearch) QIIME1.9(fast-join) QIIME1.9(SeqPrep) usearch10
sample1 31487 7883 20566 28966 1689
sample2 43406 11435 29377 39957 2458
sample3 33070 8937 22612 30736 2226
Why did the results show high variance among these methods? which method should I choose to analysis my data?
Thanks!
Probably because the default parameters and algorithms are very different between these methods.
The high variability makes me think something else is going on.
Did you perform any kind of read trimming or quality filtering prior to joining?
You should evaluate the joining, too, if you are concerned that they may be joining differently. Look at read length distributions and compare those to your expected amplicon length distribution (keep in mind that some level of length variation exists for 16S, so there is a distribution).
You could also process all of these through QIIME2 to see if, e.g,. you get more unclassified reads from the qiime1 methods (implying that these are bad joins). (as an alternative, use vsearch or another global aligner to see how well these align to full-length 16S reads).
If length distribution and taxonomic abundances/alignment look okay, I'd use the method that gives me the most reads!
I think there are several other threads in the QIIME 1 forum. But these should help get you started. You can browse through mergepairs parameters for usearch. Also check out this post on why there may be differences between vsearch and usearch.
In brief, I try to trim off the low-quality tails prior to joining the reads. This will reduce the number of mismatches, as if there are to many the merge will fail. This is one reason why DADA2 takes this approach, as outlined here. Also, it is generally a good idea to remove primers from each of the reads prior to merging, especially if you get read-through, which can also cause mismatches during merging. In which case you may need to check if both of your primers appear in each read separately.
Thanks!
I tried to change the parameters to keep them same. QIIME1.9(fast-join and SeqSrep) and usearch10 got similar results, vsearch still howed high variance. I removed the primers for the data piior to join, and dindt do quality filtering.
Thank you!
I tested to trim off the low-quality base using parameter ‘–p-truncqual’ in QIIME2 and ‘-fastq_trunctail’ in usearch10. The results showed below:
The results are pretty similar in the Q=5-10 range. You did not test lower for usearch (default is obviously Q=5) but I'd expect similar.
read yield drops off in the low range because as @SoilRotifer described there is too much bad sequence there, preventing suitable alignment.
read yield drops off in the high range because you would be trimming off too much sequence and the reads are unlikely to overlap at all since there are not overlapping tails!
Higher is not corresponding between usearch/vsearch — this really comes down to differences between those algorithms (QIIME 2 is just wrapping vsearch here, not doing anything special), which as far as I know are supposed to be very similar (perhaps not for read joining, though).
Without seeing other evaluation evidence (length distribution, classification/alignment to reference) I would actually trust the vsearch yields more here, based on my explanation above. usearch might just be gluing together two reads that don't actually overlap — there may be different minimum overlap parameters or something along those lines.
No, probably not. I think this still looks like different parameters and/or possibly differences in the algorithms.
In any case (again, lacking evals like read length and alignment to reference), joining with --p-truncqual in the Q=5-10 range looks pretty good! Looks like you've done a good evaluation to figure out what works best for your data!
As @Nicholas_Bokulich mentioned, QIIME 2 is simply wrapping vsearch. You may have missed that I had updated my original post with this additional caveat:
This should help to answer your questions about usearch vs vsearch merging.
Thanks.
I have got most of the points ablout the question. However, I used the same data and the same parameters to test QIIME2(vsearch) and usearch. In QIIME2(vsearch), when the Q increased(except 0 -1), the reads yield drop off. In usearch, when the Q increased, the reads yield increased. I thought the only reason is the algorithm?
Probably yes. As I explained above (or at least predicted, since I have not seen read length distribution results to evaluate):
So check out the length distributions after joining. I suspect usearch may be gluing together two non-overlapping reads. The vsearch results make more sense to me, because aggressive Q-score trimming will lead to shorter sequences and lower likelihood of successful joining.