DADA2 outputs extremely short rep-seqs

omera.WIS · November 26, 2018, 6:05pm

Hey all,
I’m running qiime on fungal ITS2 amplicons samples.
I pre-merge the reads with PEAR (I know it is not advisable, but it prevents massive data loss), trim the primers with cutadapt and then pass them to the qiime pipeline.
Cutadapt discards reads that are shorter than 80 bp, and no truncation \ trimming is applied with DADA2, and still it results in having few very short rep-seqs (less than 50 bp, min 20-30 bp).

What could be the cause of this issue?

Thanks!

thermokarst · November 27, 2018, 2:00pm

DADA2 is not designed to work on pre-joined reads — this violates assumptions for the effective use of this algorithm. There are many posts floating around this forum describing this (as well as in the DADA2 paper and docs), so you should be able to get your hands on more detailed material. I don’t really understand how you can wind up with such short reads given the steps you have described. Have you tried running deblur? That algorithm isn’t impacted by pre-joined reads. Otherwise, you could try OTU clustering.

omera.WIS · November 29, 2018, 1:40pm

I know it’s against recommendations; it is done in practice, as far as I understand from topics here and in the DADA2 forum, since many reads are saved from being discarded.

ITS2 amplicons vary in length (and tend to have lower quality than usual) so deblur can’t rescue me here.

Regarding vsearch, as far as I understand, it gets single fasta files as input? How can I process my entire sample altogether?

Thanks

thermokarst · November 29, 2018, 10:04pm

I don't agree with that --- using the pre-joined reads in DADA2 causes problems because of the quality scores at the overlapping nts --- what do those quality scores even mean after PEAR has gotten through with it?

Ah bummer.

Not quite --- you need at least a FeatureTable[Frequency] artifact and a FeatureData[Sequence] artifact, and, if you want to do open or closed ref clustering, a reference database (FeatureData[Sequence]).

You can dereplicate your SampleData[JoinedSequencesWithQuality] to get the FeatureTable[Frequency] and FeatureData[Sequence]. Keep us posted!

omera.WIS · December 13, 2018, 10:43am

Not quite — you need at least a FeatureTable[Frequency] artifact and a FeatureData[Sequence] artifact, and, if you want to do open or closed ref clustering, a reference database ( FeatureData[Sequence] ).

You can dereplicate your SampleData[JoinedSequencesWithQuality] to get the FeatureTable[Frequency] and FeatureData[Sequence] . Keep us posted!

Well, I did as you advised, and after dereplicating my samples I ran qiime vsearch cluster-features-open-reference more than 3 days ago and it hasn't finished yet!
I let it use 3 threads (I have 4 cores processor) and currently qiime is comsuming only one core but ~15 GB RAM. Is that reasonable?

Thanks!

thermokarst · December 14, 2018, 2:09am

Hey there @omera.WIS!

I am not sure! I am not the developer of vsearch, so my experience is limited to just what I have done with it in relation to q2-vsearch, but I suppose this doesn't surprise me. Let it keep going, hopefully it'll be done soon!

system · January 14, 2019, 8:09am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.