DADA2 outputs extremely short rep-seqs


(Omer) #1

Hey all,
I’m running qiime on fungal ITS2 amplicons samples.
I pre-merge the reads with PEAR (I know it is not advisable, but it prevents massive data loss), trim the primers with cutadapt and then pass them to the qiime pipeline.
Cutadapt discards reads that are shorter than 80 bp, and no truncation \ trimming is applied with DADA2, and still it results in having few very short rep-seqs (less than 50 bp, min 20-30 bp).

What could be the cause of this issue?

Thanks!


(Matthew Ryan Dillon) #2

DADA2 is not designed to work on pre-joined reads — this violates assumptions for the effective use of this algorithm. There are many posts floating around this forum describing this (as well as in the DADA2 paper and docs), so you should be able to get your hands on more detailed material. I don’t really understand how you can wind up with such short reads given the steps you have described. Have you tried running deblur? That algorithm isn’t impacted by pre-joined reads. Otherwise, you could try OTU clustering.


(Omer) #3

I know it’s against recommendations; it is done in practice, as far as I understand from topics here and in the DADA2 forum, since many reads are saved from being discarded.

ITS2 amplicons vary in length (and tend to have lower quality than usual) so deblur can’t rescue me here.

Regarding vsearch, as far as I understand, it gets single fasta files as input? How can I process my entire sample altogether?

Thanks


(Matthew Ryan Dillon) #4

(Matthew Ryan Dillon) #5

I don’t agree with that — using the pre-joined reads in DADA2 causes problems because of the quality scores at the overlapping nts — what do those quality scores even mean after PEAR has gotten through with it?

Ah bummer.

Not quite — you need at least a FeatureTable[Frequency] artifact and a FeatureData[Sequence] artifact, and, if you want to do open or closed ref clustering, a reference database (FeatureData[Sequence]).

You can dereplicate your SampleData[JoinedSequencesWithQuality] to get the FeatureTable[Frequency] and FeatureData[Sequence]. Keep us posted! :t_rex:


(Matthew Ryan Dillon) #6

(Omer) #7

Not quite — you need at least a FeatureTable[Frequency] artifact and a FeatureData[Sequence] artifact, and, if you want to do open or closed ref clustering, a reference database ( FeatureData[Sequence] ).

You can dereplicate your SampleData[JoinedSequencesWithQuality] to get the FeatureTable[Frequency] and FeatureData[Sequence] . Keep us posted! :t_rex:

Well, I did as you advised, and after dereplicating my samples I ran qiime vsearch cluster-features-open-reference more than 3 days ago and it hasn’t finished yet!
I let it use 3 threads (I have 4 cores processor) and currently qiime is comsuming only one core but ~15 GB RAM. Is that reasonable?

Thanks!


(Nicholas Bokulich) #8

(Matthew Ryan Dillon) #9

Hey there @omera.WIS!

I am not sure! I am not the developer of vsearch, so my experience is limited to just what I have done with it in relation to q2-vsearch, but I suppose this doesn’t surprise me. Let it keep going, hopefully it’ll be done soon!

:t_rex: :qiime2:


(Matthew Ryan Dillon) #10

(system) #11

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.