Trimming data inside or outside qiime2?


Do you think is more valuable to trim paired-end reads before to start qiime2 analysis or not ? Or maybe run trim process inside qiime2 using cutadapt option ?

thank you –

I would say it depends on the size of the overlapping region, before everything. I don’t trim PE reads for quality because our protocol only outputs 30 overlapping nts so a 10 nt trimming (which are a few nts) could led to not merging the reads. On the other hand, for 50+ nts I would trim the reads inside or outside Qiime2, it depends on you.
Relative to primer-trimming, you should do that, but again: it’s up to you running cutadapt inside or outside Qiime2.
All the best

I have 2x150 bp demultiplexed paired-end libraries (80 total lib.) ~ 300 Go. Libraries contains Nextera XT adaptors. Do you think it’s necessary to trim adaptors before qiime2 analysis ?

Yes, I do. I’ve been trimming adaptors because they are so common place in a 16S sequence that it’s is complicated to account this part of the sequence when assigning taxonomy etc. Just to illustrate: I had these samples where Bacillus megaterium was about 25% but I knew we did not have it in there. We then investigated and found the classification was being deviated to Bmegaterium because of the primers. It was literally classifying because of the primers (first ~20 nts)! I then removed it and classification changed.

1 Like

Ok Leo, thank you for your reply. Just a question: do you have trimmed input files inside qiime2 pipeline with cutadapt or before using another trimmer such as trimmomatic, bbduk… ?

I am trimming the primers and low quality bases (when single-end) outside Qiime2 because that’s what I did even before working with Qiime2… so literally because I am comfortable with :slight_smile: But I use cutadapt for primer-cut and this tool is available in Qiime2; and trimmomatic for quality trimming (not sure now whether it is in Qiime2).

okay thank you, i use bbduk, it combines cutting adaptors/primers and quality trimming.
After trimming files do you use dada2 or deblur for denoising ?

I use Dada2 and made some tests with Vsearch already. They output very similar results so I sticked with Dada2 and that’s what I’ve been working with. However, Dada2 is pretty slow.

All right, that’s what I think I’m using too.
However, I don’t know how to link my metadata contained in the metadata.tsv file with my raw demultiplexed and cleaned data. I have already asked the question on the forum but they didn’t know. If ytou have an idea, tell me.

Hi @lmanchon,
@lca123 is right on the money that you should trim your primers and any other non-biological bits (5’ end) before denoising (Dada2 or deblur), and that how much you truncate from the other end (3’) depends on what your overlap region is. DADA2 does have an option to trim from both the 5’ or 3’ which is quite handy, so if you’re just looking to trim the primers you can do it all in one step. If you have other non-biological bps such as your adapters you’ll need to get rid of those before DADA2. In short get rid of everything you synthetically created, up to an including your primers. Cutadapt is a very powerful tool that lets you do all this, but certainly you can do this outside of qiime2 if you prefer for some reason. The benefit of doing it within qiime2 is that it is convenient and you will automatically keep track of exactly what you did with the provenance tab. Makes sharing and duplicating your workflow easier. I doubt there is going to be any appreciable difference between these trimming tools since the directive is pretty straightforward. Perhaps some time differences? :man_shrugging:
I will however note that if you are planning on using DADA2 for denoising, you shouldn’t do any prior quality filtering since DADA2 does this internally and in fact doing so may affect its error-model building step.

Not available in qiime2, only cutadapt.

This will be up to you and a matter of preference, perhaps this paper comparing these methods can help.
I will point out however that if you wanted to use deblur instead of DADA2, you’ll have to merge your paired-end reads prior first as deblur only works on forward reads and unlike DADA2 doesn’t do merging internally.
Lastly, I wouldn’t compare OTU clustering methods like vsearch with ASV methods such as DADA2/deblur, they are fundamentally too different in my opinion. Unless you have a reason to do so otherwise, I would stick with DADA2/deblur.


Is DADA2 step use metadata file or it’s only used in taxonomy step, because i don’t know how qiime2 incorporate the informations stored in my metadata file ?

Hey there @lmanchon, we have already answered this general question when you asked here:

Have you had a chance to read and practice the Moving Pictures tutorial? Besides the links we provided in the other topic, this tutorial demonstrates several different ways that metadata is incorporated throughout an analysis.


okay i’m going to view Moving Pictures tutorial.
thank you

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.