DADA2 vs Cutadapt

Mehrbod_Estaki · October 14, 2020, 8:43am

Hi @Rui,
Welcome to the forum! And thanks for providing such details of your issue.

I'm not remembering any discussions that favored one method over the other. Can you link to those threads so I can have a read, mainly because as far as I'm concerned there really isn't a preference over one or the other. The DADA2 trimming function is a convenience thing but is limited in functionality, so not all datasets can be handled with it. On the other hand cutadapt provides quite a bit of flexibility and additional features of removing primers/adaptors so let's see what works best for you. In general though, they should result in the same output as long as reads actually start with the primers and not something else like spacers or stacked primers etc.

First, when you include the --p-discard-untrimmed parameter you are discarding reads that don't have the primer in them. When you truncate with DADA2, you retain everything regardless of what your reads start with. I prefer discarding the reads that don't have the primers in them, because to me those are more likely to be junk reads anyways. But that is one difference between the two approaches you've done.

Trial 1 and Trial 4: The only difference here is that 1 nt you remove from the 5 in Trial 1. I'm not sure what the purpose here was but those 2 runs look very similar to me and the difference is negligible. I'd personally not trim that 1 nt.

Trial 1 and 3. The reason why you are seeing more reads from run 1 than 3 is because you are truncating more from the 3' of your reverse reads, which are generally poor in quality. By getting rid of those poor quality tails, you retain more during the filtering step (Table 3).

Trial 2 and 4: There's a couple of differences here. Your starting input is somewhat higher when you don't use cutadapt (Trial 2) meaning that there were a bunch of reads that didn't have primers in them but are going through DADA2. As I mentioned, these can be bad reads, or some sort of contaminants so may be artificially inflating your # of unique features. Another possibility is that there may be something else before your primers that is causing truncating by a fixed number (DADA2) to produce extra unique features, this isn't an issue when you trim with cutadapt, because this gets rids of everything before the primers along with it.

As you can see you aren't really comparing apples to apples here so its hard to say what is the exact source of differences, but my personal recommendations is to go with Trial 4. You could probably even increase your yield a bit more by truncating a bit more from your 3' in that run.

Here's a quick calculation:
806-515= 291 amplicon size
2x250 run - 291 amplicon = 209 overlap region
209-12 nt (min overlap for DADA2) + 20 extra nt (natural variation to be safe) = 177 max truncating

Based on this you can technically truncate up to 177 nts combined from your forward and reverse reads and still have have enough to merge your reads. Since these would be getting rid of poor quality tails, this should (from experience) lead to more reads being retained during the filtering step and ultimately more reads at the end.

Hope this helps.