Questions about 16S data from Novogene (UK)

colinvwood · July 20, 2023, 6:16pm

Since the read 1 and read 2 adapters can be found towards the 3' end of some F and R reads, I assume I should use the --p-adapter-f and --p-adapter-r parameters of qiime cutadapt trim-paired to remove them both from F and R reads (removing also all downstream bases).

Be careful when passing multiple trimming sequences to cutadapt at once, see this post for an example of why. In this case it shouldn't really matter because it would be exceedingly strange to see two adapters in a single read, but I would probably still do them separately or use the --p-times option.

Since the V3–V4 F primer is present in some of my F reads near the 5' end, but not exactly at the 5' end, and the V3–V4 R primer is present in some of my R reads near the 5' end, but not exactly at the 5' end, I would have thought it best to remove the primers and all upstream sequence (assuming that the sequence downstream of the primer might be biological sequence from the 16S rRNA gene).

Yes you are right. The upstream comment I made was referring to the adapters, which you only expect to see at the 3' end. At the 5' end you remove the preceding sequence not the subsequent sequence as the cutadapt help text says.

By the way I would run the primer trimming before the adapter trimming, not the other way around. This is because you expect the primers to be nested within the adapters (see the diagrams of the library fragments above), so trimming primers will in most cases also trim the adapters.

I was thinking of also doing some quality trimming at the 3' ends of the reads.

Those sequences do not look like they need any quality filtering. Even if the quality were lower towards the end of the sequences, you could parameterize the trimming positions to dada2, not dedicate an entire extra step to only quality filtering.

This is possible, but dada2 takes quality scores into account in its algorithm, so this would mostly be personal preference.

I also thought about adding a --p-minimum-length parameter (maybe --p-minimum-length 180 ) to remove very short reads. I seem to have a small number of very short reads (shortest F read length = 24 nt, shortest R read length = 31 nt) in my data. Is it worth removing those, or will q2-dada2 deal with them?

Dada2 will deal them by discarding all reads shorter than the truncation length you provide for each read direction.