Reads processing with different primers

Nicholas_Bokulich · January 30, 2018, 3:26pm

Hi @Lu_Yang,
You have a few different options here:

@Mehrbod_Estaki's suggestion is excellent (thank you for the suggestion!) — comparing datasets amplified with different primers is indeed what q2-fragment-insertion was designed for (to my knowledge). I would think that plugin would be most advantageous, however, when comparing datasets with non-overlapping amplicons. So you have other options.
The process that you describe — trimming the longer reads to 515f-806r — is absolutely okay. Of course you lose the additional sequence information but it sounds like that is not important here. You would trim the reads, process separately by dada2/deblur, then merge together into a single feature table and single representative sequence file. Then classify taxonomy on the merged sequences.
You could process separately (without trimming), then classify taxonomy separately. Then collapse both feature tables on level 7 taxonomy, and merge those tables. Then all downstream analyses (e.g., diversity analyses) would be based on taxonomic information and the precise primer site does not matter too much. However, this is definitely the weakest option of the three, because (1) the longer reads may yield deeper taxonomic assignments, so collapsing on species level may still yield very different profiles between the different datasets and (2) collapsing on taxonomy reduces the amount of information you have — i.e., diversity analyses with ASVs are much more sensitive for differentiating sub-groups within your data. This approach would really only be most advantageous when trying to merge datasets from very different amplicons (e.g., 16S rRNA genes and a protein-coding gene).

I hope that helps!