Possible Analysis Pipeline for Ion Torrent 16S Metagenomics Kit Data in QIIME2?

Nicholas_Bokulich · April 24, 2020, 6:49pm

I am not 100% certain of the semantics there, but I think the point was that reads should not be pre-joined if they are imported in that format..

Your reads are all forward or reverse because in this case F/R mean the read direction on the sequencing instrument, not the orientation respective to the genome (which is mixed in this case).

So you are doing the right thing, and this is the correct format.

dada can handle mixed-orientation reads (respective to the genome), that is not a problem technically speaking. But mixed F + R reads and pre-joined reads will cause issues.

So again you are doing things correctly.

The only issue I can think of for mixed-orientation reads and dada2 is that you will get unique ASVs for reads from the same genome that are in reverse orientations. But in theory that is not a dada2 problem, it is an alpha diversity problem! (as I think we've discussed above but this topic is so long I can't remember anymore).

The issue is that you do not need SEPP, and should not use SEPP here. When you use closed-reference OTU clustering, the features are no longer ASVs that need to be aligned/spliced into a new phylogeny. The features are now the matching reference OTUs, and you adopt the reference phylogeny.

See this issue for more details on why you should not use SEPP after closed-reference OTU picking:

So what should you (and everyone else who wants to use this pipeline) do instead? You should use the reference trees that ship with your reference database of choice (e.g., in your case use the greengenes 99% OTU reference tree since you used that same database for clustering with vsearch).

Thanks everyone! I feel like we're making a lot of progress!