An off-topic reply has been split into a new topic: installing Greengenes2 in a minimal environment
Please keep replies on-topic in the future.
An off-topic reply has been split into a new topic: installing Greengenes2 in a minimal environment
Please keep replies on-topic in the future.
Hello @wasade - Very helpful information! I have a couple of questions. I have paired-end human stool data (processed with dada2) that is good quality through 250 not.
Do you know if there is any advantage to using the single-end versus paired-end data (i.e. the filter-features versus non-v4-16s approach)? And you mention trimming to 150nt (in the Deblur/Dada2 section) - is that a recommendation for filter-features and/or non-v4-16s? Thanks!
Hi @m_s,
If the sequences were generated using 515F-806R EMP primers, then you could trim them to 150nt and filter-features
. If you'd prefer to keep the full length, then you'd need to use non-v4-16s
.
I'm unaware of literature that has independently benchmarked the various read stitching strategies. In my own analyses, I only use the fwd read from the EMP primers. Most of the taxonomic and phylogenetic signal is proximal to 515F as well, which is why studies like Yatsunenko et al 2012 Nature, which used 90 cycles if I recall correctly, still were quite exciting and compelling. In fact, quite a few of the analyses in the Thompson et al 2017 EMP paper were at 90nt too.
Best,
Daniel
Hi Daniel, thank you for this resource! Can you provide a brief instruction on how to use this database outside of QIIME? For instance, I'd prefer to use Kraken2 and I have both 16s and shotgun sequencing. I presume I need the 16s sequence database, the whole-genome sequence database, and the shared taxonomy, but I can't immediately tell which files these correspond to since there are many files in the FTP repository with similar descriptions.
Hi @John_McElderry,
For shotgun, we recommend using the Woltka toolkit. The genome identifiers in the database are relative to the Web of Life version 2. It is possible Kraken2 will work although we haven't evaluated that. The exact commands we use are buried in here; as an alternative, I would encourage considering depositing data into Qiita as that resource will take care of the compute.
Best,
Daniel