I am attempting to use Qiime 2 (version 2020.11) to pass ASV information into Ghost Tree. I am following this tutorial here: Q2-ghost-tree Plugin: Community Tutorial for Creating Hybrid-Gene Phylogenetic Trees. I have successfully created my ASV table and representative sequence file using DADA2 and I am now trying to move forward with getting my ASVs ready for Ghost Tree. According to the tutorial, my next step is to dereplicate my ASVs using vsearch (they link the steps to this page here https://docs.qiime2.org/2018.8/tutorials/otu-clustering/). This is where I have issues. Vsearch’s dereplicate-sequences command requires a SampleData[Sequences] artifact which is not an output of DADA2. How can I extract the sequence data from my ASV output such that it will fit into vsearch’s parameters? I have tried running the closed-reference-clustering vsearch commands on my repset of ASV sequences (the next step) without the dereplication and it purges most of my sequences (I begin with ~7,000 ASVs and end with ~750 OTUs, as opposed to a UPARSE clustered dataset from the same sequences which yields ~4,000 OTUs). The tutorial mentions that moving forward without dereplication is problematic for vsearch and Ghost Tree. Where am I going wrong with the vsearch dereplication? Is there a way to obtain a seqs.fna file from my ASVs to use in the vsearch dereplication? Thank you,
As part of the denoising process, you perform a step equalivant to dereplication. (You collapse your duplicate sequences into a table, plus do some nice quality filtering and clean up.) So, your output is ready to go into vserach without an explicit depreplication step because it’s already been done.
Thank you for responding so quickly for me. In this case I should then be able to pass my DADA2 output (ASV table and rep set) directly into the cluster-features-closed-reference command and recluster my ASVs into 97% OTUs against a given database (I am using the UNITE version 7 databse for Ghost Tree analysis) correct? When I do this I am losing most of my expected OTUs (If Qiime2 and UPARSE (ver11) are approximately equitable I expect close to 4,000 OTUs and instead am getting ~750). Does this mean I am doing something wrong in that step? I need to recluster at 97% against this specific database in order to utilize the pre-built ghost tree for that database. Thank you,
Yes, your ASV table and repset should go directly into your closed reference command.
There are a handful of possible reasons/checks to do about your number of ASVs.
You’re losing a lot of counts because your ASVs aren’t clustering well against the reference. You can check this by comparing the table summaries and checking how many total counts remain after clustering
When you cluster the raw sequences, you have a lot more noise, resulting in somewhat more random hits against the reference
Clustering algorithms can be unstable based on a variety of factors and vsearch and usearch aren’t identical. They are close - close enough for most estimates.
If it were me, I would worry most about the first issue - if you’re losing counts or now. Different processing pipelines will give you a different number of features.