vsearch open-reference pipeline and phylogenetic tree


My commands as follow:

qiime tools import \
  --input-path $2/seqs.fna \
  --output-path $2/seqs.qza \
  --type 'SampleData[Sequences]'

# import reference database gg13_8 
qiime tools import \
  --input-path gg_13_8_otus/rep_set/97_otus.fasta \
  --output-path $2/97_otus.qza \
  --type 'FeatureData[Sequence]'

# dereplication 
mkdir -p $2/dereplicated
qiime vsearch dereplicate-sequences \
  --i-sequences $2/seqs.qza \
  --o-dereplicated-table $2/dereplicated/table.qza \
  --o-dereplicated-sequences $2/dereplicated/rep-seqs.qza

# clustering 97% OTU
mkdir -p $2/clustered 

# open reference 

qiime vsearch cluster-features-open-reference \
  --i-table $2/dereplicated/table.qza \
  --i-sequences $2/dereplicated/rep-seqs.qza \
  --i-reference-sequences $2/97_otus.qza \
  --p-perc-identity 0.97 \
  --o-clustered-table $2/clustered/table-or-97.qza \
  --o-clustered-sequences $2/clustered/rep-seqs-or-97.qza \
  --o-new-reference-sequences $2/clustered/new-ref-seqs-or-97.qza \
  --p-threads 48

# remove chimera - open reference 

# 1. run de novo chimera checking 
qiime vsearch uchime-denovo \
  --i-table $2/clustered/table-or-97.qza \
  --i-sequences $2/clustered/rep-seqs-or-97.qza \
  --output-dir $2/uchime-dn-out

# 2. visualize chimera result 

qiime metadata tabulate \
  --m-input-file $2/uchime-dn-out/stats.qza \
  --o-visualization $2/uchime-dn-out/stats.qzv

# 3. Exclude chimeras but retain “borderline chimeras

qiime feature-table filter-features \
  --i-table $2/clustered/table-or-97.qza \
  --m-metadata-file $2/uchime-dn-out/chimeras.qza \
  --p-exclude-ids \
  --o-filtered-table $2/uchime-dn-out/table-nonchimeric-w-borderline.qza

qiime feature-table filter-seqs \
  --i-data $2/clustered/rep-seqs-or-97.qza \
  --m-metadata-file $2/uchime-dn-out/chimeras.qza \
  --p-exclude-ids \
  --o-filtered-data $2/uchime-dn-out/rep-seqs-nonchimeric-w-borderline.qza

Q1: Finally, with 159 samples, I have 74219 OTUs. Is that too much?
I searched the forum with a similar question: Denoising vs OTU picking methods. In this post, he has 80000, which is too high, so i am thinking if vsearch open-reference pipeline reliable?

Q2: Can I do downstream diversity analysis by using greengenes’ trees?
I have read a related questions about how to build phylogenetic tree after using vsearch otu-clustering method, I searched the posts in forum with this similar question :

In this question, it mentioned that trees from greengenes database can be directly used if closed-reference is used. However in my case, I used open-reference, but I didn’t use the de novo seqs, can I directly use the trees from greengenes?

Thank you so much for helping!


Hi @Lu_Zhang!

Have you seen the OTU Clustering Tutorial?

Yes. I read the tutorial and I already got the feature table and representative sequence by following it. Now I wanna use this result to do downstream analysis like using unifrac distance to see beta diversity. However in this case I need to construct a tree first. Since even though I used the open reference I got some de novo seqs, but I only used the ones based on reference,it’s like close reference.(open reference is combination of de novo and close reference. The reason I used open reference is I am thinking maybe in sometime I might be interested in these de novo seqs and then I could go back to check these de novo seqs) So I am wondering if I can use the green genes its own tree. Thank you so much for your reply!

Hello @Lu_Zhang
Sorry for our late response. I want to preface this by saying that I am not 100% sure if I am answering your question. If I don’t answer your question, please let me know and I will try my best to give a better answer next time :grinning:
I think you might be able to use sepp-insertion and use the green genes reference tree if you want.
This is an article that talks about using open reference and creating phylogenies

Here is a qiime2 doc that has the green genes reference for sepp insertion
Thank you for your patience.
1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.