q2-phylogeny for vertebrates

Hello, I'm working with Illumina short reads of fish environmental DNA (using 12S MiFish primers). I've successfully generated a de novo phylogenetic tree of OTU clusters using qiime phylogeny fasttree after generating a Mafft alignment and masking noisy positions. However, I'd like to include some reference sequences either using the mitohelper qza formatted reference database or by adding some reference sequences to my rep_seqs.qza file. Is there a way to achieve this? Or is this best achieved outside of qiime2?

From my understanding, I won't be able to use q2-fragment-insertion because there isn't a SeppReferenceDatabase available or validated for the reference database I'm using, according to the qiime2 forum post here.

Thank you for any guidance.

Hi @DeannaB,

I think the easiest approach would be the following:

  1. First extract the amplicon region from the 12S-seqs-derep-uniq.qza, and save for later. Like this:
qiime feature-classifier extract-reads \
  --i-sequences 12S-seqs-derep-uniq.qza \
  --p-f-primer forward-primer-sequence \
  --p-r-primer reverse-primer-sequence \
  --p-trunc-len xx \
  --p-min-length yy \
  --p-max-length zz \
  --o-reads 12S-seqs-derep-extract-uniq.qza
  1. Generate a taxonomy visualization for the 12S-tax-derep-uniq.qza:
qiime metadata tabulate \
  --m-input-file 12S-tax-derep-uniq.qza \
  --o-visualization 12S-tax-derep-uniq.qzv
  1. Then using QIIME 2 View search (upper right of visualization) or scroll through the taxonomy presented in the 12S-tax-derep-uniq.qzv file. Write down a list of IDs that you'd like to use into a text file. We'll call it seq-ids-to-keep.txt. Using the following format:


  1. Then you can run the following command to extract those reference sequences and write them to file:
qiime feature-table filter-seqs \
     --i-data 12S-tax-derep-uniq.qza \
     --m-metadata-file seq-ids-to-keep.txt \
     --o-filtered-data 12S-tax-derep-uniq-subset.qza 
  1. Now you can simply merge your reference sequences (the amplicon region of the mitohelper database we extracted earlier) with your and OTUs/ESVs (here called my-otus.qza) with the merge command:
qiime feature-table merge-seqs \
	--i-data 12S-tax-derep-uniq-subset.qza  my-otus.qza \
	--o-merged-data merged-12S-refs-and-my-otus.qza
  1. Then you should be able to build your phylogeny how you'd like.


This is a great suggestion. Thank you for this.

I've added one additional step, which is to merge the FeatureData[Taxonomy] associated with my OTUs with the FeatureData[Taxonomy] associated with the reference database:

qiime feature-table merge-taxa --i-data rep_seqs_mitofish_blast_taxonomy.qza 12S-tax-derep-uniq.qza --o-merged-data rep_seqs_mitofish_blast_taxonomy_12S-tax-derep-uniq.qza

I couldn't find a way to filter the FeatureData[Taxonomy] artifact of the reference database (12S-tax-derep-uniq.qza) prior to merging. I did find this post on the topic.

However, in the end it doesn't seem to matter because I am importing the tree, feature-table, and taxonomy into a phyloseq object in R to draw a tree with the plot_tree function. And according to phyloseq documentation: "OTUs and samples are included in the combined object only if they are present in all components. For instance, extra “leaves” on the tree will be trimmed off when that tree is added to a phyloseq object."

If you have any suggestions on how to circumvent this problem, I'd love to know. The end goal is to draw a phylogenetic tree of my OTUs that contains some reference sequences (with taxonomic labels or accession numbers for reference sequences).

Hi @DeannaB ,
I just want to chime in to add to @SoilRotifer 's excellent advice.

One amendment to this step:

Making a text file of IDs should not be necessary — there should be programmatic ways of accomplishing the same thing, whether you want IDs that are in a feature table, or to filter based on IDs found in a taxonomy.

The RESCRIPt plugin (not currently installed as part of QIIME 2, but installation instructions are available on the forum) has an action to filter a taxonomy based on a list of IDs or search term. See this tutorial:

RESCRIPt, by the way, could also be used to programmatically download reference sequences and taxonomies directly from NCBI based on an entrez search query... so that could also be an option if you only want to grab a limited number of accessions vs. all of mitofish.

Good luck!

1 Like

@Nicholas_Bokulich thanks so much for these suggestions.

And I agree on your point above "there should be programmatic ways of accomplishing the same thing." For this particular reference database, I've used a python tool called mitohelper for this purpose (functions get record and get alignment).

I didn't know about these functionalities of the RESCRIPt plugin! Thank you for this information. This looks like another viable option to obtain references & associated taxonomies.


This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.