How to create a feature table with qiime2 for PICRUST with the taxonomic assignment?

Hi everybody

I’m just working with qiime2 to get the feature_table.biom and work in PICRUSt with it. But this feature table hasn´t the taxonomic assignment. How I can do it? It is possible?

Best regards,

Hi @Jibda,
There is no official q2-picrust plug-in (yet?) but see this discussion about some work-arounds that people have used. There’s also a really good github link in there which shows by example how you can go from dada2 to PICRUSt outside of qiime2.
Hope that gets you started at least!

1 Like

Hi @Jibda! To add to @Mehrbod_Estaki’s suggestions: Check out this forum topic for instructions on how to export the .biom file and taxonomy info from QIIME 2, and then add the taxonomy info to the .biom file for use with other software (e.g. PICRUSt).


Hi, @jairideout. The info in the “this forum topic” was very clear. I performanced the join of the featuretable.biom with the taxonomy but I have still a problem. The taxonomy file result is a table with the taxonomic asignment but this hasn´t the green genes ID. I tried the --i-reference-taxonomy with the 97_otu_taxonomy.qza (the same file of 97_otu_taxonomy.txt from greengenes database) and with the -i-classifier gg-13-8-99-515-806-nb-classifier.qza available in the qiime2 page for the examples. But the mix of my featuretable.biom and the 2 options of taxonomy files generate a file without the greengenes ID, so I can´t use the table for PICRUSt.


1 Like

Hi @Jibda! Here are the general steps you should take to create a closed-reference feature table in QIIME 2 for use with PICRUSt:

  1. PICRUSt expects you to perform closed-reference OTU picking using the exact version of Greengenes that PICRUSt was trained against. There are ways to retrain PICRUSt using different reference databases, but that’s outside the scope of what I can help you with here; see the PICRUSt retraining docs for details.

    Since PICRUSt was trained against Greengenes version 13_5 (instead of 13_8, which is used in many of the QIIME 2 tutorials), you’ll need to download that reference database from the PICRUSt docs. Once you have the reference database FASTA files, import the reference sequences into a QIIME 2 artifact with type FeatureData[Sequence]. Use this artifact when performing closed-reference OTU picking with q2-vsearch (see Step 2 below). When choosing which reference sequences to import, you’ll probably want to use the 90% or 97% identity Greengenes reference sequences.

  2. Follow the steps in the q2-vsearch OTU picking Community Tutorial to create a closed-reference feature table. The feature table’s feature IDs will correspond to Greengenes OTU IDs.

    Note: the tutorial uses Greengenes version 13_8 85% identity reference sequences. Don’t use these reference sequences with your own closed-reference OTU picking analyses. Use the reference sequences you imported in Step 1 instead. Also make sure that you use the same percent identity corresponding to your reference sequences with the --p-perc-identity option supplied to qiime vsearch cluster-features-closed-reference.

  3. Once you have a closed-reference feature table, export the feature table to obtain a .biom file.

  4. Now that you have a .biom file containing PICRUSt-compatible Greengenes IDs, you can add the corresponding Greengenes taxonomic annotations using biom add-metadata. There is no need to import the Greengenes taxonomic annotations into QIIME 2 to perform taxonomy assignment; you can use the taxonomic annotations that are distributed with the Greengenes reference database you downloaded in Step 1.

Let us know how it goes!


I just did all the steps until get the .biom file for Picrust. I took that file and run it in PICRUSt. But, in one of the results files from PICRUSt I have not the taxonomic information. The OTUs have the greengenes ID but the name of the microorganism is missing.

I´ll appreciate if you can take a view of the scrips I run:

qiime tools import \

–type ‘SampleData[PairedEndSequencesWithQuality]’
–input-path TRIMMO_GZ/
–source-format CasavaOneEightSingleLanePerSampleDirFmt
–output-path 16S-paired-end.qza

qiime dada2 denoise-paired \

–i-demultiplexed-seqs 16S-paired-end.qza
–p-trunc-len-f 0
–p-trunc-len-r 0
–p-trim-left-f 0
–p-trim-left-r 0
–o-representative-sequences 16S-paired-end-rep-seqs-dada2.qza
–o-table table_16S-dada2.qza

I put 0 to the parameters in this step because I used TRIMMOMATIC for the reads cleaning.

qiime tools import \

–input-path /vault2/homehpc/jdcarrenoca/SEQ16S/SEQ_16S/TRIMMOMATIC/97_otus.fasta
–output-path gg_13_5_otu_97.qza
–type ‘FeatureData[Sequence]’

qiime tools import \

–type FeatureData[Taxonomy]
–input-path 97_otu_taxonomy.txt
–source-format HeaderlessTSVTaxonomyFormat
–output-path 97_otu_taxonomy.qza

qiime vsearch cluster-features-closed-reference \

–i-table table_16S-dada2.qza
–i-sequences 16S-paired-end-rep-seqs-dada2.qza
–i-reference-sequences gg_13_5_otu_97.qza
–p-perc-identity 0.97
–o-clustered-table table-for-picrust-97.qza
–o-unmatched-sequences unmatched.qza

qiime tools export \

–output-dir exported-table-for-picrust

and the exported-table-for-picrust has the feature-table.biom in.

So, I start to work with the feature-table for PICRUSt.
But, in the ko_metagenomes_contributions file, the taxonomic information is missing.
ko_metagenome_contributions.txt (50.6 KB)

qiime feature-classifier classify-consensus-vsearch
–i-query 16S-paired-end-rep-seqs-dada2.qza
–i-reference-reads gg_13_5_otu_97.qza
–i-reference-taxonomy 97_otu_taxonomy.qza
–o-classification taxonomy_completed.qza

I tried to make the taxonomy assignation file and join it to the .biom, but I get the same result for the ko_contributions_table with the new .biom file with taxonomy.
I have a huge doubt: The “feature ID” of OTU that qiime2 assign in the taxonomic or the outputs of the dada2 plugin is a RDP ID or is a random ID that qiime assigns?. taxonomywithmyclassifier.qzv (1.2 MB)

Hope someone can help :slight_smile: ) !!!

1 Like

Thanks for the details @Jibda!

Since you performed your own method of sequence quality control with Trimmomatic, I’d recommend skipping DADA2 since DADA2 will work best with reads that haven’t been quality controlled yet (DADA2 has its own denoising algorithm and will trim and join reads for you). If you’re wanting to use DADA2, I’d recommend just importing the raw paired-end sequences, making sure that barcodes, primers, and any other sequencing artifacts are removed beforehand.

If you decide to skip DADA2 and use the reads that have been quality-controlled with Trimmomatic, you can use qiime vsearch dereplicate-sequences to dereplicate the reads and produce a FeatureTable[Frequency], which can then be used with qiime vsearch cluster-features-closed-reference (see the q2-vsearch OTU picking tutorial for details).

Before you dereplicate your sequences, you have a couple of options:

  • Use only the forward reads when dereplicating. In this case, import your data as SampleData[SequencesWithQuality].

  • Join the reads, either using qiime vsearch join-pairs (see the read-joining tutorial for details), or an external read-joining program. If you use an external program to join reads, import your joined reads as SampleData[JoinedSequencesWithQuality].

Are these reference sequences and taxonomy annotations the exact files required for PICRUSt analyses (i.e. the gg_13_5_otus.tar.gz downloaded from the PICRUSt docs)? I can’t tell from the file paths whether those are the right files, or whether modifications have been made to them (I’d avoid making any modifications to the Greengenes database that PICRUSt is expecting).

Also, you don’t need to import the Greengenes taxonomy annotations into a QIIME 2 artifact. You can add the taxonomy annotations (97_otu_taxonomy.txt) directly to the exported .biom file (see below).

You’ll need to use biom add-metadata to add the Greengenes taxonomy annotations (97_otu_taxonomy.txt) to the exported .biom file (check out the biom add-metadata docs for details). You’ll need to add the following header line to 97_otu_taxonomy.txt in order for it to work (make sure the two fields are separated by a tab character):

#OTUID	taxonomy

Then use this command to add the taxonomy annotations to the .biom file as observation metadata (feel free to change the file paths to whatever makes sense for your analyses):

biom add-metadata -i exported-table-for-picrust/feature-table.biom -o table-with-taxonomy.biom --observation-metadata-fp 97_otu_taxonomy.txt --sc-separated taxonomy

After that you can use table-with-taxonomy.biom with PICRUSt!

This step isn’t necessary, see above for details about handling taxonomic annotations.

The feature IDs output by the DADA2 plugin are MD5 sums of each representative sequence. Once you use q2-vsearch to create a closed-reference feature table, the feature IDs will correspond to Greengenes IDs.

Let us know how this workflow works out for you!


@jairideout It worked!
I was able to join the table .biom with the taxonomy :slight_smile: I`m really thankful. You have been a great help! :slight_smile: I will be bothering you in the future.


This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.