How to create a feature table with qiime2 for PICRUST with the taxonomic assignment?

jairideout · January 30, 2018, 8:07pm

Thanks for the details @Jibda!

Since you performed your own method of sequence quality control with Trimmomatic, I'd recommend skipping DADA2 since DADA2 will work best with reads that haven't been quality controlled yet (DADA2 has its own denoising algorithm and will trim and join reads for you). If you're wanting to use DADA2, I'd recommend just importing the raw paired-end sequences, making sure that barcodes, primers, and any other sequencing artifacts are removed beforehand.

If you decide to skip DADA2 and use the reads that have been quality-controlled with Trimmomatic, you can use qiime vsearch dereplicate-sequences to dereplicate the reads and produce a FeatureTable[Frequency], which can then be used with qiime vsearch cluster-features-closed-reference (see the q2-vsearch OTU picking tutorial for details).

Before you dereplicate your sequences, you have a couple of options:

Use only the forward reads when dereplicating. In this case, import your data as SampleData[SequencesWithQuality].
Join the reads, either using qiime vsearch join-pairs (see the read-joining tutorial for details), or an external read-joining program. If you use an external program to join reads, import your joined reads as SampleData[JoinedSequencesWithQuality].

Are these reference sequences and taxonomy annotations the exact files required for PICRUSt analyses (i.e. the gg_13_5_otus.tar.gz downloaded from the PICRUSt docs)? I can't tell from the file paths whether those are the right files, or whether modifications have been made to them (I'd avoid making any modifications to the Greengenes database that PICRUSt is expecting).

Also, you don't need to import the Greengenes taxonomy annotations into a QIIME 2 artifact. You can add the taxonomy annotations (97_otu_taxonomy.txt) directly to the exported .biom file (see below).

You'll need to use biom add-metadata to add the Greengenes taxonomy annotations (97_otu_taxonomy.txt) to the exported .biom file (check out the biom add-metadata docs for details). You'll need to add the following header line to 97_otu_taxonomy.txt in order for it to work (make sure the two fields are separated by a tab character):

#OTUID	taxonomy

Then use this command to add the taxonomy annotations to the .biom file as observation metadata (feel free to change the file paths to whatever makes sense for your analyses):

biom add-metadata -i exported-table-for-picrust/feature-table.biom -o table-with-taxonomy.biom --observation-metadata-fp 97_otu_taxonomy.txt --sc-separated taxonomy

After that you can use table-with-taxonomy.biom with PICRUSt!

This step isn't necessary, see above for details about handling taxonomic annotations.

The feature IDs output by the DADA2 plugin are MD5 sums of each representative sequence. Once you use q2-vsearch to create a closed-reference feature table, the feature IDs will correspond to Greengenes IDs.

Let us know how this workflow works out for you!