PICRUST2 Input Data setting

Youngtae_Won · July 4, 2025, 4:37am

Hello, I am a researcher who is new to microbiome studies. I am reaching out to seek advice and insights from those with more experience regarding some questions I have about using PICRUST2.

I have been processing and analyzing fastq files with dada2, and I am now interested in performing additional functional prediction analysis using PICRUST2.

My main question concerns the input sequences for PICRUST2.
Should I provide all unique sequences from the original, unfiltered fastq files, or should I use the sequences that have been filtered and processed by dada2?

Additionally, I would like to know if there are any other tools, besides PICRUST2, that are useful for pathway or functional analysis, such as KEGG or GO analysis.

I am aware that I can find information on the QIIME2 forum, but as of posting this, I have not yet found the answers I am looking for.

I would greatly appreciate any advice or help from the community. Thank you very much.

colinbrislawn · July 8, 2025, 12:04am

I would use the ASVs from DADA2 as discribed here: q2 picrust2 Tutorial · picrust/picrust2 Wiki · GitHub

Yes! See Figure2 from the Picrust2 paper PMC7365738

Piphillin
GitHub - dmcskim/pyphillin: Functional profiling of 16S sequence data. Public, open source version of Piphillin.
PanFP
GitHub - srjun/panfp
Tax4Fun2
GitHub - fjossandon/Tax4Fun2

Remember that a 16S V4 amplicon contains only about 250 basepairs, which is not a lot of information, like only 500 bits or <0.1 kB per ASV. It's miraculous that picrust2 gets more than halfway to shotgun metagenomic sequencing from so little data.

EDIT: My friend and colleague David Tool did his dissertation on functional prediction. View the PDF here: https://ufdcimages.uflib.ufl.edu/UF/E0/05/61/89/00001/Toole_D.pdf