Q2-picrust2: recommended max NSTI value

picrust
(Jai Ram Rideout) #1

I’m using q2-picrust2’s custom-tree-pipeline as described in the q2-picrust2 tutorial, and the default max-nsti cutoff described there is 2. I noticed that the Hidden state prediction page says “sequences with extremely high NSTI values (e.g. > 1) should be removed”, and further down on that page: “There is no clear cut-off for a high NSTI values, but a good rule of thumb is that sequences placed with NSTI > 0.15 will be less reliable.

Is the default max-nsti=2 reasonable for human fecal samples, or is it better to use something more stringent as described on the HSP wiki page, such as max-nsti=1?

Thanks for your insight!

2 Likes

(Mehrbod Estaki) #2

I actually was questioning this as well last week and totally forgot to follow up on it. Thanks for the reminder!
Pinging q2-picrust’s developer @gmdouglas for his input :crossed_fingers:

1 Like

(Gavin Douglas) #3

Hey @jairideout and @Mehrbod_Estaki,

Sorry for the confusion - what was written on that page was from before I tested out a number of different NSTI cut-offs on different datasets. I’m now suggesting a max NSTI cut-off of 2, which should eliminate junk sequences essentially that can’t be placed in the reference tree. I’ve changed that wiki page to reflect the difference.

That being said the choice of max NSTI had little impact on the concordance between 16S-predicted functions compared to metagenomics-identified functions (except when throwing out >90% of ASVs). This was true even for environmental samples, so using either cut-off should have very little impact on the metagenome-wide predicted function abundances.

Gavin

2 Likes

(Jai Ram Rideout) #4

@gmdouglas Awesome, thanks for the details and for updating the docs!

1 Like