Picrust2: Removed 3,751 input sequences aligned poorly to reference sequences

kkl45 · June 26, 2023, 12:35pm

I am trying to use my output from Qiime2 in picrust 2.

In total i have around 9,000 input sequences.

Is 3,751 being removed/aligning poorly means I am doing something wrong or if that level of filtering is normal?

MY first time using picrust2, and wondering if I am inputing the data wrong or something?

(picrust2) picrust2_out_pipeline % place_seqs.py -s ../seqs.fna -o out.tre -p 1 \ 
              --intermediate intermediate/place_seqs
**Warning - 3751 input sequences aligned poorly to reference sequences** (--min_align option specified a minimum proportion of 0.8 aligning to reference sequences). These input sequences will not be placed and will be excluded from downstream steps

crusher083 · June 26, 2023, 12:54pm

Depends on the input, environment of the samples, etc.
That means, that in your sample that many sequences don't have a good reference within PICRUSt2. Be careful with interpretation, because 40% of your data won't be used for predictions.

Cheers
V

kkl45 · June 26, 2023, 4:45pm

i see. is this range typical? or hard to say without more context? I used the following:

FASTQ reads were trimmed, and then filtered to remove reads containing Ns, or with maximum [expected errors] >=2.
For ASVs passing inference, chimeras were removed before taxonomic assignment
Samples with fewer than 1,000 sequences were discarded.
ASVs accounting for less than one millionth of all pass-filter reads were discarded

im wondering if there is any quality check i can do to ensure the samples are adequate?

gregcaporaso · June 26, 2023, 5:42pm

Hi @kkl45,
What type of environment are you sequencing? This seems high for human gut, but normal (at least as far as I know) for environmental samples. It may be worth checking with the PICRUSt folks though - they'll have a much better handle.

As for a sanity check, I would do your typical 16S analyses on the data that you're providing as input to PICRUSt (i.e., same filters/etc applied) and make sure that that aligns with an expectations you have about the samples.

colinbrislawn · December 9, 2023, 4:36pm

An off-topic reply has been merged into an existing topic: What it means???? I am not sure about good or bad results??!!!

Please keep replies on-topic in the future.