I am trying to use my output from Qiime2 in picrust 2.
In total i have around 9,000 input sequences.
Is 3,751 being removed/aligning poorly means I am doing something wrong or if that level of filtering is normal?
MY first time using picrust2, and wondering if I am inputing the data wrong or something?
(picrust2) picrust2_out_pipeline % place_seqs.py -s ../seqs.fna -o out.tre -p 1 \
--intermediate intermediate/place_seqs
**Warning - 3751 input sequences aligned poorly to reference sequences** (--min_align option specified a minimum proportion of 0.8 aligning to reference sequences). These input sequences will not be placed and will be excluded from downstream steps
Depends on the input, environment of the samples, etc.
That means, that in your sample that many sequences don't have a good reference within PICRUSt2. Be careful with interpretation, because 40% of your data won't be used for predictions.
Hi @kkl45,
What type of environment are you sequencing? This seems high for human gut, but normal (at least as far as I know) for environmental samples. It may be worth checking with the PICRUSt folks though - they'll have a much better handle.
As for a sanity check, I would do your typical 16S analyses on the data that you're providing as input to PICRUSt (i.e., same filters/etc applied) and make sure that that aligns with an expectations you have about the samples.