I am trying to use my output from Qiime2 in picrust 2.
In total i have around 9,000 input sequences.
Is 3,751 being removed/aligning poorly means I am doing something wrong or if that level of filtering is normal?
MY first time using picrust2, and wondering if I am inputing the data wrong or something?
(picrust2) picrust2_out_pipeline % place_seqs.py -s ../seqs.fna -o out.tre -p 1 \
**Warning - 3751 input sequences aligned poorly to reference sequences** (--min_align option specified a minimum proportion of 0.8 aligning to reference sequences). These input sequences will not be placed and will be excluded from downstream steps
Depends on the input, environment of the samples, etc.
That means, that in your sample that many sequences don't have a good reference within PICRUSt2. Be careful with interpretation, because 40% of your data won't be used for predictions.
i see. is this range typical? or hard to say without more context? I used the following:
- FASTQ reads were trimmed, and then filtered to remove reads containing Ns, or with maximum [expected errors] >=2.
- For ASVs passing inference, chimeras were removed before taxonomic assignment
- Samples with fewer than 1,000 sequences were discarded.
- ASVs accounting for less than one millionth of all pass-filter reads were discarded
im wondering if there is any quality check i can do to ensure the samples are adequate?
What type of environment are you sequencing? This seems high for human gut, but normal (at least as far as I know) for environmental samples. It may be worth checking with the PICRUSt folks though - they'll have a much better handle.
As for a sanity check, I would do your typical 16S analyses on the data that you're providing as input to PICRUSt (i.e., same filters/etc applied) and make sure that that aligns with an expectations you have about the samples.