Dear Qiime2 community
I used to remove likely non-prokaryotic sequences using qiime2 quality control on old greengenes database (clustered at 88% identity) as discussed here: 16S V3-V4 length : what to do with the sequences of shorter than expected length after DADA2? - #3 by Mehrbod_Estaki with the following commands:
qiime quality-control exclude-seqs
--i-query-sequences rep-seqs-merged.qza
--i-reference-sequences 88_otus.qza
--p-method vsearch
--p-perc-identity 0.65
--p-perc-query-aligned 0.5
--p-threads 8
--o-sequence-hits hits.qza
--o-sequence-misses misses.qza ;
This filtering gave me good results, since I discarded ASVs shorter than expected and with strange or not-found blast hits.
I am now trying to do the same step but using greengenes2 with these commands:
qiime quality-control exclude-seqs
--i-query-sequences rep-seqs-merged.qza
--i-reference-sequences 2024.09.backbone.full-length.fna.qza
--p-method vsearch
--p-perc-identity 0.65
--p-perc-query-aligned 0.5
--p-threads 8
--o-sequence-hits hits.qza
--o-sequence-misses misses.qza ;
However, now I am getting almost nothing inside the “misses.qza”. I am afraid that things that did not align with the previous version of greengenes now they do (and they are non-prokaryotic sequences).
Similarly, when I used the rescript-curated silva 138.2 i get half of the misses I got with greengenes old taxonomy.
Any idea about what could be going on?
Thanks a lot for your help!