Hi @Lu_Yang,
Sorry to hear QIIME2 has been giving you trouble! This workflow looks fine to me (and it sounds
like you are following this tutorial) — but @wasade may be able to give better guidance on deblur inputs.
Is this the full file? What is the total sequence count vs. the total number of demultiplexed sequences? Sharing your fj-joined-demux.qzv
and deblur-stats.qzv
and table.qzv
could help us assess this issue more fully.
Why not take advantage of the read merging performed in deblur/dada2 in QIIME2? It does not sound like FLASH performs any additional quality checks, but I wonder if it is somehow interfering (e.g., the alignments are not quite right, causing sequences to look like chimera?) I'm not asking you to defend your approach — each user has their own unique needs before sending their data into QIIME2 — nor am I trying to push a particular approach, but it might be worth comparing your current workflow to importing non-merged reads and processing with dada2 or deblur (either as PairedEndSequencesWithQuality
or as JoinedSequencesWithQuality
in deblur following the tutorial linked above) just to make sure that FLASH isn't causing some sort of incompatibility. Just a sort of "sanity check" worth exploring.
The final possibility is just that the sequences are very noisy and deblur is working as intended. Are they 16S sequences? What types of samples? @wasade may have more insight on what could be causing large numbers of sequences to be filtered.
(you could also try otu picking in QIIME2 for comparison — this does not perform any denoising and so should deliver a higher yield of seqs/sample, but you must ask yourself then: do I trust these data? Are there peculiarities about your data (e.g., a novel marker gene) that lead you to believe that denoising methods are not optimal for your special use case?)
Thanks for your patience!