Two many ASV in my dataset

Dear all,
After processing the data with DADA2, I found that the number of sequences in the sample was 3W-40W



But the sequencing company said it was normal.
Then I filter the dataset by bacteria

qiime taxa filter-table \
--i-table ../table-dada2-3.qza \
--i-taxonomy ../taxonomy-3.qza \
--p-include d__Bacteria \
--o-filtered-table Bacteria-dada2-table.qza \

# filter sequence
qiime taxa filter-seqs \
  --i-sequences ../rep-seq-dada2-3.qza \
  --i-taxonomy ../taxonomy-3.qza \
  --p-include d__Bacteria \
  --o-filtered-sequences Bacteria-sequences.qza \

qiime feature-table summarize \
  --i-table Bacteria-dada2-table.qza \
  --o-visualization Bacteria-dada2-table.qzv \
  --m-sample-metadata-file ../metadata_qiime2.txt

Bacteria-dada2-table.qzv
Results:


Align to tree

qiime phylogeny align-to-tree-mafft-fasttree \
--i-sequences Bacteria-sequences.qza \
--p-n-threads auto \
--o-alignment aligned-Bacteria-rep-seqs.qza \
--o-masked-alignment masked-aligned-Bacteria-rep-seqs.qza \
--o-tree Bacteria-unrooted-tree.qza \
--o-rooted-tree Bacteria-rooted-tree.qza


According to the search, this error may be due to too many ASVs.

So, what do I need to do with my data after DADA2?
Rarefaction or filter out the lower frequence before rarefaction to reduce the ASV number?

Filtering out features with low overall counts sounds like a good idea to reduce memory requirements for the job.
So, you can filter features with counts less than a certain threshold (50,100 or other numbers), filter sequences based on filtered feature table and then align sequences. In my experience it will decrease memory requirements and speed up the process.

Do I need rarefaction after filtering?

Not necessarily. For core-metrics, one need to rarefy the data, but it is already implemented in the plugin. You don't need to rarefy for other plugins unless it is specified in the plugin options.

How do I determine the filter criteria?


According this figure?My goal is to study bacteria. Do I need to filter the data into bacteria and then filter out the low quality?

I would do it in that order.
Usually for bacteria I filter out all features with absolute counts less than 50, but you also can filter based on relative abundances (like less than 1%, 0.1%) and prevalence.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.