Filter taxa/samples and re-running alpha and beta diversity

Hello, I followed your tutorial and analyzed my data. I am studying the bacteria associated with a forest tree and a beetle, using qiime2-2020.2.
By reviewing the taxa bar plot and the data structure (mitochondria,unnecessary samples…) I decided to clean my table.qza and rep-seqs.qza (commands below), in order to run a new analysis with filtered data, starting from the ‘tree generation for phylogenetic diversity analysis’ step.

Input: Feature table tableJ.qza and Representative sequences rep-seqsJ.qza

Filtering out unnecessary samples

qiime feature-table filter-samples
–i-table tableJ.qza
–m-metadata-file metadata-filtered.tsv
–o-filtered-table tableJ_filter.qza

Filtering out all the organisms/organelles different from bacteria

qiime taxa filter-table
–i-table tableJ_filter.qza
–i-taxonomy taxonomy_SILVA_138J.qza
–p-exclude mitochondria,chloroplast,archaea,eukaryota
–o-filtered-table tableJ_idfilter_138.qza

Filtering the sequences

qiime feature-table filter-seqs
–i-data rep-seqsJ.qza
–i-table tableJ_idfilter_138.qza
–o-filtered-data rep-seqsJ_idfilter_138m2.qza

One difference that I noticed is that my new Feature Table “tableJ_idfilter_138.qza“ is not interactive and I cannot directly assess the sampling depth (See screenshot).
Feature table filtered

My question is whether now I should set a low (e.g. 100) sampling depth in qiime diversity core-metrics-phylogenetic, since I want to use all the samples. During the first analysis, I set the sampling depth at 1164 and I am afraid that if I use this value now, some samples would be discarded.
Do you usually proceed in this way to analyze the alpha and beta diversity of only bacteria, without being biased by other organisms/organelles?


Welcome, @snones!

Sounds exciting!

Sounds good! A common problem, especially when looking at host-associated communities, and you are taking the right approach.

You should use qiime feature-table summarize to get that information. It looks like you used metadata tabulate, which shows you everything (too much in this case! You need a summary).

Use the summary visualization to select an appropriate sampling depth based on the filtered data. You could also use alpha rarefaction or another method to select an appropriate depth.

Yes, through filtering the way you have done. Then I would select a sampling depth in the same way (as I’ve described above) whether or not I did such filtering.

Good luck!

1 Like

Thank you very much for the fast response and for the feedback @Nicholas_Bokulich! :smiley:

1 Like

I have another doubt. Once the alpha and beta diversity with the filtered data (no mitochondria,unwanted samples…) is done, repeating the taxonomy step (e.g. classification) on the new filtered sequences, is completely unnecessary, right?

You are correct, re-classifying the taxonomy is totally unnecessary. This is because you have done nothing to the sequences themselves, you have only removed features that are not wanted… but the original taxonomy remains valid and usable (and should not change even if you did reclassify the filtered sequences)

Good luck!

1 Like

Thank you very much once again!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.