Filtering part of the taxa present in PCR-blanks from the FeatureTable

yanxianl · November 22, 2017, 9:46pm

You're right! The blank-library-sequences.qza is empty! I've done what you suggested and it's working now.

There's one more problem, though.

As discussed in another thread, filtering all the sequences in the negative controls is not a good idea as there might be cross contamination from biological samples. As such, it's best to carefully check the sequences present in the negative controls before we proceed to filtering.

Initially, I decided to filter the following 4 taxa based on their prevalence in the negative controls, mock and biological samples:

Yet, when checking the blank-library-contaminant-sequences.qzv (333.2 KB)
, I found 68 features assigned to these 4 obvious contaminant taxa. In particular, some features assigned as Halomonadaceae were found in samples but not in the negative controls, indicating that I've filtered real features from the samples.

Therefore, filtering contaminant sequences using feature ID is probably a better way. I tried to fetch a feature table file (csv or tsv) displaying the count of features in different samples, like the taxonomy table (DADA2-level-7.txt (155.2 KB)) that can be downloaded fom the visualization of taxa-bar-plots. However, the 'Frequency per feature detail.csv' file only gives the total count of each feature. How can I get the distribution of features across different samples?