Issue filtering multiple specific taxa out of feature table

Hi all,

My colleague @Lauren and I are trying to filter a list of 25 specific contaminants out of our feature table in order to then do further analysis without them being included. We would like to target each contaminant down to the species level.

Before trying to filter out all 25 at once, I tried to filter out just 3 contaminants as a test run. The code is as follows:

#!/bin/bash
#SBATCH -c 5
#SBATCH -t 24:00:00
#SBATCH --mail-type=all
#SBATCH [email protected]
#SBATCH --job-name="test filtering contam"
#SBATCH --mem=5G
#SBATCH -p parallel
#SBATCH --array=1-6

# configure
source ~/.bashrc
source activate qiime2-2020.6

# array for different regions ${SLURM_ARRAY_TASK_ID} goes from 1 to 6
declare -a arr=("v2" "v3" "v4" "v67" "v8" "v9")
#TASKID=1
REGION=${arr[${SLURM_ARRAY_TASK_ID}-1]}
echo "$REGION"
mkdir -p $outDir/$REGION

qiime taxa filter-table \
  --i-table ../analysis/P05-clust-tree-cm-goods/$REGION/table.qza \
  --i-taxonomy ../analysis/P05-clust-tree-cm-goods/$REGION/tax-class.qza \
  --p-mode exact \
  --p-exclude "k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Actinomycetaceae; g__Varibaculum; s__unassigned","k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Corynebacteriales; f__Corynebacteriaceae; g__Corynebacterium_1; s__Corynebacterium_amycolatum","k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Micrococcales; f__Microbacteriaceae; g__Microbacterium; s__unassigned" \
  --o-filtered-table ../analysis/P05-clust-tree-cm-goods/$REGION/$REGION-test-filtered-table.qza \
  --verbose

qiime feature-table summarize \
  --i-table ../analysis/P05-clust-tree-cm-goods/$REGION/$REGION-test-filtered-table.qza \
  --o-visualization ../analysis/P05-clust-tree-cm-goods/$REGION/$REGION-test-filtered-table.qzv \
  --m-sample-metadata-file ../data/atcc-metadata.tsv \

To reiterate, the 3 contaminants we tried to filter out in this test run were:

  1. k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Actinomycetaceae; g__Varibaculum; s__unassigned
  2. k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Corynebacteriales; f__Corynebacteriaceae; g__Corynebacterium_1; s__Corynebacterium_amycolatum
  3. k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Micrococcales; f__Microbacteriaceae; g__Microbacterium; s__unassigned

They were only present in 1 of our 5 samples in the V67 region. The code ran with no errors, but when I compared the filtered V67 feature table to the original V67 feature table, only the 2nd contaminant in the list was filtered!

Screen Shot 2021-02-15 at 8.45.10 PM

I verified which taxa was eliminated by finding the uuid of the one feature that was eliminated in the feature detail tab, and then looking up its taxonomic assignment in one of our other files.

Any ideas as to how we can fix this issue and eventually filter out all 25 contaminants?

Thanks,
Carli :qiime2:

Hi!

Double check if you misspelled the taxonomy annotation, even one error will force plugin to fail filtering from a table.

As a variant, you can use https://docs.qiime2.org/2020.11/plugins/available/feature-table/filter-features/ to filter the table based on the number of samples in which feature was found.
If you have a table with different regions, you can divide a table by regions, apply filtration and merge it back.

Hi @cjone228,

I think the issue is that you have multiple separate doube-quoted strings:
--p-exclude "k__Bacteria; ...","k__Bacteria; ...","k__Bacteria; ..."

Keep everything in one double-quoted string and separate each taxonomy string by commas:

--p-exclude "k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Actinomycetaceae; g__Varibaculum; s__unassigned,k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Corynebacteriales; f__Corynebacteriaceae; g__Corynebacterium_1; s__Corynebacterium_amycolatum,k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Micrococcales; f__Microbacteriaceae; g__Microbacterium; s__unassigned"

I’ve just tried this on one of may feature-tables and it worked as expected. Give this a try and let us know how it goes. :slight_smile:

I noticed that you have selected to remove two different s__unassigned taxa. You should likely take a look at the taxa that have been flagged within the same genera without a species label, e.g. g__Varibaculum and g__Microbacterium. Just because they do not have the the ambiguous species label does not mean they should remain in your data as they may still be what you consider contaminants.

-Cheers!

2 Likes

Hi there,

Thanks for the suggestion! I just tried that, but unfortunately got the same result… super weird. I am not sure what to do from here…

Carli

That is indeed odd. I’d be happy to look at the data if you’d like to share them with me privately.

Otherwise be sure that there are no odd hidden characters or white-spaces in your taxonomy strings. Perhaps export the taxonomy qza file as a text file. Then copy the strings you want directly from that file and try again?

-Mike