Hi all,
My colleague @Lauren and I are trying to filter a list of 25 specific contaminants out of our feature table in order to then do further analysis without them being included. We would like to target each contaminant down to the species level.
Before trying to filter out all 25 at once, I tried to filter out just 3 contaminants as a test run. The code is as follows:
#!/bin/bash
#SBATCH -c 5
#SBATCH -t 24:00:00
#SBATCH --mail-type=all
#SBATCH [email protected]
#SBATCH --job-name="test filtering contam"
#SBATCH --mem=5G
#SBATCH -p parallel
#SBATCH --array=1-6
# configure
source ~/.bashrc
source activate qiime2-2020.6
# array for different regions ${SLURM_ARRAY_TASK_ID} goes from 1 to 6
declare -a arr=("v2" "v3" "v4" "v67" "v8" "v9")
#TASKID=1
REGION=${arr[${SLURM_ARRAY_TASK_ID}-1]}
echo "$REGION"
mkdir -p $outDir/$REGION
qiime taxa filter-table \
--i-table ../analysis/P05-clust-tree-cm-goods/$REGION/table.qza \
--i-taxonomy ../analysis/P05-clust-tree-cm-goods/$REGION/tax-class.qza \
--p-mode exact \
--p-exclude "k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Actinomycetaceae; g__Varibaculum; s__unassigned","k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Corynebacteriales; f__Corynebacteriaceae; g__Corynebacterium_1; s__Corynebacterium_amycolatum","k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Micrococcales; f__Microbacteriaceae; g__Microbacterium; s__unassigned" \
--o-filtered-table ../analysis/P05-clust-tree-cm-goods/$REGION/$REGION-test-filtered-table.qza \
--verbose
qiime feature-table summarize \
--i-table ../analysis/P05-clust-tree-cm-goods/$REGION/$REGION-test-filtered-table.qza \
--o-visualization ../analysis/P05-clust-tree-cm-goods/$REGION/$REGION-test-filtered-table.qzv \
--m-sample-metadata-file ../data/atcc-metadata.tsv \
To reiterate, the 3 contaminants we tried to filter out in this test run were:
- k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Actinomycetaceae; g__Varibaculum; s__unassigned
- k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Corynebacteriales; f__Corynebacteriaceae; g__Corynebacterium_1; s__Corynebacterium_amycolatum
- k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Micrococcales; f__Microbacteriaceae; g__Microbacterium; s__unassigned
They were only present in 1 of our 5 samples in the V67 region. The code ran with no errors, but when I compared the filtered V67 feature table to the original V67 feature table, only the 2nd contaminant in the list was filtered!

I verified which taxa was eliminated by finding the uuid of the one feature that was eliminated in the feature detail tab, and then looking up its taxonomic assignment in one of our other files.
Any ideas as to how we can fix this issue and eventually filter out all 25 contaminants?
Thanks,
Carli