Hello!
Using a file for this kind of things is something that can be done in some QIIME 2 plugins (e.g. RESCRIPt allows you to use a tabular file with your replacements when editing taxonomy). In this case, after reading the documentation, I'm afraid you cannot do what you want by directly feeding QIIME 2 with your CSV path. But that does not mean you can use some Bash magic to make it work automatically.
First of all, your CSV file. I will assume you have a CSV file called contaminants.csv
that contains only one column: the taxonomy of the contaminants.¹ I'll also assume this file has no headers.²
You can add the content of your file to a Bash variable:
# If CSV file has Linux newlines (\n):
contaminants=$(tr '\n' ',' < contaminants.csv | sed 's/,$//')
# If CSV file has Windows newlines (\r\n):
contaminants=$(tr -s '\r\n' ',' < contaminants.csv | sed 's/,$//')
The tr
command removes newlines and put commas (the default qiime taxa filter-table
value for --p-query-delimiter
) instead. You can put the delimiter you want. The sed
command removes the trailing comma.
Once you have this, you can run your command as follows:
qiime taxa filter-table
--i-table table.qza
--i-taxonomy taxonomy.qza
--p-mode exact
--p-exclude $contaminants
--o-filtered-table table_no_contaminants.qza
Best,
Sergio
--
¹ What if my file has more than one column?
There are a lot of things you can do (like manually creating another CSV with only your column of interest). For the sake of completeness, I'll provide one possible command line solution:
desired_column=$(awk -F',' '{print $3}' contaminants.csv)
contaminants=$(echo "$desired_column" | tr '[:space:]' ',' | sed 's/,$//')
First command keeps only one column and store it in a variable desired_column
. Here I assume the column of interest is the third, $3
, but you can adapt it to your needs. I also assume the CSV field separator is a comma, -F','
.
The second command creates the contaminants
variable in a similar way as before, but instead of using the CSV as input we use the desired_column
variable.
Now you are ready to run QIIME2 with $contaminants
with the command I wrote above.
² What if my file has headers?
Again, I will give one of the multiple possible command line solutions, although you could simply open the CSV file and manually remove the first row.
Assuming you already created the contaminants
variable (either with the post method or with the footnote 1 method), all you have to do prior to run QIIME 2 is:
contaminants=$(echo "$contaminants" | cut -d',' -f2-)
This removes the first comma-separated value of the contaminants
variable (that is the column header).