Filter feature table based on taxonomy

lautaro.rostoll · April 18, 2021, 7:10pm

Hello everyone!
After running DADA2, I filtered out my rep-seq based on sequence length, to remove any sequences that were too short.
I used the following command based on this post: Filtering out ASVs from DADA2 based on length - #4 by thermokarst.
qiime feature-table filter-seqs
--i-data trim-rep-seqs.qza
--m-metadata-file trim-rep-seqs.qza
--p-where 'length(sequence) > 400'
--o-filtered-data filtered-seqs1.qza
After that, I did taxonomical classification using an amplicon-specific classifier using sklearn.
When I tried to do a taxa barplot I have the problem that there are some feature IDs that are not present in the taxonomy.qza file because I removed some sequences from the rep-seq file when I filter by seq lenght, but those feature IDs are present in the feature table.

I tried filtering out those feature IDs using the following command, but it run for several hours and the it crashed.
qiime taxa filter-table
--i-table trim-table.qza
--i-taxonomy taxonomy1.qza
--p-mode exact
--p-exclude "df328ed9e23e9e a802c48e3fc0fa68e1; 49b4951c905212d8a671647a253681d9; 9608adcdb80fe95c904395 c465537e3b; 942b345986d54917d3b13e6be81cf14a; a6d491a6a84ce2619169cd3538c63e 62; dff45ac694b2ef7186c0093a633fe5de; 44c4a170630b3d3ecebd657ccd8025c1; 65 fa2f461d4ba68b33bf45cb391e5e47; f3c4ba9a574064e92e84e4c6bd9f6d85; 18dc779e01 4ddff6a1930b6f0fc36844; a08dee45da8bf236f75aaa29b5d9d981; d29d4f22de8703dd09 351691a88b8ae6; b25c25dd8c4ea66d515ba4726a770f81; f2721c9022b3d0000ad4b9205a c91ccf; a426de20831b91b128502328779c0861; 48fd17f893369fea4a59608a90c5b4a4; cac5f1d3af38646bd58ec32c7570239a; 80dc1e882c31eb054a283adcc32c66ef; 606d46 f710fe184c2e74891b7d9177d7; 044485901c8eef72882ca8b965887bc1; 68f65eb31bbaf9 f92d61ba378a8d12b8; 62b9df9ec9666c59c505c8887183efc3; c299c4838a6d9720aa5b4e 7c40533b91; f98ba37b882428fa2a8cb47c6ab46e36; 09a65ef6f12541d910fed421f2a5b9 5b; 090078decff21ae8a4331fc1f9ec86d7; 468ffa2047e3544cd216b8c4f878723e; 00 f2449b8351431d61eb3e4bd6523da4; ca7cba37ae4d1d37d3c7f8bf17e9bd97”
--o-filtered-table trim-table1.qza

I would like to know if there is an option in qiime2 to remove the feature IDs from the feature table that we lost from the rep-seq when I removed the shorter sequences.
@thermokarst @Nicholas_Bokulich
thanks a lot!

Lautaro

SoilRotifer · April 18, 2021, 9:27pm

If I understand correctly you want to keep only the features in trim-table.qza that are present in your filtered-seqs1.qza, and remove the rest. If so you can do the following:

qiime feature-table filter-features \
    --i-table ./trim-table.qza \
    --m-metadata-file ./filtered-seqs1.qza \
    --o-filtered-table ./trim-table1.qza

You can apply --p-exclude-ids if you want to remove the IDs present in --m-metadata-file. The default is to keep the IDs.

Also you can use RESCRIPt to filter the taxonomy files and the sequences by length too. Check out the tutorial here:

Check out qiime rescript filter-taxa ... (I do not think it made it into the linked tutorial).

-Mike

system · May 20, 2021, 7:55pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.