Can I do this filtering in one step

Hello, I have a master feature table with related taxonomy table (taxa.qza), and representative sequences file.

I would like to remove the ASV appearing less than 10 time and I know this can be easily done using qiime feature-table filter-features script. However, I would like to know:

1>Is there any way, I can do this in one step (one script setting) to get three different files filtered feature table, filtered taxa file and filtered representative reads. Well, now I have to do three times of filtering.

2>If can’t do it in one step, I am wondering if my workflow order is wrong. I assign the taxonomy before the filtering, which means I have a master feature table with related master taxa table.

Should I finish all filtering the master feature table first, before I assign the taxonomy? So, I wouldn’t do the taxonomy filtering.

3>Just wanna know how you guys do this? I always filter the representative sequences file whenever I do the filtering. Do you normally do this? Or the representative sequences file is not very useful for downstream analysis. For now, I only can think about the some phylogeny distance analyses (e.g. unifrac, tree building) that would use the sequences. I am not sure if I have to filtering. If I use tree built from master representative sequencing table + filtered feature able would give you errors or not?

Thanks

Hi @sdpapet,

I start from the bottom if you don’t mind!
3) I usually don’t filter the rep seqs. The usual analysis I do are fine as long as all the ASVs are represented in the taxonomy file/tree (as most of diversity plug ins in qiime). If there are more representative sequences/taxonomy should not be a problem most of the cases, at least in my experiences. If any ASVs are missing from the taxonomy table/phylogenetic tree, that will cause error!
But again, I suppose it depends on the type of analysis and what tool/plug ins you are using.

  1. My understanding is that you have three runs (a, b, c). You are filtering out any ASVs which its total count is <10 (generally agree with that!) for each run. The question is what if an ASVs is present at: 4 in run a; 4 in run b, and 3 in run c. With your approach you are going to loose it, while if you merge first and then filter, you probably are going to retain it. I don’t think there is right or wrong here, only what do you want to achieve with your design and analysis! Merging first, has the benefit that you do filtering only once!

  2. Back at the beginning, if you need to apply the same filtering to three different files, you probably need a bit of scripting language, you should be able to do with a loop. I personally would go for a bash loop script because I am familiar with it, other may choose python or other languages!

Hope it helps
Luca

2 Likes

Hi Luca,

Thank you for the help. I personally don’t think the tree matters either, so I’ve never filter the the tree. Thank you for offering your experience.

However, I stuck at the step how to filter the taxonomy table “based on the subset ASV feature table”

For example, I have a total ASV.qza file and total Taxonomy.qza file related to this total asv feature table.

Later, I have filtered the ASV.qza (let say I remove all singletons and doubletons) and I got a subset feature table. Let’s say the name is subASV.qza. Can you tell me which script that I should use if I want to filtered the total Taxonomy.qza file.

Basically, I would like to get a subset of Taxonomy table. This mean, I only need a subset of taxonomy information which related to filtered ASV.qza (a subset of ASV feature table).

Thanks,

Hi @sdpapet,

With an example, if you using ‘taxa barplot’ you can use subASV.qza and Taxonomy.qza, no need to filter anything else. That should be the behaviour for many (if most) of the qiime2 plug ins, in this aspect, but again it depend on its developer, so the only way is to try first and see.
Hope it helps
Luca

1 Like

Well, there is no problem to plot in QIIME2. The reason that I need a subset ASV table and subset taxonomy table is to import it to R.

Currently, I can’t find a way to filter it, so I have to assign the taxonomy twice – one for total ASV table and one for subset ASV table. I guess this solve my problems too. :smile:

Hi @sdpapet,
I see, yeah assign taxonomy twice may work, but since you are importing into R,
would not be better to filter after importing in it?

Luca

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.