filtering rep-seq.qza

Hi everyone,

I have looked the qiime2 tutorials regarding filtering the data but I couldn’t find solution which I am looking for. I have roughly 16000 unique representative sequences. I would like to randomly select only 20 representative sequences from rep-seqs.qza. Is there any suggestion how could I achieve it?

Reason behind my question:

I ran qiime phylogeny align-to-tree-mafft-fasttree in rep-seqs.qza in order to generated the rooted-tree.qza . After running the above command aligned-rep-seqs.qza and masked-aligned-rep-seqs.qza files were also created.

I was curious to see how different the representative-sequence, aligned-representative-sequence and masked-aligned-representative look for a particular feature id. I found the following result:

Starting length are different for Representative sequences, Aligned-Representative sequences have same length (5420) for all feature-id and Masked-Aligned-Representative sequences have same length (5222) for all feature-id.

Now I would like to run qiime phylogeny align-to-tree-mafft-fasttree in the rep-seqs.qza file (which will have less unique feature id ~around 20) and see how difference will length be in Representative sequences , Aligned-Representative sequences and Masked-Aligned-Representative sequences after running qiime phylogeny align-to-tree-mafft-fasttree.

Use qiime feature-table subsample to randomly sample features from a table, then use the subsampled table to filter your rep-seqs.qza with qiime feature-table filter-seqs

Good luck!

Thank you Nicholas for providing the solution.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.