I am trying to pick representative sequences from the OTU table I have created so that I have one representative sequence for each OTU cluster, so I have a smaller OTU table that can then be used to generate a phylogenetic tree. Currently I have >20,000 ASV's which I have turned into an OTU table using the below command:
I believe what you are describing is in fact already produced by the second output in your command:
--o-clustered-sequences rep-seqs-dn-99.qza
These are the representative sequences which will map the feature-table's (feature) IDs to some sequence.
Generally in QIIME 2 we produce the table and rep-seqs are the same time, since you need to generate the same mapping to define either of them.
What we don't have is an equivalent to the old "OTU map", which would show how different individual sequences were binned to some representative sequence.
Thanks for your response! That make sense, I guess I thought the number of sequences would reduce more when the OTUs where clustered and representative sequences pulled out. My data went from 20000 ASV's to 14000 which is still far to many to include in a phylogeny.
Is there a way to filter just the most prevalent sequences in the rep-seqs file above a certain threshold? I have filtered the feature-table but can't find the same solution for rep-seqs. Equally is it possible to filter by species i.e. finding the most prevalent sequences within a species (rather than within a sample, I have multiple samples per species)?
Hi @Phoebe_Cunningham, You can use qiime feature-table filter-seqs to do the filtering you're requesting. First, filter the feature table how you want it (e.g., with qiime feature-table filter-features, and then call qiime feature-table filter-seqs, providing the filtered feature table as the table input. That will filter the sequences to only those whose feature id shows up in the feature table.