gathering a random subset of reference sequences

thermokarst · August 10, 2020, 8:54pm

Hey @devonorourke - this is a great feature request (feel free to open a ticket at GitHub - qiime2/q2-feature-table: QIIME 2 plugin supporting operations on feature tables.). In the meantime, here is a SQL-based workaround, all in QIIME 2:

 qiime feature-table filter-seqs \
  --i-data rep-seqs.qza \
  --m-metadata-file rep-seqs.qza \
  --p-where "[Feature ID] IN (SELECT [Feature ID] FROM metadata ORDER BY RANDOM() LIMIT 10)" \
  --o-filtered-data filtered.qza

Here we select 10 random sequences, but you can change the number to match whatever threshold you need.

:qiime2: