qiime vsearch cluster-features-open-reference issue

Hello Qiime members,

I am trying to run "cluster-features-open-reference" for diversity analysis. We would like to run differential abundance analysis on OTU corresponding to denovo ref-sequence but there is no option to export the table corresponding to

--o-new-reference-sequences output_99_perc/new-ref-seqs-or-99.qza

like we have --o-clustered-table output_99_perc/table-or-99.qza for open reference.

My command

qiime vsearch cluster-features-open-reference \
  --i-table step2_dada2/filtered-table.qza \
  --i-sequences filtered-rep-seqs.qza \
  --i-reference-sequences /database_99_perc_similarity/unite_database_99perc.qza \
  --p-perc-identity 0.99 \
  --o-clustered-table output_99_perc/table-or-99.qza \
  --o-clustered-sequences output_99_perc/rep-seqs-or-99.qza \
  --o-new-reference-sequences output_99_perc/new-ref-seqs-or-99.qza

Also, for diversity analysis analysis when I am trying to run against the denovo sequence using

qiime phylogeny align-to-tree-mafft-fasttree \
  --i-sequences new-ref-seqs-or-99.qza \
  --o-alignment new-ref-seqs/aligned-new-ref-seqs-or-99.qza \
  --o-masked-alignment new-ref-seqs/masked-aligned-new-ref-seqs-or-99.qza \
  --o-tree new-ref-seqs/unrooted-tree_new-ref-seqs-or-99.qza \
  --o-rooted-tree new-ref-seqs/rooted-tree_new-ref-seqs-or-99.qza \
  --p-parttree True

I am getting an error

Command '['FastTree', '-quote', '-nt', '/tmp/qiime2/arora172/data/030cdd4e-6c90-49df-bcde-ce022e89f1cf/data/aligned-dna-sequences.fasta']' died with <Signals.SIGKILL: 9>.

How can I filter the sequences or run it on denovo identified sequences (I tried increasing the nthreads to 8 but getting the error).
Total sequence identified in denovo is ~180k

Hello @devenderarora,

We would like to run differential abundance analysis on OTU corresponding to denovo ref-sequence but there is no option to export the table corresponding to
--o-new-reference-sequences output_99_perc/new-ref-seqs-or-99.qza
like we have --o-clustered-table output_99_perc/table-or-99.qza for open reference.

There is no --o-new-reference-sequences output for the de novo approach because all references are new. In the open reference approach, clusters that form beyond the pre-supplied references are new, and so there is an additional option for them.

Command '['FastTree', '-quote', '-nt', '/tmp/qiime2/arora172/data/030cdd4e-6c90-49df-bcde-ce022e89f1cf/data/aligned-dna-sequences.fasta']' died with <Signals.SIGKILL: 9>.

This error commonly happens if you run out of memory. Increasing the number of threads will only make the problem worse.

Dear @colinvwood

I would like to rephrase my question as some confusion:
We ran open-reference clustering using:

qiime vsearch cluster-features-open-reference \
  --i-table table-dada2.qza \
  --i-sequencesrep_seqs-dada2.qza \
  --i-reference-sequences database_99_perc_similarity/unite_database_99perc.qza \
  --p-perc-identity 0.99 \
  --o-clustered-table table-or-99.qza \
  --o-clustered-sequences rep-seqs-or-99.qza \
  --o-new-reference-sequences new-ref-seqs-or-99.qza

Here we are getting output
--o-clustered-table table-or-99.qza
--o-clustered-sequences rep-seqs-or-99.qza
--o-new-reference-sequences new-ref-seqs-or-99.qza

How can we get :

--o-new-reference-table like we get sequence.

We are interested in analyzing new reference table for downstream analysis.

Regards,
Devender

Hello @devenderarora,

I believe you can do:

qiime feature-table filter-features \
    --i-table table-or-99.qza \
    --m-metadata new-ref-seqs-or-99.qza \
    --o-filtered-table new-table.qza

which will subset your clustered table to include only the novel features.

1 Like

@colinvwood
Dear Colinvwood,

We filtered the table using the command suggested by you and observed that it will generate the new table of OTU which is same as the input table.

Our input table have have ~10k OTUs and new-ref-seqs-or-99.qza have ~100K sequence. Now we are wondering how can we get the table for the novel OTUs ~90k. The filtered table gave back us the same 10k OTU table we already have.

Can you please guide us to find how we can get the corresponding 90k sequence table or 100k table?

Regards,
Devender

Hello @devenderarora,

Sorry, I thought that --o-new-reference-sequences were only the newly clustered features but they are in fact all of the clusters, old and new. Instead, this should work:

qiime feature-table filter-features \
    --i-table table-or-99.qza \
    --m-metadata table-dada2.qza \
    --p-exclude-ids \
    --o-filtered-table new-table.qza

because table-or-99.qza contains all features, table-dada2.qza includes only old features, and using --p-exclude-ids will give the set difference all - old = new features.

Hello
@colinvwood, we performed the feature-table filter-feature as recommended but it is also giving the same OTUs.

From my original command

qiime vsearch cluster-features-open-reference \
  --i-table table-dada2.qza \
  --i-sequencesrep_seqs-dada2.qza \
  --i-reference-sequences database_99_perc_similarity/unite_database_99perc.qza \
  --p-perc-identity 0.99 \
  --o-clustered-table table-or-99.qza \
  --o-clustered-sequences rep-seqs-or-99.qza \
  --o-new-reference-sequences new-ref-seqs-or-99.qza

Here: --o-new-reference-sequences new-ref-seqs-or-99.qza

The new-ref sequences is generated from known reference and denovo identified OTU using the unite database at 99% similarity. Filtering the same table with dada2 table still remain the same. We are looking for the table for the denovo identified OTUs along with known identified sequence.

Hello @devenderarora,

It's possible that none of your features clustered to the references (everything clustered de novo). Can you confirm that your feature table has features from the provided references?

Hello @colinvwood,

Thank you for your continued support and suggestions.

After running the qiime vsearch cluster-features-open-reference command with the UNITE database for 99% similarity clustering, we generated 2 sequence output files:

--o-clustered-sequences rep-seqs-or-99.qza

--o-new-reference-sequences new-ref-seqs-or-99.qza

We also have table (table-or-99.qza) file that corresponds to rep-seqs-or-99.qza

 --o-clustered-table table-or-99.qza \

We can assign the taxonomy upto level 8 for these sequences in

rep-seqs-or-99.qza.

We are trying to generate table (“new-ref-table.qza”) for “new-ref-seqs-or-99.qza

As you suggested we used:

qiime feature-table filter-features \
--i-table table-or-99.qza \
--m-metadata table-dada2.qza \
--p-exclude-ids \
--o-filtered-table new-table.qza

But our new-table.qza is exactly the same as table-or- 99.qza

Are there any other scripts that can generate new-table.qza?

Thanks again for your help.

Best regards,

Devender

Hello @devenderarora,

Please read my previous post again. This could be happening because none of your features clustered to the reference sequences provided to cluster-features-open-reference.

Hello @colinvwood,

Just to make sure if our features are clustered or not we ran the same command on the example data provided in QIIME at https://docs.qiime2.org/2023.9/tutorials/otu-clustering/
And performed open-ref clustering using:

qiime vsearch cluster-features-open-reference \
  --i-table table.qza \
  --i-sequences rep-seqs.qza \
  --i-reference-sequences /database/unite_database_99perc.qza \
  --p-perc-identity 0.99 \
  --o-clustered-table table-or-99.qza \
  --o-clustered-sequences rep-seqs-or-99.qza \
  --o-new-reference-sequences new-ref-seqs-or-99.qza

We exported the table table-or-99.qza into tsv and found 20 OTUs which correspond to the rep-seqs-or-99.qza file.

table.tsv (1.6 KB)
The new-ref-seqs-or-99.qza file contains ~190k sequences and as you suggested we excluded the ids table.qza to get a new table for new-ref-seqs-or-99.qza using:

qiime feature-table filter-features \

--i-table table-or-99.qza \

--m-metadata-file table.qza \

--p-exclude-ids \

--o-filtered-table new-table.qza

When we export the new-table.qza in to tsv and look into the number of OTUs it is still the same 20.

table.tsv (1.6 KB)

Does this mean none of the features clustered to the reference sequences for the test data as well?

Regards,
Devender

Hello @devenderarora,

Sorry for the confusion so far. The distinction between the --o-clustered-sequences and --o-new-reference-sequences outputs is:

  • in the --o-clustered-sequences output, for each cluster to a reference sequence, the representative sequence will be of the most abundant feature of the cluster (not the reference sequence itself). Each feature id comes from the reference.
  • in the --o-new-reference-sequences output, for each cluster to a reference sequence, the representative sequence is the reference sequence itself. Each feature id still comes from the reference.

De novo clustered features, in both outputs, have as their representative sequence and feature identifier the ones from the most abundant feature. There is no guarantee that any features cluster to a reference--the output table may contain completely de novo-clustered features.

Now, in your case, you want to filter a clustered feature table to remove features clustered to a reference, and keep those that are clustered de novo. So we can take the clustered table, filter with the reference sequences, and invert the filter:

qiime feature-table filter-features \
    --i-table table-or-99.qza \
    --m-metadata unite_database_99perc.qza \
    --p-exclude-ids \
    --o-filtered-table new-table.qza

Again, apologies for my misunderstandings.

1 Like