I am trying to run "cluster-features-open-reference" for diversity analysis. We would like to run differential abundance analysis on OTU corresponding to denovo ref-sequence but there is no option to export the table corresponding to
Command '['FastTree', '-quote', '-nt', '/tmp/qiime2/arora172/data/030cdd4e-6c90-49df-bcde-ce022e89f1cf/data/aligned-dna-sequences.fasta']' died with <Signals.SIGKILL: 9>.
How can I filter the sequences or run it on denovo identified sequences (I tried increasing the nthreads to 8 but getting the error).
Total sequence identified in denovo is ~180k
We would like to run differential abundance analysis on OTU corresponding to denovo ref-sequence but there is no option to export the table corresponding to --o-new-reference-sequences output_99_perc/new-ref-seqs-or-99.qza
like we have --o-clustered-table output_99_perc/table-or-99.qza for open reference.
There is no --o-new-reference-sequences output for the de novo approach because all references are new. In the open reference approach, clusters that form beyond the pre-supplied references are new, and so there is an additional option for them.
Command '['FastTree', '-quote', '-nt', '/tmp/qiime2/arora172/data/030cdd4e-6c90-49df-bcde-ce022e89f1cf/data/aligned-dna-sequences.fasta']' died with <Signals.SIGKILL: 9>.
This error commonly happens if you run out of memory. Increasing the number of threads will only make the problem worse.
Here we are getting output
--o-clustered-table table-or-99.qza
--o-clustered-sequences rep-seqs-or-99.qza
--o-new-reference-sequences new-ref-seqs-or-99.qza
How can we get :
--o-new-reference-table like we get sequence.
We are interested in analyzing new reference table for downstream analysis.
We filtered the table using the command suggested by you and observed that it will generate the new table of OTU which is same as the input table.
Our input table have have ~10k OTUs and new-ref-seqs-or-99.qza have ~100K sequence. Now we are wondering how can we get the table for the novel OTUs ~90k. The filtered table gave back us the same 10k OTU table we already have.
Can you please guide us to find how we can get the corresponding 90k sequence table or 100k table?
Sorry, I thought that --o-new-reference-sequences were only the newly clustered features but they are in fact all of the clusters, old and new. Instead, this should work:
because table-or-99.qza contains all features, table-dada2.qza includes only old features, and using --p-exclude-ids will give the set difference all - old = new features.
The new-ref sequences is generated from known reference and denovo identified OTU using the unite database at 99% similarity. Filtering the same table with dada2 table still remain the same. We are looking for the table for the denovo identified OTUs along with known identified sequence.
It's possible that none of your features clustered to the references (everything clustered de novo). Can you confirm that your feature table has features from the provided references?
Thank you for your continued support and suggestions.
After running the qiime vsearch cluster-features-open-reference command with the UNITE database for 99% similarity clustering, we generated 2 sequence output files:
Please read my previous post again. This could be happening because none of your features clustered to the reference sequences provided to cluster-features-open-reference.
We exported the table table-or-99.qza into tsv and found 20 OTUs which correspond to the rep-seqs-or-99.qza file.
table.tsv (1.6 KB)
The new-ref-seqs-or-99.qza file contains ~190k sequences and as you suggested we excluded the ids table.qza to get a new table for new-ref-seqs-or-99.qza using:
Sorry for the confusion so far. The distinction between the --o-clustered-sequences and --o-new-reference-sequences outputs is:
in the --o-clustered-sequences output, for each cluster to a reference sequence, the representative sequence will be of the most abundant feature of the cluster (not the reference sequence itself). Each feature id comes from the reference.
in the --o-new-reference-sequences output, for each cluster to a reference sequence, the representative sequence is the reference sequence itself. Each feature id still comes from the reference.
De novo clustered features, in both outputs, have as their representative sequence and feature identifier the ones from the most abundant feature. There is no guarantee that any features cluster to a reference--the output table may contain completely de novo-clustered features.
Now, in your case, you want to filter a clustered feature table to remove features clustered to a reference, and keep those that are clustered de novo. So we can take the clustered table, filter with the reference sequences, and invert the filter: