Quality trimming

Hi to all!

I’ve analyzed 2 studies downloaded from NCBI.

I grouped each tables with their own category and merged them. Afterwards, I tried to draw a alpha-rarefaction graph and I find out that samples’ depth of one study are far lower than the other one. The highest depth is almost 50000 times more than the lowest one.

So I thought this is because I didn’t do a quality trimming. The reason why I didn’t is to cluster OTU. FeatureTable data from denoised using dada2 or deblur doesn’t work with clustering OTU so I used FeatureTable data from dereplicated from demultiplexed .qza file.

The featureTable data from denoise shows far more depth than what I used. So, here are my questions.

  1. Is there a way to trim the sequence quality to use for clustering OTU?
  2. Why doesn’t work to cluster OTU using denoised FeatureTable data??

Thank you!

@1115 good questions!

Yes, see q2-quality-filter

It should work. dada2 and denoise (just like vsearch dereplicate) output a FeatureTable[Frequency] artifact and a FeatureData[Sequence] artifact. These are the same input types required for any of the OTU picking methods in q2-vsearch, e.g., this.

My guess is you were trying to use the denoised data as inputs to dereplicate-seqs? Don't do that — go straight to the clustering commands (denoised seqs are already dereplicated).

I hope that helps clarify!

Hi, @Nicholas_Bokulich! I really appreciate with your advice.
The first problem, filtering the quality is solved thanks to you.

The second one, however, still doesn’t work. I tried OTU clustering with command qiime vsearch cluster-features-closed-reference, and it shows this error message.

Traceback (most recent call last):
File “/home/microbiome/miniconda3/envs/qiime2/lib/python3.5/site-packages/q2_vsearch/_cluster_features.py”, line 275, in cluster_features_closed_reference
collapse_f = _collapse_f_from_sqlite(conn)
File “/home/microbiome/miniconda3/envs/qiime2/lib/python3.5/site-packages/q2_vsearch/_cluster_features.py”, line 97, in _collapse_f_from_sqlite
raise ValueError(“No sequence matches were identified by vsearch.”)
ValueError: No sequence matches were identified by vsearch.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/microbiome/miniconda3/envs/qiime2/lib/python3.5/site-packages/q2cli/commands.py”, line 274, in call
results = action(**arguments)
File “”, line 2, in cluster_features_closed_reference
File “/home/microbiome/miniconda3/envs/qiime2/lib/python3.5/site-packages/qiime2/sdk/action.py”, line 232, in bound_callable
output_types, provenance)
File “/home/microbiome/miniconda3/envs/qiime2/lib/python3.5/site-packages/qiime2/sdk/action.py”, line 367, in callable_executor
output_views = self._callable(**view_args)
File “/home/microbiome/miniconda3/envs/qiime2/lib/python3.5/site-packages/q2_vsearch/_cluster_features.py”, line 278, in cluster_features_closed_reference
raise VSearchError('No matches were identified to ’
q2_vsearch._cluster_features.VSearchError: No matches were identified to reference_sequences. This can happen if sequences are not homologous to reference_sequences, or if sequences are not in the same orientation as reference_sequences (i.e., if sequences are reverse complemented with respect to reference sequences). Sequence orientation can be adjusted with the strand parameter.

Plugin error from vsearch:

No matches were identified to reference_sequences. This can happen if sequences are not homologous to reference_sequences, or if sequences are not in the same orientation as reference_sequences (i.e., if sequences are reverse complemented with respect to reference sequences). Sequence orientation can be adjusted with the strand parameter.

See above for debug info.

BUT if I try a dereplicated from same study made by qiime vsearch dereplicate-sequences, it works! This is why I asked the first question. Is there a way to fix this problem? I really wanna use FeatureTable made by dada2 :frowning:

Thank you!

Wow — I have not seen that error message before. It is fairly self-explanatory but I cannot think of a reason why it would work with dereplicated sequences but not dada2. Perhaps the orientation of your sequences was somehow reversed? I will need a few things from you:

  1. the feature table and reference sequences that you are using as input
  2. the exact command that you are using when you get that error
  3. the exact command that you are using when you cluster the dereplicated sequences.
  4. Run qiime feature-table summarize on the feature table and attach it here, please
  5. Run qiime demux summarize on the sequences that you are inputting to dada2 and attach here, please.

I have used OTU clustering on dada2 sequences plenty of times without ever seeing this error, so this is not an incompatibility between the two. This should work but there is something wrong with your sequences.

OTU clustering is not necessary by any means after denoising your sequences with dada2. Denoising methods are, if anything, a replacement for OTU clustering methods, and tend to perform much better. So if it is dada2 denoising that matters to you and you are not explicitly trying to test out the effect of clustering after denoising, I recommend just proceeding with the dada2 results.

Thanks!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.