Dada2 issues with denoising trimmed.qza from ITSxpress

mouldinator · October 9, 2018, 2:08pm

Hello!
I'm trying to run the following;

qiime itsxpress trim-single \
  --i-per-sample-sequences demux.qza \
  --p-region ALL \
  --p-taxa F \
  --o-trimmed trimmed.qza

echo "trimming complete"
echo "beginning denoise"
qiime dada2 denoise-single \
  --i-demultiplexed-seqs trimmed.qza \
  --p-trunc-len 0 \
  --output-dir dada2wrongout

the trim single command is slow (aprox 2hrs) but does produce the trimmed.qza output.
the dada2 denoise-single command produces the error No reads passed the filter. trunc_len (1) may be longer than read lengths, or other arguments (such as max_ee or trunc_q) may be preventing reads from passing the filter.

Heres an image of the demux.qzv

Any ideas why this might be happening?

ebolyen · October 9, 2018, 5:34pm

Hey @mouldinator,

The length of these reads is impossible for Illumina data, what kind of sequencing platform is this and what kind of data are you collecting? Are these genome contigs or something like a PacBio SMRT amplicon?

mouldinator · October 9, 2018, 6:15pm

Hey @ebolyen
They're nanopore data with 4k reads/fastq

ebolyen · October 9, 2018, 6:25pm

Hey @mouldinator,

I think your data is incompatible with DADA2, although this may have changed recently (@benjjneb are nanopore amplicons supported?)

You might try something simple like q2-vsearch for OTU picking which will probably work, although I have no idea what the runtime will be for such long sequences.

mouldinator · October 9, 2018, 7:21pm

@ebolyen
the aim here is to take the ITS regions (regardless of if they can be assigned a taxonomy by BLAST or something similar) and dump the rest of the data. The ITS sequences will then be given a nominal ID and metadata. Then, hopefully we will get repeats of the same unknown ITS sequences from several samples in different sequencing runs to a point of certainty. We can then send the data to UNITE and the world of Fungi grows. Any rough idea on how to go about that if DADA2 is off the table and q2-vsearch is just going to find known ITS's? (also, have I got the function of q2-vsearch right when i say that?)

Cheers!

ebolyen · October 10, 2018, 7:49pm

Almost, there's also a de-novo OTU picking pipeline which is probably most appropriate for this kind of task, or if you have other ideas for quality-control/feel particularly confident about your sequencing there's also just dereplicating and leaving it at that (giving you a table and rep-seqs).

mouldinator · October 11, 2018, 8:32am

I tried the following on a subsample of real data;
import, itsxpress trim-single, vsearch dereplicate-sequences, vsearch cluster-features-denovo. Took a while on the clustering but seemed to work! Would you suggest doing any further commands or is that as far as we need to go with it for what we're trying to achieve?

Thankyou so much for all your time and help, very greatly appreciated!!!

ebolyen · October 11, 2018, 5:06pm

Hey @mouldinator,

I think you should be basically all set. Hopefully the results came out well!

It occurs to me I have no idea if nanopore produces variable-length reads and I'm not sure how vsearch would handle that while clustering, so it might be worth double checking that your output seems reasonable. Unfortunately the q2 plugin doesn't report anything like an OTU map, so that may be easier said than done.

mouldinator · October 16, 2018, 12:35pm

The reads are indeed variable length... from 200bp to 50kb is the longest I remember seeing.
Thanks for the heads up! I'll put an example dataset here for the community to look at as I may have a different opinion of what is "reasonable" to fellow nanoers, also I am very green in the horn so I would really value your opinion on this too!

system · November 20, 2018, 12:22am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.