Deblur trim-length for ITS1 data

maoniu · April 18, 2018, 5:57pm

Hi,

I was wondering when using deblur denoising on ITS1 data (variable length) generated from Miseq 2x300bp run, should I set the --p-trim-length to some lower number to keep shorter ITS1 species (eg 150, with the cost of taxonomy resolution), or could I avoid trimming (-1)? What is the downside for not trimming to the same length in deblur?

Briefly, I imported into Qiime2 the demultiplexed pear-joined fastq files with the adapters and flanking SSU+5.8S sequences already trimmed. Starting from "qiime quality-filter q-score", I followed this tutorial: Alternative methods of read-joining in QIIME 2 — QIIME 2 2018.2.0 documentation.

Also, how can I view the length distribution of quality-filtered data? I only found quality plot using the following:
qiime demux summarize
--i-data fj-joined-demux.qza
--o-visualization fj-joined-demux.qzv

Thanks!

Nicholas_Bokulich · April 18, 2018, 11:51pm

I would try this both ways to see how it impacts the results. See below.

The downside is just that sequences of different lengths can wind up dereplicating as unique sequence variants even if they should technically bin together as a single variant (at different lengths, who knows).

With ITS1 I am always a bit nervous about paired-end data, just because it is a hypervariable region and I believe some clades do have ITS1 > 600 bp long. Long variants that are dropped due to lack of sufficient overlap will bias against these clades (length is variable but not randomly distributed across clades). I would pay close attention to how many sequences fail to merge — of course it could be due to low-quality sequences at the tips failing to overlap, but it could also be very long variants that fail to overlap. If you have some way to figure out what's what, it would be beneficial (to everyone who has this problem!)

We do not have a good way to look at length distribution in QIIME2 — currently. But it is on our radar and should be available in a future release. We will post back here when that feature is available.

For now, you could export those sequences and use something like stand-alone vsearch to get quick stats on sequence length (vsearch is included in the QIIME2 installation so that should be easy).

I hope that helps!

system · May 20, 2018, 5:55am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.