Trimming long reads to a shorter region for a methods comparison

bosullivan · April 18, 2022, 3:59pm

Hi all,

I'm currently working on comparing two different approaches for 16S sequencing studies. For this comparison, I extracted bacterial DNA from ~50 samples. For each sample, I used two different primer pairs and sequenced them in different ways. The first is a fairly traditional V1-V3 approach with an Illumina MiSeq. The second approach uses a ~2,500 bp amplicon that encompasses the entire 16S rRNA gene, as well as the ITS region and part of the 23S rRNA gene, sequenced on a PacBio Sequell IIe.

One thing I want to look at is the potential for primer biases. To address this, my plan is to take the long PacBio reads and trim them to just the V1-V3 region, and then analyze both datasets the exact same way. My thinking is that the best time to do this would be before the DADA2 step, because sequences that might otherwise get thrown out could be retained as an ASV with the shorter sequence. Additionally, trimming the reads after DADA2 would likely result in several different ASVs having the same sequence.

Does anyone have any suggestions on the best way to go about this? I have a demultiplexed fastq file that I imported into Qiime2. The "qiime feature-classifier extract-reads" function does almost exactly what I want, but from what I understand, it only works on a rep-seqs.qza file that I get after running DADA2. Would "qiime cutadapt trim-single" accomplish what I want?

Thanks!