Any workflow for paired reads that don't overlap?

mamillerpa · December 15, 2017, 5:07pm

I'm analyzing a paired end 515F-926R bacterial dataset for a collaborator, and I believe there are some reads that don't overlap. It's 250 nt reads from a MiSeq.

Specifially, I have done BLASTs and QIIME 1 rtax analyses that lead me to believe that >70% of the amplicons in some of the samples are ~ 550 nt fragments of fungal 18s, specifically from Pichia spp. Since 250 nt reads from a 550 nt amplicon don't overlap, I can't classify these reads with a DADA2 feature table, can I?

Is there some other way I can build a feature table in QIIME 2, including any paired but non-overlapping reads, and then classify with Silva's all SSU database?

thanks,
Mark

Nicholas_Bokulich · December 15, 2017, 8:47pm

Hi @mamillerpa,

Do you really want to include these in your analysis? As you are using (presumably specifically) bacterial primers, it sounds like these are effectively non-target hits; if you are hoping that you may be able to profile fungal communities simultaneously using these sequences, I would be skeptical of the results unless if you know that these primers are designed to have broad coverage of fungi as well. But I may be misunderstanding the experimental design and specificity of your primers.

No, you are correct, dada2 will discard all sequences that fail to overlap. In your case, and based on my assumption above, I would say that this is beneficial, as these fungal hits are non-target and should be excluded.

I think not. QIIME2 does not have a method like RTAX, nor does it have support for the sort of data type you are describing (unjoined paired reads). If you want to include these fungal sequences in your analysis, I am afraid you will need to use only the forward reads and proceed as if you were using single-end data.

Sorry I can't offer a better solution!

mamillerpa · December 15, 2017, 8:53pm

Thanks. I appreciate that this isn't a suitable way to accurately tabulate all of the bacteria and all of the fungi that are present. I was just looking for a way to account for the low number of reads that can be classified as any bacterium at all, without using any tools outside of QIIME 2.

I'll just try it with the forward-only data as you suggested.

Nicholas_Bokulich · December 15, 2017, 9:09pm

Got it, thanks for clarifying!

Yes, I think single-end reads are the way to achieve what you are after, and that information should still be enough to determine the composition of non-bacterial reads with the SILVA all SSU database (if these are indeed all 18S reads).

Good luck!

system · January 16, 2018, 3:09am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.