Denoising dude (reads F and R do not overlap)

glabher · August 7, 2019, 7:00pm

Hello there,

I do not know if it is possible to analyze V3-V4 regions from 16S if I have only 150bp-reads (amplicon V3-V4 had 500bp aprox.), so there is no overlapping between reads F and R. With DADA2 and vsearch I get an error, and only Deblur allowed me to do the denoise of the reads, but I am not sure if I am doing well. Is it considered single-end instead of paired end as there is no overlapping between reads F and R? Now we are performing the sequencing with 301 cycles, instead of 151.

Thank you in advance!!

G.

Mehrbod_Estaki · August 7, 2019, 8:25pm

Hi @glabher,
Welcome to the forum!
As you have already pointed out the V3-V4 region is longer than 2x150 bs reads meaning no merging can occur properly. Deblur actually only uses your forward reads (and discards the reverse) in denoising which is the reason it was able to complete without any errors. You should notice that the output of deblur should give you sequence that are <150 in length. If you repeat the cycles with 300 reads then you should be able to merge those reads successfully.

Mehrbod_Estaki · August 8, 2019, 6:28am

Just a comment on @timanix's suggestion here.

There's no need to re-import these as single reads, if you imported paired-end reads you can still use dada2-denoise-single or deblur and they will just use the forward reads, ignoring the reverse.
Concatenating non-merging reads is technically doable, for example as @timanix pointed out in Dada2, but I would strongly advise against this. These almost always produce subpar quality features, and the taxonomic assignments are particularly problematic (this isn't just an issue within Qiime2), in fact you most likely would only use your forward reads for taxonomic assignment anywhere.
I would recommend sticking either with the method you have already which is to just use your forward reads, or once you get your new run (300 cycles) back you can use conventional approaches to merge paired ends.

edit: This post was in reply to a previous comment that has been since removed by the user.

glabher · August 8, 2019, 12:56pm

Hi again,

Reading this paper (https://www.nature.com/articles/sdata20197.pdf), they conclude that V2-V3 regions has higher resolution for lower-rank taxa and it allows for a more precise distance-based clustering of reads into species-level OTUs, in comparison with V3-V4 regions. However, how accurate is to analyze a single region? I mean, I have a run with 150 cycles as I told you, so if only F reads are analyzed, only V3 region is covered. I do not know if there will be so many differences between analyzing V3 or V3-V4. Nevertheless, I will compare them.

Thanks!!

glabher · August 8, 2019, 12:57pm

Thank you very much @timanix @Mehrbod_Estaki!!! Now, I understand how to use them properly.

G.

Mehrbod_Estaki · August 8, 2019, 7:27pm

Hi @glabher,
It's good that you are reading into this problem further. You will soon read also about the fact that these types of studies are also specific to the sourced community in which they are comparing. For example, the V2-V3 region may be preferred for these lake water samples however they may not perform as well in samples where the community of bacteria are better captured by V3-V4 region for example. It all depends on your question at hand. Never the less it seems as though you will have no choice but to use the V3 region for now, which is totally fine still!

glabher · August 9, 2019, 1:47pm

Hi @Mehrbod_Estaki,

Thanks a lot! I am quite new in this field of 16S, but I really like it. Yes, now I will only use the V3 region, but I will repeat some samples with 300 cycles in order to know if there are too many differences with 150. I am working with human fecal samples, and most papers I have read analyze V3-V4 regions, hence my concern.

Regards,

Gema

timanix · August 9, 2019, 2:43pm

I am using V3-V4 300 nt paired reads, but one of my colleagues sequenced small dataset with similar samples and the same primers and got paired 150 nt reads, which do not overlap. I used only forward reads and analyzed them in Qiime2 to compare with my results with 300 nt merged reads, and the results were quite similar. And you are working with human microbiome which is the best of the studied, so you should have quite good results with this samples. IMHO.

system · September 12, 2019, 11:54am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.