Analysing merged sequences

Hi all,

I'm currently working with 16S datasets from bioreactors extracted from the NCBI database, and I have been having issues with some of them. A few authors decided to deposit merged sequences, instead of the paired-end raw data. I'm confused about how to proceed with classification because none of them describe if there is any processing on them already. Does anyone have any tips?

Cheers,
Carolinne

Hi @carolinnerdc,

This sounds challenging!

I would start by reviewing the papers to see where (and if!) they mention read joining. Its occasionally done as part of the demultiplexing process.

I would also look at the reads and see if they have a bridge where the quality is high between two sort of drop offs, which would indicate merged reads, or if it's just a quality drop at the end, which might suggest they only deposited single end reads. (Qiita only submits single end read, for example).

If you do have paired end reads, you may need to denoise with either deblur (q2-deblur) or using a technique like Unoise3.

Best,
Justine