Extracting V4 region from V3-V4 data

Hi

I have two different 16S rRNA gene data sets of the microbiome of a species

i) First dataset is only V4 region (using #515f- TGCCAGCMGCCGCGGTAA and #806r GGACTACNNGGGTATCTAAT primer)
ii). 2nd dataset is V3-V3 region (using standard Illumina primers Forward PrimerTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGNGGCWGCAG and Reverse Primer = TCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACTACHVGGGTATCTAATCC)

I want to classify only the V4 region from both data sets,
Would it be possible to extract the V4 region form the 2nd data set before denoising reads with DADA2?

Looking forward to hearing from you

Thanks in advanced

Ashutosh

Hi @ashutosh,
Yes, you will certainly be able to accomplish this using cutadapt trim-paired, -> tutorial here. I would recommend running the 2 datasets separately (in fact for downstream use with DADA2 this is required) and use cutadapt to remove the 515F/806R primers from both runs. I also like to add on the --p-discard-untrimmed parameter to get rid of any reads that didn’t have these primers in them, but that’s a personal choice.
Your remaining reads should all be to that specific region.
I’ll also point you to this previous discussion on analyzing data from different regions which may of interest.

Good luck!

1 Like

Hi @Mehrbod_Estaki
Thank you so much for your nice suggestions
I will give a go

Thanks again

Ashutosh

1 Like