How to analyze sequence data with 2 primers targeting 2 different gene regions multiplexed into one run?

ylor · April 22, 2022, 3:51pm

Hello,

It's been several years since I've posted on the forum and it's changed a lot, so I hope this is posted in the right place. I have a sequencing run that contains sequence data from 2 different primers that amplify 2 different gene regions targeting the same group of species. We first ran 2 different PCRs for each sample then did a second PCR to attach the Illumina overhang adapter sequences. The amplicons from both primers were pooled together by sample then we used Illumina tags to index our samples. These samples were then pooled into one library and sequenced on the MiSeq (2x250 bp). I now have demultiplexed fastq files that are labeled by sample but contain reads from both primers in each sample fastq file. What is the best/easiest way to go about analyzing the sequence data?

Should I:

Option 1: pre-filter the fastq files for each primer set first then import the separate files into QIIME2 and use cutadapt then DADA2? What would be the best method of sorting the fastq files per primer set? However, according to this forum thread Analyzing Different sample batches in the same sequencing run, it is not recommended to split the sequence run data because DADA2 will run better with with more samples from the same run.

or

Option 2: import into QIIME2 as is with sequence data from both primers, use cutadapt to trim both primer sets then DADA2? This approach was recommended in this this forum Separating two different amplicons from demultiplexed data. The only problem with this option is that I export the rep seqs file and use standalone blastn because we are interested to see what species we can detect with these primers so it doesn't make sense to spend time to make a reference database that encompasses everything in GenBank when I can just use standalone blast to do the same job. Therefore, if I trim the primers, I won't be able to differentiate which reads belong to which primer set after DADA2...

I want to be able to differentiate the taxonomy assignment per primer set downstream so I am worried about not being able to differentiate which sequence belongs to which primer set if I trim with cutadapt before DADA2.

If I missed any relevant forums related to my questions, please share them with me.

Thank you for reading through my long post!

Yer

SoilRotifer · May 23, 2022, 6:29pm

Hi @ylor,

Not necessarily so. I typically separate out my different amplicons / data sets prior to denoising. IMO, as long as you have enough data for the denoiser to make reasonable estimates, you should be fine. I doubt any differences would be significant. Also, many users perform additional taxonomic and sequence based quality control in addition to denoising anyway.

Just run cutadapt trim-paired separately for each primer pair with the --p-discard-untrimmed option set. This flag will force cutadapt to only write out the paired-end reads to file in which the primers were detected and trimmed. This way you have have a file for each primer set, and you can keep track of everything that way.

ylor · May 25, 2022, 2:54pm

Hi @SoilRotifer,

Thank you so much for your insight! I completely missed the --p-discard-untrimmed option. I will go ahead and process my datasets as you have suggested.

Thank you!

Yer