I have about 500 fastq files and each file contains the following 5 different types of amplicon sequences.
16S having 1 pair of primer
18S having 1 pair of primer
ITS having 1 pair of primer
COI having 2 pairs of primers
Chloroplast having 3 pairs of primers
Given that, I need to separate the reads and make 5 independent fastq files from each fastq file. Is there a way in qiime2 where I can extract the reads based on primers? or any other method to accurately pull the respective reads?
I am not sure how to handle this but what I would try:
Import samples to Qiime2
Use cutadapt to trimm primers with "--p-discard-untrimmed" and (optionally) "--p-minimum-length" parameters. The idea is that cutadapt will discard any sequence without primers, and if you run it for each primer set you used, it will do the job.
PS. To all: Please feel free to jump in with better suggestions. I am also curious.
I step in just to say I agree with @timanix - the easiest option here is to use q2-cutadapt.
Also, I see here that your idea is to have one FASTQ file for COI (including amplicons derived from both primer sets) and another FASTQ file for Chloroplast (inlcuding the three amplicons):
I think each amplicon should be treated separately. So you would end up with 8 groups instead of 5, one for each primer set.
If you still want to merge amplicons, you can always use qiime feature-table merge and qiime feature-table merge-seqs once you build your feature table (suggested by @llenzi - thanks for that!).
I want to add that if you merge amplicons from different primer sets and decide to get beta diversity PCoA, don't be surprised if you get a Y shape or see samples clustering to an almost perfect line. Different primers produce different amplicons, which will be denoised to different ASVs, excluding overlaps between different sets of primers.