I have a dataset from QIAseq 16S/ITS Panels. This is based on a panel of variable regions i.e ITS, V1V2, V2V3, V3V4, V4V5, V5V7 and V7V9. I want to perform species/genus level classification.
I use the following script for each specific target region/primer-set. For example, for primers 341F 5'-CCTACGGGNGGCWGCAG-3' and 805R 5'-GACTACHVGGGTATCTAATCC-3'
You could make one classifier for each primer pair. At the very least, you should make separate classifiers for ITS and 16S, but you could also use a full-length 16S classifier (e.g., one of the pre-trained classifiers on the QIIME 2 website) to classify all of the 16S domains simultaneously.
Judging from the Qiagen website, it appears that they split the different primer pairs into separate datasets and classify these separately.
So I recommend you do the same. If primers are present in your reads, you can use q2-cutadapt to split up your sequences into groups prior to denoising. If not, denoise everything together and use qiime quality-control exclude-seqs to filter out individual amplicon sites by aligning against reference sequences trimmed to the same amplicon region.
Thank you Nichilas.
QIAGEN is using multiple primers for each target. For example, the following primers (modified for this post) were used for V1V2 amplification .
Which pair of primer set is more appropriate to consider for read extraction from a database!!!!
The primers are not actually targeting different sites. There are two things going on here and it would be possible to collapse all of those V1 forward and V2 reverse into two primers if you account for the following:
for some reason only some degenerate bases are listed (probably because they make a mixture of specific primers so that the degeneracy is constrained, instead of having all possible combinations of degenerate bases at each variable site). You can collapse these primers for your purposes if you account for all of the degeneracy at each site.
They use phased primers, i.e., primers targeting the same site but of different lengths, to improve sequencing cluster quality.
For your purposes you should:
pick the shortest primer
find all degenerate sites by comparing against the other primers targeting that region
That primer is what you should use for:
q2-cutadapt for trimming primers and adapters from your reads prior to denoising
for extracting reads from the reference database (if you choose to do so) for taxonomic classification