How to train a classifier for paired end reads from QIAseq 16S/ITS Panel data using QIIME2 feature classifier

Hello,

I have a dataset from QIAseq 16S/ITS Panels. This is based on a panel of variable regions i.e ITS, V1V2, V2V3, V3V4, V4V5, V5V7 and V7V9. I want to perform species/genus level classification.

https://www.qiagen.com/us/products/next-generation-sequencing/qiaseq-16s-its-index-kits/#productdetails

I use the following script for each specific target region/primer-set. For example, for primers 341F 5’-CCTACGGGNGGCWGCAG-3’ and 805R 5’-GACTACHVGGGTATCTAATCC-3’

qiime feature-classifier extract-reads
–i-sequences repseqs.qza
–p-f-primer CCTACGGGNGGCWGCAG
–p-r-primer GACTACHVGGGTATCTAATCC
–o-reads seqs_extracted.qza

However,QIAseq (and ) data is based on multiple primers (i.e for each sample, I have 7 fastq files; ITS, V1V2, V2V3, V3V4, V4V5, V5V7 and V7V9).

How to extract reads using multiple primer pairs? What is the best way to classify a data like QIAseq 16S/ITS Panel that uses multiple-primers?

Thank you,

You could make one classifier for each primer pair. At the very least, you should make separate classifiers for ITS and 16S, but you could also use a full-length 16S classifier (e.g., one of the pre-trained classifiers on the QIIME 2 website) to classify all of the 16S domains simultaneously.

Judging from the Qiagen website, it appears that they split the different primer pairs into separate datasets and classify these separately.

So I recommend you do the same. If primers are present in your reads, you can use q2-cutadapt to split up your sequences into groups prior to denoising. If not, denoise everything together and use qiime quality-control exclude-seqs to filter out individual amplicon sites by aligning against reference sequences trimmed to the same amplicon region.

I hope that helps!

Thank you Nichilas.
QIAGEN is using multiple primers for each target. For example, the following primers (modified for this post) were used for V1V2 amplification .

Which pair of primer set is more appropriate to consider for read extraction from a database!!!

V1V2 Region V1_Forward_0 AGRGTTTGATYMTGGCTC
V1V2 Region V1_Forward_01 GAGRGTTTGAAYMTGGCTC
V1V2 Region V1_Forward_02 ctAGRGTTTGATYMTGGCAC
V1V2 Region V1_Forward_03 XXXXXXXXXXXXXXXXXXXXX
V1V2 Region V1_Forward_04 GCttAGRGTTTGATYMTGGCTC
V1V2 Region V1_Forward_05 TGCtcAGRGTATGATYMTGGCTC
V1V2 Region V1_Forward_06 XXXXXXXXXXXXXXXXXXXXXXXX
V1V2 Region V1_Forward_07 TATcCAcAGRGTTTGATYMTGGCTC
V1V2 Region V1_Forward_08 CTATGCAcAGRGTTTGATYMTGGCTC
V1V2 Region V1_Forward_09 GCTATGCAcAGRGTTTGATYMTGGCTC
V1V2 Region V1_Forward_10 XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
V1V2 Region V1_Forward_11 TAGCTAcGCAcAGRGTTTGATYMTGGCTC
V1V2 Region V2_Reverse_0 CTGCTGCCTYCCGTA
V1V2 Region V2_Reverse_01 TCTGCTGACTYCCGTA
V1V2 Region V2_Reverse_02 GgCTGCTGCCTYCCGTA
V1V2 Region V2_Reverse_03 XXXXXXXXXXXXXXXXXXX
V1V2 Region V2_Reverse_04 AaGTCTGCTGCCTYCCGTA
V1V2 Region V2_Reverse_05 TAaGTCAGCTGCCTYCCGTA
V1V2 Region V2_Reverse_06 XXXXXXXXXXXXXXXXXXXXX
V1V2 Region V2_Reverse_07 GCAAaaTCTGCTGACTYCCGTA
V1V2 Region V2_Reverse_08 XXXXXXXXXXXXXXXXXXXXXXXX
V1V2 Region V2_Reverse_09 TAGCaACATCTGCTGACTYCCGTA

Thank you,
Ashok

The primers are not actually targeting different sites. There are two things going on here and it would be possible to collapse all of those V1 forward and V2 reverse into two primers if you account for the following:

  1. for some reason only some degenerate bases are listed (probably because they make a mixture of specific primers so that the degeneracy is constrained, instead of having all possible combinations of degenerate bases at each variable site). You can collapse these primers for your purposes if you account for all of the degeneracy at each site.
  2. They use phased primers, i.e., primers targeting the same site but of different lengths, to improve sequencing cluster quality.

For your purposes you should:

  1. pick the shortest primer
  2. find all degenerate sites by comparing against the other primers targeting that region

That primer is what you should use for:

  1. q2-cutadapt for trimming primers and adapters from your reads prior to denoising
  2. for extracting reads from the reference database (if you choose to do so) for taxonomic classification

Good luck!

2 Likes

A post was split to a new topic: ITS classification only kingdom level

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.