How to train a classifier for paired end reads from QIAseq 16S/ITS Panel data using QIIME2 feature classifier

adinasarapu · June 24, 2019, 12:28am

Hello,

I have a dataset from QIAseq 16S/ITS Panels. This is based on a panel of variable regions i.e ITS, V1V2, V2V3, V3V4, V4V5, V5V7 and V7V9. I want to perform species/genus level classification.

https://www.qiagen.com/us/products/next-generation-sequencing/qiaseq-16s-its-index-kits/#productdetails

I use the following script for each specific target region/primer-set. For example, for primers 341F 5'-CCTACGGGNGGCWGCAG-3' and 805R 5'-GACTACHVGGGTATCTAATCC-3'

qiime feature-classifier extract-reads
–i-sequences repseqs.qza
–p-f-primer CCTACGGGNGGCWGCAG
–p-r-primer GACTACHVGGGTATCTAATCC
–o-reads seqs_extracted.qza

However,QIAseq (and ) data is based on multiple primers (i.e for each sample, I have 7 fastq files; ITS, V1V2, V2V3, V3V4, V4V5, V5V7 and V7V9).

How to extract reads using multiple primer pairs? What is the best way to classify a data like QIAseq 16S/ITS Panel that uses multiple-primers?

Thank you,

Nicholas_Bokulich · June 24, 2019, 11:41am

You could make one classifier for each primer pair. At the very least, you should make separate classifiers for ITS and 16S, but you could also use a full-length 16S classifier (e.g., one of the pre-trained classifiers on the QIIME 2 website) to classify all of the 16S domains simultaneously.

Judging from the Qiagen website, it appears that they split the different primer pairs into separate datasets and classify these separately.

So I recommend you do the same. If primers are present in your reads, you can use q2-cutadapt to split up your sequences into groups prior to denoising. If not, denoise everything together and use qiime quality-control exclude-seqs to filter out individual amplicon sites by aligning against reference sequences trimmed to the same amplicon region.

I hope that helps!

adinasarapu · June 24, 2019, 3:04pm

Thank you Nichilas.
QIAGEN is using multiple primers for each target. For example, the following primers (modified for this post) were used for V1V2 amplification .

Which pair of primer set is more appropriate to consider for read extraction from a database!!!!

V1V2 Region	V1_Forward_0	AGRGTTTGATYMTGGCTC
V1V2 Region	V1_Forward_01	GAGRGTTTGAAYMTGGCTC
V1V2 Region	V1_Forward_02	ctAGRGTTTGATYMTGGCAC
V1V2 Region	V1_Forward_03	XXXXXXXXXXXXXXXXXXXXX
V1V2 Region	V1_Forward_04	GCttAGRGTTTGATYMTGGCTC
V1V2 Region	V1_Forward_05	TGCtcAGRGTATGATYMTGGCTC
V1V2 Region	V1_Forward_06	XXXXXXXXXXXXXXXXXXXXXXXX
V1V2 Region	V1_Forward_07	TATcCAcAGRGTTTGATYMTGGCTC
V1V2 Region	V1_Forward_08	CTATGCAcAGRGTTTGATYMTGGCTC
V1V2 Region	V1_Forward_09	GCTATGCAcAGRGTTTGATYMTGGCTC
V1V2 Region	V1_Forward_10	XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
V1V2 Region	V1_Forward_11	TAGCTAcGCAcAGRGTTTGATYMTGGCTC
V1V2 Region	V2_Reverse_0	CTGCTGCCTYCCGTA
V1V2 Region	V2_Reverse_01	TCTGCTGACTYCCGTA
V1V2 Region	V2_Reverse_02	GgCTGCTGCCTYCCGTA
V1V2 Region	V2_Reverse_03	XXXXXXXXXXXXXXXXXXX
V1V2 Region	V2_Reverse_04	AaGTCTGCTGCCTYCCGTA
V1V2 Region	V2_Reverse_05	TAaGTCAGCTGCCTYCCGTA
V1V2 Region	V2_Reverse_06	XXXXXXXXXXXXXXXXXXXXX
V1V2 Region	V2_Reverse_07	GCAAaaTCTGCTGACTYCCGTA
V1V2 Region	V2_Reverse_08	XXXXXXXXXXXXXXXXXXXXXXXX
V1V2 Region	V2_Reverse_09	TAGCaACATCTGCTGACTYCCGTA

Thank you,
Ashok

Nicholas_Bokulich · June 24, 2019, 3:23pm

The primers are not actually targeting different sites. There are two things going on here and it would be possible to collapse all of those V1 forward and V2 reverse into two primers if you account for the following:

for some reason only some degenerate bases are listed (probably because they make a mixture of specific primers so that the degeneracy is constrained, instead of having all possible combinations of degenerate bases at each variable site). You can collapse these primers for your purposes if you account for all of the degeneracy at each site.
They use phased primers, i.e., primers targeting the same site but of different lengths, to improve sequencing cluster quality.

For your purposes you should:

pick the shortest primer
find all degenerate sites by comparing against the other primers targeting that region

That primer is what you should use for:

q2-cutadapt for trimming primers and adapters from your reads prior to denoising
for extracting reads from the reference database (if you choose to do so) for taxonomic classification

Good luck!

Nicholas_Bokulich · June 29, 2019, 2:59pm

A post was split to a new topic: ITS classification only kingdom level

system · July 30, 2019, 8:59pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.