I am new to qiime world and I am trying to train my taxonomic classifier for different data sets with different primers and different amplifying regions. In the Extract reference reads, primer sequence information and different parameters related to length are needed. So, how to proceed further??
Hi @kundan.bi , welcome to the forum!
It’s not clear exactly what you’re asking. Have you looked over the feature classifier training tutorial? There’s a section on how to extract reference reads, and lots of other useful information.
If you still need help once you’ve gone through that, please try to make your question as clear and specific as possible, and we’ll do our best to help.
hello @ChrisKeefe , thank you for the reply.
my question is regarding this step
qiime feature-classifier extract-reads
I am using greengene database and its taxonomy information to classify my sequencing data. So my input for the above step is --i-sequences greengenes.qza.
My questions is what should be --p-f-primer, --p-r-primer, p-trunc-len, --p-min-length, --p-max-length.
My sequencing data has been generated by using different primers.
thank you again for the reply.
This is essential to know the expected length distributions of any given amplicon, and that information can be retrieved in a number of ways, e.g., from the literature.
As @ChrisKeefe already mentioned, those parameter settings and how to choose them are described in the tutorials and help documentation for the plugin methods, so please see those tutorials for general information.
But the specifics of your primers, amplicons, and experiments are best known to you so beyond the general information given in the tutorial we cannot provide too many specifics (e.g., we cannot do a literature search for you to find expected amplicon length distributions for a given primer pair, also because the effective length depends on your experimental setup).
As far as I understand your problem is that you have different amplicons in your dataset while
qiime feature-classifier extract-reads is asking for a single
--p-f-primer, a single
--p-r-primer, and so on.
This is happening because at this point it is assumed you are analyzing a single amplicon (a region spanned by a single pair of primers), so maybe you should first (before taxonomic assignment) separate your dataset in subsets containing a single amplicon of interest.
You can do this by using q2-cutadapt and “demultplexing” your reads with primers as barcodes. Probably you should do this right after sample demultiplexing. Then, after this double demultiplexing you should have a “sample” for each pair of sample-amplicon. Now for each amplicon you will have a particular dataset that you will run through your pipeline in parallel, including the
qiime feature-classifier extract-reads step, where you will be able to specify the parameters for each amplicon dataset.
Hope this helps!