q2-feature-classifier for PR2 dataset

airbender97 · April 15, 2024, 8:16am

Hello,

I am trying to train q2-feature-classifier for PR2 dataset. I have some problems and some questions on my mind.

Firstly, I downloaded pr2_version_5.0.0_SSU_taxo_long.fasta.gz and pr2_version_5.0.0_taxonomy.xlsx files. There was a problem with importing pr2_version_5.0.0_SSU_taxo_long.fasta.gz file but I solved it simply gunzip the file.

I had another problem saying that:

There was a problem importing /home/mpolat/ANALYSIS/Qiime2/feature-classifier/pr2_version_5.0.0_taxonomy.txt:

/home/mpolat/ANALYSIS/Qiime2/feature-classifier/pr2_version_5.0.0_taxonomy.txt is not a(n) TSVTaxonomyFormat file:

['Feature ID', 'Taxon'] must be the first two header values. The first two header values provided are: ['domain', 'supergroup'] (on line 1).

I solved this by adding two headers namely Feature ID and Taxon with empty rows. I am not sure if this is the correct way to fix it...

I am trying to follow the instructions given in Training feature classifiers with q2-feature-classifier tutorial and trying to change the input according to my dataset.

Now I am trying to extract the reads by modifying the following command:

qiime feature-classifier extract-reads
--i-sequences 85_otus.qza
--p-f-primer GTGCCAGCMGCCGCGGTAA
--p-r-primer GGACTACHVGGGTWTCTAAT
--p-trunc-len 120
--p-min-length 100
--p-max-length 400
--o-reads ref-seqs.qza

My primers have more than 30nt so probably they contain non-biological sequences as well. The main problem starts with --p-trunc-len 120 --p-min-length 100 --p-max-length 400. I have no idea how to change them according to my dataset. I am open to any kind of help and suggestions

colinvwood · April 15, 2024, 4:21pm

Hello @airbender97,

First of all, I don't think that the taxonomy import went as planned. I took a look at the taxonomy you linked and it's not in the format that the TSVTaxonomyFormat expects. Take a look at the taxonomy linked in the moving pictures tutorial here (click on taxonomy.qzv). You'll have to convert the taxonomy into this format (don't worry about the confidence column).

As far as feature-classifier extract-reads goes, did you read the help text for each of the parameters?

airbender97 · April 16, 2024, 2:26pm

Thank you for your response @colinvwood.

I checked the PR2 database and tried several options to train the classifier. I downloaded the wrong database and taxa at the beginning.

system · May 17, 2024, 8:26pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.