Hello,
I am trying to train q2-feature-classifier
for PR2 dataset. I have some problems and some questions on my mind.
Firstly, I downloaded pr2_version_5.0.0_SSU_taxo_long.fasta.gz and pr2_version_5.0.0_taxonomy.xlsx files. There was a problem with importing pr2_version_5.0.0_SSU_taxo_long.fasta.gz file but I solved it simply gunzip the file.
I had another problem saying that:
There was a problem importing /home/mpolat/ANALYSIS/Qiime2/feature-classifier/pr2_version_5.0.0_taxonomy.txt:
/home/mpolat/ANALYSIS/Qiime2/feature-classifier/pr2_version_5.0.0_taxonomy.txt is not a(n) TSVTaxonomyFormat file:
['Feature ID', 'Taxon'] must be the first two header values. The first two header values provided are: ['domain', 'supergroup'] (on line 1).
I solved this by adding two headers namely Feature ID and Taxon with empty rows. I am not sure if this is the correct way to fix it...
I am trying to follow the instructions given in Training feature classifiers with q2-feature-classifier tutorial and trying to change the input according to my dataset.
Now I am trying to extract the reads by modifying the following command:
qiime feature-classifier extract-reads
--i-sequences 85_otus.qza
--p-f-primer GTGCCAGCMGCCGCGGTAA
--p-r-primer GGACTACHVGGGTWTCTAAT
--p-trunc-len 120
--p-min-length 100
--p-max-length 400
--o-reads ref-seqs.qza
My primers have more than 30nt so probably they contain non-biological sequences as well. The main problem starts with --p-trunc-len 120
--p-min-length 100
--p-max-length 400
. I have no idea how to change them according to my dataset. I am open to any kind of help and suggestions