Regarding issue in reference dataset

Hi everyone!
I was trying to train the classifier in QIIME2, but facing difficulty in reference sequences belonging to 97_otus.fasta.

Input:
$ qiime tools import --type ‘FeatureData[Sequence]’ --input-path 97_otus.fasta --output-path 97_otus.qza

Error message:

There was a problem importing 97_otus.fasta:

97_otus.fasta is not a(n) DNAFASTAFormat file:

Invalid characters on line 2 (does not match IUPAC characters for a DNA sequence).

(qiime2-2019.7)

Kindly suggest how I can fix this error.
I have also tried working with other reference sequences i.e., 99_otus.fasta and 85_otus.fasta, which were found to be working fine. But, during one of the following commands shown below (using 99_otus.qza), I got the following error response:

Input:
$ qiime feature-classifier extract-reads
–i-sequences 99_otus.qza
–p-f-primer TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG
–p-r-primer GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG
–p-trunc-len 530
–p-min-length 390
–p-max-length 600
–o-reads ref-seqs.qza

Output:
Plugin error from feature-classifier:
No matches found
Debug info has been saved to /tmp/qiime2-q2cli-err-l_x18rdb.log
(qiime2-2019.7)

Kindly suggest how I can fix these errors? Thank you very much in advance.

Hello Ashish,

Thanks for posting your full commands and errors. Let’s start with the first error:

97_otus.fasta is not a(n) DNAFASTAFormat file:

I wonder if the file is corrupted in some way, especially as the other files work fine. Type this to show the first few lines of the file and post them here.

head 97_otus.fasta

Now on to the second error!

No matches found

This one is easy! Based on the settings you provided, those primers didn’t match any reads in 99_otus.qza. :cry:

Which region are your primers trying to amplify? Maybe we could get a recommendation from another user?

Colin

1 Like

If I’m not mistaken these sequences

are not biological primers but rather the adapter sequences, so naturally there are no hits in the reference database. You should replace those with your actual loci-specific primer sequences.
Also worthy of note that your --p-trunc-len 530 parameter may be too high for most paired-end illumina primers. This may lead you to discarding most if not all your matching sequences and then you will get another error hinting at having an empty file.

2 Likes

Great catch @Mehrbod_Estaki!

Ashish, if you want to remove adapters, try this:
https://docs.qiime2.org/2019.7/plugins/available/cutadapt/trim-paired/

Colin

1 Like

Thank you Sir for the suggestion. Mistakenly I was using the aligned file of 97_otus.fasta, which led to this error. The next command also worked as I removed the adaptor sequences from the locus-specific primers for V3 and V4 amplification.
Thanks!:slightly_smiling_face:

1 Like

Yes Sir, the resulting nucleotide sequences was indeed a combination of both adaptors and locus specific primers. I have now removed the adaptor sequences and used the locus specific primers for analysis. I have modified the --p-trunc-len to 460 with the min and max values of 390 and 550. I have set these values based on info given in tutorial for the setting of these parameters. Kindly correct me if you feel something fishy in these parameters.
Thanks for the help.:slightly_smiling_face:

2 Likes