qiime tools import error

Hi, I was trying to import my reference Database pr2 and encountered the following error:

my command is:
qiime tools import --type ‘FeatureData[Sequence]’ --input-path pr2_version_4.12.0_16S_dada2.fasta --output-path pr2_16S.qza

And this is the error:

There was a problem importing pr2_version_4.12.0_16S_dada2.fasta:

pr2_version_4.12.0_16S_dada2.fasta is not a(n) DNAFASTAFormat file:

ID on line 5 is a duplicate of another ID on line 3.

Is it possible that I am using the wrong import type?

Thanks a lot in advance.

Hello @Clarissa. That error message is saying the sample ID on the third line of your file is duplicated on the fifth line of your file which is disallowed. Have you checked the file to verify whether or not this is actually the case?

Hello @Oddant1, thank you for your quick response. I should have specified this in my first post: I already tried deleting the duplicated lines, but as soon as those were deleted the error showed a new pair of duplicated lines.

  1. attempt: There was a problem importing test_pr2_version_4.12.0_16S_dada2.fasta:

test_pr2_version_4.12.0_16S_dada2.fasta is not a(n) DNAFASTAFormat file:

ID on line 18 is a duplicate of another ID on line 16.

  1. attempt: There was a problem importing test_pr2_version_4.12.0_16S_dada2.fasta:

test_pr2_version_4.12.0_16S_dada2.fasta is not a(n) DNAFASTAFormat file:

ID on line 31 is a duplicate of another ID on line 29.

Since it´s a database I am not sure if I can just delete the duplicated lines.

Hello @Clarissa. I looked at the file you’re trying to import (another admin sent me a link to the database), and it looks like the sample ids here are phylogenies. So I suppose the duplicated sample ids are sequences from the same phylogeny. It does seem inadvisable to remove any samples from the file in this case.

What are you intending to use this database for downstream? That could help inform how to proceed with the import of this file.

Hello @Oddant1,
I am analyzing phytoplankton samples using Qiime2. My goal is the taxonomic classification.
I want to use different databases to compare the taxa afterwards. Therefore I am trying to import this database to extract reference reads and then train the classifier to apply it to my samples and gain my Taxonomy list.

Is this useful? Thanks a lot!

Hi @Clarissa! Are you using the PR2 files from here:

I think they will require some intermediate processing before importing into QIIME 2, namely the taxonomy and ref seqs will need to be split into two separate files. Right now the files look like this:

>Eukaryota;Alveolata;Apicomplexa;Coccidiomorphea;Eimeriida;Eimeriidae;Eimeria;Eimeria_sp.;
GAGAGTTTGATCCTGGCTCAGGATGAACGCACAAGACGTGCCTAACACATGCAAATCGAATGAAAATAATTAAATTGTTTTCATGGTGAACGGGTGAGTAATACATGAGAATCTACTTTTAGATAAGGCAAAATAAAAGTAATATTTTGGTAATTCCTT
>Eukaryota;Alveolata;Apicomplexa;Coccidiomorphea;Eimeriida;Eimeriidae;Eimeria;Eimeria_sp.;
TGCAAATCGAATGGGAGACAGCGAGCTCTCATGGTGAACGGGTGAGTATAACATGAGAATCAACCATTAGACAAGGTATAACAAGAGGAAACTTTTAGTAATCCCTTATATGCGCAGTAGTGCTGAGAAAGGTTAATTTAATTAGATAAGGACCGTCTA
>Eukaryota;Alveolata;Apicomplexa;Colpodellidea;Vitrelladida;Vitrellaceae;Vitrella;Vitrella_brassicaformis;
AGGGTTTGATCCTGGCTCAGGATGAACGCGAGTCGGCGTGCCTAACACATGCAAGTCGTATGGGGCTTCGGCCTCATGGCGTACGGGTGAGTAACACGTGGGCACCTGCCCCCAGATGGGGTATAAGGGAGGGAAACCTCCGGTAAACCCCCATGGGCG
...

But will need to be formatted as two separate files,

sequences:

>feature1
GAGAGTTTGATCCTGGCTCAGGATGAACGCACAAGACGTGCCTAACACATGCAAATCGAATGAAAATAATTAAATTGTTTTCATGGTGAACGGGTGAGTAATACATGAGAATCTACTTTTAGATAAGGCAAAATAAAAGTAATATTTTGGTAATTCCTT
>feature2
TGCAAATCGAATGGGAGACAGCGAGCTCTCATGGTGAACGGGTGAGTATAACATGAGAATCAACCATTAGACAAGGTATAACAAGAGGAAACTTTTAGTAATCCCTTATATGCGCAGTAGTGCTGAGAAAGGTTAATTTAATTAGATAAGGACCGTCTA
>feature3
AGGGTTTGATCCTGGCTCAGGATGAACGCGAGTCGGCGTGCCTAACACATGCAAGTCGTATGGGGCTTCGGCCTCATGGCGTACGGGTGAGTAACACGTGGGCACCTGCCCCCAGATGGGGTATAAGGGAGGGAAACCTCCGGTAAACCCCCATGGGCG
...

and taxonomy:

id           taxon
feature1     Eukaryota;Alveolata;Apicomplexa;Coccidiomorphea;Eimeriida;Eimeriidae;Eimeria;Eimeria_sp.;
feature2     Eukaryota;Alveolata;Apicomplexa;Coccidiomorphea;Eimeriida;Eimeriidae;Eimeria;Eimeria_sp.;
feature3     Eukaryota;Alveolata;Apicomplexa;Colpodellidea;Vitrelladida;Vitrellaceae;Vitrella;Vitrella_brassicaformis;
...

You might be able to write up a script to assist with that (if you’re comfortable with SQL, the PR 2 folks provide a well-structure sqlite database, that’s probably how I would tackle this problem).

Keep us posted! :qiime2: :t_rex:

Hi,
I formatted the File into two separate files and it worked! Thank you so much for your help!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.

A post was merged into an existing topic: error in qiime tools import