Error importing SILVA_138_SSURef_tax_silva.fasta.gz database into QIIME 2

Hi,

I am new to QIIME2,
I am failing to import SILVA_138_SSURef_tax_silva.fasta.gz database into QIIME 2
I downloaded the database into my working directory from
wget https://www.arb-silva.de/fileadmin/silva_databases/release_138/Exports/SILVA_138_SSURef_tax_silva.fasta.gz.
I unzipped the file using gunzip SILVA_138_SSURef_tax_silva.fasta.gz

The script I am using is;

qiime tools import --input-path ./SILVA_138_SSURef_tax_silva.fasta --type 'FeatureData[Sequence]' --output-path ./q2files/SSUreferenceSeqs.qza
echo; echo "ASV table, ASV sequences, and reference sequences have been converted to .qza"

I get this error;
SILVA_138_SSURef_tax_silva.fasta.gz is not a(n) DNAFASTAFormat file:
First line of file is not a valid description. Descriptions must start with '>'
Invalid character 'U' at position 2 on line 2 (does not match IUPAC characters for this sequence type
this is how the head of the sequence looks

AB000106.1.1343 Bacteria;Proteobacteria;Alphaproteobacteria;Sphingomonadales;Sphingomonadaceae;Sphingobium;Sphingomonas sp.

GGAAUCUGCCCUUGGGUUCGGAAUAACGUCUGGAAACGGACGCUAAUACCGGAUGAUGACGUAAGUCCAAAGAUUUAUCG

CCCAGGGAUGAGCCCGCGUAGGAUUAGCUAGUUGGUGAGGUAAAGGCUCACCAAGGCGACGAUCCUUAGCUGGUCUGAGA

I tried following the tutorials on similar topic but still i was not able to resolve the error
Thank you for your help

1 Like

Hi @Nde,

Welcome to the Qiime2 forum :qiime2:.

The error above is telling you that the files you have downloaded and are trying to import are not in the correct format.

However, do not worry, the very helpful people at Qiime2 do supply the correctly formatted ones, as do the people at the Silva database.

Qiime2 Data resources page has plenty of information about the files and supplies both the sequence and taxonomy files as .qza files. Just scroll down to the Silva (16S/18S rRNA) section and you'll see them.

Then you can move to the extract reads step or training step as desired (information on how to perform those steps is here )

hope that helps, happy :qiime2:ing.

Vic :+1:

4 Likes

Hi @Nde,

I just wanted to add @buzic's great comments. Fortunately you do not need to do any of this yourself...

You can also install the RESCRIPt plugin to download and curate the SILVA database to suite your needs, as outlined in this tutorial.

Note, if you are running the latest version of qiime (qiime2-amplicon-2023.9 ), you can follow these install instructions.

4 Likes

Hi ,

Thank you for the response.
I am sorry to ask, but how do I download the database to the cluster?
Also, do I still need to use "Rescript" to process the database?

I'd work with your cluster admins for help with such things.

You can copy the links to either the SILVA and/or Greengenes2 classifier files, from the Data resources page that was linked earlier. Then, while logged into your compute cluster, you can run the wget command as you've done before.

Only if you want to curate the database yourself and want to run through the linked tutorial.

2 Likes

Sorry I have still not been able to get it right

There was a problem importing ./SILVA_138_SSURef_tax_silva.fasta.gz:

SILVA_138_SSURef_tax_silva.fasta.gz is not a(n) DNAFASTAFormat file:

First line of file is not a valid description. Descriptions must start with '>'

I downloaded the file from the SILVA website as suggested

wget https://www.arb-silva.de/fileadmin/silva_databases/release_138.1/Exports/SILVA_138.1_SSURef_tax_silva.fasta.gz

Thank you for your help

Actually, I was suggesting that you download the pre-formatted classifiers from the QIIME 2 Data resources page that @buzic linked to. That is, the qza files.

Again, if you truly want to import and format the SILVA database yourself, then you'll need to install RESCRIPt plugin and follow the tutorial.

1 Like

Hello,

Thank you for your continuous assistance, I really appreciate
So I am trying to download this

Using wget https://data.qiime2.org/2022.2/command/silva-138-99-tax-515-806.qza

But I get this error message

--2023-11-20 11:31:59-- https://data.qiime2.org/2022.2/command/silva-138-99-tax-515-806.qza
Resolving data.qiime2.org (data.qiime2.org)... 54.200.1.12
Connecting to data.qiime2.org (data.qiime2.org)|54.200.1.12|:443... connected.
HTTP request sent, awaiting response... 404 NOT FOUND
2023-11-20 11:31:59 ERROR 404: NOT FOUND.

my guess is that I am not supplying the correct link, I have tried manipulating the link in several ways but still, I didn't get through

could you please provide the right approach to download the link using Wget?

Hi @Nde

I think you have just mistyped the web address.

try:

wget https://data.qiime2.org/2022.2/common/silva-138-99-seqs.qza

best,

Vic

2 Likes

Thank you so much,

finally, the download was successful

2 Likes

Hi @Nde,

Glad you got it to work! Although that is specific to the V4 region. Anyway, some other thoughts...

I'd recommend using the latest version of QIIME 2 ( 2023.9). The version (2022.2) you are using is over a year old.

Assuming you want to run the classifier via feature-classifier classify-sklearn you should be downloading the classifiers. The following should work to download these:

wget https://data.qiime2.org/2023.9/common/silva-138-99-nb-classifier.qza

or

wget https://data.qiime2.org/2023.9/common/silva-138-99-515-806-nb-classifier.qza

Unless you are interested in using vsearch / blast, etc... in which case downloading the taxonomy and sequence files are the way to go.

-Mike

1 Like

Thank you, I intend to use the vsearch to cluster the ASVs to OTUs.

I however new to QIIME and I am open to suggestions on the right approach

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.