Plugin Error from rescript with dereplicate command

Hi!

I am constructing the classifier for marker gene (i.e, dsrB)

From several times of communication in form, I could made a ref-seqs.fasta and converted it to the ref-seqs. qza file.
I also made ref-tax.tsv file for taxonomy file, which converted to the ref-tax. qza file too.
This was manually done in Excel and saved as tsv.

Using those two files, i run the dereplicate step with following commands,

*qiime rescript dereplicate *
*--i-sequences ref-seqs.qza *
*--i-taxa ref-tax.qza *
*--p-mode 'uniq' *
*--p-threads 4 --p-rank-handles 'disable' *
*--o-dereplicated-sequences ref-seqs-derep.qza *
--o-dereplicated-taxa ref-tax-derep.qza

And, plugin error was come from rescript: '[9] not in index'

Does someone know which step the error come from?

Thanks,

Hee-Sung

Hi @baehsung,

I am not exactly sure what the issue is. Can you private message me with a link to those qza files.

Just a thought, it may be related to this uneven taxonomy lengths? See below.

-Mike

Thanks Mike,

Here are two files.

dsrB-ref-seqs.qza (21.7 KB)
dsrB-ref-tax.qza (7.1 KB)

Hi @baehsung,

Part of the issue was that there were Windows newline characters within the FASTA and Taxonomy files. When I fixed these, the initial error you saw was resolved... however, I then obtained another error, which I could not figure out. I'd recommend avoid using Excel, or similar tools, as they often include hidden characters which can cause problems. I'd recommend editing only in a raw text editor.

Anyway, I was unable to make use of your files. Given that, I decided to pull the accessions from your reference data and generated a metadate file with the genbank accessions: dsrB-accs.txt (832 Bytes). I then used the RESCRIPt plugin to re-download the data directly from genbank like so:

qiime rescript get-ncbi-data \
    --m-accession-ids-file dsrB-accs.txt \
    --o-sequences dsrB-seqs-from-gb.qza \
    --o-taxonomy dsrB-tax-from-gb.qza

... and then I dereplicated the data:

qiime rescript dereplicate \
    --i-sequences dsrB-seqs-from-gb.qza \
    --i-taxa dsrB-tax-from-gb.qza \
    --p-mode 'uniq' \
    --p-rank-handles 'disable' \
    --p-threads 1  \
    --o-dereplicated-sequences dsrB-seqs-from-gb-derep.qza \
    --o-dereplicated-taxa dsrB-tax-from-gb-derep.qza \
    --verbose

Note, since get-ncbi defaults to the ranks kpcofgs, which is the same as greengenes, you can use --p-rank-handles greengenes if you'd like.

-Mike

1 Like

Mike, I really appreciate your big efforts to solve it !

It is unfortunate that we could not edit the taxonomy file in excel as I have whished to add seqs retrieved from other than ncbi (e.g., fungene pipline) to the ref-seqs file. Also, I wanted to revise the ncbi ref-seqs because I found that some of the ncbi seqs had a wrong name due to mistake of depositors.

Interestingly, the ref-tax.qza, which I edited in excel, it worked in successfully making classifier, even though I am not sure if it could work for the taxonomic affiliation of my marker gene seqs.

HS

This is one of the reasons why we developed RESCRIPt, to help curate your sequence-taxonomy reference databases. In fact, you can use qiime rescript edit-taxonomy ..., to make any taxonomy edits after import.

Also, do not forget that you can use the basic :qiime2: commands like qiime feature-table merge-seqs ... and qiime feature-table merge-taxa .... To add other sequences to your reference database.

Great!

Why wouldn't it work?

Thanks Mike for your reply !

There are so many functions within RESCRIPt plugin, which I have not touched with yet. The RESCRIPt is a wonderful invention, which I am interested in learning them now.

HS

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.