Plugin Error from rescript with dereplicate command

baehsung · April 7, 2022, 10:27pm

Hi!

I am constructing the classifier for marker gene (i.e, dsrB)

From several times of communication in form, I could made a ref-seqs.fasta and converted it to the ref-seqs. qza file.
I also made ref-tax.tsv file for taxonomy file, which converted to the ref-tax. qza file too.
This was manually done in Excel and saved as tsv.

Using those two files, i run the dereplicate step with following commands,

*qiime rescript dereplicate *
*--i-sequences ref-seqs.qza *
*--i-taxa ref-tax.qza *
*--p-mode 'uniq' *
*--p-threads 4 --p-rank-handles 'disable' *
*--o-dereplicated-sequences ref-seqs-derep.qza *
--o-dereplicated-taxa ref-tax-derep.qza

And, plugin error was come from rescript: '[9] not in index'

Does someone know which step the error come from?

Thanks,

Hee-Sung

SoilRotifer · April 8, 2022, 12:03am

Hi @baehsung,

I am not exactly sure what the issue is. Can you private message me with a link to those qza files.

Just a thought, it may be related to this uneven taxonomy lengths? See below.

-Mike

baehsung · April 8, 2022, 9:38pm

Thanks Mike,

Here are two files.

dsrB-ref-seqs.qza (21.7 KB)
dsrB-ref-tax.qza (7.1 KB)

SoilRotifer · April 9, 2022, 10:58pm

Hi @baehsung,

Part of the issue was that there were Windows newline characters within the FASTA and Taxonomy files. When I fixed these, the initial error you saw was resolved... however, I then obtained another error, which I could not figure out. I'd recommend avoid using Excel, or similar tools, as they often include hidden characters which can cause problems. I'd recommend editing only in a raw text editor.

Anyway, I was unable to make use of your files. Given that, I decided to pull the accessions from your reference data and generated a metadate file with the genbank accessions: dsrB-accs.txt (832 Bytes). I then used the RESCRIPt plugin to re-download the data directly from genbank like so:

qiime rescript get-ncbi-data \
    --m-accession-ids-file dsrB-accs.txt \
    --o-sequences dsrB-seqs-from-gb.qza \
    --o-taxonomy dsrB-tax-from-gb.qza

... and then I dereplicated the data:

qiime rescript dereplicate \
    --i-sequences dsrB-seqs-from-gb.qza \
    --i-taxa dsrB-tax-from-gb.qza \
    --p-mode 'uniq' \
    --p-rank-handles 'disable' \
    --p-threads 1  \
    --o-dereplicated-sequences dsrB-seqs-from-gb-derep.qza \
    --o-dereplicated-taxa dsrB-tax-from-gb-derep.qza \
    --verbose

Note, since get-ncbi defaults to the ranks kpcofgs, which is the same as greengenes, you can use --p-rank-handles greengenes if you'd like.

-Mike

baehsung · April 11, 2022, 2:07am

Mike, I really appreciate your big efforts to solve it !

It is unfortunate that we could not edit the taxonomy file in excel as I have whished to add seqs retrieved from other than ncbi (e.g., fungene pipline) to the ref-seqs file. Also, I wanted to revise the ncbi ref-seqs because I found that some of the ncbi seqs had a wrong name due to mistake of depositors.

Interestingly, the ref-tax.qza, which I edited in excel, it worked in successfully making classifier, even though I am not sure if it could work for the taxonomic affiliation of my marker gene seqs.

HS

SoilRotifer · April 11, 2022, 3:37pm

This is one of the reasons why we developed RESCRIPt, to help curate your sequence-taxonomy reference databases. In fact, you can use qiime rescript edit-taxonomy ..., to make any taxonomy edits after import.

Also, do not forget that you can use the basic :qiime2: commands like qiime feature-table merge-seqs ... and qiime feature-table merge-taxa .... To add other sequences to your reference database.

Great!

Why wouldn't it work?

baehsung · April 11, 2022, 3:54pm

Thanks Mike for your reply !

There are so many functions within RESCRIPt plugin, which I have not touched with yet. The RESCRIPt is a wonderful invention, which I am interested in learning them now.

HS

system · May 12, 2022, 9:54pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.