I am constructing the classifier for marker gene (i.e, dsrB)
From several times of communication in form, I could made a ref-seqs.fasta and converted it to the ref-seqs. qza file.
I also made ref-tax.tsv file for taxonomy file, which converted to the ref-tax. qza file too.
This was manually done in Excel and saved as tsv.
Using those two files, i run the dereplicate step with following commands,
Part of the issue was that there were Windows newline characters within the FASTA and Taxonomy files. When I fixed these, the initial error you saw was resolved... however, I then obtained another error, which I could not figure out. I'd recommend avoid using Excel, or similar tools, as they often include hidden characters which can cause problems. I'd recommend editing only in a raw text editor.
Anyway, I was unable to make use of your files. Given that, I decided to pull the accessions from your reference data and generated a metadate file with the genbank accessions: dsrB-accs.txt (832 Bytes). I then used the RESCRIPt plugin to re-download the data directly from genbank like so:
Mike, I really appreciate your big efforts to solve it !
It is unfortunate that we could not edit the taxonomy file in excel as I have whished to add seqs retrieved from other than ncbi (e.g., fungene pipline) to the ref-seqs file. Also, I wanted to revise the ncbi ref-seqs because I found that some of the ncbi seqs had a wrong name due to mistake of depositors.
Interestingly, the ref-tax.qza, which I edited in excel, it worked in successfully making classifier, even though I am not sure if it could work for the taxonomic affiliation of my marker gene seqs.
This is one of the reasons why we developed RESCRIPt, to help curate your sequence-taxonomy reference databases. In fact, you can use qiime rescript edit-taxonomy ..., to make any taxonomy edits after import.