Hello,
Sorry for my previous post, I copied directly from my script and I can now see how awful that looked.
I hope this attempt is better.
Background information
I am using qiime2-2020.8 and accessing this via a conda environment.
I have attempted to search for a similar issue on the forum.
Aim
Train feature classifiers with q2-feature-classifier using my own reference database for the functional target gene pmoA (methanotrophs).
Use the trained classifier to assign taxonomy to the methanotrophs in my environmental samples.
To do this, I have a file with representative sequences to which I want to assign taxonomy which I need to load into qiime2 as an artefact. I already did this because I cannot upload the original fasta file
rep_set90nonchimeras.qza (241.9 KB)
I have worked through the online tutorial Training feature classifiers with q2-feature-classifier — QIIME 2 2019.10.0 documentation using the example files provided, so I know the code works.
I then attempted to substitute my own files into the code.
I have a reference database for the functional target gene pmoA (methanotrophs) with the nucleotide sequence and their associated accession number pmoa7809_YangDB.qza (347.5 KB).
I already converted this to a .gza file as I cannot upload the original .fasta file.
I also have the complementary text file containing the accession numbers and the taxonomic lineage
pmoa4rdp1_qiime.txt (1006.1 KB)
I managed to get through all the way until I train the classifier with my own files. It is here that I encounter the following error message,
(1/1) Invalid value for '--i-reference-taxonomy': 'ref-taxonomy.gza' is not a
valid filepath
The code I have run
qrsh
source activate qiime2-2020.8
Importing rep-seqs into qiime2
qiime tools import
--input-path rep_set90nonchimeras.fna
--output-path rep_set90nonchimeras.gza
--type 'FeatureData[Sequence]'
My representative sequences are now converted to a featureData artefact
Training a classifier in Qiime 2
Create qiime2 artefacts (.gza files)
First we do this for the reference seqs which = the .fasta file
There were some . at the start of sequences in the database, therefore I had to remove these,
sed -i "/^>/! s/.//" pmoa7809_YangDB.fasta
qiime tools import
--type 'FeatureData[Sequence]'
--input-path pmoa7809_YangDB.fasta
--output-path pmoa7809_YangDB
Do the same for the associated txt taxonomy file which I checked doesn't have a header and looks same as example file.
qiime tools import
--type 'FeatureData[Taxonomy]'
--input-format HeaderlessTSVTaxonomyFormat
--input-path pmoa4rdp1_qiime.txt
--output-path ref-taxonomy
So, I now have the representative seqs I want to assign tax in the .gza format.
I now also have the reference database seqs and the associated taxonomy to those sequences in .gza format.
Extract reference reads
The notes associated with this section suggest not to include the min and max length and the --truncated option when using paired end for non-tax gene.
qiime feature-classifier extract-reads
--i-sequences pmoa7809_YangDB.qza
--p-f-primer GGNGACTGGGACTTCTGG
--p-r-primer ACGTCCTTACCGAAGGT
--o-reads pmoA_ref-seqs
Train the classifier
qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads pmoA_ref-seqs.qza
--i-reference-taxonomy ref-taxonomy.gza
--o-classifier classifier.qza
Error message
What is the exact error message? If you didn't run the command with the --verbose
flag, please re-run and copy-and-paste the results.
There was a problem with the command:
(1/1) Invalid value for '--i-reference-taxonomy': 'ref-taxonomy.gza' is not a
valid filepath
I have tried with various iterations of the output filenames with and without the output filename as
filename.gza
or
filename
For example,
When I look at the file imported when typing,
qiime tools import
–type ‘FeatureData[Sequence]’
–input-path pmoa7809_YangDB.fasta
–output-path pmoa7809_YangDB.gza
In the directory, this file is called pmoa7809_YangDB.gza.gza
Therefore I chose not to add the extension on to the output file name.
qiime tools import
–type ‘FeatureData[Sequence]’
–input-path pmoa7809_YangDB.fasta
–output-path pmoa7809_YangDB
So then when I look at this file in the directory, I see pmoa7809_YangDB.gza
I wonder if there is an issue occurring here.
Please let me know if this is clearer?