I was attempting to build a new BOLD database and was wondering if there was a suggested resolution to the error that began appearing after the update to "2020.8"...besides reverting to a version before the update. Here is the error: There was a problem importing bold_rawSeqs_forQiime.fasta:
bold_rawSeqs_forQiime.fasta is not a(n) AlignedDNAFASTAFormat file:
The sequence starting on line 34 was length 597. All previous sequences were length 602. All sequences must be the same length for AlignedFASTAFormat.
Hello, I use primers from “ An improved method for utilizing high-throughput amplicon sequencing to determine the diets of insectivorous animals”, and I find that you use different reverse primers. One is 5 '- GGWACTAATCATTTCAAATCC-3', and your is 5 '- ggatttggaaattgagtwcc-3'. Can I use the "bold_anml_classifier. qza" classifier provided by you for taxonomy annotation.Is that work?
I would not suggest doing this, as the primer sequences can affect how these amplicon regions are extracted from the sequence reference database. Potentially affecting your ability to classify your reads appropriately.
I would use your own primers to extract the amplicon region. You can also supplement parts of the BOLD tutorial, with this tutorial.
Thanks for your detail instruction of building the COI database! I am trying to use it to build the COI database for Diptera, but currently stuck at the step:
qiime rescript dereplicate
I received this error:
Plugin error from rescript:
Parameter 'rank_handles' received ['greengenes'] as an argument, which is incompatible with parameter type: List[Str % Choices('disable')] | List[Str % Choices('domain', 'superkingdom', 'kingdom', 'subkingdom', 'superphylum', 'phylum', 'subphylum', 'infraphylum', 'superclass', 'class', 'subclass', 'infraclass', 'cohort', 'superorder', 'order', 'suborder', 'infraorder', 'parvorder', 'superfamily', 'family', 'subfamily', 'tribe', 'subtribe', 'genus', 'subgenus', 'species group', 'species subgroup', 'species', 'subspecies', 'forma')]
Debug info has been saved to /dev/shm/jobs/39316550/qiime2-q2cli-err-f9wqntj8.log
Looks like it does not like "greengenes". Could you please help me with that? That would be much appreciated!
Thanks (as usual!) with some insight from @SoilRotifer, I've learned that part of the error you're seeing is because my tutorial methods have since been updated, and the --p-rank-handles 'greengenes' parameter is no longer the method of choice. What I would have mentioned, using the old version of RESCRIPt that existed when I was putting this tutorial together would be to point out that the taxonomic divisions that your various labels fall into - are outside of the expected groupings that the greengenes rank-handle style is looking for. These taxonomic "groupings" are referred to as rank-handles. You could see the old rank-handle methods that were permitted in this code here, however, this my not exactly be your issue if you are using an updated version of QIIME2 and RESCRIPt.
Instead, the new code handling taxonomic ranks is perfectly fine with these expanded rank-handles, including the ones listed in your error message. Thus, I believe the trick for you is to just drop the --p-rank-handles 'greengenes' argument altogether, and instead define the ranks you have directly. Don't quote me on that very last bit about defining your ranks directly - I haven't used the updated RESCRIPt version, and others in the forum can likely point you exactly what your next steps are if you can't find them directly from the help command with:
rescript dereplicate --help
qiime rescript merge-taxa --help
To conclude, I suspect the issue is that the instructions in the tutorial are now outdated for certain input taxonomy files, like yours might be. Nevertheless, RESCRIPt is now indeed capable of handling these expanded rank-handle types, and you just need to adjust that --p-rank-handles parameter accordingly to resolve the error message you are receiving.
I'm sorry, but I do not understand your questions.
Are you asking if I am suggesting to import data from BOLD to QIIME? You can certainly use that as a resource, and this tutorial shows how to leverage those resources, though I would point out that you can alternatively import from COI data from NCBI. We even compared the two data sources in the RESCRIPt paper, so have a look there if you're wondering which way to go about it.
Can you possibly rephrase your question about what you mean by "next steps when it requires..."? What are the exact steps you are referring to? Are they QIIME-specific, or as shown in this tutorial, related to preprocessing before any particular QIIME analysis?
So sorry for the confusion here !
I was having a problem to import my bold database with the type 'FeatureData[AlignedSequence]' into Qiime2 because it has gaps and Chris Field suggested to use seqkit to remove the gaps before importing to Qiime2 under 'FeatureData[Sequence]'. I have tried this and it works. However, when I used the output .qza to continue with your tutorial on REScript, for example, when I run this command "qiime rescript degap-seqs", it did not work because it requires the input Qiime artifact qza under 'FeatureData[AlignedSequence]' type.
I replied to his comment, not sure why it comes down to you. Hopefully I am not making you more confused
The reason why you can't import as a FeatureData[Sequence] or FeatureData[AlignedSequence] is that the file from BOLD is neither. See earlier in this thread. For whatever reason, there are gaps in some sequences but not others.
This creates a conundrum, as you can't import as FeatureData[Sequence] as there is a sequence within the file that contains a gap. So, we then decide to import as a FeatureData[AlignedSequence], but we can't because we're not actually importing an alignment. That is, in an alignment, every sequence should be the same exact length, i.e. the nucleotides and the gaps for each sequence should sum to the same string length, hence this error message:
This is why the seqkit command was used, to remove those spurious gaps in a few sequences, which were in fact, not in a sequence alignment. If you were able to import an actual alignment as FeatureData[AlignedSequence], then you can use qiime rescript degap-seqs .... This command will take an alignment and remove all gap characters to, essentially, "unalign" the sequences. That is, convert the alignment to FeatureData[Sequence].
Thanks a lot for clarifying this! I have the same problem that there are some spurious gaps in some sequences. I tried to use seqkit to remove these gap, and import as 'FeatureData[Sequence]'. Then, I used "qiime alignment mafft" to do alignment and convert it to QIIME artifact under the type FeatureData[AlignedSequence]. However, this command requires huge memory. I set 100GB but still failed to run
Are you adding a \ at the end of each line? Remember when spreading a command over multiple lines in the terminal, you need to add \ at the end of each line. Otherwise it will execute each line as if it is a separate command. This is the most common reason for the error message you're seeing. So, your command should look like this: