An off-topic reply has been split into a new topic: Using BOLD database
Please keep replies on-topic in the future.
An off-topic reply has been split into a new topic: Using BOLD database
Please keep replies on-topic in the future.
An off-topic reply has been split into a new topic: error while using rescript
Please keep replies on-topic in the future.
Hello, I use primers from “ An improved method for utilizing high-throughput amplicon sequencing to determine the diets of insectivorous animals”, and I find that you use different reverse primers. One is 5 '- GGWACTAATCATTTCAAATCC-3', and your is 5 '- ggatttggaaattgagtwcc-3'. Can I use the "bold_anml_classifier. qza" classifier provided by you for taxonomy annotation.Is that work?
Hi @PanZhu921,
I would not suggest doing this, as the primer sequences can affect how these amplicon regions are extracted from the sequence reference database. Potentially affecting your ability to classify your reads appropriately.
I would use your own primers to extract the amplicon region. You can also supplement parts of the BOLD tutorial, with this tutorial.
Hi @devonorourke,
Thanks for your detail instruction of building the COI database! I am trying to use it to build the COI database for Diptera, but currently stuck at the step:
qiime rescript dereplicate
--i-sequences bold_ambi_hpoly_length_filtd_seqs.qza
--i-taxa bold_rawTaxa.qza
--p-mode 'super'
--p-derep-prefix
--p-rank-handles 'greengenes'
--o-dereplicated-sequences bold_derep1_seqs.qza
--o-dereplicated-taxa bold_derep1_taxa.qza
I received this error:
Plugin error from rescript:
Parameter 'rank_handles' received ['greengenes'] as an argument, which is incompatible with parameter type: List[Str % Choices('disable')] | List[Str % Choices('domain', 'superkingdom', 'kingdom', 'subkingdom', 'superphylum', 'phylum', 'subphylum', 'infraphylum', 'superclass', 'class', 'subclass', 'infraclass', 'cohort', 'superorder', 'order', 'suborder', 'infraorder', 'parvorder', 'superfamily', 'family', 'subfamily', 'tribe', 'subtribe', 'genus', 'subgenus', 'species group', 'species subgroup', 'species', 'subspecies', 'forma')]
Debug info has been saved to /dev/shm/jobs/39316550/qiime2-q2cli-err-f9wqntj8.log
Looks like it does not like "greengenes". Could you please help me with that? That would be much appreciated!
Thank you
Kind regards,
Toan
Hi @Toan,
Thanks (as usual!) with some insight from @SoilRotifer, I've learned that part of the error you're seeing is because my tutorial methods have since been updated, and the --p-rank-handles 'greengenes'
parameter is no longer the method of choice. What I would have mentioned, using the old version of RESCRIPt that existed when I was putting this tutorial together would be to point out that the taxonomic divisions that your various labels fall into - are outside of the expected groupings that the greengenes
rank-handle style is looking for. These taxonomic "groupings" are referred to as rank-handles
. You could see the old rank-handle methods that were permitted in this code here, however, this my not exactly be your issue if you are using an updated version of QIIME2 and RESCRIPt.
Instead, the new code handling taxonomic ranks is perfectly fine with these expanded rank-handles, including the ones listed in your error message. Thus, I believe the trick for you is to just drop the --p-rank-handles 'greengenes'
argument altogether, and instead define the ranks you have directly. Don't quote me on that very last bit about defining your ranks directly - I haven't used the updated RESCRIPt version, and others in the forum can likely point you exactly what your next steps are if you can't find them directly from the help command with:
rescript dereplicate --help
or
qiime rescript merge-taxa --help
To conclude, I suspect the issue is that the instructions in the tutorial are now outdated for certain input taxonomy files, like yours might be. Nevertheless, RESCRIPt is now indeed capable of handling these expanded rank-handle types, and you just need to adjust that --p-rank-handles
parameter accordingly to resolve the error message you are receiving.
Hi @cjfields,
Your suggestion is actually good to import the bold database to qiime2, but how can you deal with the next steps when it requires type FeatureData[AlignedSequence]?
Thanks,
Toan
Hi @Toan,
I'm sorry, but I do not understand your questions.
Good luck,
Devon
Hi @devonorourke,
So sorry for the confusion here !
I was having a problem to import my bold database with the type 'FeatureData[AlignedSequence]' into Qiime2 because it has gaps and Chris Field suggested to use seqkit to remove the gaps before importing to Qiime2 under 'FeatureData[Sequence]'. I have tried this and it works. However, when I used the output .qza to continue with your tutorial on REScript, for example, when I run this command "qiime rescript degap-seqs", it did not work because it requires the input Qiime artifact qza under 'FeatureData[AlignedSequence]' type.
I replied to his comment, not sure why it comes down to you. Hopefully I am not making you more confused
.
Thanks,
Toan
Hi @Toan,
The reason why you can't import as a FeatureData[Sequence]
or FeatureData[AlignedSequence]
is that the file from BOLD is neither. See earlier in this thread. For whatever reason, there are gaps in some sequences but not others.
This creates a conundrum, as you can't import as FeatureData[Sequence]
as there is a sequence within the file that contains a gap. So, we then decide to import as a FeatureData[AlignedSequence]
, but we can't because we're not actually importing an alignment. That is, in an alignment, every sequence should be the same exact length, i.e. the nucleotides and the gaps for each sequence should sum to the same string length, hence this error message:
This is why the seqkit command was used, to remove those spurious gaps in a few sequences, which were in fact, not in a sequence alignment. If you were able to import an actual alignment as FeatureData[AlignedSequence]
, then you can use qiime rescript degap-seqs ...
. This command will take an alignment and remove all gap characters to, essentially, "unalign" the sequences. That is, convert the alignment to FeatureData[Sequence]
.
Hopefully, this helps to clarify things.
Hi @SoilRotifer,
Thanks a lot for clarifying this! I have the same problem that there are some spurious gaps in some sequences. I tried to use seqkit to remove these gap, and import as 'FeatureData[Sequence]'. Then, I used "qiime alignment mafft" to do alignment and convert it to QIIME artifact under the type FeatureData[AlignedSequence]. However, this command requires huge memory. I set 100GB but still failed to run
Do you have any suggestions on this?
Thank you,
Toan
No problem.
Did you try using the --p-parttree
option of mafft? That might help.
-Mike
Hi @SoilRotifer
I tried to add --p-parttree, but it keeps showing this error: Missing option '--o-alignment'. ("--output-dir" may also be used), even i already had it in the command.
Here is my command:
qiime alignment mafft
--i-sequences $input/COins_Seqs.qza
--p-parttree
--output-dir $output/alligned_COins_Seqs.qza
I tried either --o-alignment' or "--output-dir", it gave me the same error. I did not get this error when I did not use --p-parttree
Can you please check if i need to add anything else?
Many many thanks,
Toan
Hi @Toan,
Are you adding a \
at the end of each line? Remember when spreading a command over multiple lines in the terminal, you need to add \
at the end of each line. Otherwise it will execute each line as if it is a separate command. This is the most common reason for the error message you're seeing. So, your command should look like this:
qiime alignment mafft \
--i-sequences $input/COins_Seqs.qza \
--p-parttree \
--output-dir $output/alligned_COins_Seqs.qza
I assume you've appropriately set up your $input
and $output
variables.
Hi @SoilRotifer,
You are absolutely right! when I added --p-parttree, I completely forgot to add \ at the end. Then, I gave a try without --p-parttree option with 300GB of memory but it still failed because of OOM!
I just set another job with --p-parttree with 300GB for 96 hours. Hopefully it will be enough. I has been running well till now. Wish me luck!
Thanks a lot for your kind support!
Toan