Fungal ITS analysis tutorial

Thanks @angrybee! Could you please clarify: is this a newer release of UNITE that has that issue? Please let us know the version. Thanks!

I use version 8 from 2.2.19 and the developer files.

Maybe a side effect from

[q2-types] @Oddant1 Made validation of FASTA files more robust.

There was a problem importing ../../data/classifiers_trained_2019.7_190731/qiime_ver8_99_02.02.2019_dev_uppercase.fasta:
../../data/classifiers_trained_2019.7_190731/qiime_ver8_99_02.02.2019_dev_uppercase.fasta is not a(n) DNAFASTAFormat file:
Invalid characters on line 10274 (does not match IUPAC characters for a DNA sequence).

1 Like

You are right! Looks like the upgraded FASTA validator is doing its job!

I have edited the post above to include your fix, and credit you.

I think you can skip awk. sed does the uppercase and removes the blanks at line end for every line not starting with >

The sed command is not replacing all lowercases in the sequences. I am happy to drop the awk if you send a fixed sed command, but it's not a big deal — the chained command takes seconds.

2 Likes

A post was split to a new topic: Cutadapt trimming in ITS tutorial

2 posts were split to a new topic: UNITE error in Developer DB

Hi,

I’m having trouble with the above sed command. It is adding a “U” to the beginning of each of my sequences, which throws an error when I try to create the classifier.

I’m new to sed so any help would be appreciated!

Here is the sed command I tried running, the only difference should be my file names:
awk '/^>/ {print($0)}; /^[^>]/ {print(toupper($0))}' unite_db_files/sh_refs_qiime_ver8_97_02.02.2019.fasta | sed -e '/^>/!s/\(.*\)/\U\1/;s/[[:blank:]]*$//' > unite_db_files/sh_refs_qiime_ver8_97_02.02.2019_uppercase.fasta

I played around with the command and if I remove /^>/!s/\(.*\)/\U\1/ (which is the portion adding the U) from the command it doesn’t add any U’s to my sequences and appears to retain the separation between the header and sequence portions of the file. The classifier step also runs without error.

Have I broken something I’m not aware of and my taxonomy classifications will be off or is it ok to use the command with the alteration I made?

Thanks,
Samantha

Thanks for reporting @saatkinson!

I am guessing this is breaking because of the new UNITE release (this was written based on an older UNITE release).

I think your edits are fine… I have tested using this command and it fixes the lowercase and blank space issues for this release:

awk '/^>/ {print($0)}; /^[^>]/ {print(toupper($0))}' developer/sh_refs_qiime_ver8_99_02.02.2019_dev.fasta | tr -d ' ' > developer/sh_refs_qiime_ver8_99_02.02.2019_dev_uppercase.fasta

I have updated the tutorial above to replace that command.

3 Likes

A post was split to a new topic: Feature Classifier version error

An off-topic reply has been split into a new topic: importing paired-end fungal ITS reads

Please keep replies on-topic in the future.

An off-topic reply has been split into a new topic: Newest ITS tutorial

Please keep replies on-topic in the future.

Thanks for your really helpful tutorial. One issue I had running it with my own data was doing cutadapt in one step as it didn't trim the adapters correctly. Running it in two steps seems to work though. I think there have been a few other reports of this on the forum.

In case you're interested in the original issue and solutions: ITS cutadapt trimming of primer and reverse complement of the reverse primer and cutadapt / trim-paired / option "front" and "adapter" and Cut-adapt trim paired - different results when primers separate vs linked.

2 Likes

2 off-topic replies have been split into a new topic: analyzing paired-end fungal ITS sequence data with QIIME 2

Please keep replies on-topic in the future.

An off-topic reply has been split into a new topic: Link BIOM table with taxonomy

Please keep replies on-topic in the future.

An off-topic reply has been split into a new topic: high proportion of unclassified_Fungi with classifier trained on UNITE database

Please keep replies on-topic in the future.

Hi there,

I am assuming this is still the most up-to-date tutorial for processing ITS sequences?

I am wondering if this tutorial includes the ITS extraction step using the Q2-ITSxpress plugin or if that is a separate tutorial?

Also, I have read some issues in this post about cutadapt not trimming the primers properly. My samples do not have additional adapters so I will not be using --p-adapter-f and --p-adapter-r. Is it still recommended to use ^ in the --p-front-f command to anchor my primers?
Thanks a bunch.

Hi @emmlemore ,

Yes unfortunately we have not had time to release a more complete tutorial, but the options described in this tutorial are still current.

See the note at the top of the tutorial for a link to a separate tutorial using q2-itsxpress.

Good luck!

Hi @Nicholas_Bokulich thank you for your response.

Regarding the Q2-ITSxpress plugin, this tutorial does not have the ITS extraction step. Apologies for my naivety as I am a newbie but is it necessary or is the trimming of the primers in this tutorial sufficient?

Thank you.

Hi @emmlemore .

The q2-itsxpress tutorial does have an ITS extraction step with the qiime itsxpress trim-pair-output-unmerged command. This tutorial does not.

They are two steps to achieve similar goals. ITSxpress trims to the ITS domain, removing the adjacent rRNA gene domains. But the primers are usually situated quite close to the ITS with only a bit of the rRNA gene domain present, so the difference is usually minimal. So which method you choose depends on your primer set and experimental goals, but both are valid options.

Good luck!