ITSxpress trims too much of my reads

(Einar Marius Hjellestad Martinsen) #1

Dear QIIME developers, other QIIME users and @Adam_Rivers!

We have some fungal data from human samples, and we are trying to compare different trimming methods. It seems that ITSxpress does well in terms of number of ASVs and taxonomic assingment. However, there are some ASVs only assigned to Fungi at kingdom level and then nothing, and some assigned to Fungi at kingdom level and then unidentified. We blasted this ASVs using https://blast.ncbi.nlm.nih.gov/Blast.cgi and https://www.ncbi.nlm.nih.gov/sites/batchentrez, and found some worrying results. For instance, Candida (albicans, dubliensis, tropicalis) and Cyberlindnera were present. Trimmomatic assigned these down to species level. Hence we compared the sequences from ITSxpress and Trimmomatic, and the blast result is found below (a screen shot):

“full” is Trimmomatic, and “itsxpress” is ITSxpress. 18S should be from <1…190, ITS1 from 191…330, and 5.8S from 331…487.
The Trimmomatic sequence spans over the whole ITS1 region. ITSxpress seems to trim too much of the ITS1 region, resulting in a sequence ranging from position 60 to 255. We aligned the two sequences and the sequence from ITSxpress was only equal to the first half of the Trimmomatic sequence, the last half was trimmed.

We used ITSxpress in QIIME2 with taxa F, region ITS1 and cluster ID default (0.995). We are rerunning now with cluster ID 1.0.

Why do ITSxpress trim so much of the ITS1 end?

Please also explain the difference between cluster ID 0.995 and 1.0

Thank you!

0 Likes

(Adam Rivers) #2

In an older version of ITSxpress about 0.2%-0.5% of reads were mistrimmed due to an error parsing HMM reports. This looks like that error. See this closed issue for more details: https://github.com/USDA-ARS-GBRU/itsxpress/issues/8. Updating the the newest versions of ITSxpress and Q2-ITSxpress should solve it.

The cluster id allows you to temporarily cluster reads that are very similar and use the start and stop sites from the representative sequence to trim the others. setting the cluster ID of 0.995 instead of 1.0 speeds up the trimming with a small loss in sensitivity. I’ve changed the default setting to 1.0 because for most use cases the speed increase is not really necessary.

3 Likes

(Einar Marius Hjellestad Martinsen) #3

Thanks @Adam_Rivers!

We used old versions of QIIME2 an ITSxpress, and default of cluster-ID. So that means 0.995, or are the old versions also updated to default=1.0 when you did the change?

We have now updated to the latest versions, and have repeated the analyses both with cluster-ID 0.995 and 1.0.

Cluster-ID 1.0 and updated versions:
When we blast those assigned to Fungi at kingdom level and nothing more, we still find some Fungi (Malassezia, Candida, Cyberlindnera). We compared a sequence (a Candida) to the corresponding from the old versions, and they are trimmed exactly the same.
Additionally, we lost 7 genuses by using the updated versions. None were added. We didn’t like that. Do you know why?

Cluster-ID 0.995 and updated versions:
Finds the same genuses as cluster-id 1.0 using updated versions, but some more ASVs (assigned to already existing genus).

Extra comment: We typed qiime itsxpress trim-pair-output-unmerged --help, and according to the ouput, the default cluster-ID is still 0.995 for ITSxpress v. 1.7.2. Is this correct, or are we missing something?

0 Likes

(Adam Rivers) #4

Hi Eniar,

Thanks for following up.

I’m happy to investigate this more. I can’t tell to much from the screenshot above, and the links did not take me to your Blast reports. I can’t make your description in the text match up with what I see in the screenshot. You can PM me and set up a way to send me the data and command you ran and I can see if I can replicate it. The ITS1 region for Candida should be around 110-140 depending on the strain and species so the “Full” sequences look too long. It is likely that a trimmed sequence will score lower in a gapped alignment than an untrimmed sequence. Where are you trimming with Trimmomatic and what is your sequencing primer set?

I do suspect you are may still be using an old version of one of the ITSxpress programs. Running conda list within your correct qiime2 environment and looking at the results may shed some light on it. The independent versioning of q2-itsxpress and itsxpress and multiple installation channels mean that old versions can sometimes be called accidentally. That was happening to me and I ended up installing in a clean conda environment to fix everything.

I’m planning to combine the two packages into one single package to prevent some of this updating and versioning confusion but the US Government shutdown has meant I couldn’t make any improvements. I hope to do upgrades on the package in a few weeks,

By the way, cluster_id 1.0 is the default for ITSxpress 1.7.2 but not Q2-ITSxpress 1.7.2 currently.

1 Like

(Einar Marius Hjellestad Martinsen) #5

I am sorry for the late reply. I will come back to this. I am currently doing some more testing.

According to the conda list, I am using q2-itsxpress 1.7.2 and “standalone itsxpress” v 1.7.2.

But looking into the following path: /…/miniconda3/pkgs/
three itsxpress-folders are present, which is “itsxpress-1.7.1-py35_0”, “itsxpress-1.7.2-py35_0” and “itsxpress-1.7.2-py36_0”. I reanalysed the data using ITSxpress outside QIIME2, and the results are identical. How can I be sure that I used the 1.7.2 version? Is it ok if the conda list tells me so, or is there a possibilty that I still be using an old version?

Also, the third line in the log from ITSxpress outside QIIME contains: java -Djava.library.path=/…/miniconda3/envs/qiime2-2019.1/opt/bbmap-38.22-0/jni/
Does this mean that I am using ITSxpress still inside QIIME2? That is, the path in the third line rather should be something like …/miniconda3/pkgs/ ?

I am sorry I mix things up and get confused. I am sadly no computer expert :tired_face:

0 Likes

(Evan Bolyen) #6

You are doing better than most, and looking in all the places to boot!

This is the conda package cache (where it lives depends on your installation, but it’s always called something like pkgs/. This directory accounts for 99% of the problems with conda (caching is hard).

In this case, it probably isn’t causing any issues, you have just at some point installed different versions, but that doesn’t mean those are the versions that your environment is using.

You are correct it is running that from inside a QIIME 2 environment. How are you running itsxpress? I suspect you are running it while still inside the environment (just not via a QIIME 2 command, which is fine).


If you’d like to test ITSxpress without QIIME 2 getting in the way here, you should create a new conda environment:

conda create -n itsxpress -c conda-forge -c bioconda -c defaults itsxpress=1.7.2

Then, to be extra sure that nothing is getting in the way (such as some Java environment variables), start a new terminal session and run:

source activate itsxpress

This should ensure that 1.7.2 is the only version being used, and that there’s nothing weird going on with environment variables which change library-paths. You could provide the output of:

env

if you would like us to confirm everything looks right inside that environment (make sure to read it first and redact anything that you know to be sensitive, but there’s probably nothing sensitive in there).

1 Like

(Einar Marius Hjellestad Martinsen) #7

Thanks @ebolyen!

That was really clarifying :grinning: And the env output looks fine to me.

You are right, but when I discovered that, I also did it outside the QIIME environment (with identical results). Now I will try to run itsxpress inside the new environment and compare the results :smile:

1 Like

(system) closed #8

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.

0 Likes