Trim ITS2 with ITSxpress and cutadapt

Please read the following before posting!

Is this post about a User Support Question? Those include questions about specific results while running QIIME 2, warnings observed while running a QIIME 2 command. Please do not post questions here that have to do with interpretation of results, general discussion, or technical support.

Before posting, please make sure you have the following information available, in order for us to help you in a timely manner:

  • Have you searched for the problem on the forum? It is rare that we see a new question asked, so make sure you do your homework before asking for us to commit our time to helping you.
  • Have you reviewed the QIIME 2 Forum Glossary?
  • Version of QIIME 2 you are running, and how it is installed (e.g. Virtualbox, conda, etc.)
  • What is the exact command or commands you ran? Copy and paste please.
  • What is the exact error message, if applicable? If you didn’t run the command with the --verbose flag, please re-run and copy-and-paste the results.

Hi, I am trying to trim ITS2 sequence use ITSxpress and cutadapt, but sequence count and after get the taxonomy, there are very big difference.

ITS2 primer: gITS7/ITS4

gITS7: GTGAATCATCGARTCTTTG

ITS4: TCCTCCGCTTATTGATATGC

Trim with cutadapt:

time qiime cutadapt trim-paired --i-demultiplexed-sequences /Users/houjianjian/ajmal_fungi/demux-paired-end.qza --p-front-f GTGAATCATCGARTCTTTG --p-front-r TCCTCCGCTTATTGATATGC --p-adapter-f TCCTCCGCTTATTGATATGC --p-adapter-r GTGAATCATCGARTCTTTG --o-trimmed-sequences demux-paired-end-front-end-remove-primers.qza

Saved SampleData[PairedEndSequencesWithQuality] to: demux-paired-end-front-end-remove-primers.qza

real 2m19.795s

user 4m12.330s

sys 0m8.096s

demux-paired-end-front-end-remove-primers.qzv (296.7 KB)

dada2-denoise:

time qiime dada2 denoise-paired --i-demultiplexed-seqs demux-paired-end-front-end-remove-primers.qza --p-trunc-len-f 0 --p-trunc-len-r 0 --output-dir front-end-remove-primers-dada2-denoised

denoising_stats.qzv (1.2 MB)

table.qzv (441.7 KB)

representative_sequences.qzv (349.7 KB)

Trim by ITSxpress

qiime itsxpress trim-pair-output-unmerged --i-per-sample-sequences /Users/houjianjian/ajmal_fungi/demux-paired-end.qza --p-region ITS2 --p-taxa F --o-trimmed trimmed.qza --p-cluster-id 1.0

trimmed.qzv (293.1 KB)

qiime dada2 denoise-paired --i-demultiplexed-seqs trimmed.qza --p-trunc-len-f 0 --p-trunc-len-r 0 --output-dir dada2out

table.qzv (439.9 KB)

representative_sequences.qzv (335.5 KB)

denoising_stats.qzv (1.2 MB)

Sequence counts from trimming by ITSxpress are very low, input sequence counts of all of samples over 56000, but after trimming by itsxpress, many samples get lower 10000, lower than 1000, also have around 2000 and 3000 counts, it looks my sample sequence quality very bad? After denoising by dada2 sequence not removed a lot, it looks ok.

However, Trimmed by cutadapt, input sequence count didnot remove much, after denoised, discarded many sequences, but higher than trimmed by itsxpress. It seems to my sample’s sequence quality exact not good.

But I want to know what is the difference between trimming by itsxpress and cutadapt, and which is more batter for my sample.

Taxonmic analysis I use UNITE(version 8 realease 2018-11-18) training by my self.

taxonomy trimming by cutadapt: taxonomy-barplot.qzv (393.7 KB)

taxonomy trimming by ITSxpress: taxa-barplot.qzv (513.4 KB)

The most abundance otu of taxonomy from trimming by cutadapt are unidentified even at phylum level. But taxonomy from trimming by ITSxpress it looks ok.

I use qiime2-2019.4, natively installed in Mac pro (2.7 GHz Intel Core i5, 8 GB 1867 MHz DDR3)

Now I am very confused which method i should use, and which is correct.

Please help me.

Thank you very much!

Welcome to the forum @JIANJIAN_HOU!

Here is what is going on:

  1. you have lots of non-target sequences in your data. These may be plants or other eukaryotes that are amplified by that primer pair.
  2. q2-itsxpress filters out those sequences because they do not resemble fungal ITS. Hence, q2-itsxpress loses many more sequences at filtering, but the taxonomic profiles look good (no or very few unassigned sequences)
  3. q2-cutadapt just trims the primers and does not filter out those sequences. So you get more sequences but these non-target reads are apparent in the taxonomic profiles: the unassigned sequences and those classified only to kingdom or phylum level are almost certainly non-target (e.g., plant) sequences.

So which to use? either potentially. I suspect that if you filter out all sequences without at least class-level classification your taxonomic profiles would look very similar. But the q2-itsxpress results look good as they are now! It has been designed to take care of these non-target reads on the front end.

good luck!

2 Likes

Thanks a lot! @Nicholas_Bokulich
Suddenly be enlightened!
It should have much non-target sequences in my data.
I would try to filter sequences without class-level classification. For me, it’s a very good experience.

Now, I have another question, when I prepare fungi sequences data, how to avoid amplifying plants or other eukaryotes. Use more specifical primer? I am sorry, maybe I should not ask this question here.

Thank you! Thanks to QIIME team!

1 Like

Yes, choose a different primer set