>90% of sequences are unidentified?!? WHY?

Hello all q2 experts,

Can someone please kindly give me some advice on my situation.
Herewith the summary of my project:

I am wondering why there is such a big percentage of non-target amplification/ error within all of my samples.


Welcome @NPK!

What sorts of samples are you analyzing? You could be amplifying e.g., plant DNA… I used to run into this problem frequently when using similar primers to examine plant-associated microbial communities.

What taxonomy classification method did you use? Another possibility is that your reads are in the wrong orientation, which will confuse classify-sklearn. Try the BLAST- or vsearch-based classifiers in QIIME 2. If you get the same result, this is probably non-target amplification.

The ITSxpress results you see could indicate either of these problems, but maybe @Adam_Rivers has some more ideas?

Hi Nicholas,
Yes you are correct. I am analyzing fungal community from eucalyptus leaves. I thought ITS1F could counter all plant DNA amplification?

yes, classify-sklearn. Thanks for your recommendation, I will try other classifiers and see how it goes.

ITSxpress is most likely not merging most of your reads because it has pretty high quality thresholds but it is a bit hard to tell from the information I can see in the post.

I’d second the suggestion to Blast a subset of reads and try to get a better sense of what’s happening. Plant contamination seems like the most likely culprit.


I did try again with the eukaryotes UNITE database. Well it is true that all those bastards are plant contamination. Thank you @Nicholas_Bokulich @Adam_Rivers

Well now can I have some advice of how to avoid this situation. How to lower the chance of amplifying plant DNA? Is it all crucial in the library prep step?


You have already used the best method: choose primers that do not amplify plant DNA. ITS1F is supposed to do that, but obviously is not doing its job!

Library prep is where most of this should happen; e.g., if you are able to remove plant matter from your samples prior to DNA extraction, perhaps by rinsing leaves and then filtering.

When I have done plant-associated microbiome work I have just attempted to increase the sequencing depth (i.e., put fewer samples on a single sequencing run) so that I can afford to lose some of my sequences to non-target hits. In some samples I would lose 90% of my sequences! And some samples could not be recovered. But if you have enough non-plant sequences left over you can just proceed with the leftovers.


This may be analogous, but I have this same problem with low-biomass lung samples. To exclude reads from eukaryotic sources, you can do a quality filter step where you essentially blast/vsearch to a taxonomic file (99_otus.txt) from your training set. Then once you generate a hit/misses.qza you can then filter out ALL of the “misses” from your table/sequences.

I use this code:

qiime quality-control exclude-seqs \

–i-query-sequences ~/id-filtered-seqs.qza
–i-reference-sequences ~/greengenes/trained.v4/99_otus/99_otus.qza
–p-method vsearch
–p-perc-identity 0.97
–p-perc-query-aligned 0.97
–p-threads 4
–o-sequence-hits ~/99_hits.qza
–o-sequence-misses ~/99_misses.qza

Then obviously filter your sequences file. Once you filter your sequences file by excluding misses. You filter your table.


1 Like

That method is great for miscellaneous non-target DNA, but I would actually discourage this for ITS data, just because your non-target plant hits are still ITS sequences and you would need to figure out a reasonable threshold of sequence similarity (i.e., how dissimilar plant ITS is from fungal ITS) to use the exclude-seqs method.

Instead, ITSxpress should do a good job of removing most plant reads. Anything that passes you can filter out after taxonomy classification, using qiime taxa filter-table as shown in this tutorial. Something like this:

qiime taxa filter-table \
  --i-table table.qza \
  --i-taxonomy taxonomy.qza \
  --p-exclude k__Viridiplantae \
  --o-filtered-table table-no-plants.qza

or better yet (in case you hit multiple non-fungal kingdoms):

qiime taxa filter-table \
  --i-table table.qza \
  --i-taxonomy taxonomy.qza \
  --p-include k__Fungi \
  --o-filtered-table table-no-plants.qza


Thanks Nick, I figured there should be caveats with ITS. I am not an expert on ITS.

1 Like

Thanks everyone for all the advice. I really appreciate it.