>90% of sequences are unidentified?!? WHY?

Hello all q2 experts,

Can someone please kindly give me some advice on my situation.
Herewith the summary of my project:

I am wondering why there is such a big percentage of non-target amplification/ error within all of my samples.

Regards,
Namie

Welcome @NPK!

What sorts of samples are you analyzing? You could be amplifying e.g., plant DNA… I used to run into this problem frequently when using similar primers to examine plant-associated microbial communities.

What taxonomy classification method did you use? Another possibility is that your reads are in the wrong orientation, which will confuse classify-sklearn. Try the BLAST- or vsearch-based classifiers in QIIME 2. If you get the same result, this is probably non-target amplification.

The ITSxpress results you see could indicate either of these problems, but maybe @Adam_Rivers has some more ideas?

Hi Nicholas,
Yes you are correct. I am analyzing fungal community from eucalyptus leaves. I thought ITS1F could counter all plant DNA amplification?

yes, classify-sklearn. Thanks for your recommendation, I will try other classifiers and see how it goes.

ITSxpress is most likely not merging most of your reads because it has pretty high quality thresholds but it is a bit hard to tell from the information I can see in the post.

I’d second the suggestion to Blast a subset of reads and try to get a better sense of what’s happening. Plant contamination seems like the most likely culprit.

2 Likes

I did try again with the eukaryotes UNITE database. Well it is true that all those bastards are plant contamination. Thank you @Nicholas_Bokulich @Adam_Rivers

Well now can I have some advice of how to avoid this situation. How to lower the chance of amplifying plant DNA? Is it all crucial in the library prep step?

3 Likes

You have already used the best method: choose primers that do not amplify plant DNA. ITS1F is supposed to do that, but obviously is not doing its job!

Library prep is where most of this should happen; e.g., if you are able to remove plant matter from your samples prior to DNA extraction, perhaps by rinsing leaves and then filtering.

When I have done plant-associated microbiome work I have just attempted to increase the sequencing depth (i.e., put fewer samples on a single sequencing run) so that I can afford to lose some of my sequences to non-target hits. In some samples I would lose 90% of my sequences! And some samples could not be recovered. But if you have enough non-plant sequences left over you can just proceed with the leftovers.

@NPK

This may be analogous, but I have this same problem with low-biomass lung samples. To exclude reads from eukaryotic sources, you can do a quality filter step where you essentially blast/vsearch to a taxonomic file (99_otus.txt) from your training set. Then once you generate a hit/misses.qza you can then filter out ALL of the “misses” from your table/sequences.

I use this code:

qiime quality-control exclude-seqs \

–i-query-sequences ~/id-filtered-seqs.qza
–i-reference-sequences ~/greengenes/trained.v4/99_otus/99_otus.qza
–p-method vsearch
–p-perc-identity 0.97
–p-perc-query-aligned 0.97
–p-threads 4
–o-sequence-hits ~/99_hits.qza
–o-sequence-misses ~/99_misses.qza

Then obviously filter your sequences file. Once you filter your sequences file by excluding misses. You filter your table.

Ben

1 Like

That method is great for miscellaneous non-target DNA, but I would actually discourage this for ITS data, just because your non-target plant hits are still ITS sequences and you would need to figure out a reasonable threshold of sequence similarity (i.e., how dissimilar plant ITS is from fungal ITS) to use the exclude-seqs method.

Instead, ITSxpress should do a good job of removing most plant reads. Anything that passes you can filter out after taxonomy classification, using qiime taxa filter-table as shown in this tutorial. Something like this:

qiime taxa filter-table \
  --i-table table.qza \
  --i-taxonomy taxonomy.qza \
  --p-exclude k__Viridiplantae \
  --o-filtered-table table-no-plants.qza

or better yet (in case you hit multiple non-fungal kingdoms):

qiime taxa filter-table \
  --i-table table.qza \
  --i-taxonomy taxonomy.qza \
  --p-include k__Fungi \
  --o-filtered-table table-no-plants.qza

@NPK

Thanks Nick, I figured there should be caveats with ITS. I am not an expert on ITS.

1 Like

Thanks everyone for all the advice. I really appreciate it.