Unassigned reads k__bacteria;__;__;__;__;__; only in one sample type, murine samples

Hi all,

I did a quick search for questions related to this, but I have a large number of unassigned reads in low abundance samples (e.g., lung/BAL samples). Other samples such as gut, nasal, skin, TI, cecum reads are actually quite robust with unassigned reads taking up only < 5% of the relative frequency.

I've already performed a quick search and it looks like for the most part, it looks like people have dealt with it and generally it's an annotation issue? With classification parameters being too stringent (99% otu match). However, I would add that using UCLUST with QIIME 1.9.1 and picking open/closed we never have this many unassigned reads.

Some advice? I have attached that taxa_bar_plot screen shot. The yellow bars are unassigned:


What's even most interesting is that in PBS blanks that we run from the study we get around 5% unassigned which is acceptable, but in another study, again the Lung samples are dominated by unassigned. I'm pretty sure this is animal DNA. (brown being the unassigned BAL)

This is probably NOT an annotation issue (unless if you mean that lack of animal DNA in the reference is to blame). Do you have any posts to reference? This post may hold the answer.

Particularly in low-biomass samples, a high proportion of unassigned reads will probably be host DNA and/or other non-target DNA/artifact. Better than cross-contamination, which would be much more difficult to eliminate :smile:

I’d recommend doing a preliminary check (e.g., NCBI blast a few unassigned reads) just to see what these reads might be, then filter out all unassigned reads without giving it another thought.

That’s very stringent. I would lower that, personally. But that does not actually seem related to the issue you are having, since unassigned is only high in the low-biomass samples, suggesting that it may be some artifact/background noise/host DNA.

Closed reference OTU picking will remove these before you ever see them, because they do not match the reference database. Open reference would build novel OTUs, but the different filtering/chimera checking methods between QIIME 1 and QIIME 2 could be leading to this disparity.

Lower your % similarity threshold a bit, filter all unassigned features, and don’t look back.

I hope that helps!

1 Like

Thanks, will give it a spin, appreciate the input. Ben

edit: Yes, sorry I misspoke, it is not that they’re unannotated or annotated incorrectly, when in actuality there is nothing to annotate them to!

@Nicholas_Bokulich I am probably missing this, but how can I see assigned SV to k__bacteria;;;;; ?

Nevermind, I figured it out:
I have to view the taxonomy.qzv and sequence.gzv and cross reference them to each other. thanks!

tip: use qiime metadata tabulate taxonomy.qza sequence.qza to have these appear in a single qzv. No cross-referencing required!

Thanks, I'm re-running the filtering from the sequences generated after DADA2 and denoise:

qiime quality-control exclude-seqs
--i-query-sequences ~/rep-seqs.qza
--i-reference-sequences ~/gg-13-8-99-515-806-nb-classifier.qza
--p-method vsearch
--p-perc-identity 0.97
--p-perc-query-aligned 0.97
--p-threads 4
--o-sequence-hits ~/filter.new/hits.qza
--o-sequence-misses ~/filter.new/misses.qza

qiime feature-table filter-features
--i-table ~/table.qza
--m-metadata-file ~/filter.new/misses.gza
--o-filtered-table ~/no-hits-filtered-table.qza

I'll post the filtered taxa-bar-plots afterwards

edit/update: I just wanted to say that passing this method as pointing out in the other thread has resulted in removal of any unassigned K__bacteria reads. Thank you again. I had the opportunity to run it on the HPC, one of the sequence runs took me up to 24 hours with a 97% match, but I think it was worth it, the lung samples look as we expect them to look now. We did have to retrain a feature classifier. Thanks @Nicholas_Bokulich

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.