Is my low abundance fungal dna sequencing data salvagable?

vpandyar · October 12, 2018, 4:56pm

Hello! I'm brand new to the microbiome scene and am quickly learning how to use qiime to do my own data analysis! Since I am so brand new I had some questions regarding my data! To preface this I am interested in looking at the fungal elements of the microbiome. I recently obtained DNA sequences from mice fed different diets (normal chow vs a high fat fed group) and was told specifically from our sequencing core that they observed "a low assembly ratio" in the high fat reads (for ITS sequencing) which indicated to them that some of the reads were from non-specific PCR reads. The ITS sequences in my normal fed group were okay. Since my 16S reads were fine from the same samples this indicated a low abundance of fungal DNA. I managed to run through the steps of the "moving pictures" tutorial and got some results despite this, but this leaves me with a few questions.

My questions are. 1) Does having low abundance limit the interpretation of my results in any way? I'm worried that I may not be picking up enough sequences to process the data effectively to get meaningful results. For example the sequencing core tried aligned the data to a fungal database and I was told they got a low mapping ratio which again indicated a large proportion of background DNA.
2) Are there ways to account for or overcome this? Or has anyone run into something similar before?

I'm sure I will have more questions and if you need to see any of my data files please let me know and I would be happy to post it!

Thanks in advance!

Nicholas_Bokulich · October 12, 2018, 5:48pm

Hi @vpandyar,
Welcome to the scene!

The scenario you describe is quite common, especially with ITS primers, most of which are not fungi-specific. Hence host DNA and other non-targets are commonly detected and dilute on-target sequences. Usually this is not so pronounced and enough on-target reads are left over to use... but not always.

I recommend that you attempt to analyze your data and see how many on-target sequences you have, and if it is enough for your purposes! After demultiplexing, trimming, and denoising your data, you can use this command to filter out sequences that do not resemble the reference sequences (or, better yet, use deblur for denoising and a positive filter is built in to the pipeline).

I hope that helps!

vpandyar · October 13, 2018, 5:35pm

Thank you for your response! I originally used dada2 but does deblur tend to work better for this type of data? I managed to remove the off-target sequences as per your suggestion, but what is considered having "enough"? Just skimming through the taxonomy bar plots I don't see nearly as much diversity but how can I be sure this is true and not just a technical error related to not having enough fungal dna? Thanks again so much for your reply!

bsen2018 · October 14, 2018, 3:01pm

I would first make sure from literature the diversity of fungal community in my sample/environment. Each environment has its unique fungal diversity and I would do a quick search to find out what how many fungal taxa have been reported hitherto from my sample/environment instead of playing around with sequence analysis. In many of the unexplored or yet to be described environments , the fungal diversity, often poorly represented, mostly comprises a few phyla. So, it would not be surprising to see very few fungal taxa in your samples. What is your sample type, mice gut or something else and why do you expect to see many fungal taxa?

vpandyar · October 15, 2018, 1:07pm

Thanks for the response bsen2018. Unfortunately there is only one paper that used conditions similar to mine (collecting mouse stool from differently fed mice diets). So it is difficult to make a ton of different comparisons. In that paper they did not report a low abundance but did mention that the PCR counts using a universal fungal primer were quite high (35ish). While I think its true there might be less abundance of fungal DNA in my sample I want to make sure I am covering all my bases.

Nicholas_Bokulich · October 15, 2018, 1:39pm

@vpandyar let me start out by emphasizing a key quote from @bsen2018 (thank you for your insight!):

Of course sampling depth, methodology, and experimental idiosyncrasies all impact diversity, making it difficult to compare between experiments, but you should at least be able to use this as a point of reference (and control for even sampling depth and methodology).

that's a matter of personal taste... both methods perform similarly in my hands. Though I am sure others will disagree.

Enough will depend on your experimental goals and the characteristics of your own data. You can run alpha rarefaction or something like that if representative sampling depth matters to you. But at least 2000 sequences per sample is probably a good rule of thumb for "enough" (and that's still quite shallow if you are interested in rare species). There is not a good answer for this question.

As much diversity as what? Unless if you have some sort of standard control that you can test in your run, there is really no way to determine if something went wrong or if this is technical error. If you have too few reads per sample, that would explain it. Run alpha rarefaction on your samples to see how diversity scales with sampling depth.

I hope that helps!

vpandyar · October 15, 2018, 4:53pm

These responses are very helpful! I suppose I will have to play around more and run more experiments to verify my results. You are correct that I don't have a great comparison study for my data and so my results may be true. My only point of reference is my own control data which appeared to have significantly more diversity and taxonomic assignments than my experimental groups. I'll run the alpha rarefaction some time today and let you know how it goes.

Thanks again, I'm learning a ton!

vpandyar · October 16, 2018, 6:11am

I've attached my alpha-rarefaction data here. alpha-rarefaction.qzv (313.4 KB). If I am interpreting this correctly at a sampling depth approximately > 1000 I am essentially sampling an adequate mix of species and am technically not "missing" anything (barring extremely rare species). Is this correct? I see that my number of observed OTUs are low for my HFD and CD-HFD groups as well. Is there anything else I can glean from or say about these data? Again thank you very much, this has been extremely helpful for a newbie such as myself!

Nicholas_Bokulich · October 16, 2018, 12:10pm

Sounds about right, though I would go for 2000 instead. There is a bit of an increase from 1k to 2k in ND.

system · November 16, 2018, 6:10pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.