I have soil samples, ITS2, V3-V4. I only took forward sequence for analysis. I used the following code for training UNITE classifier. I found major phyla missing in taxa bar plot comprising about 20 -25% of all phyla.
Can you provide more information? What is it you are expecting? What is the purpose of your study?
Normally, quite a bit of quality control is performed even after taxonomy assignment. That is, removing anythign that does not have at least a phylum level assignment, unassigned sequences, etc...
A good place to start is the QIIME 2 tutorials, I'd recommend this
Hello Mike,
My objective is to find out how the crop rotation and Nitrogen fertilizer rate affect on soil microbial communities. I am looking for the fungal community composition especially for Arbuscular mycorrhizal fungi(Glomeromycota). What I notice is even we do level 2, the top four phyla information is missing, it shows Kingdom fungi and nothing more than that. I was wondering why is that, is there any mistake in my code or I was not able to train the classifier?
Likely due to DNA extraction or the quality of the data. Again, your file looks quite typical. Are you sure that the primers you used are good at targeting arbuscular mycorrhizal fungi? Usually, one picks the primer set that best targets the organisms of interest.
Have you performed and data filtering and analysis yet? I'd not worry too much about taxonomy until you perform some alpha and beta diversity analysis with your ASVs. That will address your question about how Nitrogen fertilization affects your fungal communities. The analysis is performed at the ASV level. If you see changes based on your treatment then the taxonomy won't matter much anyway, unless you have other questions specific to taxonomy.
I noticed that your truncation length for DADA2 denoise-single is 285 bases. That is quite long. How is the quality in that part of the sequence? If the values are below 25 or 20 I'd try for a shorter truncation length. Otherwise there is too much noise for the denoiser to disambiguate true base changes from PCR / sequencing error and you'll obtain spurious ASVs (assuming low quality in that region).
Actually, those are nice quality scores. When possible I like to set my truncation to the position just before the 'bottom of the box' goes below 30. If I was to do this with you data I'd go for being super strict and set the truncation length to ~245 (if using the forward read only). But many are okay with 20 -25. Which I think is okay for merging paired-ends on occasion. You could also or to be more lenient, set the truncation length to ~278, were 'bottom of the box' is 25.
There are many opinions on this, so you'll just have to play around and see what works. Often you are stuck with the data you have. Again, I suggest you do some preliminary analysis first. In fact you can compare weather or not the truncation settings affect your data interpretation.
Hello Mike,
Thank you for explaining so well. I was actually taking forward sequence only, but I think I should try taking both sequences. When I took both F and R sequences for 16S, the percentage of merged non-chimeric sequences were only 36%. Do you think, this percentage is enough for further sequencing or you suggest taking only forward sequence?
I will be grateful for your suggestions.
Are you losing data at the merge or the chimera filtering. If merging than I suggest playing around with the truncation parameters, and/or try deblur for denoising. If you are losing many reads due to chimera checking I'd suggest setting DADA2's parameter --p-min-fold-parent-over-abundance 8 as per:
Hi Mike,
Yes, I lost most during merging.
I am wondering if we could use the AMF specific primers while using UNITE classifier instead of the primers used by the genomic center?