Hello! This is a conceptual question, but I'd like to hear your opinons here.
I'm running a 16S analysis on some old plant samples, performed through Roche454 pyrosequencing 5 years ago.
I used the latest version (gg_13_8) of the Greengenes database to build my taxonomy reference file and then performed the taxonomic analysis, resulting in the following barplot:
The dark blue part of the bars is labelled as 'k__Bacteria;p__Cyanobacteria;c__Chloroplast;o__Streptophyta;f__;g__;s__'
Sadly, I think the problem is due to a bad primer choice (27 and 533).
I don't think there is any way to obtain something useful from these results, because the contamination affects more than half of the samples making comparisons impossible, but I'm asking just out of curiosity.
By reading the QIIME2 tutorial, I noticed the feature-classifier extract-reads method makes it possible to perform a training on the basis of the primers sequences, and it has been proven to improve results. I doubt that, but do you think this would make the results even slightly better?
No, since your issue is chloroplast amplification extract-reads will not help — your reads are chloroplast any way you cut it
The only advice I can give is to use qiime taxa filter-table to drop all chloroplast reads and hope you have enough reads and samples left over to make a meaningful analysis.
As I said, in the plot chloroplast DNA is represented by the dark blue part of the bars. Sadly, too many samples, 6 out of 12, are only constituted of this type of DNA.
I think the experiment has to be repeated, with a more specific primer choice and maybe with Illumina reads this time.