Hi everyone in the Qiime2 community,
I'm a Qiime2 user who encountered some weird problems when trying to reuse a rumen microbiome public dataset. I would really appreciate if I can discuss what I'm facing now with you.
Here's the background, the dataset I'm using is from "A heritable subset of the core rumen microbiome dictates dairy cow productivity and emissions (https://www.science.org/doi/10.1126/sciadv.aav8391) and I'm trying to reuse their paired-end sequenced16s rRNA sequences to perform the taxonomic classification using Qiime2.
I downloaded the SRA files and each of them is for a specific domain, i.e., bacteria, archaea. After running FastQC for each FASTQ file I converted from the SRA files, the "Per base sequence quality" plot looks super weird, which has an arch shape.
In the paper, the authors describe how the amplicon sequences were initially processed with OBITools. With the above picture and some other clues, I infer that each FASTQ file should contains the joined paired-end reads produced by the function "obijoinpairedend" or "illuminapairedend" from OBOTools. So I decided to use DADA2 single-end sequence denoising, instead of paired-end though I'm not sure if I can just treat the files as single-end sequence files.
The denoising stats look fine to me. Here is the qiime visualization file I got from the bacterial sequence files.
bacteria_denoising_stats.qzv (1.2 MB)
For the taxonomic assignment, I used the pretrained gg-13-8-99-515-806-nb-classifier.qza from Qiime2. In the end, I got 30924 entries for all the baterial sequence files (from over 200 samples), as you can see from the figure as well as the file. However, if you look into the file, only less than 100 entries have genus-level resolution and all of them are even from the same genus.
bateria_taxonomy.qzv (3.9 MB)
So now I'm really confused. Is the dataset simply not usable? Or did I miss out something? What do you think?