Hi, I’m a biginner in QIIME2 world. (16S rRNA seqs)
I have questions at barplot with collapsed ASVs (by q2-taxa collapse).
Below are detailed information about our analysis.
In the beginning, I got pair-end fastq files with illumina miseq (v3-v4), and performed a standard pipeline in the moving pictures with q2-dada2 (forward right trim 280, forward left trim 21, reverse right trim 220, reverse left trim 17) program. After then, I used scikit-learn classification with naive-bayes classifier (database: 99% greengene). Also, the feature table was collapsed with q2-taxa collapse.
I often used the .csv files in a bar plot at qiime2 viewer, but I have some questions about relative abundance value in this barplot with collapsed ASVs.
I have two questions about the process where features in FeatureData [Taxonomy] become ASVs in taxa barplot csv file.
-
In FeatureData [Taxonomy], there exist two kinds of features (rows) that are not identified at species level. For example, multiple rows belong to the genus Neisseria, but without species information. Those show either of two different annotations:
Feature 1) … g__Neisseria.s__
Feautre 2) … g__Neisseria
How do the two annotations differ? Is there any difference in the meaning of feature 1 and 2? -
FeatureData [Taxonomy] contains multiple features that belong to feature1 and 2. However, when I downloaded taxa barplot as csv file and opened it, I saw only two ASVs:
ASV 1) … g__Neisseria.s__
ASV 2) … g__Neisseria.__
Therefore I guessed features that belong to feature 1 merged into a single ASV (ASV 1) and feature 2s also merged into a single ASV (ASV 2).
Can we regard ASV 1 and 2 as a single component when drawing stacked barplot (composition) at species level?
To elavorate, there, too, exist species-level ASVs such as;
ASV 3) … g__Neisseria.s__mucosa
Is it fair to compare ASV1, 2 with ASV 3 at species level? Or is it better not to perform species-level comparisons on composition when using 16S rRNA data?
These questions first arose when we downloaded taxa barplot data as csv file, and tried to draw composition plot at species level based on it. I’m greatly concerned that these "species-level unassigned"s might bias our interpretation on relative abundance data.
Above were two questions on ASVs, and I have a short question on representative sequences.
- I’m not sure about the meaning of representative sequences given for each ID in FeatureData [Sequence]. Are the sequences showing the representatives drawn from the data made from our analysis, or are they the representatives of each item from the database?
I know questions are quite long. Thanks in advance for your kind help.
Best,
Kinam