I'm tring to locate some sequences in my rep_seq file after dada2 and relate them with their corresponding sample ID...
Specifically, I'm analysing the effect of the addition of some strains in soil samples... thus, I want to locate the sequences from my strains in the samples. Analyzing the rep_seqqza file after dada2, I can see the sequences and the number of samples they can be found in. However, I cannot know what are those samples... and for me it is very important if they are located in treated soils or control ones... I already locate the sequences by downloading the fasta file... but I don't know what are the samples where they were detected.
The answer to this question is in the core semantic types offered by Qiime2.
That is FeatureData[Sequence], which links sequence with feature ID.
You will also need FeatureTable[Frequency], which links feature ID and sample ID.
Then you will need sample metadata, which links sample IDs to their metadata context, like soil treatment group.
It goes like this:
feature sequence FeatureData[Sequence]
feature ID FeatureTable[Frequency]
sample ID metadata
soil treatment groups for those samples
This question is tricky because it requires a mental model linking together the data structures of Qiime2. We should probably have a diagram that explains this
Great! and how can obtain this kind of results? I mean, I already have both FeatureTables (after dada2 you got rep. sequences and frequency table), and also the metadata table... but I don't know how to get the table I need... just a table similar to the Frequency Table but with all the samples and the number of times I got each sequence in each sample...
And afterwards I've 3 files: the stats, with the quality filter info, the rep.seqs file, with the following info:
Feature ID Sequence Length Sequence
and the table.qza, similar to this:
Frequency
# of Samples Observed In
565510dc4fd6dc5637f352518bcad024
19,277
5
a29d551ec1784e2d4c623a6dc65a7ebb
15,773
5
ef7f45e97788eee3959ece505f84a23f
9,200
5
So I just have the number of samples observed in, but I don't know which samples are... So how can I get the FeatureTable[Frequency] with the sampleID and the samples... It would be amaizing get that!
First, we run qiime dada2 denoise-single ... --o-table ./dada2_table.qza,
and later we run qiime feature-table summarize ... --o-visualization ./dada2_table.qzv
That is the output of feature-table summarize, which is a .qzv file.
Go back a step to the output of dada2 denoise-single, which is a .qza file. Export the contents of that file and open it up with a text editor. Inside there is the table we are looking for.