subsample, but not randomly subsample

nmgduan · May 2, 2019, 3:36pm

Dear developers,

I have two reactors labelled as A and B, and each was conducted in triplicate labelled α, β and γ, and sampled at three time points labelled as 1, 2 and 3. So I have 18 paired-end sequence data, A_α_1, A_β_1, A_γ_1 and so on, and the average amounts of reads in one sequence data are 50,000.

I want to compare the microbial composition between A_1 and B_1 by edgeR. So I imported the sequences A_α_1, A_β_1, A_γ_1 into A_1.qza and imported the sequences B_α_1, B_β_1, B_γ_1 into B_1.qza. And then the two .qza files were denoised by DADA2, so I got two FeatureData[Sequence] and two FeatureTable[Frequency] and the used these file to make comparison.

Nevertheless, the parameter --p-n-reads-learn in DADA2 shows higher and more reliable error model. So would it be better to import all 18 sequences data into one .qza file?

If so, how to subsample the FeatureData[Sequence] and FeatureTable[Frequency] of A_1 and B_1 to only make the comparison between A_1 and B_1?
The qiime feature-table subsample would randomly pick samples.

Thanks in advance.

colinbrislawn · May 2, 2019, 6:17pm

Hello @nmgduan,

Thanks for posting again on the Qiime 2 forums!

So would it be better to import all 18 sequences data into one .qza file?

Yes! Importing all 18 samples into a single .qza file is the recommended way to do this.

The qiime feature-table subsample would randomly pick samples.

No... that command would randomly pick reads from samples.

I think some of these questions might be answered in the tutorials. These tutorials would also help with words. For example, you mentioned "So I have 18 paired-end sequence data" and I would say "I have 18 samples." But I still knew what you meant

Colin

system · June 3, 2019, 12:17am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.