feature counts by sample? and how to do correct rarefy

moonlight · February 7, 2020, 2:08am

Hi Mehrbod,

Thanks for you patience and I really appreciate. I have figured out most of them.
Some follow up:

1>"Which taxa do you not want? Is there a specific reason why you need to remove some?" -- This is a good question. We are trying to target Archaea. There is on prefect archeal primers so far. If you do 16S sequencing and assign taxa. A lot of them would be assigned to bacteria or unsigned. So, we want to filter them out and only leave those reads assigned.

2>"The option of justConcatenate is not available in Qiime2 as there is in native DADA2 in R"

Just to confirm -- if I have 100 F reads and 100 R reads. If their quality are perfect, it should show as "200" reads in the merged column (https://view.qiime2.org/visualization/?type=html&src=https%3A%2F%2Fdocs.qiime2.org%2F2019.10%2Fdata%2Ftutorials%2Fatacama-soils%2Fdenoising-stats.qzv).

Am I correct? since we don't do any concatenate, just simply pair them.

3> I know you doesn't advise to concatenate to long reads. -- I am wondering if long reads would have some advantage to assign to taxa (more accuracy) for downstream analysis?

4> Thanks for the suggestion on Fungal analyses. I will read that link. Hmm, my fungal primers' barcode is on reverse reads (EMP primers). Hmm, not sure if I should use all reverse reads in this case. Basically, I got three fastq files. F.fastq, R.fastq, barcodes.fastq. -- since F.fastq is no barode, I might not be able to demux if I just use single reads denoising workflow for Forward. Normally, Foward reads quality is better. It doesn't make sense to me if we use reverse reads. I will see, if I have any questions. I will ask? -- Any suggestion in advance?

5> The last question is about the script "qiime diversity core-metrics" and its output. I follow the moving picture tutorials.(“Moving Pictures” tutorial — QIIME 2 2019.10.0 documentation)

A>
qiime diversity core-metrics-phylogenetic
--i-phylogeny rooted-tree.qza
--i-table table.qza
--p-sampling-depth 1103
--m-metadata-file sample-metadata.tsv
--output-dir core-metrics-results
The default matrices are shannon, chao1, observed_otus etc? As far as, I know you there are more than 20 matrices in QIIME2. If I want to add one more matrix such as Good's coverage, how can I add it to core diversity script?

Or I have to do it manually as suggested here (Alpha and Beta Diversity Explanations and Commands)

B> In the core alpha diversity analysis above, it rarefy the master feature table to equal depth of 1103, right? I don't think the output of core diversity analysis save the sub-sample feature table? Would it be possible for me save it using this scripts? If I am not saving this subsample table, I can't use this for downstream analysis. I know I can use the rarefy feature table to do it manually. Later, calculate the alpha diversity from the subsampled table. It seems the tutorial doesn't do this. Mostly, I use the subsample feature table to plot taxonomic plots? Is this a good idea? If you plot taxonomic plot for you research, do you use the total table or subsampled table (equal deapth).

C> About the "observed_otus" output. -- I can generate a observed_otus_vector.qza file and I use qiime tools export to export it to csv file.

Something looks like this

             observed_OTUs

sample 1 1000

sample 2 200

I am confused about the header "observed_OTUs". Does this mean observed ASV? I think I use DADA2 workflow and I don't do OTU clustering.

6>This is a general question. Do you normally filter out those rare OTUs/ASVs in your research? If you do, any general rules about this? The tutorial remove low abundance features, which is less than 10. I am not sure if this is general rule? I did my dataset at 50? Is this too high? I think this is trade-off. If you remove too many, it will give you a good rarefaction curve, but you lose diversity.

What do you normally do?

Thanks in advance