What is the decent amount and quality of reads for 16s sequencing?

Abul_Bashar · February 17, 2024, 4:55pm

Hi everyone! I am very new to this world.

For determining the archaeal community structure, I prepared my own library using 16S archaeal primer pairs (519F and 915R) and sent them for miseq (2*250) sequencing. Sequencing yielded minimum 33k ( mean 96k) fastq reads for each sample. Is the amount of reads decent for downstream analysis?
All bases have a quality score (at 25th percentile) of more than 37 until 250th position. Considering the quality plot, should I keep all the bases during denoising with DADA2 or should truncate somewhere
before 250?

timanix · February 17, 2024, 6:55pm

Hello!

Happy birthday then

Counts are great!

I would truncate F reads at 235/240 and R at 225 and see what the output will be. Also, I would increase ee values, at least for reverse reads.

When you will proceed to the taxonomy assignment, check out RESCRIPT plugin for Qiime2, which will help you to train your own classifier based on primers you are working with.

Best,

Abul_Bashar · February 19, 2024, 5:43am

Hi @timanix thanks a lot for the insight!

I've running the qiime2 pipeline using my university server. I have been trying to denoise using DADA2 keeping the trunc lengths 240 and 225, respectively. Each time, it took 4-5 hrs and came out with no result. Should I subsample my data to fast the run time?

qiime demux subsample-paired
--i-sequences demux-full.qza
--p-fraction 0.3
--o-subsampled-sequences demux-subsample.qza

In that case, I wonder, if sub-sampling causes me losing any data and what should be the fraction size?

Thanks again for your response.

timanix · February 19, 2024, 7:47am

Could you please elaborate on it? "No results" means that the job just killed with no output produced or you have the output but all /most sequences are filtered out? In the first case you should request more resources from the cluster (make sure that you have more RAM and storage for temp files), and in the second try to reduce 240 parameter to 230.

Subsampling will lead to exactly what in its name: subsampling of all the sequences from each sample to the fraction specified. I would avoid it if possible (by requesting more RAM and storage from the cluster). If not, I would start with 0.9 and try Dada2, then decrease it by 0.1 with every attempt.

Best,