PaBio 16s full length results

Hello everyone, I used DADA2 to denoise the PaBio 16s full length results, and it shows that only about 52% of the reads remain, meaning approximately half of the reads were filtered out. My question is there any way to improve non-chimeric reads? :star_struck:

Below are my run parameters(almost default parameters due to good sequence quality :blush:):
--p-min-len 1000 \
p-max-len 1600 \
-p-max-ee 2 \
-p-chimera-method consensus \
--p-trunc-len 0 \
-p-front AGRGTTYGATYMTGGCTCAG --p-adapter RGYTACCTTGTTACGACTT \
-p-pooling-method pseudo

sample-id input primer-removed percentage of input primer-removed filtered percentage of input passed filter denoised non-chimeric percentage of input non-chimeric
#q2:types numeric numeric numeric numeric numeric numeric numeric numeric
TEST-2-1 110107 96818 87.93 68197 61.94 66051 64843 58.89
TEST-2-10 91515 78719 86.02 55035 60.14 52532 52285 57.13
TEST-2-2 106974 91424 85.46 63453 59.32 60568 60245 56.32
TEST-2-3 79565 67482 84.81 46097 57.94 41891 41196 51.78
TEST-2-4 130730 118701 90.8 82551 63.15 80004 79610 60.9
TEST-2-5 100513 87518 87.07 60225 59.92 56492 54668 54.39
TEST-2-6 65963 55426 84.03 36095 54.72 30673 29070 44.07
TEST-2-7 62975 53712 85.29 35325 56.09 29044 27983 44.44
TEST-2-8 113017 99407 87.96 68709 60.8 66229 65375 57.85
TEST-2-9 62697 52938 84.43 34858 55.6 28950 28095 44.81
Control1 122802 105774 86.13 73771 60.07 70371 69612 56.69
Control10 65815 56335 85.6 36811 55.93 32124 31870 48.42
Control2 87008 74489 85.61 50744 58.32 46378 46034 52.91
Control3 88636 75912 85.64 49556 55.91 44859 44449 50.15
Control4 90676 78532 86.61 52777 58.2 48690 48496 53.48
Control5 98532 84166 85.42 56824 57.67 52006 51682 52.45
Control6 132235 113503 85.83 80783 61.09 79550 79170 59.87
Control7 57607 48528 84.24 30199 52.42 25273 25053 43.49
Control8 90543 76989 85.03 51824 57.24 47621 47056 51.97
Control9 69478 59249 85.28 38333 55.17 34027 33461 48.16
Model1 102228 87159 85.26 60575 59.25 58187 57810 56.55
Model10 79340 66546 83.87 45283 57.07 41115 40382 50.9
Model2 124623 106422 85.4 74589 59.85 72593 72375 58.08
Model3 58893 50807 86.27 33940 57.63 26314 25334 43.02
Model4 108932 92409 84.83 64621 59.32 61939 61535 56.49
Model5 77426 66561 85.97 46222 59.7 42811 42212 54.52
Model6 128813 114801 89.12 79910 62.04 77416 75340 58.49
Model7 130817 111453 85.2 78068 59.68 74622 73891 56.48
Model8 55531 46416 83.59 31201 56.19 27240 26667 48.02
Model9 69684 58807 84.39 38825 55.72 33134 32213 46.23

Here is 16s full length sequence quailty.
forward-seven-number-summaries.tsv (162.4 KB)

Hi @Rainjie,

Altering the value for --p-min-fold-parent-over-abundance might help. See here.

Hi, thanks for your reply. However, adding the parameter --p-min-fold-parent-over-abundance 8/16 does not improve the results. Still nearly half of the reads did not pass the chimeric filter.
Any suggestion? :blush:

Hi @Rainjie,

Sadly, I've never worked with PacBio data before, so that was my only quick suggestion. Not sure what the quality plots look like, but would truncation help at all? Perhaps someone in the forum with more experience working with PacBio data can help here?

-Mike

Hi Mike,

Thank you for your quick reply and for the suggestion about truncation. I really appreciate you taking the time to think about it. I’ll keep experimenting and also see if anyone else with PacBio experience can offer additional advice.

Thanks again, and have a nice day!

Hello @Rainjie,

Do you still require additional assistance with this?

Yeah, Any other suggestions?
--p-min-fold-parent-over-abundance 8/16 does not improve the results.

Can you please run qiime demux summarize on your data and post the resulting visualization here? That will help us figure out where your parameters should be set.

1 Like

I’m using dada2 denoise-ccs. Given the expected amplicon length of 16s rRNA sequence, I’m currently using:
--p-min-len 1000 \
--p-max-len 1600 \
--p-max-ee 2 \
--p-chimera-method consensus \
--p-trunc-len 0
Any suggestions are greatly appreciated—thanks!


forward-seven-number-summaries.tsv (162.4 KB)
per-sample-fastq-counts.tsv (478 Bytes)
dada2 denoise-ccs result , denoising-stats file is shown above.

Hi, I'm sorry to bother you, but I really need your help.
I’m using dada2 denoise-ccs . Given the expected amplicon length of 16s rRNA sequence, I’m currently using:
--p-min-len 1000 \
--p-max-len 1600 \
--p-max-ee 2 \
--p-chimera-method consensus \
--p-trunc-len 0
Any suggestions are greatly appreciated—thanks!

forward-seven-number-summaries.tsv (162.4 KB)
per-sample-fastq-counts.tsv (478 Bytes)

Hi @Rainjie,
Thank you for your patience!

I am not very familiar with dada2 with Pacbio sequences. What I notice when looking at your post-dada2 stats is that after filtering, we lose the majority of the sequences (~85 to ~60). I am wondering if messing with your --p-min-len may help?

Have you tried a lower min sequence length?

1 Like

Thanks a lot! We’ll try it out first and see how it goes. I’ll follow up in about two days (after the weekend) :star_struck:

1 Like

Hi @cherman2
Thanks for your suggestion!
Despite repeated attempts, we were unable to further improve this outcome. Under the same raw data and identical parameter settings, except expanding the allowable sequence-length range to 800–1600 bp did not alter the results; approximately 40% of sequences were still removed during filtering. So, that's that, we therefore proceeded with the subsequent analyses.

All the best
Rainjie

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.