How to visualise the length of representative sequences after dada2 filtering

Hajar · October 4, 2018, 10:03am

I am using dada2 filtering without trimming as this trimming discarded 70%-80% of my reads which was not acceptable. So I decided to trim the length after I got representative sequences. I looked in the form how to do that but I did not find any answer.
here the command used:
qiime dada2 denoise-paired
–i-demultiplexed-seqs demux-lane2-all.qza
–p-trunc-len-f 0
–p-trunc-len-r 0
–p-chimera-method none
–p-n-threads 0
–output-dir dada2
–verbose

R version 3.4.1 (2017-06-30)
Loading required package: Rcpp
DADA2 R package version: 1.6.0

Filtering

…
2) Learning Error Rates
Not all sequences were the same length.
Not all sequences were the same length.
2a) Forward Reads
Initializing error rates to maximum possible estimate.
Sample 1 - 1381178 reads in 388236 unique sequences.
selfConsist step 2
selfConsist step 3
selfConsist step 4
selfConsist step 5
Convergence after 5 rounds.
2b) Reverse Reads
Initializing error rates to maximum possible estimate.
Sample 1 - 1381178 reads in 560079 unique sequences.
selfConsist step 2
selfConsist step 3
selfConsist step 4
Convergence after 4 rounds.

Denoise remaining samples Not all sequences were the same length.
Not all sequences were the same length.
.Not all sequences were the same length.

Please does anyone know how to visualise all my seqs length and also trim to the same length before taxonomic analysis?

Thanks a lot.

thermokarst · October 4, 2018, 1:24pm

Hello @Hajar!

Does that mean that your reads had non-biological sequence present in it when processed in DADA2? If so, that is a problem, and will need to be addressed.

Taking a step back to your question - there isn't a way to trim the FeatureData[Sequence] type - and I think for good reason, since that is effectively altering the identity of a feature, after that feature has been identified. You could imagine two features that are different:

AAACGT
AAACGA

But if you take off the last nucelotide:

AAACG
AAACG

So now you feature table is all wrong, because these two features are now the same - see the issue here?

Okay, so as I mentioned above, we need to double-check that the denoising portion of this is working as expected. How about you send a long your demux summarize viz, and the command you ran previously that resulted in such a huge loss of reads. Let's take our time to think about why that was happening, then we can move forward. Thanks!

Hajar · October 4, 2018, 3:21pm

Hello Matthew thermokarst,
Thanks for your quick reply and clarifications.

Does that mean that your reads had non-biological sequence present in it when processed in DADA2? If so, that is a problem, and will need to be addressed.
No my reads do not have any non-biological sequences or any chimeras.

Taking a step back to your question - there isn’t a way to trim the FeatureData[Sequence] type - and I think for good reason, since that is effectively altering the identity of a feature, after that feature has been identified. You could imagine two features that are different:

yes for sure they are not the same, but I mean after alignment is there away to trim alligned-seq.qza and use these in all further analysis instead of these representative_sequences.qza?
qiime alignment mafft
–i-sequences representative_sequences.qza
–o-alignment alligned-seq.qza

Or do you think it shall be ok if i do the diversity and taxonomy with sequences of different length?

Okay, so as I mentioned above, we need to double-check that the denoising portion of this is working as expected. How about you send a long your demux summarize viz, and the command you ran previously that resulted in such a huge loss of reads. Let’s take our time to think about why that was happening, then we can move forward. Thanks

regarding the previous filtering I used this:
qiime dada2 denoise-paired
–i-demultiplexed-seqs demux-lane2-all.qza
–p-trunc-len-f 0
–p-trunc-len-r 250
–p-n-threads 0
–output-dir dada2
–verbose
and I got that:

sampleid	Filtered	NoFiltered
sample149	996,849	1729815
sample148	995,030	1999153
sample137	943,225	1821175
sample193	786,771	1588410
sample68	743,510	1738200
sample135	727,715	1309344
sample54	724,839	1497721
sample202	718,309	2075071
sample46	701,031	1363918
sample76	656,272	1280884
sample60	634,270	1349454
sample122	606,535	1592833
sample166	606,291	1183932
sample96	598,620	1256152
sample147	591,489	1038659
sample75	578,145	1055157
sample120	577,642	1060561
sample144	572,054	1150436
sample134	566,249	1075430
sample145	550,086	877663
sample189	538,855	1112052

The length restriction discarded many of my reads. So I can not afford using this command.

thanks a lot.

thermokarst · October 4, 2018, 3:27pm

Thanks @Hajar,

You missed one critical piece of information I asked for, and I can't comment until I see this:

Hajar · October 4, 2018, 3:37pm

sorry I do not know exactly which part is needed, please see attached.

I used v4-v5 16s, the expected amplicon length is 466bp.
Let me know if this what you are looking for:

Hajar · October 4, 2018, 5:12pm

thermokarst · October 4, 2018, 9:32pm

Hey @Hajar!

In the future please just attach the QZV - these photos of your monitor are very difficult to read.

I took a look through and your demux seqs look good!

Okay, so it seems like your untrimmed reads through DADA2 seem reasonable.

With that out of the way, back to the main question:

No - why would you do this? I have been asking around and I haven't been able to figure out a reasonable workflow that would do this to your ASVs --- can you provide some context?

From what I see, you should be fine to proceed with your FeatureTable[Frequency] & FeatureData[Sequence] as-is, no trimming necessary...

Hajar · October 5, 2018, 8:34am

morning thermokarst,

Thanks for this reply about demux summary it is really a big relief to me. so the data is ok .
Sorry for the inconvenience caused, I did not know how to attach the file from my HPC account.

Do you think it would be fine to proceed with my FeatureData[Sequence] without trimming? my supervisor suggested to trim before any taxonomic analysis to ensure the assignment is as accurate as possible and also to avoid any false positive assignment due to short length of some representative sequences.

Do you think length variation in my FeatureData[Sequence] won’t impact alpha and beta diversity?
Thanks a lot for your help and advice.

Hajar

thermokarst · October 5, 2018, 9:54pm

Yes!

No, I don't. I do think that trimming them will impact your analysis in ways that might be undesirable.

Keep us posted!

Nicholas_Bokulich · October 17, 2018, 7:04pm

An off-topic reply has been split into a new topic: Length-based sorting of rep seqs?

Please keep replies on-topic in the future.

system · November 18, 2018, 1:04am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.