How to visualise the length of representative sequences after dada2 filtering

I am using dada2 filtering without trimming as this trimming discarded 70%-80% of my reads which was not acceptable. So I decided to trim the length after I got representative sequences. I looked in the form how to do that but I did not find any answer.
here the command used:
qiime dada2 denoise-paired
–i-demultiplexed-seqs demux-lane2-all.qza
–p-trunc-len-f 0
–p-trunc-len-r 0
–p-chimera-method none
–p-n-threads 0
–output-dir dada2
–verbose

R version 3.4.1 (2017-06-30)
Loading required package: Rcpp
DADA2 R package version: 1.6.0

  1. Filtering


2) Learning Error Rates
Not all sequences were the same length.
Not all sequences were the same length.
2a) Forward Reads
Initializing error rates to maximum possible estimate.
Sample 1 - 1381178 reads in 388236 unique sequences.
selfConsist step 2
selfConsist step 3
selfConsist step 4
selfConsist step 5
Convergence after 5 rounds.
2b) Reverse Reads
Initializing error rates to maximum possible estimate.
Sample 1 - 1381178 reads in 560079 unique sequences.
selfConsist step 2
selfConsist step 3
selfConsist step 4
Convergence after 4 rounds.

  1. Denoise remaining samples Not all sequences were the same length.
    Not all sequences were the same length.
    .Not all sequences were the same length.

Please does anyone know how to visualise all my seqs length and also trim to the same length before taxonomic analysis?

Thanks a lot.

Hello @Hajar!

Does that mean that your reads had non-biological sequence present in it when processed in DADA2? If so, that is a problem, and will need to be addressed.

Taking a step back to your question - there isn't a way to trim the FeatureData[Sequence] type - and I think for good reason, since that is effectively altering the identity of a feature, after that feature has been identified. You could imagine two features that are different:

AAACGT
AAACGA

But if you take off the last nucelotide:

AAACG
AAACG

So now you feature table is all wrong, because these two features are now the same - see the issue here?

Okay, so as I mentioned above, we need to double-check that the denoising portion of this is working as expected. How about you send a long your demux summarize viz, and the command you ran previously that resulted in such a huge loss of reads. Let's take our time to think about why that was happening, then we can move forward. Thanks! :qiime2: :t_rex:

Hello Matthew thermokarst,
Thanks for your quick reply and clarifications.

Does that mean that your reads had non-biological sequence present in it when processed in DADA2? If so, that is a problem, and will need to be addressed.
No my reads do not have any non-biological sequences or any chimeras.

Taking a step back to your question - there isn’t a way to trim the FeatureData[Sequence] type - and I think for good reason, since that is effectively altering the identity of a feature, after that feature has been identified. You could imagine two features that are different:

yes for sure they are not the same, but I mean after alignment is there away to trim alligned-seq.qza and use these in all further analysis instead of these representative_sequences.qza?
qiime alignment mafft
–i-sequences representative_sequences.qza
–o-alignment alligned-seq.qza

Or do you think it shall be ok if i do the diversity and taxonomy with sequences of different length?

Okay, so as I mentioned above, we need to double-check that the denoising portion of this is working as expected. How about you send a long your demux summarize viz, and the command you ran previously that resulted in such a huge loss of reads. Let’s take our time to think about why that was happening, then we can move forward. Thanks

regarding the previous filtering I used this:
qiime dada2 denoise-paired
–i-demultiplexed-seqs demux-lane2-all.qza
–p-trunc-len-f 0
–p-trunc-len-r 250
–p-n-threads 0
–output-dir dada2
–verbose
and I got that:

sampleid Filtered NoFiltered
sample149 996,849 1729815
sample148 995,030 1999153
sample137 943,225 1821175
sample193 786,771 1588410
sample68 743,510 1738200
sample135 727,715 1309344
sample54 724,839 1497721
sample202 718,309 2075071
sample46 701,031 1363918
sample76 656,272 1280884
sample60 634,270 1349454
sample122 606,535 1592833
sample166 606,291 1183932
sample96 598,620 1256152
sample147 591,489 1038659
sample75 578,145 1055157
sample120 577,642 1060561
sample144 572,054 1150436
sample134 566,249 1075430
sample145 550,086 877663
sample189 538,855 1112052

The length restriction discarded many of my reads. So I can not afford using this command.

thanks a lot.

Thanks @Hajar,

You missed one critical piece of information I asked for, and I can't comment until I see this:

sorry I do not know exactly which part is needed, please see attached.

I used v4-v5 16s, the expected amplicon length is 466bp.
Let me know if this what you are looking for:

Hey @Hajar!

In the future please just attach the QZV - these photos of your monitor are very difficult to read.

I took a look through and your demux seqs look good!

Okay, so it seems like your untrimmed reads through DADA2 seem reasonable.

With that out of the way, back to the main question:

No - why would you do this? I have been asking around and I haven't been able to figure out a reasonable workflow that would do this to your ASVs --- can you provide some context?

From what I see, you should be fine to proceed with your FeatureTable[Frequency] & FeatureData[Sequence] as-is, no trimming necessary...

morning thermokarst,

Thanks for this reply about demux summary it is really a big relief to me. so the data is ok :slight_smile: .
Sorry for the inconvenience caused, I did not know how to attach the file from my HPC account.

Do you think it would be fine to proceed with my FeatureData[Sequence] without trimming? my supervisor suggested to trim before any taxonomic analysis to ensure the assignment is as accurate as possible and also to avoid any false positive assignment due to short length of some representative sequences.

Do you think length variation in my FeatureData[Sequence] won’t impact alpha and beta diversity?
Thanks a lot for your help and advice.

Hajar

1 Like

Yes!

No, I don't. I do think that trimming them will impact your analysis in ways that might be undesirable.

Keep us posted! :qiime2: :t_rex:

An off-topic reply has been split into a new topic: Length-based sorting of rep seqs?

Please keep replies on-topic in the future.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.