I am using dada2 filtering without trimming as this trimming discarded 70%-80% of my reads which was not acceptable. So I decided to trim the length after I got representative sequences. I looked in the form how to do that but I did not find any answer.
here the command used:
qiime dada2 denoise-paired
–i-demultiplexed-seqs demux-lane2-all.qza
–p-trunc-len-f 0
–p-trunc-len-r 0
–p-chimera-method none
–p-n-threads 0
–output-dir dada2
–verbose
R version 3.4.1 (2017-06-30)
Loading required package: Rcpp
DADA2 R package version: 1.6.0
Filtering
…
2) Learning Error Rates
Not all sequences were the same length.
Not all sequences were the same length.
2a) Forward Reads
Initializing error rates to maximum possible estimate.
Sample 1 - 1381178 reads in 388236 unique sequences.
selfConsist step 2
selfConsist step 3
selfConsist step 4
selfConsist step 5
Convergence after 5 rounds.
2b) Reverse Reads
Initializing error rates to maximum possible estimate.
Sample 1 - 1381178 reads in 560079 unique sequences.
selfConsist step 2
selfConsist step 3
selfConsist step 4
Convergence after 4 rounds.
Denoise remaining samples Not all sequences were the same length.
Not all sequences were the same length.
.Not all sequences were the same length.
Please does anyone know how to visualise all my seqs length and also trim to the same length before taxonomic analysis?
Does that mean that your reads had non-biological sequence present in it when processed in DADA2? If so, that is a problem, and will need to be addressed.
Taking a step back to your question - there isn't a way to trim the FeatureData[Sequence] type - and I think for good reason, since that is effectively altering the identity of a feature, after that feature has been identified. You could imagine two features that are different:
AAACGT
AAACGA
But if you take off the last nucelotide:
AAACG
AAACG
So now you feature table is all wrong, because these two features are now the same - see the issue here?
Okay, so as I mentioned above, we need to double-check that the denoising portion of this is working as expected. How about you send a long your demux summarize viz, and the command you ran previously that resulted in such a huge loss of reads. Let's take our time to think about why that was happening, then we can move forward. Thanks!
Hello Matthewthermokarst,
Thanks for your quick reply and clarifications.
Does that mean that your reads had non-biological sequence present in it when processed in DADA2? If so, that is a problem, and will need to be addressed.
No my reads do not have any non-biological sequences or any chimeras.
Taking a step back to your question - there isn’t a way to trim the FeatureData[Sequence] type - and I think for good reason, since that is effectively altering the identity of a feature, after that feature has been identified. You could imagine two features that are different:
yes for sure they are not the same, but I mean after alignment is there away to trim alligned-seq.qza and use these in all further analysis instead of these representative_sequences.qza?
qiime alignment mafft
–i-sequences representative_sequences.qza
–o-alignment alligned-seq.qza
Or do you think it shall be ok if i do the diversity and taxonomy with sequences of different length?
Okay, so as I mentioned above, we need to double-check that the denoising portion of this is working as expected. How about you send a long your demux summarize viz, and the command you ran previously that resulted in such a huge loss of reads. Let’s take our time to think about why that was happening, then we can move forward. Thanks
regarding the previous filtering I used this:
qiime dada2 denoise-paired
–i-demultiplexed-seqs demux-lane2-all.qza
–p-trunc-len-f 0
–p-trunc-len-r 250
–p-n-threads 0
–output-dir dada2
–verbose
and I got that:
sampleid
Filtered
NoFiltered
sample149
996,849
1729815
sample148
995,030
1999153
sample137
943,225
1821175
sample193
786,771
1588410
sample68
743,510
1738200
sample135
727,715
1309344
sample54
724,839
1497721
sample202
718,309
2075071
sample46
701,031
1363918
sample76
656,272
1280884
sample60
634,270
1349454
sample122
606,535
1592833
sample166
606,291
1183932
sample96
598,620
1256152
sample147
591,489
1038659
sample75
578,145
1055157
sample120
577,642
1060561
sample144
572,054
1150436
sample134
566,249
1075430
sample145
550,086
877663
sample189
538,855
1112052
The length restriction discarded many of my reads. So I can not afford using this command.
In the future please just attach the QZV - these photos of your monitor are very difficult to read.
I took a look through and your demux seqs look good!
Okay, so it seems like your untrimmed reads through DADA2 seem reasonable.
With that out of the way, back to the main question:
No - why would you do this? I have been asking around and I haven't been able to figure out a reasonable workflow that would do this to your ASVs --- can you provide some context?
From what I see, you should be fine to proceed with your FeatureTable[Frequency] & FeatureData[Sequence] as-is, no trimming necessary...
Thanks for this reply about demux summary it is really a big relief to me. so the data is ok .
Sorry for the inconvenience caused, I did not know how to attach the file from my HPC account.
Do you think it would be fine to proceed with my FeatureData[Sequence] without trimming? my supervisor suggested to trim before any taxonomic analysis to ensure the assignment is as accurate as possible and also to avoid any false positive assignment due to short length of some representative sequences.
Do you think length variation in my FeatureData[Sequence] won’t impact alpha and beta diversity?
Thanks a lot for your help and advice.