Hardly any features found

We received fastq files that were barcoded via the reverse reads. Therefore for the demultiplexing step we used the rev-comp command, and received tens of thousands of reads for each sample.

The problem is that in the next dada2 step very few features were found and even those were found only in ~5 reads (out of hundreds of thousands!). Obviously something went wrong there…

My question is: Is there a different command for the dada2 step when using rev-comp for demultiplexing?

And also can the reason for this be that the actual insert sequenced was quite short (169 bp, of which I cut off half of them from the reverse reads due to quality)?

Hi @Naamah!

Definitely sounds like something is going wrong. You should check out the stats file that dada2 produces to see where your sequences are being dropped, and how many are being dropped.

Based on your description, my guess is that this is an issue with getting paired-end reads to merge. There should be at least ~20 nt overlap between forward and reverse reads.

So the majority of your reads are being dropped and the few features you see at the other end are abnormally long amplicons that successfully merge. You will need to adjust your read trimming parameters to successfully merge other reads and achieve a higher feature count. If you cannot trim your reads long enough to overlap, you will need to use either the forward or reverse reads in your analysis as if they are single-end reads.

I hope that helps!

Thank you so much @Nicholas_Bokulich! It sure helped! When I redid the analysis on single end it worked and gave thousand of features (as expected). Apparently the reverse reads were poor and so i only used half of its length which was too short for any overlap with the forward reads. So all good, thanks!!

1 Like

Unfortunately, this often happens! I have experienced the same issue with my data many times.

Good luck!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.