uneven coverage: subsampling of joined R1 and R2 or only R1?


thanks for recent feedback about rarefactions/ corrections using subsampling of uneven coverage among samples.
I was experiencing also diffciulties in classification.

I think that stating from same number of sequences (i.e. 30,000) joined R1 and R2 sequences I could arrive to classify compostion of samples.

I understand that if would be possible to start with more sequences and even coverage the results could be better, I just wanted to go into details of the experiment to highlight possible amelioration.

Since joined sequences tend to be filtered in a certain percentage due to quality (from 30,000 I retain nealy 12,000 I was wandering if it could be a reasonable compromise to try and use only R1 (300 nt) to go for OTu clustering an d classification. I imagine that we loose power in the sense that the sequence is shorter however quality could be better so I imagine we could have more data if classification works...

Would it be acceptable?

Thanks a lot

Hi @MichelaRiba,
I am not quite sure what process you went through to merge your reads. I personally try to do everything I can to keep my reverse reads because it does allow for better classification and allows you to investigate your data a bit more.

However if there is no way to increase the amount of reads that are making it through quality control, using just your forward reads is fine. With shorter sequences your taxonomic resolution definitely decreases, but if you are losing too many reads then that might be your best option. Its really up to the analyst. So, to directly answer your question, yes it is acceptable. I personally would really try to make joined reads work but if I couldn't without losing a lot of reads I would switch to just forward.

I would also probably run the joined one with few reads and the forward reads analysis and parallel so that I could compare the side by side at later steps.
Hope that helps!


I thank you very much for your kind expert opinion,

1 Like