Amplicon size is longer than paired end read length

the_dummy · November 8, 2019, 10:59am

Well... This might sound stupid.

I have 2x300 MiSeq reads on 18S region, but the problem is, now, I have learnt that amplicon size is 634. Even without truncation, paired reads only reach 600 length so, I'm short of about 34 bp.

Maybe I understood the context completely wrong, so please help me understand it better.

Is there anything I can do to analyse the data or is it all trash? Can someone point me to a direction?

I'm completely lost on this subject...

jwdebelius · November 8, 2019, 12:36pm

Hi @the_dummy,

The data is perfectly fine. It just means that you can't do joined end work and have an at most 300 bp read. A lot of people work only with single end data (including a couple of tutorials here). So, import/denoise/cluster as single ended data, then make a feature table, and keep going.

Best,
Justine

the_dummy · November 8, 2019, 1:38pm

Is it ok to analyse reverse reads as single end reads, too? Can I compare the results of forward and reverse reads?

jwdebelius · November 8, 2019, 3:19pm

Hi @the_dummy,

You could work on the reverse reads alone; you can't combine forward and reverse reads easily. If you're running Illumina, then your reverse reads are likely lower quality anyway. So, i would just ignore your reverse reads and leave them be.

Best,
Justine

the_dummy · November 12, 2019, 6:16am

I was planning to add up otu tables from forward and reverse. Yes, reverse reads are lower quality, but doesn't the sequences from reverse reads align to different regions, so result in different otus? I will miss 34 bp + lower quality length but at least I would check the longest region possible with my reads.

Is there something wrong with my thought process?

jwdebelius · November 12, 2019, 12:41pm

Hi @the_dummy,

Are your samples such low quality that you need to add the extra reads? If you've got >5000 seqs/sample you're probably good for analysis, and the addition you get from joining just isn't there. Again, the most common practice is to simply generate the table from the forward reads.

Best,
Justine

the_dummy · November 12, 2019, 12:56pm

Well, there is a clear difference between forward and reverse reads. I analysed them both, and I didn't expect this much difference in results.

Feature count of forward reads were around 15k, whereas feature count of reverse reads were around 2k. You were right, there is no need to add reverse reads results to forward read results.

Thanks, I've learned so much from you