trimming read files containing a mixture of forward and reverse primers

adamsorbie · June 2, 2021, 1:20pm

Hi,

I'm working on a meta-analysis using 16S data obtained from the ENA/SRA. For this analysis i've written a DADA2 based pipeline to resolve ASVs. I do use qiime2 downstream but prefer to run DADA2 outside of qiime so I have a bit more freedom to fine tune things. I recently obtained some files which seem to be in a very weird format, where the forward and reverse reads contain a roughly 50/50 mixture of sequences containing either the forward or reverse primer. Unfortunately this means removal of primers is not simple and I can't just trim a specified portion from the left of each read. I've tried using cutadadapt as well but it doesn't seem to be working. Has anyone came across fastq files like the ones I have described before? If yes and you could offer some advice on how to deal with this, that would be great!

SoilRotifer · June 3, 2021, 6:05pm

Hi @adamsorbie,

I think this is what you're looking for:

and this bit:

-Mike

adamsorbie · June 4, 2021, 8:33am

Hi,

Thanks a lot! That looks very similar to the problem i'm having, i'm just having some trouble understanding exactly what this means.

As detailed information on primers can often be limited in certain publications, for the purposes of this meta-analysis I rewrote Benjamin Callahan's R code for detecting primers from this tutorial: DADA2 ITS Pipeline Workflow (1.8).

The output I get is something like this:

                 Forward Complement Reverse RevComp
FWD.ForwardReads   11227          0       0       1
FWD.ReverseReads    7752          0       0       1
REV.ForwardReads    7839          0       0       0
REV.ReverseReads   11353          0       0       2
                 Forward Complement Reverse RevComp
FWD.ForwardReads   11227          0       0       1
FWD.ReverseReads    7752          0       0       1
REV.ForwardReads    7839          0       0       0
REV.ReverseReads   11353          0       0       2
                 Forward Complement Reverse RevComp
FWD.ForwardReads    9683          0       0       0
FWD.ReverseReads    8637          0       0       1
REV.ForwardReads    8831          0       0       0
REV.ReverseReads   10240          0       0       0

To my understanding this means in each read file I have a mixture of forward and reverse primers present. I also quickly checked using grep and the primers, regardless of whether they are the forward or reverse are present at the start of each read.

e.g. this is an exert from the R1 file:

@NS500181:15:H0YA6BGXX:4:23612:7477:3050 2:N:0
TTCTCTGAGCC 805r **GGACTACCCGGGTTTCTAAT** CCTGTTTGCTCCCCACGCTTTCGCACCTGAGCGTCAGTCTTCGTCCAGGGGGCCGCCTTCGCCACCGGTATTCCTCCAGATCTCTACGCATTTCACCGCTACACCTGGAATTCTACCCCC
+
<AAAAFFFFFFFFFFFAFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FFFFFFFFF7FFFF.FFFFFFFFFAFFFFFFFFFFFFFFFFFFFFFFF.FFFFFFFFAFFAFFFFFFFFFF<FFFFFFF.FFFAF.F<FFFFFFFA<AFAF<7FF
@NS500181:15:H0YA6BGXX:4:23612:12806:3814 2:N:0
GGCAAGA 514f **GTGCCAGCAGCCGCGGTAAT** ACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTTTGTTAAGTCAGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCTGATACTGGCAAGCTTGAGTC
+
AAAAAFFFFFFFFFFFFFFFFFAFFFFFFFFFFFFFFFFAFFFFFFFFFFFFFFFFFFAFFFFFFF<FFFF<FFFFFFFFFFFFFFFFFFFFFFFFFFAAAFFFFFFAFFAFFA.FFFFFF.FFF<FFFFFAFF.FFFFF<AF.FFFFFF<

Now, this is where my understanding gets a little muddled. If the orientation was mixed would it not be the case that I find just one primer in each read file but in either as the forward or complement for example?

SoilRotifer · June 4, 2021, 5:12pm

Hi @adamsorbie,

No problem.
Based on the description in your initial post, and the new output you provided, this apperas to be the exact same issue, unless I am missing something.

Not sure I understand...

Yep, this is correct. You've confirmed that your reads are in mixed orientation. The threads I linked you to should resolve the trimming for you.

Can you clarify what is causing the confusion, as I am not sure what you are asking here?

But as you initially suspected in your initial post, and as revealed in your example, you'll have either the forward primer or the reverse primer in the 5' portion of either the R1 or R2 reads. This is why you'd enter in both primers for trimming the R1 and R2 reads in the provided cutadapt example I linked to.

adamsorbie · June 7, 2021, 1:07pm

Ah I think I misunderstood what mixed orientation actually means. I actually struggled to find a concrete definition anywhere but I had assumed it meant that reads contained a mixture of forward, reverse/reverse complement sequences, rather than orientation being forward or reverse. Thanks for your help.