Quick update, I have tried to check whether the reverse complement reverse primer is present in my forward reads and I couldn’t seem to find it.
(to do this I opened the forward reads fasta in notepad++, then ctrl+f, my primer sequence, it uses one degenerate base (K) so I tried both possible variants). Would this have actually worked, or is there a better way to go about this?
Thanks for the advice (sorry for the extra questions!), I can’t really see if/how I can use cutadapt in isolation? Is it a component of demux-paired/singe & trim-paired/single? If so this dataset has been passed through demux already, do you think it is worth attempting with a trim-paired step?
Thanks I will give that a shot!
Is it okay if I just ask for some advice on how to run trim-paired? Under the --p-front-f & --p-front-r options, am I supposed to give my adaptor sequence, or can I give it my PCR primers so that it removed everything upstream of them?
Is there any way of me finding out my adaptor sequence?
I would suggest looking at the cutadapt docs — https://cutadapt.readthedocs.io/en/stable/ — that describes their filtering semantics — q2-cutadapt simply exposes the parameters defined by cutadapt. Both of those examples in your question are cases covered in those docs.
Not in cutadapt, to my knowledge. Tools like FastQC can help identify illumina-specific non-biological-sequence, YMMV.
The adapter shouldn’t actually be necessary to know as those sequences flank your primers which you do (or at least should) know. So if you trim everything before your forward primer and everything after your RC’ed reverse primer, you will remove all non-biological sequence
Thanks again for your replies! I have ran the sequences through cutadapt (see below for code), and then passed it through dada2 again, and I am still losing the same proportion at the chimera checking step. Is what I ran correct?
Another interesting development…
I have ran both the forward and reverse reads through dada2 denoise single separately, treating them as single end reads. I submitted these as scripts to our cluster as it can take >10 hours for dada2 to process this data-set sometimes, so I don’t have any details of how many reads were filtered at each step. What I found however is that the forward reads retained ~70% of the sequences, whereas I only retained ~12% of the reverse reads. I have ran some of the most abundant features through BLAST and nothing looks like a chimera! Could the issue here be something to do with my reverse reads? Or some merging problem resulting in chimeras?
Maybe. I have had such a problem and it happened to be a truncating + merging issue. In my case I was truncating too much and letting a smaller than the necessary overlap (<20 bp). In addition, the overlapping region in my reads is already small, 30-40bp, se check it out if that’s not the case of your reads.
In addition, the overlapping region in my reads is already small, 30-40bp, se check it out if that’s not the case of your reads.
Thanks for the suggestion! I have looked into this and am now even more confused… My amplicon is 167bp, yet somehow my reads are over ~245bp… (looking at the reads after removing adaptors/primers, no other processing). Judging by the output from previous dada2 runs (see above) the merging seemed to go okay (although if my reads are that length my p-trunc-len parameters wouldn’t have left much overlap
I don’t remember seeing a table or anything, but it appeared to run to completion (no error messages etc) and it took a few hours to complete, other than that I am not entirely sure… sorry! (it was a while ago and I am afraid I did not save it!)
No problem I will when I get a chance (I’m currently progressing with annotating the forward reads, attempting to train a classifier using the SILVA database, which is proving challenging! I’ll report back after this is done)
Recently I run into the similar issue, the sequence after merging is still good, bot after the chimera check, it lost ~80%. I tried to trim more sequences off and use the “–p-min-fold-parent-over-abundance 8” ~80% sequences are retained. I am now doing the taxonomy assign. I will keep updating.
I have the same issue, I am trying to trim more sequences using “–p-min-fold-parent-over-abundace8”, my question is does this increase the time required to more than one day in order to get the denoising stats?
Thanks for flagging this up, the main reason I altered “–p-min-fold-parent-over-abundance” was to test whether this parameter was responsible for my reads being discarded, by lowering the threashold. In the end, I did not use this line for the final analysis (and we have actually since repeated this experiment and have much higher quality sequencing coverage on some indepentant replicates of these samples), I did not appear to have the same chimera issue in the second dataset.
I am afraid what you are looking at is probably what you think, PCR artifacts caused by high number of cycles, low starting material and possibly contamination.
Some amount of chimeras inevitably arise in process of PCR and they compete with your sequences for amplification. Hence high starting material and low cycle advices.
The fast that first time this scheme worked also points to possibility of contamination of working environment. I also had times when PCR worked once but when I do same PCR I don't get what I need, only strange short fragments. It can be prevented by using filter tips and sticking to general PCR guidelines, but completely removing contamination is not always possible. Again, high starting material and low cycles reduce effect of contamination.