Do I have adaptors and/or primers in my sequences

asbarros · September 16, 2023, 1:53pm

Hey everyone,

I am aware this has been a topic explored in length in some posts but, I wanted to double-check two major questions.

I have received 16s data for V4 using the EMP primers, using MiSeq. I do not know if the primers and adaptors were removed from my samples. Therefore:

(1) Regarding primers, do I need to check for primers? And if so, what should be the best way to do it?
Asking since there is a post that states I should not worry about primers if used the EMP protocol. Using Cutadapt w 16s Primers - #2 by Nicholas_Bokulich
Nevertheless, I wanted to be sure;

(2) For adaptors, would these show up in external tools which identify adaptors, such as FASTQC? I ran it and does not seem to show any adaptors whatsoever.

The samples just as imported are presented here:
LI_GF_Samples.qzv (313.0 KB)

Nevertheless, I ran cutadapt just for "sanity purposes" and I got the following result:

qiime cutadapt trim-paired \
--i-demultiplexed-sequences intermediate_files/LI_GF_Samples.qza \
--p-front-f GTGYCAGCMGCCGCGGTAA \
--p-front-r GGACTACNVGGGTWTCTAAT  \
--o-trimmed-sequences intermediate_files/LI_GF_trimmed_Samples.qza \
--verbose

LI_GF_trimmed_Samples.qzv (318.8 KB)

Looking at both number of sequences and sequence length, it does not seem to affected by this step. Should that be my output?

I am also sending one fastqc report from one of the forward files, that I have used to ascertain the adaptor issue.

dcpedroso.Olf.KO.F1.LI_36_L001_R1_001_fastqc .zip (1.1 MB)

Thanks in advance!

colinbrislawn · September 16, 2023, 4:56pm

Hello André,

It sounds like you have a good understanding of the options Qiime2 gives you. I want to zoom in on why this step is so difficult.

If your reads do not contain any adapters/primers, then the output will be the same as the input, like you observe here.

If your reads do contain adapters/primers, running cutadapt with the wrong settings will also produce output that is the same as the input. But this is wrong.

If cutadapt finds no primers, this could be cause there were none there, or because the wrong settings were used.

One of these is wrong, but the outputs are exactly the same and indistinguishable

For these resasons, cutadapt is unhelpful for finding primers.

Instead, you could look at the fastqc report:

So fastqc is not finding primers!

I have received 16s data for V4 using the EMP primers, using MiSeq.

If run correctly, this method will not sequence the adapters, and this matches our fastqc results!

asbarros · September 16, 2023, 5:20pm

Thanks @colinbrislawn for the detailed answer!

Is something similar to fastqc or fastp as a Qiime2 plugin in the works/already developed? This would be very helpful exactly for the reasons we've discussed.

This came up because I was now looking at the first positions of the reads (up to 5), which seem to have lower quality that the ones who go afterwards. I thought that could something related with primers or adapters. Do you think is wise to remove those 5 positions in dada2 step?

colinbrislawn · September 16, 2023, 7:54pm

A low-quality start for 16S reads is pretty common due to the low diversity of bases there. You could trim.

Not that I know of. I've used fastp and it's great, so I could see that being a useful plugin.

But if most folks are using the EMP primers, there's nothing to remove.
(I suspect this is why this is not a standard part of the pipeline.)

asbarros · September 16, 2023, 8:01pm

Thanks @colinbrislawn, really helpful!

asbarros · September 16, 2023, 8:43pm

Sorry to sound like a broken record but, I really want to understand in order to create a framework to use.

Based on what you said regarding the low diversity in the first bases, that would that, even if I trim it, the impact at the level of features/ASVs will not be considerable and, thus, the phylogenetic analysis will not be affected greatly. At least, this is what I would assume. Am I right? If yes, then, would there a theoretical benefit on trim these bases Vs keeping them?

Thanks once again

5cr34m · September 16, 2023, 10:24pm

Good evening.

Regards, sn

colinbrislawn · September 17, 2023, 1:12am

Right. Most reads are highly similar in this area (from the FastQC report).

This leads to low diversity/variety on the flowcell and lowers quality scores.
Trimming a highly conserved region will not change phylogeny or taxonomy very much.

Trimming this section will improve the cumulative expected error of the full read, which helps a little bit at the DADA2 quality filtering step. (Other pipelines like deblur may not care.)

system · October 18, 2023, 7:13am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.