Do primers need to be removed? Or are we talking about adapters?

Hii!

I'm facing a conceptual question.. I'm using Illumina Miseq 2x300bp files.

When preparing libraries, three main "concepts" are combined: primers for amplifying the specific gene, adapters for compatibility with the sequencer, and barcodes for sample identification. Is this correct?

I receive one FASTQ file per sample, so the sequencing is already demultiplexed and I understand that the barcodes are no longer within the sequences I receive, right?

It remains to understand what to do with the adapters and primers... In the FASTQ files we receive from the sequencer, would the adapter and primer sequences be present? I've checked different posts and tutorials and everyone talks about primers, but does those primers refer to PCR primers or are referring to the adapters?

From the genomics platform, they have told us that they used the following protocol:

On page 3, it talks about primers:

16S Amplicon PCR Forward Primer = 5'
TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGNGGCWGCAG
16S Amplicon PCR Reverse Primer = 5'
GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACTACHVGGGTATCTAATCC

But also about adapters:

Forward overhang: 5’ TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG‐[locus‐ specific sequence]
Reverse overhang: 5’ GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG‐[locus‐ specific sequence]

And if we look, the adapter sequence matches the beginning of the primers above...

Knowing these sequences.. How should I use cutadapt? Should I remove the adapter sequence or the adapter+primer? We are unsure if it's better to remove everything or if removing just the adapter is sufficient, as the primer is part of the gene...

Thank you so much in advance!

Hello @MiriamGorostidi,

When preparing libraries, three main "concepts" are combined: primers for amplifying the specific gene, adapters for compatibility with the sequencer, and barcodes for sample identification. Is this correct?

Yes

I receive one FASTQ file per sample, so the sequencing is already demultiplexed and I understand that the barcodes are no longer within the sequences I receive, right?

The sequences are already demultiplexed, yes, but the barcodes may or may not be in the sequences still. Usually they are not. This is something the sequencing center should have communicated to you. It depends on the protocol used.

It remains to understand what to do with the adapters and primers... In the FASTQ files we receive from the sequencer, would the adapter and primer sequences be present? I've checked different posts and tutorials and everyone talks about primers, but does those primers refer to PCR primers or are referring to the adapters?

PCR primers are usually called primers and the adapters are usually called adapters or sequencing primers. But yes, sometimes people use one to refer to the other.

And if we look, the adapter sequence matches the beginning of the primers above...

It looks like TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG is the adapter and CCTACGGGNGGCWGCAG is the 16S primer. (And similar for the other two sequences).

Knowing these sequences.. How should I use cutadapt? Should I remove the adapter sequence or the adapter+primer? We are unsure if it's better to remove everything or if removing just the adapter is sufficient, as the primer is part of the gene...

You should be safe to trim using the two primers. This should remove any adapters that are present as well because they will be downstream (i.e. further to the 3' end) of the primers.

That being said, it's a little strange that a "meta genomic library prep" was used for these sequences if these are 16S sequences. Do you know which of these your sequences are?

5 Likes

Hi @colinvwood !!

Thank you so much for your rapid and clear response!

The sequences are already demultiplexed, yes, but the barcodes may or may not be in the sequences still. Usually they are not. This is something the sequencing center should have communicated to you. It depends on the protocol used.

Yes! I definitely concluded that we should contact with the sequencing center again and ask everything..

You should be safe to trim using the two primers. This should remove any adapters that are present as well because they will be downstream (i.e. further to the 3' end) of the primers.

Ok amazing! I was not sure if trimming the primers would trim the adapters too, It is so good to know! Thank you!

Would you then recommend to use the --p-front or the --p-anywhere parameter (--p-adapter would be incorrect since it refers to the 3' end, right?)?

This is what I have right now:

  qiime cutadapt trim-paired \
    --p-cores 4 \
    --i-demultiplexed-sequences samples.qza \
    --p-front-f CCTACGGGNGGCWGCAG \
    --p-front-r GACTACHVGGGTATCTAATCC \
    --p-match-read-wildcards \
    --p-match-adapter-wildcards \
    --p-discard-untrimmed \
    --o-trimmed-sequences samples-trimmed.qza \
    --quiet

Regarding the "Metagenomic" protocol, I understand that what it is weird is to call it metagenomic, since it is an Amplicon sequencing, right? But it does amplify only V3 and V4 regions..

Metagenomic studies are commonly performed by analyzing the prokaryotic 16S ribosomal
RNA gene (16S rRNA), which is approximately 1,500 bp long and contains nine variable
regions interspersed between conserved regions. Variable regions of 16S rRNA are frequently
used in phylogenetic classifications such as genus or species in diverse microbial
populations.
Which 16S rRNA region to sequence is an area of debate, and your region of interest might
vary depending on things such as experimental objectives, design, and sample type. This
protocol describes a method for preparing samples for sequencing the variable V3 and V4
regions of the 16S rRNA gene.

So I guess It would be correct, right? This step was decided by the genomic service..

Thank you so much again!

1 Like

Yeah. I define 'metagenomics' as untargeted (shotgun) DNA sequencing.

Amplicon sequencing is much cheaper than shotgun sequencing, so there's a business incentive to conflate the two and sell the cheaper amplicon sequencing as 'metagenomics.' :person_shrugging:

4 Likes

Hello @MiriamGorostidi,

You definitely want:

  • --p-front-f (forward primer)
  • --p-front-r (reverse primer)

which will remove the forward primer from the 5' ends of the forward reads and the reverse primer from the 5' ends of the reverse reads.

You may also want:

  • --p-adapter-f (reverse complement of reverse primer)
  • --p-adapter-r (reverse complement of forward primer)

which will deal with reads that read through the entire amplicon and into the primer (and possibly the adapter) at the other end. Since you have the V3/4 region, which is on average 450bp I believe, and 300bp reads, this might not have happened at all in your case. Could still be worth checking though, maybe as a separate cutadapt run.

You don't want --p-anywhere because that will search for the same sequence at both the 5' and 3' ends of a read, a sequence we wouldn't expect to exist in this situation.

2 Likes

Yeah.. I guess It is a marketing thing :roll_eyes:

1 Like

Hi @colinvwood !

Thank you so much! I think we should consider to check the --p-adapter run that you mention! However, how can I know if that would be necessary or not? And what would happen if I don't apply it?

Thank you!

Hello @MiriamGorostidi,

However, how can I know if that would be necessary or not?

You can't really know without checking for them. But there's no harm in checking and them not being there.

And what would happen if I don't apply it?

You could end up with primers and adapters at the 3' ends of some of your reads. Depending on the downstream denoising method you use they could get removed then.

hi @colinvwood !

Perfect! Thank you so much for your help, it's been crucial :slight_smile: