Using Cutadapt w 16s Primers

fabipc · July 19, 2019, 8:36pm

Hello,

I have a quick question regarding the removal of the 16s primers for demultiplexed Illumina files. First, I wanted to see if this was required, as I ran the cutadapt using the code below and no bases were removed (based on looking at "demultiplexed sequenced length summary".

Could it be that I did something wrong, or does cutadapt not trim the primers off the 16s, in general?

qiime cutadapt trim-paired
--i-demultiplexed-sequences Bacteria-demux-paired-end.qza
--p-adapter-f CTGCWGCCNCCCGTAGG
--p-front-f GACTACHVGGGTATCTAATCC \
--p-adapter-r GGATTAGATACCCBDGTAGTC
--p-front-r CCTACGGGNGGCWGCAG \
--o-trimmed-sequences Bacteria-demux-trimmed.qza
--verbose
**not sure why the \ are not showing on the code on here, but they were included when the code was processed.

Bacteria-demux-summary.qzv (300.7 KB) Bacteria-demux-trimmed.qzv (305.5 KB)

Nicholas_Bokulich · July 19, 2019, 9:57pm

cutadapt works the same on any type of DNA sequence data — it is agnostic to marker gene.

Are you sure you have the primers in your reads? E.g., if you used the EMP format the primers are not found in the reads because the PCR primers are also used as the sequencing primer. I recommend checking out the fastq to make sure.

Use three backtick characters ("```") before and after all code blocks to display "preformatted text" (or click on the formatting button at the top of the text box that looks like this: </> and type your text inside).

fabipc · July 22, 2019, 9:25pm

Hi Nicholas,

The primers are in fact on the sequences, and I actually re-ran the code, since I had entered it incorrectly, and I was able to get a file which shows the primers have been trimmed, but it also had a warning.

Unfortunately, I did not copy the warning but it said something along the lines of: "One or more of your adapter sequences may be incomplete" .....The adapter is preceded by an "A" quite often so the results should be interpreted with care.

At first I was not concerned since its just a warning, but after looking at my file, I see that most of the bars, starting at position 97 are pink, and it says:

  *"The plot at position 97 was generated using a random sampling of 9999 out of 7713524 sequences without replacement. This position (97) is greater than the minimum sequence length observed during subsampling (96 bases). As a result, the plot at this position is not based on data from all of the sequences, so it should be interpreted with caution when compared to plots for other positions. Outlier quality scores are not shown in box plots for clarity."*

Did I set up the code incorrectly?

qiime cutadapt trim-paired
--i-demultiplexed-sequences demux.qza
--p-adapter-f [RC of SD-Bact-0341-b-S-17 (F)]] \ # RC of the Fprimer (found in 3’ end of the sequence if you have primer read-through.)
--p-front-f [SD-Bact-0785-a-A-21(R)] \ #should be the primer on the 5’ end of the read
--p-adapter-r [RC SD-Bact-0785-a-A-21(R)] ]
--p-front-r [SD-Bact-0341-b-S-17 (F)]
--o-trimmed-sequences demux-trimmed.qza

Hi,

The primers were present and after checking my code and rerunning it, I get a trimmed file, but I am concerned on the forward reads. When I zoom in to choose an trim, trunc parameters on the forward read, I see that starting at position #80, the bars are in pink and these shows up.

The plot at position 80 was generated using a random sampling of 9999 out of 7713524 sequences without replacement. This position (80) is greater than the minimum sequence length observed during subsampling (79 bases). As a result, the plot at this position is not based on data from all of the sequences, so it should be interpreted with caution when compared to plots for other positions. Outlier quality scores are not shown in box plots for clarity.

I am unsure what to do with this, but I feel like I did something wrong, since the "pink bars" on my ITS dataset did now show up until I scrolled to the left tail.

Also, I ran the code twice, once with this code: Bacteria-demux-trimmed.qzv (306.0 KB)

*qiime cutadapt trim-paired *
*--i-demultiplexed-sequences Bacteria-demux-paired-end.qza *
*--p-adapter-f GGATTAGATACCCBDGTAGTC *
*--p-front-f GACTACHVGGGTATCTAATCC *
*--p-adapter-r CTGCWGCCNCCCGTAGG *
*--p-front-r CCTACGGGNGGCWGCAG *
*--o-trimmed-sequences Bacteria-demux-trimmed.qza *

Than this code: because when I checked the files I noticed that there were sequences preceding and succeeded the primers. Not sure if this is right, but I gave it a try, but in both cases, I got back the same results. (P.S. I decided to run this, since the previous code gave me a warning something along the lines of "WARNING: One or more of your adapter sequences may be incomplete, ....usually preceded by an A". Per one of your previous comments, this is just a warning and not an error and it should be fine, but I ran it just to see. Bacteria-demux-trimmed-1.qzv (305.9 KB)

*qiime cutadapt trim-paired *

--i-demultiplexed-sequences Bacteria-demux-paired-end.qza *
*--p-adapter-f GGATTAGATACCCBDGTAGTCCCTGACTTGG *
*--p-front-f GGACTACHVGGGTATCTAATCC *
*--p-adapter-r CTGCWGCCNCCCGTAGGC *
*--p-front-r CCTACGGGNGGCWGCAG *
*--o-trimmed-sequences Bacteria-demux-trimmed-1.qza *
--verbose

This is the original file
Bacteria-demux-summary.qzv (300.7 KB)

Nicholas_Bokulich · July 22, 2019, 9:58pm

The warning is just a warning, you can probably ignore because it sounds like you have some sort of non-biological DNA upstream of the primers. Maybe barcode? Or a linker sequence?

Looks like you did not reverse-complement the "W" in your sequence.

But overall I think everything looks fine. Do not worry about the pink warning message in the visualization — look at the length distribution at the bottom to get a better idea. Basically, it looks like cutadapt successfully trimmed your sequences (100% of sequences were 300 nt, after trimming 90% are 270 nt, basically your primers and any upstream adapter). A small fraction of sequences are shorter — could be true biological variation (e.g., chloroplast and mito DNA is amplified by most 16S primers but makes a shorter amplicon I think) but could also be imprecise cutting due to low-quality sequences at the 3' ends...

all in all, I recommend proceeding and letting downstream quality control steps (dada2, taxonomy classification) weed out any abnormalities.

fabipc · July 22, 2019, 11:31pm

Sweet, thank you for the explanation.

I do have a question though, I remember that we also did do the reverse-complement of "W" as "S" for the ITS primers, but after looking at the reverse complement website I was using it says

"Thanks to Joost Kolkman at Maxygen who pointed out that revcomp(S)=S and revcomp(W)=W; in the source above (no longer online), revcomp(S) was W and vice-versa, which is is incorrect. I knew that but hadn't verified".

Since I wasn't sure I asked my labmate, who does a lot of biomolecular work and he said yes the reverse complement of "W" is "W". Is this no longer the case?

Thanks again

Nicholas_Bokulich · July 23, 2019, 12:17am

You are correct — I did not know offhand what the degenerate IUPAC codes are, but the RC of A or T (W) is definitely A or T (W)! So you are right.

system · August 23, 2019, 6:19am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.