Cut-adapt for ITS primers w degenerative bases

Hello,

I am working with dual-indexed ITS primers which use the 5.8S-FUN and the ITS4-Fun primers (Taylor et al., 2016). Per the sequencing lab, my files have been demultiplexed and both the adapters and indexes have already been removed from both the forwards and reverse ends.

I have imported my files as paired-end demultiplexed files and tried to perform “cut-adapt” by following the [Fungal ITS Qiime tutorial]. Unfortunately, the file resulted in a mean of 54 sequences per sample.

Per the tutorial and per what I have previously done with single indexed sequences, I have the code as follows:

qiime cutadapt trim-paired
–i-demultiplexed-sequences Fungal-demux-paired-end.qza
–p-adapter-f AGCCTCCGCTTATTGATATGCTTAART
–p-front-f AGWGATCCRTTGYYRAAAGTT \
–p-adapter-r AACTTTYRRCAAYGGATCWCT \
–p-front-r AYTTAAGCATATCAATAAGCGGAGGCT
–o-trimmed-sequences Fungal-demux-trimmed-new.qza

As you can see, my primers contain degenerative bases, and they were kept when I performed the reverse compliment. Could this be a problem? If so how can I deal with this?

Could it be the way I imported the data? Not sure if this step differs since I am working w dual-indexed primers.

Thanks again for all your help

Yes, you need to reverse complement of the degenerate bases. So instead of AGWGATCCRTTGYYRAAAGTT I think you want AGSGATCCRTTGYYRAAAGTT (as the RC of AACTTTYRRCAAYGGATCWCT... looks like the W was not RC-ed in your example).

Also, --p-front-* should be the primer on the 5' end of the read, then --p-adapter should be the RC of the primer that may be found somewhere to the 3' end of the sequence if you have primer read-through. It looks like you may have switched these.

Don't follow the fungal ITS tutorial too litarally; that tutorial uses a mock community in which the reads were swapped around and there may be some other funny details. So you will need to double-check the direction of your reads, e.g., look at the first few sequences manually to see if you can detect the presence of your forward/reverse primers to determine the read direction and appropriate settings for those parameters.

I think not, unless if you switched the read order (to import forward as reverse and vice versa).

Try this and let me know how it goes:

qiime cutadapt trim-paired \
  --i-demultiplexed-sequences Fungal-demux-paired-end.qza \
  --p-adapter-f AYTTAAGCATATCAATAAGCGGAGGCT \
  --p-front-f AACTTTYRRCAAYGGATCWCT \
  --p-adapter-r AGWGATCCRTTGYYRAAAGTT \
  --p-front-r AGCCTCCGCTTATTGATATGCTTAART \
  --o-trimmed-sequences Fungal-demux-trimmed-new.qza

and if that doesn't work, try this (switching the forward and reverse read primers):

qiime cutadapt trim-paired \
  --i-demultiplexed-sequences Fungal-demux-paired-end.qza \
  --p-adapter-r AYTTAAGCATATCAATAAGCGGAGGCT \
  --p-front-r AACTTTYRRCAAYGGATCWCT \
  --p-adapter-f AGWGATCCRTTGYYRAAAGTT \
  --p-front-f AGCCTCCGCTTATTGATATGCTTAART \
  --o-trimmed-sequences Fungal-demux-trimmed-new.qza

It worked! I now have a mean of 120877 sequences.

I cannot believe I put the primers in the wrong spot. Thank you for catching my mistake. I ran the code as you suggested, and made sure that the “W” was in fact changed to a “S”.

1 Like

Just to be sure, you should check this against the demultiplexed but untrimmed sequences. If you know you have primers in the reads, you will want to make sure that these were in fact trimmed!

Hi Nicholas,

I had forgotten to compare the trimmed file to the original file, and in fact, the primer were not trimmed.What a dumb mistake :confused: Both files have the exact same sequences, so the primers were in fact not removed.

I re-read the article by Taylor et al., 2016, who designed the primers and he states that, “forwards reads obtained from the Illumina sequencing are in reverse orientation w respect to ribosomal operons.” So I have found the issue but I am unsure about how to fix it. I found this, which says that ITSxpress is not suitable for this type of data. Embedded in that post was this link to one of your github post, but to be honest, I am stumped on how to deal with this. Do you have any idea or suggestions of how I can work through this problem?

Sorry about all the questions, but no one in my lab has worked with this new primers, so I am unsure how to move forward and or have anyone in my lab that I can ask about the issue.

Are you sure the primers are still in the sequences? Could you post a few sequences here as examples?

That all depends a bit on the library prep method, not only the primer sequences themselves. So I recommend hand-checking your sequences to see if you can detect the orientation (easiest way is to find either of the primers, but you could also align against one of the reference sequences in the UNITE database).

It is quite easy, actually (I think). If your data are in the reverse orientation, you can just switch the reads when importing: import the forward as reverse and vice versa. Then these reads are in the orientation that ITSxpress expects, and you can use that plugin to trim instead of cutadapt.

Not a problem! That's what we are here for. Sorry I don't have all the answers :fearful:

I have done a little more experimentation using the mock community in the Taylor et al., 2016 article (I am assuming you followed the protocol described in there so am assuming your sequences are in the same format/orientation though again you should manually check this). A few things I’ve confirmed about the structure:

  1. The “forward” file is definitely in the reverse orientation with respect to the UNITE reference sequences, and the “reverse” file is the forward orientation.
  2. Each read already has the 5’-end primer trimmed (so the reverse in the “forward” file and the forward primer in the “reverse” file).

I cannot find primers on the 3’ ends, but looking back at the Taylor et al. article I see in Fig 1 that the estimated amplicon range for that primer is 257-511 bp. This means that (1) we should not have any primer read-through, but also (2) the reads will probably be too short to merge successfully after low-quality bases are trimmed (this is certainly true for the mock community I am looking at, but maybe your reads are higher quality!). So when I use cutadapt on the mock community reads I do not see any trimming either, but after checking out the sequences and expected lengths I think this is the expected outcome.

I also tried reversing the forward/reverse reads at import, as I recommended above, but I am still having problems with q2-ITSxpress (it runs but outputs an empty file) so I am not sure what is going on there. q2-cutadapt should be fine for our purposes (though it seems unnecessary since there should not be primer read-through if Taylor’s length estimates are accurate)

The bottom line: if your protocol matches the Taylor 2016 protocol, then q2-cutadapt probably should not be trimming your sequences at all. But I recommend double-checking your input sequences just to be sure (see if you can find the forward/reverse primers anywhere in there).

Hi Nicholas

Thank you for looking into this problem with such detail.

I opened two of the sequence files to look for the primers manually, and you are correct, the reverse compliment of the primers are not present in either the R1 (F) or R2 (R) files.

Both the ITS4-fun and 5.8Fun primers are still present, in the R1 file and R2 files, respectively.

Just to make sure I am understanding correctly,given the protocol there should be no trimming of my sequences using cut-adapt, so I should be "fine" to proceed with DADA2, is that right?

I am attaching a copy of Fungal-demux-trimmed.qzv, as well as the original file, Fungal-demux-summary.qzv in case you want to take a look.

Fungal-demux-summary.qzv (296.2 KB)
Fungal-demux-trimmed.qzv (301.1 KB)

Again, thank you very much for all your help :smile:

So you should use the second command I listed above (did you try this already?):

qiime cutadapt trim-paired \
  --i-demultiplexed-sequences Fungal-demux-paired-end.qza \
  --p-adapter-r AYTTAAGCATATCAATAAGCGGAGGCT \
  --p-front-r AACTTTYRRCAAYGGATCWCT \
  --p-adapter-f AGWGATCCRTTGYYRAAAGTT \
  --p-front-f AGCCTCCGCTTATTGATATGCTTAART \
  --o-trimmed-sequences Fungal-demux-trimmed-new.qza

No. It sounds like you are using the same primers as Taylor 2016 but not quite the same protocol:

  1. Taylor 2016 seems to use the PCR primers as the sequencing primer also, so that the primer does not appear in the sequences — or at least that's what I am seeing in the mock community reads from that study
  2. Your read lengths are longer! (300 nt.) So you could have some primer read-through.

So cutadapt or q2-itsxpress should still be used, though I would not worry if you don't see too many reads being trimmed at the 3' end by q2-itsxpress since read through will only occur for species with shorter amplicons (and it is not clear to me whether the amplicon lengths cited by Taylor 2016 include the primers or not in the length calculations; if not then you will have no primer read-through).

Have you tried swapping the forward and reverse reads when you import to see if this allows you to run q2-itsxpress on your data? I would be curious if that solution I proposed actually works.

We are nearly there! :mushroom:

The second code you suggested is running now. I forgot I was running the code last night before I went to bed, and closed my laptop so the run stopped. I will let you know how this works out.

I also have another computer running w the new codes, after swapping the F and R reads and re-importing the files to Qiime2, so I’ll keep you posted.

Thanks :slight_smile:

1 Like

Okay so, running the second code:

qiime cutadapt trim-paired
--i-demultiplexed-sequences Fungal-demux-paired-end.qza
--p-adapter-r AYTTAAGCATATCAATAAGCGGAGGCT
--p-front-r AACTTTYRRCAAYGGATCWCT
--p-adapter-f AGSGATCCRTTGYYRAAAGTT
--p-front-f AGCCTCCGCTTATTGATATGCTTAART
--o-trimmed-sequences Fungal-demux-trimmed-2.qza
--verbose

w the switch in the "W" to the "S" for correct reverse compliment left me with the exact same output that was shared before (same number of sequences) Fungal-demux-trimmed-2.qzv (302.0 KB), with the exception of the tails which appear better in the demux.qza than demux-2.qza.

Not sure how to proceed now :confused:

Additionally, I thought I could figure out how to import the reads in reverse order, F reads as R reads, and vice versa but turns out I do not understand it. tried to see if there was a way of importing them so that the reads were reverse order but I realized that qiime tools import does not give me that option. I am thinking of manually renaming the files, just to see, but I wanted to check with you and see if there was an easier way and I am just not seeing it.

Success! Scroll down to the sequence length summary. You have successfully trimmed the primers from the sequences!

I am not sure what format you have, but here goes:

  1. manifest format: just swap the forward/reverse designations in the appropriate column of the manifest
  2. casava format: use the manifest format to import
  3. EMP or a multiplexed format: rename the files (switch "forward" and "reverse")

I would be interested in hearing if swapping on import allows you to use q2-itsxpress successfully, but for your purposes you can proceed with your freshly trimmed sequences.

Success! :mushroom: :scream: :mushroom:

wooo!!! You are the best! :1st_place_medal:

I will let the rest of my lab know so that you don't receive a million of messages. Although, with all the questions I've asked, anyone can replicate the analysis now.

I have demultiplexed files, casava format, so I will definitely test it out and let you know :smiley:

Again, thank you for everything

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.