I am working with dual-indexed ITS primers which use the 5.8S-FUN and the ITS4-Fun primers (Taylor et al., 2016). Per the sequencing lab, my files have been demultiplexed and both the adapters and indexes have already been removed from both the forwards and reverse ends.
I have imported my files as paired-end demultiplexed files and tried to perform “cut-adapt” by following the [Fungal ITS Qiime tutorial]. Unfortunately, the file resulted in a mean of 54 sequences per sample.
Per the tutorial and per what I have previously done with single indexed sequences, I have the code as follows:
Yes, you need to reverse complement of the degenerate bases. So instead of AGWGATCCRTTGYYRAAAGTT I think you want AGSGATCCRTTGYYRAAAGTT (as the RC of AACTTTYRRCAAYGGATCWCT… looks like the W was not RC-ed in your example).
Also, --p-front-* should be the primer on the 5’ end of the read, then --p-adapter should be the RC of the primer that may be found somewhere to the 3’ end of the sequence if you have primer read-through. It looks like you may have switched these.
Don’t follow the fungal ITS tutorial too litarally; that tutorial uses a mock community in which the reads were swapped around and there may be some other funny details. So you will need to double-check the direction of your reads, e.g., look at the first few sequences manually to see if you can detect the presence of your forward/reverse primers to determine the read direction and appropriate settings for those parameters.
I think not, unless if you switched the read order (to import forward as reverse and vice versa).
I had forgotten to compare the trimmed file to the original file, and in fact, the primer were not trimmed.What a dumb mistake Both files have the exact same sequences, so the primers were in fact not removed.
I re-read the article by Taylor et al., 2016, who designed the primers and he states that, “forwards reads obtained from the Illumina sequencing are in reverse orientation w respect to ribosomal operons.” So I have found the issue but I am unsure about how to fix it. I found this, which says that ITSxpress is not suitable for this type of data. Embedded in that post was this link to one of your github post, but to be honest, I am stumped on how to deal with this. Do you have any idea or suggestions of how I can work through this problem?
Sorry about all the questions, but no one in my lab has worked with this new primers, so I am unsure how to move forward and or have anyone in my lab that I can ask about the issue.
Are you sure the primers are still in the sequences? Could you post a few sequences here as examples?
That all depends a bit on the library prep method, not only the primer sequences themselves. So I recommend hand-checking your sequences to see if you can detect the orientation (easiest way is to find either of the primers, but you could also align against one of the reference sequences in the UNITE database).
It is quite easy, actually (I think). If your data are in the reverse orientation, you can just switch the reads when importing: import the forward as reverse and vice versa. Then these reads are in the orientation that ITSxpress expects, and you can use that plugin to trim instead of cutadapt.
Not a problem! That’s what we are here for. Sorry I don’t have all the answers
I have done a little more experimentation using the mock community in the Taylor et al., 2016 article (I am assuming you followed the protocol described in there so am assuming your sequences are in the same format/orientation though again you should manually check this). A few things I’ve confirmed about the structure:
The “forward” file is definitely in the reverse orientation with respect to the UNITE reference sequences, and the “reverse” file is the forward orientation.
Each read already has the 5’-end primer trimmed (so the reverse in the “forward” file and the forward primer in the “reverse” file).
I cannot find primers on the 3’ ends, but looking back at the Taylor et al. article I see in Fig 1 that the estimated amplicon range for that primer is 257-511 bp. This means that (1) we should not have any primer read-through, but also (2) the reads will probably be too short to merge successfully after low-quality bases are trimmed (this is certainly true for the mock community I am looking at, but maybe your reads are higher quality!). So when I use cutadapt on the mock community reads I do not see any trimming either, but after checking out the sequences and expected lengths I think this is the expected outcome.
I also tried reversing the forward/reverse reads at import, as I recommended above, but I am still having problems with q2-ITSxpress (it runs but outputs an empty file) so I am not sure what is going on there. q2-cutadapt should be fine for our purposes (though it seems unnecessary since there should not be primer read-through if Taylor’s length estimates are accurate)
The bottom line: if your protocol matches the Taylor 2016 protocol, then q2-cutadapt probably should not be trimming your sequences at all. But I recommend double-checking your input sequences just to be sure (see if you can find the forward/reverse primers anywhere in there).
No. It sounds like you are using the same primers as Taylor 2016 but not quite the same protocol:
Taylor 2016 seems to use the PCR primers as the sequencing primer also, so that the primer does not appear in the sequences — or at least that’s what I am seeing in the mock community reads from that study
Your read lengths are longer! (300 nt.) So you could have some primer read-through.
So cutadapt or q2-itsxpress should still be used, though I would not worry if you don’t see too many reads being trimmed at the 3’ end by q2-itsxpress since read through will only occur for species with shorter amplicons (and it is not clear to me whether the amplicon lengths cited by Taylor 2016 include the primers or not in the length calculations; if not then you will have no primer read-through).
Have you tried swapping the forward and reverse reads when you import to see if this allows you to run q2-itsxpress on your data? I would be curious if that solution I proposed actually works.
w the switch in the "W" to the "S" for correct reverse compliment left me with the exact same output that was shared before (same number of sequences) Fungal-demux-trimmed-2.qzv (302.0 KB), with the exception of the tails which appear better in the demux.qza than demux-2.qza.
Not sure how to proceed now
Additionally, I thought I could figure out how to import the reads in reverse order, F reads as R reads, and vice versa but turns out I do not understand it. tried to see if there was a way of importing them so that the reads were reverse order but I realized that qiime tools import does not give me that option. I am thinking of manually renaming the files, just to see, but I wanted to check with you and see if there was an easier way and I am just not seeing it.