I am using a sequencing company for Illumina MiSeq 2x300bp paired-end 16S rRNA amplicon sequencing (V3/4 region, circa 450 - 500 bp) . This company is using an unusual sequencing strategy: They do not use long concatamer primer as part of illumina data, but create actual libraries out of each individual amplicons. The results are two raw R1.fastq and R2.fastq files (on basespace) in which the forward (5’-3’) and reverse (3’-5’) reads are mixed up. Half of the sequences in each file start with a barcode, followed by the forward primer, followed by the forward sequence, whereas the other half of the sequences start with a reverse primer followed by the reverse sequence. Please correct me, but so far I do not see a nice way to import this into qiime2? The sequencing company suggests (and they do this also in their own data analysis pipeline) to join the reads with qiime1 (join_paired_ends.py) and then to re-orientate all reads in forward direction and remove barcodes (extract_barcodes.py). In the resulting fastq file, the sequences are multiplexed, the forward primer and reverse primer are still present, and the barcodes were extracted into an additional fastq file. From here on, I can import the re-orientated and joined reads in qiime2 using the EMPSingleEndSequences protocol (as suggested a couple of days ago: How to demultiplex fastq file that still includes Barcodes and LinkerPrimer?).
However, there are some minor issues:
Is it somehow possible to import these forward/reverse mixed-up R1.fastq R2.fastq files using qiime2? (so I could use DADA2 or q2-vsearch for joining of the reads without the need of qiime1)
Is it possible to detect the reverse primer, trim it of and delete all the sequences that do not have a correctly matching reverse primer? Either by using a fastq file with joined reads, or after importing the fastq into qiime2? For qiime1, there was the truncate_reverse_primer.py plugin, however this works only with fasta and not fastq.
The same as in (2) would also be nice for the forward primer: With DADA2 I can trim of the first bases that in most cases correspond to the forward primer. However, in some instances, the forward primer is incorrect and I would rather like to delete the whole sequence, instead of trimming it.
off note: yesterday, a nice tutorial for “Analyzing paired end reads in QIIME2” was published (Analyzing paired end reads in QIIME 2). This was really helpful. Maybe it would be nice to add a comment about the reverse primer issue and the importing of multiplexed fastq data in this tutorial?
That is quite an involved workflow! We’re still working on a couple of upstream steps (e.g. extract_barcodes.py) so a lot of this is kind of hypothetical:
There is a format for paired-end multiplexed data, however it does presume that your barcodes are in a separate file, which isn’t the case without something like extract_barcodes.py.
This is actually something we are looking to have implemented soon, this same situation generally exists for ITS data where your reverse primer ends up on your forward read. I don’t think we had really expected to filter via the reverse primer, but it’s an interesting idea (though not relevant to ITS which is what we’ve mostly been thinking about). We’re thinking about a cut-adapt plugin, to handle both extract_barcodes.py and truncate_reverse_primer.py, so this could probably fit into that somewhere at some point.
That makes sense, perhaps we could have something that could filter sequences based on primers/adapters which could also handle your reverse-primer filtering scenario.
In short, we basically don’t have any of those pieces in QIIME 2 yet, but we’re working on it! Thanks for letting us know this is something you need!
To highlight a little bit the issue I have with the reverse primers, I added a screenshot. This screenshot shows the last bases of my 16S rRNA sequences after using DADA2. The reverse primer is marked. The sequences were generated using primers for the V3/V4 region (S-D-Bact-0341-b-S-17 and S-D-Bact-0785-a-A-21, Klindworth et al., 2012). The resulting sequences differ slightly in size, which prevents a simple trimming of the last nucleotides.
Further, in some sequences the reverse primer is missing, and I would like to delete these sequences (I guess there were some problems with the joining of the paired-end sequences or the re-orientation).
Would love to see this problem solved in a future release. Thanks for the great work you people do!
QIIME 2 2017.12 is now out and it includes a cutadapt plugin, for assisting with demultiplexing and trimming adapter sequence. A community tutorial is still in the works, so keep an eye on the release announcement for that!
Hi @Martin! Sorry to hear things aren’t going well
Strange - when you ran the command, did it include that trailing slash, like you provided above?
If so then the command never actually executed - your shell was just sitting there waiting for your next command. If you saw nothing printed back to your terminal (stdout/stderr), this seems like the most likely culprit to me. If this isn’t the case, can you please provide any of the output that is generated (when running with --verbose)? Thanks!
Ooops , you are totally right, this was a stupid mistake.
Now cutadapt works perfectly fine! Great work.
I just encountered one issue: I tried to remove forward and reverse primer simultaneously using the --p-front and the --p-adapter flag together. However, when doing so, only the reverse primer was removed. To solve this, I had the remove the forward primer first, followed by a second run in which I removed the reverse primer. Is this expected? Or a bug?
It is possible to specify more than one adapter sequence by using the options -a, -b and -g more than once. Any combination is allowed, such as five -a adapters and two -g adapters. Each read will be searched for all given adapters, but only the best matching adapter is removed.
to remove the 5’ primer (--p-front) and the 3’ primer (--pp-adapter) simultaneously, applying --p-times 2 worked excellent!
As you mentioned, cutadapt has the interesting linked-primers feature. According to the docs, this trims primers only, if the 5’ forward AND the 3’ reverse primers are matching. This feature is not mentioned in the qiime 2 cutadapt docs. Is it already possible to use the cutadapt linked-primer feature in qiime2, or do I have to use cutadapt “stand alone” (which, of course, does not except qiime2 artifacts).
In the cutadpt docs, I also found that a --discard-untrimmed feature (or --untrimmed-output FILE) is implemented, allowing to remove sequences without matching adapters (besides only trimming off matching adapters). Is it planned to make this feature also available for qiime2?
Thanks @Martin and @thermokarst for bringing up and flagging the --discard-untrimmed option of cutadapt. At least for specific case of primer trimming, I recommend that this flag always be used. If not, you’ll have some off-set OTU sequences if length trimming is involved later on.
My collaborators and I use --discard-untrimmed as an additional form of sequence quality control. Anyway, I just wanted to voice my strong support that this option be added to the cutadapt plugin.
Otherwise, thank you much for including cutadapt as a QIIME2 plugin! Great work!