Import multiplexed R1.fastq and R2.fastq with mixed forward and reverse reads + truncate reverse primer

Martin · December 1, 2017, 8:09pm

Dear community,

I am using a sequencing company for Illumina MiSeq 2x300bp paired-end 16S rRNA amplicon sequencing (V3/4 region, circa 450 - 500 bp) . This company is using an unusual sequencing strategy: They do not use long concatamer primer as part of illumina data, but create actual libraries out of each individual amplicons. The results are two raw R1.fastq and R2.fastq files (on basespace) in which the forward (5'-3') and reverse (3'-5') reads are mixed up. Half of the sequences in each file start with a barcode, followed by the forward primer, followed by the forward sequence, whereas the other half of the sequences start with a reverse primer followed by the reverse sequence. Please correct me, but so far I do not see a nice way to import this into qiime2? The sequencing company suggests (and they do this also in their own data analysis pipeline) to join the reads with qiime1 (join_paired_ends.py) and then to re-orientate all reads in forward direction and remove barcodes (extract_barcodes.py). In the resulting fastq file, the sequences are multiplexed, the forward primer and reverse primer are still present, and the barcodes were extracted into an additional fastq file. From here on, I can import the re-orientated and joined reads in qiime2 using the EMPSingleEndSequences protocol (as suggested a couple of days ago: How to demultiplex fastq file that still includes Barcodes and LinkerPrimer?).

However, there are some minor issues:

Is it somehow possible to import these forward/reverse mixed-up R1.fastq R2.fastq files using qiime2? (so I could use DADA2 or q2-vsearch for joining of the reads without the need of qiime1)
Is it possible to detect the reverse primer, trim it of and delete all the sequences that do not have a correctly matching reverse primer? Either by using a fastq file with joined reads, or after importing the fastq into qiime2? For qiime1, there was the truncate_reverse_primer.py plugin, however this works only with fasta and not fastq.
The same as in (2) would also be nice for the forward primer: With DADA2 I can trim of the first bases that in most cases correspond to the forward primer. However, in some instances, the forward primer is incorrect and I would rather like to delete the whole sequence, instead of trimming it.

off note: yesterday, a nice tutorial for "Analyzing paired end reads in QIIME2" was published (Analyzing paired end reads in QIIME 2). This was really helpful. Maybe it would be nice to add a comment about the reverse primer issue and the importing of multiplexed fastq data in this tutorial?

Best,
Martin

ebolyen · December 1, 2017, 10:58pm

Hi @Martin!

That is quite an involved workflow! We're still working on a couple of upstream steps (e.g. extract_barcodes.py) so a lot of this is kind of hypothetical:

There is a format for paired-end multiplexed data, however it does presume that your barcodes are in a separate file, which isn't the case without something like extract_barcodes.py.

This is actually something we are looking to have implemented soon, this same situation generally exists for ITS data where your reverse primer ends up on your forward read. I don't think we had really expected to filter via the reverse primer, but it's an interesting idea (though not relevant to ITS which is what we've mostly been thinking about). We're thinking about a cut-adapt plugin, to handle both extract_barcodes.py and truncate_reverse_primer.py, so this could probably fit into that somewhere at some point.

That makes sense, perhaps we could have something that could filter sequences based on primers/adapters which could also handle your reverse-primer filtering scenario.

In short, we basically don't have any of those pieces in QIIME 2 yet, but we're working on it! Thanks for letting us know this is something you need!

Martin · December 4, 2017, 3:52pm

Thanks for the information and processing.

To highlight a little bit the issue I have with the reverse primers, I added a screenshot. This screenshot shows the last bases of my 16S rRNA sequences after using DADA2. The reverse primer is marked. The sequences were generated using primers for the V3/V4 region (S-D-Bact-0341-b-S-17 and S-D-Bact-0785-a-A-21, Klindworth et al., 2012). The resulting sequences differ slightly in size, which prevents a simple trimming of the last nucleotides.

Further, in some sequences the reverse primer is missing, and I would like to delete these sequences (I guess there were some problems with the joining of the paired-end sequences or the re-orientation).

Would love to see this problem solved in a future release. Thanks for the great work you people do!

Best,

Martin

thermokarst · December 22, 2017, 6:00pm

QIIME 2 2017.12 is now out and it includes a cutadapt plugin, for assisting with demultiplexing and trimming adapter sequence. A community tutorial is still in the works, so keep an eye on the release announcement for that!

Martin · December 27, 2017, 3:19pm

Hi there,

I am very excited to see that qiime2 2017/12 includes a new feature to trim forward and reverse primers. Apparently, you were quite busy during Christmas.

I was testing the preliminary q2-cutadapt tutorial, which works fine

Now I wanted to trim the forward primer from my demultiplexed sequences using "qiime cutadapt trim-single", however, this did not worked so far. I used to following command:

qiime cutadapt trim-single
--i-demultiplexed-sequences DemuxSeq.qza
--p-cores 4
--p-front CCTACGGGNGGCWGCAG
--p-match-read-wildcards
--p-match-adapter-wildcards
--p-error-rate 0
--o-trimmed-sequences trimmed-seqs.qza
--verbose \

The command does not return any error, it is simply running forever. However, the CPU is idle and no memory is used. I stopped it after a while with ctrl + c.

In the following you see a screenshot, showing the first bases of my sequences (including forward primer: CCTACGGGNGGCWGCAG) in fastq format.

Well... its open for discussion and suggestions

thermokarst · December 27, 2017, 5:06pm

Hi @Martin! Sorry to hear things aren't going well

Strange - when you ran the command, did it include that trailing slash, like you provided above?

If so then the command never actually executed - your shell was just sitting there waiting for your next command. If you saw nothing printed back to your terminal (stdout/stderr), this seems like the most likely culprit to me. If this isn't the case, can you please provide any of the output that is generated (when running with --verbose)? Thanks!

Martin · December 29, 2017, 11:22am

Ooops , you are totally right, this was a stupid mistake.

Now cutadapt works perfectly fine! Great work.

I just encountered one issue: I tried to remove forward and reverse primer simultaneously using the --p-front and the --p-adapter flag together. However, when doing so, only the reverse primer was removed. To solve this, I had the remove the forward primer first, followed by a second run in which I removed the reverse primer. Is this expected? Or a bug?

Best,
Martin

thermokarst · December 29, 2017, 1:35pm

Hi @Martin! Glad to hear you got moving forward!

This is expected! From the cutadapt docs:

It is possible to specify more than one adapter sequence by using the options -a, -b and -g more than once. Any combination is allowed, such as five -a adapters and two -g adapters. Each read will be searched for all given adapters, but only the best matching adapter is removed.

If you want to remove more than one adapter at once:

By default, at most one adapter sequence is removed from each read, even if multiple adapter sequences were provided. This can be changed by using the --times option

So, if you bump up the --p-times value to 2, that sounds like it should work for you.

Another option is to experiment with linked-primers, which might be more appropriate, given what you are trying to do.

Let us know how it goes!

Martin · January 1, 2018, 6:35pm

Hi,

to remove the 5' primer (--p-front) and the 3' primer (--pp-adapter) simultaneously, applying --p-times 2 worked excellent!

As you mentioned, cutadapt has the interesting linked-primers feature. According to the docs, this trims primers only, if the 5' forward AND the 3' reverse primers are matching. This feature is not mentioned in the qiime 2 cutadapt docs. Is it already possible to use the cutadapt linked-primer feature in qiime2, or do I have to use cutadapt "stand alone" (which, of course, does not except qiime2 artifacts).

In the cutadpt docs, I also found that a --discard-untrimmed feature (or --untrimmed-output FILE) is implemented, allowing to remove sequences without matching adapters (besides only trimming off matching adapters). Is it planned to make this feature also available for qiime2?

Best,
Martin

thermokarst · January 2, 2018, 10:50pm

Awesome!

The cutadapt docs indicate that linked primers use a special ... syntax with the existing flags: ADAPTER1...ADAPTER2. So theoretically, you should be able to do something like this with q2-cutadapt:

$ qiime cutadapt trim-single \
  --i-demultiplexed-sequences DemuxSeq.qza \
  --p-cores 4 \
  --p-front 'CCTACGGGNGGCWGCAG...ADAPTER2SEQUENCE' \
  --p-match-read-wildcards \
  --p-match-adapter-wildcards \
  --p-error-rate 0 \
  --o-trimmed-sequences trimmed-seqs.qza \
  --verbose

I have opened an issue to remind us to add some more documentation regarding linked primers!

Good point! I opened an issue to put this on our radar - thanks!

Martin · January 3, 2018, 11:21pm

Hi @thermokarst,

I tested

--p-front CCTACGGGNGGCWGCAG...GGATTAGATACCCBDGTAGTC

and

--p-adapter CCTACGGGNGGCWGCAG...GGATTAGATACCCBDGTAGTC

and both worked!

I also played around with "anchoring" the primers using something like this:

--p-front ^CCTACGGGNGGCWGCAG...GGATTAGATACCCBDGTAGTC$

which resulted in trimming of less sequences.

I am still figuring out the consequences of anchoring, but it is quite well explained in the cutadapt docs here and here and I think I am getting there...

Thanks for forwarding the issues with the documentation and filtering of untrimmed sequences!

Cheers,
Martin

SoilRotifer · January 17, 2018, 10:10pm

Thanks @Martin and @thermokarst for bringing up and flagging the --discard-untrimmed option of cutadapt. At least for specific case of primer trimming, I recommend that this flag always be used. If not, you'll have some off-set OTU sequences if length trimming is involved later on.

My collaborators and I use --discard-untrimmed as an additional form of sequence quality control. Anyway, I just wanted to voice my strong support that this option be added to the cutadapt plugin.

Otherwise, thank you much for including cutadapt as a QIIME2 plugin! Great work!

thermokarst · February 16, 2018, 4:05pm

QIIME 2 2018.2 is out now, and it includes some revisions to the help text changes described above. We haven't gotten to the discard-untrimmed request just yet, so please stay tuned!

system · March 19, 2018, 10:13pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.