I am using a sequencing company for Illumina MiSeq 2x300bp paired-end 16S rRNA amplicon sequencing (V3/4 region, circa 450 - 500 bp) . This company is using an unusual sequencing strategy: They do not use long concatamer primer as part of illumina data, but create actual libraries out of each individual amplicons. The results are two raw R1.fastq and R2.fastq files (on basespace) in which the forward (5’-3’) and reverse (3’-5’) reads are mixed up. Half of the sequences in each file start with a barcode, followed by the forward primer, followed by the forward sequence, whereas the other half of the sequences start with a reverse primer followed by the reverse sequence. Please correct me, but so far I do not see a nice way to import this into qiime2? The sequencing company suggests (and they do this also in their own data analysis pipeline) to join the reads with qiime1 (join_paired_ends.py) and then to re-orientate all reads in forward direction and remove barcodes (extract_barcodes.py). In the resulting fastq file, the sequences are multiplexed, the forward primer and reverse primer are still present, and the barcodes were extracted into an additional fastq file. From here on, I can import the re-orientated and joined reads in qiime2 using the EMPSingleEndSequences protocol (as suggested a couple of days ago: How to demultiplex fastq file that still includes Barcodes and LinkerPrimer?).
However, there are some minor issues:
Is it somehow possible to import these forward/reverse mixed-up R1.fastq R2.fastq files using qiime2? (so I could use DADA2 or q2-vsearch for joining of the reads without the need of qiime1)
Is it possible to detect the reverse primer, trim it of and delete all the sequences that do not have a correctly matching reverse primer? Either by using a fastq file with joined reads, or after importing the fastq into qiime2? For qiime1, there was the truncate_reverse_primer.py plugin, however this works only with fasta and not fastq.
The same as in (2) would also be nice for the forward primer: With DADA2 I can trim of the first bases that in most cases correspond to the forward primer. However, in some instances, the forward primer is incorrect and I would rather like to delete the whole sequence, instead of trimming it.
off note: yesterday, a nice tutorial for “Analyzing paired end reads in QIIME2” was published (Analyzing paired end reads in QIIME 2). This was really helpful. Maybe it would be nice to add a comment about the reverse primer issue and the importing of multiplexed fastq data in this tutorial?