Technical support re: Illumina want to use only forward reads

(benjamin w.) #1

Hi all,

Running into a weird problem I’m helping to trouble shoot. We have a run which may have some technical issues, which we are re-running, but wanted to see if we could use only the forward reads to de-multiplex and annotate.

Essentially, the forward and reverse runs are in fastq files and the barcode is in another fastq files. I realized that since the barcode is not in the sequence itself it is not useful if I use cutadapt. The forward read/reverse read fastq/barcode fastq have been performed through the EMP protocol. Illumina MiSeq.

Here’s what I already looked at:

Low quality reads

Only forward reads Deblur

Thanks for any input.



(Nicholas Bokulich) #2

As long as the forward reads and barcodes are okay, you can definitely use just the forward reads.

I (and many others on this forum!) have run into this issue many times. The reverse reads may be too low-quality to use, or may not be long enough (after trimming) to merge with the forward reads.

So go ahead… toss the reverse reads and pretend you have single-end data.


(benjamin w.) #3

Great, I tried this earlier today, but I’m having some issues importing - let me see what I tried to do:

qiime tools import --type MultiplexedSingleEndBarcodeInSequence --input-path ~/QIIME2_fastq_forward/ --output-path ~/QIIME2_3_Demux/ImportedonlyforwardlibA.qza

which is obviously not right, but which import should I use?

qiime tools import --type SampleData[SequencesWithQuality] --input-path ~/QIIME2_fastq_forward/ --output-path ~/QIIME2_3_Demux/ImportedonlyforwardlibASingle.qza


(Nicholas Bokulich) #4

did you import as paired-end already? demux emp-single can actually operate on EMPPairedEndSequences artifacts and will only use the forward reads, so that is one possibility. If your data are multiplexed I am assuming it is an EMP format.


(benjamin w.) #5

Yes, well it does, I will try this, I haven’t thought about running demux off of that imported file! Ben

edit: Do I need to run it with any modifiers if I want only forward reads demultiplexed?



Excuse my dumb question. . . But if I decide to import forward reads only, is it essentially running single - end data?

What if I run the forward and reverse reads separately, both as single - end data? Providing the quality of both are closely related, would I potentially get similar results? (w single fwd vs single rev)

I ran forward reads as single recently and got 2x the amount of features present (maybe more) in each sample.


(Nicholas Bokulich) #7


Not necessarily. One end may be more informative than another. E.g., the forward reads of V4 domain are more variable among species than the reverse end.

more variation = higher diversity. So if you are targeting a domain like V4 that makes sense… there are not actually more species, the reverse reads just contain less variation.




Thanks for the clarity,

It’s the v3-v4 region so that makes perfect sense.

Really appreciate the prompt reply.

So I guess I’ll try running reverse as single - end but it seems like ultimately, a paired - end run would be most accurate, right?


(Nicholas Bokulich) #9

run both as single-end and compare your results. See which gives better classification. More accurate is not the question — more variability, deeper classification is more to the point.


(benjamin w.) #10

Thanks for the help everyone, I’m finding out some other issues re: the run. Ben

Lock it up!


(Jeongsu) #11


I have a question regarding your comment.

Why are forward reads more variable than reverse reads of V4 domain?

Also, in general, the reverse reads have substantially lower quality than the forward ones. What can be the reasons for this issue?

Thank you in advance,


(benjamin w.) #12

Not sure, but I think that @Nicholas_Bokulich may answer. My issues were strictly in the reverse reads during this run actually. I had high base variability on early base pairs. Ben


(Nicholas Bokulich) #13

Higher entropy, i.e., the the sequences of different species are more different in the front end than the back end of the V4. E.g., see this plot (source):

(the V4 primers used are 515f + 806r, so look at those positions on this plot)

Higher entropy is a good thing. It means that you are more likely to be able to differentiate those species based on that sequence. Lower entropy = less likely to differentiate species (more likely to get classification at family level and below, and likely to get fewer ASVs/OTUs, yielding lower sensitivity for differentiation of sample groups)


(Jeongsu) #14

Thank you for your detailed explanations :slight_smile: