Technical support re: Illumina want to use only forward reads

ben · March 29, 2019, 9:22pm

Hi all,

Running into a weird problem I'm helping to trouble shoot. We have a run which may have some technical issues, which we are re-running, but wanted to see if we could use only the forward reads to de-multiplex and annotate.

Essentially, the forward and reverse runs are in fastq files and the barcode is in another fastq files. I realized that since the barcode is not in the sequence itself it is not useful if I use cutadapt. The forward read/reverse read fastq/barcode fastq have been performed through the EMP protocol. Illumina MiSeq.

Here's what I already looked at:

Low quality reads

Only forward reads Deblur

Thanks for any input.

Ben

Nicholas_Bokulich · March 29, 2019, 9:33pm

As long as the forward reads and barcodes are okay, you can definitely use just the forward reads.

I (and many others on this forum!) have run into this issue many times. The reverse reads may be too low-quality to use, or may not be long enough (after trimming) to merge with the forward reads.

So go ahead... toss the reverse reads and pretend you have single-end data.

ben · March 29, 2019, 9:39pm

Great, I tried this earlier today, but I'm having some issues importing - let me see what I tried to do:

qiime tools import --type MultiplexedSingleEndBarcodeInSequence --input-path ~/QIIME2_fastq_forward/ --output-path ~/QIIME2_3_Demux/ImportedonlyforwardlibA.qza

which is obviously not right, but which import should I use?

qiime tools import --type SampleData[SequencesWithQuality] --input-path ~/QIIME2_fastq_forward/ --output-path ~/QIIME2_3_Demux/ImportedonlyforwardlibASingle.qza

Nicholas_Bokulich · March 29, 2019, 9:45pm

did you import as paired-end already? demux emp-single can actually operate on EMPPairedEndSequences artifacts and will only use the forward reads, so that is one possibility. If your data are multiplexed I am assuming it is an EMP format.

ben · March 29, 2019, 9:52pm

Yes, well it does, I will try this, I haven't thought about running demux off of that imported file! Ben

edit: Do I need to run it with any modifiers if I want only forward reads demultiplexed?

sabasu · March 30, 2019, 1:32am

Excuse my dumb question. . . But if I decide to import forward reads only, is it essentially running single - end data?

What if I run the forward and reverse reads separately, both as single - end data? Providing the quality of both are closely related, would I potentially get similar results? (w single fwd vs single rev)

I ran forward reads as single recently and got 2x the amount of features present (maybe more) in each sample.

Nicholas_Bokulich · March 30, 2019, 1:38am

Correct

Not necessarily. One end may be more informative than another. E.g., the forward reads of V4 domain are more variable among species than the reverse end.

more variation = higher diversity. So if you are targeting a domain like V4 that makes sense... there are not actually more species, the reverse reads just contain less variation.

sabasu · March 30, 2019, 1:47am

@Nicholas_Bokulich

Thanks for the clarity,

It's the v3-v4 region so that makes perfect sense.

Really appreciate the prompt reply.

So I guess I'll try running reverse as single - end but it seems like ultimately, a paired - end run would be most accurate, right?

Nicholas_Bokulich · March 30, 2019, 3:42am

run both as single-end and compare your results. See which gives better classification. More accurate is not the question — more variability, deeper classification is more to the point.

ben · March 30, 2019, 2:56pm

Thanks for the help everyone, I'm finding out some other issues re: the run. Ben

Lock it up!

Jeongsu_Kim · April 15, 2019, 2:26pm

Hi,

I have a question regarding your comment.

Why are forward reads more variable than reverse reads of V4 domain?

Also, in general, the reverse reads have substantially lower quality than the forward ones. What can be the reasons for this issue?

Thank you in advance,
Sue

ben · April 15, 2019, 2:35pm

Not sure, but I think that @Nicholas_Bokulich may answer. My issues were strictly in the reverse reads during this run actually. I had high base variability on early base pairs. Ben

Nicholas_Bokulich · April 15, 2019, 2:45pm

Higher entropy, i.e., the the sequences of different species are more different in the front end than the back end of the V4. E.g., see this plot (source):

(the V4 primers used are 515f + 806r, so look at those positions on this plot)

Higher entropy is a good thing. It means that you are more likely to be able to differentiate those species based on that sequence. Lower entropy = less likely to differentiate species (more likely to get classification at family level and below, and likely to get fewer ASVs/OTUs, yielding lower sensitivity for differentiation of sample groups)

Jeongsu_Kim · April 16, 2019, 3:54am

Thank you for your detailed explanations

system · May 17, 2019, 9:54am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.