Concatenate R1 and R2 for reads that can't join

colinbrislawn · July 16, 2022, 2:22pm

EDIT: This is now tracked by Expose `fastq_join` command to concatenate (not merge) PE reads · Issue #98 · qiime2/q2-vsearch · GitHub

I've got some amplicon reads that do not, and should not, overlap.

These amplicons are continuous, but the region sequenced is not.
(Meaning it's unlike the multi-region Ion Torrent kit.)

Some programs support aligning non-overlapping paired-ends reads without joining them.

Vsearch supports --fastq_join in which, "sequences are not merged as with the fastq_mergepairs command, but simply joined with a gap."

|------------------------| Full amplicon
|---------->               R1 
              <----------| R2
|----------nnnn----------| Concatenated (no overlap)

Does Qiime2 support discontiguous reads these days? Looks like it didn't in 2019:

What parts of the pipeline could be configured to support a read with NNNs in the middle?

Is this a use-case for q2-sidle, or is that for seperate amplicons instead of amplicons with gaps in the middle?

jwdebelius · July 16, 2022, 9:30pm

Hi @colinbrislawn,

I'm trying to work through how this works. So, it's something where you're missing the end fo the reads and cannot overlap them, but theoretically, if say, your sequencing was long enough, you could? Or like, you have an amplicon from the 16S and ITS gene that you're trying to combine?

I think the first case could work with Sidle - the current tutorial actually uses forward and reverse reads from the same primer pair as one fo the regions amplified. Whether or not they overlap is irrelevant to the processing.

There are a few caveats though, in my mind

The quality of the alignment goes down based on the number of sequences in your database that are the same over a region. You might chose to dereplicate your database if you're working in a smaller subset or region.
Sidle's alignment parameters tend to be pretty stringent, and that can cause some weirdness and you may lose reads. I'm still working on an "accounting" function that would tell you how mnay you've lost, and I have so many things I'm trying to do right now.
You will likely want to build your table in "average" mode (default) where the read "depth" is provided based on the average sequencing depth over all regions, which is probably more appropriate if you're merging forward and reverse reads.

If you try it, please keep let us know how it works?

Best,
Justine

colinbrislawn · July 17, 2022, 1:50pm

Correct. Like a 16S V4 amplicon with 100 bp paired-end reads (50 bp gap).

Thank you for telling me more about Sidle. I'll look into that!

Mehrbod_Estaki · July 18, 2022, 7:07am

As far as I know there isn't anything in the core plugins still. Outside of Sidle, exposing vsearch's option in the Q2 plugin or even DADA2's justConcatenate in mergePairs would be the easiest Q2 adaptation in a future release I would think.

Nicholas_Bokulich · July 18, 2022, 8:03am

Correct

This would be easy and probably less likely to cause issues downstream than some other options (it would break dada2 and break classify-sklearn but would work for OTU clustering and alignment). One hang-up is to specify semantic types accordingly to prevent passing this to dada2 etc.

this has been discussed a bit on the forum and off — it could lead to serious issues downstream with taxonomy classifcation, alignment/phylogeny building, and maybe elsewhere. So exposing it would be easy, but this is not advisable (unless if a new semantic type is created for such an output, which would not be compatible with many downstream steps).

This is something that has been on my radar to look into for quite some time, so maybe later this year I can explore if others do not beat me to it

Sounds like a promising secondary use case for q2-sidle!

Mehrbod_Estaki · July 19, 2022, 7:09am

100% @Nicholas_Bokulich !

This is how I saw it in my head as well

dnlrx · April 17, 2024, 5:05am

Hi there,

just wondering if there is now a way to join non-overlapping paired reads in q2?

All best,
Daniel

colinvwood · April 17, 2024, 4:49pm

Hello @dnlrx,

I believe this is still not possible, unfortunately.

fenny · August 7, 2024, 10:56am

@colinvwood : you might look at NG-Tax, which builds ASVs based on both the forward and reverse of a pair, even if not overlapping.

http://wurssb.gitlab.io/ngtax/docs/intro.html

SoilRotifer · August 7, 2024, 6:45pm

Hi @dnlrx,

You should be able to run vsearch --fastq_join external to QIIME 2. You can lookup the command here.
Note this is not the same as fastq_mergepairs!

But @fenny 's suggestion looks great!