I'm trying to work through how this works. So, it's something where you're missing the end fo the reads and cannot overlap them, but theoretically, if say, your sequencing was long enough, you could? Or like, you have an amplicon from the 16S and ITS gene that you're trying to combine?
I think the first case could work with Sidle - the current tutorial actually uses forward and reverse reads from the same primer pair as one fo the regions amplified. Whether or not they overlap is irrelevant to the processing.
There are a few caveats though, in my mind
The quality of the alignment goes down based on the number of sequences in your database that are the same over a region. You might chose to dereplicate your database if you're working in a smaller subset or region.
Sidle's alignment parameters tend to be pretty stringent, and that can cause some weirdness and you may lose reads. I'm still working on an "accounting" function that would tell you how mnay you've lost, and I have so many things I'm trying to do right now.
You will likely want to build your table in "average" mode (default) where the read "depth" is provided based on the average sequencing depth over all regions, which is probably more appropriate if you're merging forward and reverse reads.
If you try it, please keep let us know how it works?
As far as I know there isn't anything in the core plugins still. Outside of Sidle, exposing vsearch's option in the Q2 plugin or even DADA2's justConcatenate in mergePairs would be the easiest Q2 adaptation in a future release I would think.
This would be easy and probably less likely to cause issues downstream than some other options (it would break dada2 and break classify-sklearn but would work for OTU clustering and alignment). One hang-up is to specify semantic types accordingly to prevent passing this to dada2 etc.
this has been discussed a bit on the forum and off — it could lead to serious issues downstream with taxonomy classifcation, alignment/phylogeny building, and maybe elsewhere. So exposing it would be easy, but this is not advisable (unless if a new semantic type is created for such an output, which would not be compatible with many downstream steps).
This is something that has been on my radar to look into for quite some time, so maybe later this year I can explore if others do not beat me to it
Sounds like a promising secondary use case for q2-sidle!