BUG: q2-itsxpress's dependency BBmap cannot handle PacBio CCS

I am using q2-itsxpress to handle my full-length fungi Pacbio CCS.

The bug showed in BBmap merge step:

qiime itsxpress trim-single --i-per-sample-sequences rawdata/demux.qza --p-region ALL --o-trimmed test.qza --verbose
ERROR:root:could not perform read merging with BBmerge. Error from BBmerge was:
java -ea -Xms300m -cp /home/test/miniconda3/envs/qiime2-2020.8/opt/bbmap-38.69-0/current/ jgi.ReformatReads in=/tmp/qiime2-archive-dqnjht7w/d5b51a90-b232-4a1f-a88d-4a37e382e0e3/data/DJ0h1.fastq.gz_10_L001_R1_001.fastq.gz out=/tmp/itsxpress_d0tb7kep/seq_r1.fq.gz out2=/tmp/itsxpress_d0tb7kep/seq_r2.fq.gz
Executing jgi.ReformatReads [in=/tmp/qiime2-archive-dqnjht7w/d5b51a90-b232-4a1f-a88d-4a37e382e0e3/data/DJ0h1.fastq.gz_10_L001_R1_001.fastq.gz, out=/tmp/itsxpress_d0tb7kep/seq_r1.fq.gz, out2=/tmp/itsxpress_d0tb7kep/seq_r2.fq.gz]

Set INTERLEAVED to true
Input is being processed as paired
Changed from ASCII-33 to ASCII-64 on input ]: 93 -> 62

The ASCII quality encoding offset (64) is not set correctly, or the reads are corrupt; quality value below -5.
Please re-run with the flag 'qin=33', 'ignorebadquality', or '-da'.
Problematic read number 0:

@m54166_190711_083138/4260081/ccs
AAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTACTGTGATTTACTACTACACTGCGTGAGCGGAACGAAAACAACAACACCTAAAATGTGGAATATAGCATATAGTCGACAAGAGAAATCTACGAAAAACAAACAAAACTTTCAACAACGGATCTCTTGGTTCTCGCATCGATGAAGAGCGCAGCGAAATGCGATACCTAGTGTGAATTGCAGCCATCGTGAATCATCGAGTTCTTGAACGCACATTGCGCCCCTCGGCATTCCGGGGGGCATGCCTGTTTGAGCGTCGTTTCCATCTTGCGCGTGCGCAGAGTTGGGGGAGCGGAGCGGACGACGTGTAAAGAGCGTCGGAGCTGCGACTCGCCTGAAAGGGAGCGAAGCTGGCCGAGCGAACTAGACTTTTTTTCAGGGACGCTTGGCGGCCGAGAGCGAGTGTTGCGAGACAACAAAAAGCTCGACCTCAAATCAGGTAGGAATACCCGCTGAACTTAAGCATATCAATAAGCGGAGGAAAAGAAACCAACAGGGATTGCCTCAGTAGCGGCGAGTGAAGCGGCAAGAGCTCAGATTTGAAATCGTGCTTTGCGGCACGAGTTGTAGATTGCAGGTTGGAGTCTGTGTGGAAGGCGGTGTCCAAGTCCCTTGGAACAGGGCGCCCAGGAGGGTGAGAGCCCCGTGGGATGCCGGCGGAAGCAGTGAGGCCCTTCTGACGAGTCGAGTTGTTTGGGAATGCAGCTCCAAGCGGGTGGTAAATTCCATCTAAGGCTAAATACTGGCGAGAGACCGATAGCGAACAAGTACAGTGATGGAAAGATGAAAAGCACTTTGAAAAGAGAGTGAAACAGCACGTGAAATTGTTGAAAGGGAAGGGTATGCGATTAGCGGCCAGCAGGAGGTGCCTTCTCGTGAAAAGGCCGTGCACCGTCTTCGGACACCGTGCGCGGAGATGGCGAGGGGGCGCCTGAGGTCTGCGAACTCGAGGTTGCTGGCGTAATGATTGCATACCA
+
@@@@@@@@@@@@@@@@@}bstq~_iH~@~IYVs~P~P~Hqtm~aconp]t|qGin\mjmjXdYj^Ion\KMbsN~Ro^~AmxHg{FnglJR}l}@Rrb~@]PE_Taminsq]qqvjrda|U[[m|~@stnst]~r@Z~b@f]nv@k{x@q~Sm~IS~OB_b^q~a~axOZlouujtX\tqs~AkslXessrutt~JquuutXs~qqqfsm~b~LSmookGjitsalx@ttpZo\qm~PsiCT~Kdstqrq~Xgar~~~MZs~NCT~b~Y~~~~~@sUdm~b[h~~@jsruZHYn~~CtRlmHnuusjstuuuqrrm|Kbvwe@rqu~YHgu~RU]VthX@^pr~Ea@hXXlZ|ZVhI]tuusrsuqx@cZ~~T~eIZndWuVprr~a~busqdu~RmqqqLm|~~~~~@kUx~@]ktq~_~ai~Z~aEkirquusphs~Xtptsqth~@hiu~~ChmJg]t~XWog~@tvn~Jmqn_rZt^~~Wlnk^~V~c@ramfgvf~[~su~u~~~~@s~~JcF~atp~~Wu~u~Css]qsng[vKLebirv~aov~Np{S]pf[budsXUv@]biakx~@p[~NWTkt^p~\Iiuqsraaun~a~~tnt[ntqtm~O~akWs~@leo~a~qu~tG~@s~Hlr~~Htu~~Fk~Mr~~Wnpr]u]d~~@mq~~Qtqr~\~^vt@~Vquqpttq~@~wK~Vqquvs^f[dVQgVn@Ump@VLnHPHktrs~b~qs~{HkwZZ~~J~be@stq^~~rq~~Vssam~Gt]W]eorq\tctobtu~Rl~OppjXlqruup~b~~Tququ~~~JqrkU~~Tu~~~Aqnnhrtt~~@lqrurr]ltjoC|FT@@@[q@cj@a@j~OYtUguZ[r@otu~a~Tkq]j~Isc@MPu_zYjdtalii|~A~_~QrbGs^~uuq~b^d@ee^zTIgstnr~MsZts~urZx~~~Duu~oq[pU\qov[ts]YgrjpxYpGtnLY@u\s~Nquu~duusrIg[q

Offset=64
java.lang.Exception: Aborting.
at shared.KillSwitch.kill(KillSwitch.java:108)
at stream.FASTQ.quadToRead_slow(FASTQ.java:754)
at stream.FASTQ.toReadList(FASTQ.java:625)
at stream.FastqReadInputStream.fillBuffer(FastqReadInputStream.java:107)
at stream.FastqReadInputStream.hasMore(FastqReadInputStream.java:73)
at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:667)
at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:656)
Traceback (most recent call last):
File "/home/test/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/itsxpress/main.py", line 586, in split_interleaved
p1.check_returncode()
File "/home/test/miniconda3/envs/qiime2-2020.8/lib/python3.6/subprocess.py", line 389, in check_returncode
self.stderr)
subprocess.CalledProcessError: Command '['reformat.sh', 'in=/tmp/qiime2-archive-dqnjht7w/d5b51a90-b232-4a1f-a88d-4a37e382e0e3/data/DJ0h1.fastq.gz_10_L001_R1_001.fastq.gz', 'out=/tmp/itsxpress_d0tb7kep/seq_r1.fq.gz', 'out2=/tmp/itsxpress_d0tb7kep/seq_r2.fq.gz']' returned non-zero exit status 1.
Traceback (most recent call last):
File "/home/test/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/q2cli/commands.py", line 329, in call
results = action(**arguments)
File "", line 2, in trim_single
File "/home/test/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/qiime2/sdk/action.py", line 245, in bound_callable
output_types, provenance)
File "/home/test/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/qiime2/sdk/action.py", line 390, in callable_executor
output_views = self._callable(**view_args)
File "/home/test/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/q2_itsxpress/_itsxpress.py", line 123, in trim_single
cluster_id=cluster_id)
File "/home/test/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/q2_itsxpress/_itsxpress.py", line 203, in main
threads=threads)
File "/home/test/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/q2_itsxpress/_itsxpress.py", line 74, in _set_fastqs_and_check
reversed_primers=reversed_primers)
File "/home/test/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/itsxpress/main.py", line 603, in init
self.split_interleaved(reversed_primers=reversed_primers)
File "/home/test/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/itsxpress/main.py", line 596, in split_interleaved
raise e
File "/home/test/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/itsxpress/main.py", line 586, in split_interleaved
p1.check_returncode()
File "/home/test/miniconda3/envs/qiime2-2020.8/lib/python3.6/subprocess.py", line 389, in check_returncode
self.stderr)
subprocess.CalledProcessError: Command '['reformat.sh', 'in=/tmp/qiime2-archive-dqnjht7w/d5b51a90-b232-4a1f-a88d-4a37e382e0e3/data/DJ0h1.fastq.gz_10_L001_R1_001.fastq.gz', 'out=/tmp/itsxpress_d0tb7kep/seq_r1.fq.gz', 'out2=/tmp/itsxpress_d0tb7kep/seq_r2.fq.gz']' returned non-zero exit status 1.

Plugin error from itsxpress:

Command '['reformat.sh', 'in=/tmp/qiime2-archive-dqnjht7w/d5b51a90-b232-4a1f-a88d-4a37e382e0e3/data/DJ0h1.fastq.gz_10_L001_R1_001.fastq.gz', 'out=/tmp/itsxpress_d0tb7kep/seq_r1.fq.gz', 'out2=/tmp/itsxpress_d0tb7kep/seq_r2.fq.gz']' returned non-zero exit status 1.

See above for debug info.

I guess bbtools can not handle Pacbio Phred-33 right now

Hi @sixvable,
I am just cc:ing @Adam_Rivers to see if he can help debug.

It looks like there is an issue with the data or the phred scores were altered, maybe by another program after sequencing or there is a file encoding or conversion error. BBmerge autodetects the offset (33 or 64), but your first read (@m54166_190711_083138/4260081/ccs) has phred scores as high as 62 (~) in ASCII 64 space which seems off. This can sometimes happen if a program uses both reads and recalculates the confidence based on that. /the solution would be to use the unprocessed reads from the run.

Pacbio CCS reads are encoded with Phred33, but has higher q-value than normal illumina reads. Normally the q-value of Pacbio CCS is 93 (Phred score ~), which is same with sanger format Phred33. I guess that BBmerge's autodetect offset treats these reads as normal illumina 1.5 format Phred64.

Hmm. I've never run Pacbio reads through ITSxpress because I've never had any. it looks like it is failing the read pre-check prior to running.

Can you DM me a link to the data? I'll use it to add Pacbio support. We are almost done with version 2 that simplifies installations, fixes some things, and adds features.

Dr. @Adam_Rivers

I have sent the data to your USDA email. Please check.

Sixvable