Importing Error- Quality Score doesn't match

kayleecastle · August 23, 2019, 2:06pm

hiya!!

Currently I'm trying to import sequence data into Qiime2 2019.4 via Virtual Machine. I'm already processed the fastq file by removing barcodes and linkers. When I go to import the sequences into Qiime2, it gives me an error:

There was a problem importing /home/qiime2/Documents/pacbio/qiime2/emp-single-end-sequences/:

/home/qiime2/Documents/pacbio/qiime2/emp-single-end-sequences/sequences.fastq.gz is not a(n) FastqGzFormat file:*
Quality score length doesn't match sequence length for record beginning on line 5*

I ran the normal import command with type as EMPSingleEndSequences with no option to run "--verbose" flag.

qiime tools import --input-path '/home/qiime2/Documents/pacbio/qiime2/emp-single-end-sequences/' --type EMPSingleEndSequences --output-path '/home/qiime2/Documents/pacbio/Qiimedocs/empsingleend.qza'

Any and all help is appreciated!!
Thank you!!
--Kaylee

ebolyen · August 23, 2019, 4:13pm

Hi @kayleecastle!

Welcome to the forum!

Based on the error, it looks like something is up with the second record in your sequences.fastq.gz file.

Since we are lucky and it's at the very top of the file, would you be able to post the result of running this command?

zcat ~/Documents/pacbio/qiime2/emp-single-end-sequences/sequences.fastq.gz | head

That should give us the first 10 lines of the file.

Thanks!

kayleecastle · August 23, 2019, 5:21pm

Hope this helps!! We did have a small issue with processing but got it to work with a minor change..

ebolyen · August 23, 2019, 8:06pm

Thanks for the screenshot @kayleecastle!

So I have a few observations and questions:

The first record is actually the problematic one, it's quality sequence is indeed shorter than the read. I don't know how this happened.
This is from a pacbio instrument, could you describe what kind?
It looks like the orientation (and perhaps even the location) is mixed, but without knowing the instrument, I am uncertain how to interpret the fastq headers.
What kind of data is this? Is it still amplicon, or are we working with shotgun/other?

Oh yeah? You can't just leave us hanging with that! What kind of steps did you end up needing to take? It might explain problem 1 from above.

ebolyen · August 23, 2019, 8:50pm

Looking into this more, I don't think we have the capacity to deal with PacBio CCS data yet in QIIME 2, although it is on our todo list.

This may also necessitate some changes to our upstream types as it seems that the notion of paired or single end is fairly irrelevant to CCS data.

(cc @benjjneb who knows more about this data)

In the meanwhile, using DADA2 directly as discussed in this paper, is probably your best bet!

I'm also very interested in what this process generally looks like from your (@kayleecastle) perspective, as I haven't had a chance to work with this kind of data before. What transformations did you have to do to get a fastq file? It appears that is not the native format for CCS data.

benjjneb · August 24, 2019, 1:58am

The raw fastqs that come out of the ccs/lima applications from amplicon data seem to typically be in mixed orientation, contain the primers, and have quality scores that range up to 93.

We will add a dedicated denoise-pacbio dada2 workflow at some point, but I might wait until R packagve version 1.12 propagates to bioconda/qiime2, as there were some pacbio fixes between 1.10 and 1.12 that make implementing such a workflow significantly easier.

kayleecastle · August 26, 2019, 1:35pm

Thank you for taking the time to respond! It is PacBio Sequel and is amplicon data. The company that does our sequencing provides a Fastq Processor, which we have used with 3 other sets of MiSeq data. Usually when processing the data, we remove linker primers, barcodes and reverse primers. This time we just left the reverse primers on since it was a new type of data set and giving errors with our normal parameters. We were going to address this further in Qiime2, but obviously no luck lol

kayleecastle · August 26, 2019, 1:40pm

Thank you for helping me with this issue!! QIIME2 team is awesome!

Anyways, I used a Fastq Processor that the sequencing company provides (http://www.mrdnafreesoftware.com/). The processor takes the Fastq file removes linker primes, barcodes and reverse primers, and zips it. Unfortunately I have only worked with MiSeq data prior to this so I'm pretty unfamiliar with PacBio/CCS data.

DhebbieF · October 7, 2021, 12:21am

Hi, I would like to please know if QIIME2 now supports the importation of PacBio css?
apologies for bumping in like this.

Cheers.Deborah

Keegan-Evans · October 7, 2021, 12:25am

@DhebbieF

Not yet, though it is on our list of things to do, especially with the growing use of PacBio sequencing!

DhebbieF · October 7, 2021, 5:49am

OK..thanks for the prompt response.