Visualization for Paired-end Demultiplexed Fastq Imports

We have imported our data (split over two lanes) using the Casava 1.8 paired-end demultiplexed fastq format, and performed the dada2 command (denoise-paired) for each lane. Next, we merged our denoised data using the FMT tutorial; however, we encountered the following error while trying to generate a FeatureTable[Frequency] artifact:


This response lead us to question the congruency of our sample-metadata.tsv and our SampleData[PairedEndSequencesWithQuality] imported data.

We then attempted to generate a visualization of our denoised data, but encountered the following error from the demux summarize command:


Is there a way to create a visualization of demultiplexed SampleData [PairedEndSequencesWithQuality]? We are anticipating that our issue with generating the FeatureTable[Frequency] artifact lies between the #SampleIDs of our imported data and our sample-metadata.tsv.

Good interpretation of our nasty error message, that is very likely the issue!

Yup, it just looks to me like you are passing the wrong artifact as input to demux summarize.

I'm guessing rep-seqs-Lane1.qza is an artifact of type FeatureData[Sequence] (you could use qiime metadata tabulate to look at that if you want, but it won't have sample ids, only feature ids).

You'll want to provide the artifact(s) you passed to qiime dada2 denoise-single/paired to
qiime demux summarize (in fact you probably already made these to pick the trim/trunc parameters for the denoise step). These summaries should have the sample ids present on the first tab at the bottom.

I hope that helps!

2 Likes

Created an issue for the gross error message.

1 Like

Hi @ebolyen,

Thanks for your help! When I re-ran qiime demux summarize with the correct artifact, I got the following error:

image

This artifact was generated while using qiime2-2017.5 (currently using 2017.7), and when imported, contained 68 samples. Do you have any suggestions for working around this error?

Hey @Lexie_Keding,

That’s weird. Would you be able to post the log-file (or re-run with --verbose)? I’d have to look at the traceback to be sure what is happening.

Thanks!

Hi @ebolyen,

Here is the log-file from the demux summarize:

image

Thanks again for your help!!

Hey @Lexie_Keding,

Thanks for the traceback!

Based on that, it would seem that the summarize visualizer is trying to plot a distribution of a single element, which seaborn (one of our dependencies) fails to do because it's a nonsense thing to try. I'm not 100% sure that this is the problem, but it's a place to start.

I wonder if something has gone very wrong here. We should have detected this and failed earlier, but maybe something isn't right.

Would you be able to run:

qiime tools export sample_Lane2.qza --output-dir sample_Lane2_debug

and then

ls sample_Lane2_debug
cat sample_Lane2_debug/MANIFEST

That will tell us what ended up inside of your artifact and what samples it thinks exist (the contents of MANIFEST).

That should get us enough info to figure out where to start looking for the bug.

Thanks so much!

Hi @ebolyen,

Below are the contents of the MANIFEST:

I appreciate all of your help on this issue!

1 Like

Hey @Lexie_Keding,

That explains exactly what is wrong! It looks like your artifact is of the opinion that there is only one sample: raw (all rows in the first column in that file has the same ID).

I’ve created an issue on q2-demux to handle a single sample. But that is a little beside the point for you, as you have 68 samples. I suspect you ran into the underscore bug from a couple releases ago (you mentioned importing on 2017.5). Do your sample-ids have underscores? If so, try re-importing with this current release, things should work like expected.

If that isn’t the case, do you remember how you imported the data? If your file is small enough you could look at the provenance in view.qiime2.org.

1 Like

Hey @ebolyen,

Our sample-ids do contain underscores, so I re-imported the samples with 2017.7, but still encountered the same error (below) when running the demux summarize command with the artifacts.

image
I then proceeded to run the --verbose and the debug commands, which resulted in the exact same traceback and MANIFEST, respectively.

I'm working with @Lexie_Keding on this project.

There were a couple of errors we had in the setup here, which I think we have fixed (at least as far as this step is concerned!). The first was that we were pointing to a parent folder of the correct input folder. The second was more subtle: the file names were not structured correctly. We have, e.g

B6-Chow-24-L2_L002_R1_001.fastq.gz

This has only four tokens when split on '_': according to the documentation

In this format, there are two fastq.gz file for each sample in the study, and the file name includes the sample identifier. The forward and reverse read file names for a single sample might look like L2S357_15_L001_R1_001.fastq.gz and L2S357_15_L001_R2_001.fastq.gz, respectively. The underscore-separated fields in this file name are the sample identifier, the barcode sequence or a barcode identifier, the lane number, the read number, and the set number.

As you can see, we were missing the barcode id token, which I think was (one of the things) creating problems.

We tried running the import from a clean directory with the original filenames, and both the import and demux summarize appear to now work (on a subset of the data).

[Backstory: our facility mainly processes RNA-Seq data, and we have in-house protocols for sample and file naming, part of which we had used in moving the files from the Illumina demultiplex output to the analysis folder. It looks like the import tool is designed to work directly with the file names generated by Illumina bcl2fastq2 (which makes absolute sense, of course.]

@ebolyen Thanks for the assistance: you got us moving in the right direction.

2 Likes

The MANIFEST posted above makes sense now, QIIME 2 interpreted part of the parent dir (fastqs/TH-HF-18850-L2 as the barcode ID. Leaving raw as the sample id because it looks like the sample segment.

Would you be able to tell me what the exact command you used to import was? I think we can probably keep this from happening to other people, but I can't really figure out how you would have ended up with a MANIFEST referencing a directory as that file is generated from a directory it searches, or a fastq-manifest which outlines the sample ids in the first place.

That is excellent to hear!

1 Like

To follow up:

It looks like the single-sample issue in the visualization was fixed in an upstream package. Additionally, 2017.9 now has some better validation to prevent the import issues you guys ran into!

1 Like

Hi @ebolyen,

Thanks for the update and for all of your help over the past few months!!

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.