very few sequences, overlap issues & "cannot describe DataFrame without columns error"

Hello, I'm new to QIIIME 2 as well as microbiome bioinformatics. I want to make sure I'm importing a sample properly and figure out why, when I do import, the sample contains few sequences and can't join.

When I extract and examine my fastq.gz files, the read sequence matches the Casava 1.8 Format exactly, but the filenames I was given (possibly renamed to retain anonymity) do not match the proper format for easy importing. I tried renaming them according to their lane number, but then I noticed there were multiple lanes in the sample. It is possibly worth noting that I was given only one sample (2 fastq.gz files: R1 and R2) to work with.

When I import using a manifest, it seems like a lot of sequences are missing and there isn't enough overlap to join pairs.

This is the command I run which completes without issue:

qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path pe-33-manifest.tsv
--output-path paired-end-demux.qza
--input-format PairedEndFastqManifestPhred33V2

This is the manifest file: pe-33-manifest.tsv (131 Bytes)

And here is the .qza and .qzv:
paired-end-demux.qza (2.9 MB)
demux.qzv (269.1 KB)

I then run:
qiime vsearch join-pairs
--i-demultiplexed-seqs paired-end-demux.qza
--o-joined-sequences demux-joined.qza

demux-joined.qza (10.6 KB)

and then:
qiime demux summarize
--i-data demux-joined.qza
--o-visualization demux-joined.qzv

which outputs the following:

Plugin error from demux:

Cannot describe a DataFrame without columns

Debug info has been saved to /var/folders/5l/48xfw0c53419vl06526g32kr0000gs/T/qiime2-q2cli-err-9oa7xejf.log
(qiime2-2019.10) ~/Desktop/fastq $ qiime demux summarize --i-data demux-joined.qza --o-visualization demux-joined.qzv --verbose
Traceback (most recent call last):
File "/opt/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/q2cli/", line 328, in call
results = action(**arguments)
File "</opt/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/>", line 2, in summarize
File "/opt/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/sdk/", line 240, in bound_callable
output_types, provenance)
File "/opt/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/sdk/", line 445, in callable_executor
ret_val = self._callable(output_dir=temp_dir, **view_args)
File "/opt/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/q2_demux/_summarize/", line 165, in summarize
forward_stats = _compute_stats_of_df(forward_scores)
File "/opt/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/q2_demux/_summarize/", line 92, in _compute_stats_of_df
percentiles=[0.02, 0.09, 0.25, 0.5, 0.75, 0.91, 0.98])
File "/opt/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/pandas/core/", line 10178, in describe
raise ValueError("Cannot describe a DataFrame without columns")
ValueError: Cannot describe a DataFrame without columns

Plugin error from demux:

Cannot describe a DataFrame without columns

See above for debug info.

I'm still new to all of this, but it looks like there just isn't enough overlap as seen in demux summary visualization? Or is there something else I'm doing wrong? Can it not join pairs for one sample?


Additional information
Mac / Mojave / 16GB RAM / 2.6 GHz Intel Core i7 / 500 GB HDD

Hi @glados,

Welcome to the :qiime2: forum!

I think your problem has to do with your read length. The error is saying - in a very round about way - that there is nothing to summarize in the new file because none of your reads joined.

My guess is that you don't have enough of an overlap to be able to join your reads (this will depend on your primer pair, among other things). If this si the case, my suggestion is to continue processing with just the forward read.



Thanks for the quick reply and reassurance on the read length. I will pass this information along.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.