Commas in SampleID column

Chris_Hemmerich · November 7, 2017, 4:51pm

Hi,

I'd like to report a bug with either Keemei or demux. I have a metadata file with a SampleID containing a comma that Keemei validates as okay for QIIME 2, but demux summarize fails because the comma corrupts the MANIFEST file in demux.qza. It would be great if these two could be on the same page.

Thanks,
Chris

thermokarst · November 7, 2017, 11:58pm

Hi @Chris_Hemmerich, thanks for writing! Can you please provide a bit more detail to help us out:

The exact command you were trying to run (copy-and-paste please!)
The exact error message you received (copy-and paste the output when run with --verbose, or attach the detailed error log)
A few example Sample IDs

If you are able to, it would be great if you could upload a copy of your demultiplexed sequences artifact, but we know sometimes that isn't always possible.

Thanks!

Chris_Hemmerich · November 9, 2017, 3:38pm

Thanks for the reply.

Here are the commands that I ran, with the error occurring in demux summarize

qiime demux emp-paired \
  --m-barcodes-file meta.tsv \
  --m-barcodes-category Barcode \
  --i-seqs ../reads.qza \
  --o-per-sample-sequences demux

qiime demux summarize \
  --i-data demux.qza \
  --verbose \
  --o-visualization demux.qzv

The error is:

Traceback (most recent call last):
File "/N/u/chemmeri/Karst/opt/miniconda3/envs/qiime2-2017.10/lib/python3.5/site-packages/q2cli/commands.py", line 218, in call
results = action(**arguments)
File "", line 2, in summarize
File "/N/u/chemmeri/Karst/opt/miniconda3/envs/qiime2-2017.10/lib/python3.5/site-packages/qiime2/sdk/action.py", line 220, in bound_callable
output_types, provenance)
File "/N/u/chemmeri/Karst/opt/miniconda3/envs/qiime2-2017.10/lib/python3.5/site-packages/qiime2/sdk/action.py", line 416, in callable_executor
ret_val = self._callable(output_dir=temp_dir, **view_args)
File "/N/u/chemmeri/Karst/opt/miniconda3/envs/qiime2-2017.10/lib/python3.5/site-packages/q2_demux/_summarize/_visualizer.py", line 102, in summarize
header=0, comment='#')
File "/N/u/chemmeri/Karst/opt/miniconda3/envs/qiime2-2017.10/lib/python3.5/site-packages/pandas/io/parsers.py", line 655, in parser_f
return _read(filepath_or_buffer, kwds)
File "/N/u/chemmeri/Karst/opt/miniconda3/envs/qiime2-2017.10/lib/python3.5/site-packages/pandas/io/parsers.py", line 411, in _read
data = parser.read(nrows)
File "/N/u/chemmeri/Karst/opt/miniconda3/envs/qiime2-2017.10/lib/python3.5/site-packages/pandas/io/parsers.py", line 1005, in read
ret = self._engine.read(nrows)
File "/N/u/chemmeri/Karst/opt/miniconda3/envs/qiime2-2017.10/lib/python3.5/site-packages/pandas/io/parsers.py", line 1748, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 890, in pandas._libs.parsers.TextReader.read (pandas/_libs/parsers.c:10862)
File "pandas/_libs/parsers.pyx", line 912, in pandas._libs.parsers.TextReader._read_low_memory (pandas/_libs/parsers.c:11138)
File "pandas/_libs/parsers.pyx", line 966, in pandas._libs.parsers.TextReader._read_rows (pandas/_libs/parsers.c:11884)
File "pandas/_libs/parsers.pyx", line 953, in pandas._libs.parsers.TextReader._tokenize_rows (pandas/_libs/parsers.c:11755)
File "pandas/_libs/parsers.pyx", line 2184, in pandas._libs.parsers.raise_parser_error (pandas/_libs/parsers.c:28765)
pandas.errors.ParserError: Error tokenizing data. C error: Expected 3 fields in line 84, saw 7

Plugin error from demux:

Error tokenizing data. C error: Expected 3 fields in line 84, saw 7

See above for debug info.

I made the leap to the error being caused by a comma by poking around the qza archive for tabular files. Line 84 (from the error) from data/MANIFEST is

GSF1493_Mock2,3,4,GSF1493_Mock2,3,4_42_L001_R1_001.fastq.gz,forward

as compared to another example

GSF1493_CGB_EB_Neg,GSF1493_CGB_EB_Neg_46_L001_R1_001.fastq.gz,forward

The sample names from my metadata file for these two samples are:

GSF1493_Mock2,3,4
GSF1493_CGB_EB_Neg

When I removed these commas and reran demux emp-paired, it worked okay.

I can't share qza file or non-control sample info because this data is not mine to share. Please let me know if you have any other questions or I wasn't clear.

Thanks,
Chris

thermokarst · November 9, 2017, 10:55pm

Thanks @Chris_Hemmerich, this is a bummer, but thanks so much for the detailed info. I went ahead and opened up a bug report --- we will update this thread as soon as this bug is fixed (I don't have an ETA at the moment). It sounds like removing the commas from your sample IDs will work around the issue for now, but I recognize that that is a huge pain. Stay tuned, and thanks!

Chris_Hemmerich · November 13, 2017, 4:03pm

Thanks! Fortunately this was early in the pipeline, so replacing the commas was not much of a pain. From the bug report, it sounds like fixing this in QIIME2 may not be trivial, in which case I'd be equally happy if Keemei could complain about it.

system · December 14, 2017, 10:03pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.