Deblur error - can only use .str accessor with string values!

I’m QIIME 2 v2019.10 on a cluster and following the " Alternative methods of read-joining in QIIME 2" tutorial with my data, and am stuck at the deblur denoising/subOTU picking step. I ran the command:

qiime deblur denoise-16S --i-demultiplexed-seqs demux-joined-filtered.qza --p-trim-length 250 --p-sample-stats --o-representative-sequences rep-seqs-deblur250.qza --o-table table-deblur250.qza --o-stats deblur-stats250.qza

And get an error

"Traceback (most recent call last): File “/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/q2cli/commands.py”, line 328, in call results = action(**arguments) File “</opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/decorator.py:decorator-gen-449>”, line 2, in denoise_16S File “/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/sdk/action.py”, line 240, in bound_callable output_types, provenance) File “/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/sdk/action.py”, line 383, in callable_executor output_views = self._callable(**view_args) File “/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/q2_deblur/_denoise.py”, line 100, in denoise_16S hashed_feature_ids=hashed_feature_ids) File “/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/q2_deblur/_denoise.py”, line 150, in denoise_helper ids_with_underscores = df[df.index.str.contains(’’)].index.tolist() File “/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/pandas/core/accessor.py”, line 175, in get accessor_obj = self._accessor(obj) File “/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/pandas/core/strings.py”, line 1917, in init self._inferred_dtype = self._validate(data) File “/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/pandas/core/strings.py”, line 1967, in _validate raise AttributeError("Can only use .str accessor with string " “values!”) AttributeError: Can only use .str accessor with string values! Plugin error from deblur: Can only use .str accessor with string values! "

The test dataset in the tutorial runs fine, so it is something with my file rather than the qiime2 setup. The only thing I can think of is that the script is looking for samples which no longer exist because they were filtered out at the previous step (it is an old labeling experiment so some samples had very few reads to start with). If indeed this is likely to be the case, how do I filter out these fastq files from the .qza file? Or do you have any other ideas about what might have gone wrong?

Thanks so much!

Hi @GrSeq !

It seems like there is an issue with the sample ID’s. Can you share with me what your sample ID’s are like?

For example, the sample ID’s for the tutorial data are all of the format XXXYYYY.Y[.Y], where X are letters, Y are numbers, and [] indicate the contained is optional. E.g., BAQ1552.1.1, BAQ2420.2, YUN3856.1.3 are some of the sample ID’s in the file.

Hi @gwarmstrong,

Thank you for your message.

The header of my fastq files are like this:
@M00517:58:000000000-A458E:1:1101:17332:1534 1:N:0:0

Hmmm…

What do you get for sample-id if you do something like this?

$ python
Python 3.6.7 | packaged by conda-forge | (default, Feb 28 2019, 02:16:08)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from qiime2 import Artifact
>>> from q2_types.per_sample_sequences import SingleLanePerSampleSingleEndFastqDirFmt
>>> import pandas as pd
>>> demux_seqs = Artifact.load('demux-joined-filtered.qza')
>>> fastq_dir = demux_seqs.view(SingleLanePerSampleSingleEndFastqDirFmt)
>>> df = fastq_dir.manifest.view(pd.DataFrame)
>>> df
                                                       forward
sample-id
BAQ1552.1.1  /var/folders/5q/m7zbbc_s08n4xh4tqjdbwmfw0000gq...
BAQ2420.1.1  /var/folders/5q/m7zbbc_s08n4xh4tqjdbwmfw0000gq...
BAQ2420.1.2  /var/folders/5q/m7zbbc_s08n4xh4tqjdbwmfw0000gq...
BAQ2420.1.3  /var/folders/5q/m7zbbc_s08n4xh4tqjdbwmfw0000gq...
BAQ2420.2    /var/folders/5q/m7zbbc_s08n4xh4tqjdbwmfw0000gq...
...                                                        ...
YUN3856.1.1  /var/folders/5q/m7zbbc_s08n4xh4tqjdbwmfw0000gq...
YUN3856.1.2  /var/folders/5q/m7zbbc_s08n4xh4tqjdbwmfw0000gq...
YUN3856.1.3  /var/folders/5q/m7zbbc_s08n4xh4tqjdbwmfw0000gq...
YUN3856.2    /var/folders/5q/m7zbbc_s08n4xh4tqjdbwmfw0000gq...
YUN3856.3    /var/folders/5q/m7zbbc_s08n4xh4tqjdbwmfw0000gq...

[61 rows x 1 columns]

I get the following:

sample-id                                                   
1          /tmp/qiime2-archive-0j85o6bx/f847378f-6244-40a...
2          /tmp/qiime2-archive-0j85o6bx/f847378f-6244-40a...
3          /tmp/qiime2-archive-0j85o6bx/f847378f-6244-40a...
4          /tmp/qiime2-archive-0j85o6bx/f847378f-6244-40a...
5          /tmp/qiime2-archive-0j85o6bx/f847378f-6244-40a...
...                                                      ...
267        /tmp/qiime2-archive-0j85o6bx/f847378f-6244-40a...
268        /tmp/qiime2-archive-0j85o6bx/f847378f-6244-40a...
270        /tmp/qiime2-archive-0j85o6bx/f847378f-6244-40a...
271        /tmp/qiime2-archive-0j85o6bx/f847378f-6244-40a...
275        /tmp/qiime2-archive-0j85o6bx/f847378f-6244-40a...

[227 rows x 1 columns]

Should the sample IDs here be matching the sample IDs in the fastq header of the corresponding files?

In case it will interfere with your interpretation of the output, I ran the commands above on the qiime2 virtual box installed on my laptop, but run qiime2 on a cluster… my laptop isn’t powerful enough to do some of the steps in qiime, but qiime2 isn’t installed in a way that I can access it like a python package for the commands you suggested above)

Thanks!

This is super helpful!

What you did worked great!

Did you upload the data as a fastq? Something like: Importing data — QIIME 2 2019.10.0 documentation
If so, can you also post the first few lines of the manifest file?

I did upload the data as a fastq file, and used the --type EMPPairedEndSequences option, rather than making a manifest file. The sequencing was a Golay barcoded, multiplexed 2*150nt illumina run using the original EMP primers from Caporaso. Is the --type EMPPairedEndSequences option only for the newer EMP protocol? I’m not sure if qiime2 would make a manifest file upon uploading the data, and if so, how I would access it. Sorry!

Ah. I think the manifest would have been created during the demultiplexing step. Can you post the qiime command you used to demultiplex the data?

Yes - here you go. Thanks!
qiime demux emp-paired -i-seqs ToolikSequences.qza --m-barcodes-file mapping_Toolik.txt --m-barcodes-column "BarcodeSequence" --o-per-sample-sequences demux --o-error-correction-details demux_details --p-rev-comp-mapping-barcodes --p-golay-error-correction False

Awesome. Can you post the first few lines of the mapping_Toolik.txt file?

Yes - here it is.

#SampleID BarcodeSequence LinkerPrimerSequence ReversePrimer Horizon Block Season IC ICPairedsample Location Coverclass Project Description
1 TCTGAGGTTGCC GTGTGCCAGCMGCCGCGGTAA CCGGACTACHVGGGTWTCTAAT O1 1 July NO 1 Toolik NA Toolik NA
2 TCCAACTGCAGA GTGTGCCAGCMGCCGCGGTAA CCGGACTACHVGGGTWTCTAAT O2 1 July NO 2 Toolik NA Toolik NA
3 ATAATTGCCGAG GTGTGCCAGCMGCCGCGGTAA CCGGACTACHVGGGTWTCTAAT min 1 July NO 3 Toolik NA Toolik NA

There it is!

So the reason you are getting the error above is that the entries in the #SampleID column are integers.

Are you sure these match the #SampleID’s in your sample metadata?

If so, then I think you can work around the error by creating a new mapping file with each #SampleID remapped to something with non-numeric characters, e.g., ['sample-1', 'sample-2', 'sample-3', ..., ]. You will need to do this both in the mapping file and the sample metadata file.

If these do not match your sample metadata, you may need to do some more investigating to determine how these sample ID’s correspond to the ones in those fastq headings you showed above.

1 Like

Excellent - thanks so much. I will test this out tomorrow and let you know how it goes.

It worked - @gwarmstrong thanks so much!

1 Like