Deblur error - can only use .str accessor with string values!

GrSeq · December 15, 2019, 11:23pm

I'm QIIME 2 v2019.10 on a cluster and following the " Alternative methods of read-joining in QIIME 2" tutorial with my data, and am stuck at the deblur denoising/subOTU picking step. I ran the command:

qiime deblur denoise-16S --i-demultiplexed-seqs demux-joined-filtered.qza --p-trim-length 250 --p-sample-stats --o-representative-sequences rep-seqs-deblur250.qza --o-table table-deblur250.qza --o-stats deblur-stats250.qza

And get an error

"Traceback (most recent call last): File "/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/q2cli/commands.py", line 328, in call results = action(**arguments) File "</opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/decorator.py:decorator-gen-449>", line 2, in denoise_16S File "/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/sdk/action.py", line 240, in bound_callable output_types, provenance) File "/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/sdk/action.py", line 383, in callable_executor output_views = self._callable(**view_args) File "/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/q2_deblur/_denoise.py", line 100, in denoise_16S hashed_feature_ids=hashed_feature_ids) File "/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/q2_deblur/_denoise.py", line 150, in denoise_helper ids_with_underscores = df[df.index.str.contains('')].index.tolist() File "/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/pandas/core/accessor.py", line 175, in get accessor_obj = self._accessor(obj) File "/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/pandas/core/strings.py", line 1917, in init self._inferred_dtype = self._validate(data) File "/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/pandas/core/strings.py", line 1967, in _validate raise AttributeError("Can only use .str accessor with string " "values!") AttributeError: Can only use .str accessor with string values! Plugin error from deblur: Can only use .str accessor with string values! "

The test dataset in the tutorial runs fine, so it is something with my file rather than the qiime2 setup. The only thing I can think of is that the script is looking for samples which no longer exist because they were filtered out at the previous step (it is an old labeling experiment so some samples had very few reads to start with). If indeed this is likely to be the case, how do I filter out these fastq files from the .qza file? Or do you have any other ideas about what might have gone wrong?

Thanks so much!

gwarmstrong · December 18, 2019, 6:05pm

Hi @GrSeq !

It seems like there is an issue with the sample ID's. Can you share with me what your sample ID's are like?

For example, the sample ID's for the tutorial data are all of the format XXXYYYY.Y[.Y], where X are letters, Y are numbers, and indicate the contained is optional. E.g., BAQ1552.1.1, BAQ2420.2, YUN3856.1.3 are some of the sample ID's in the file.

GrSeq · December 18, 2019, 8:16pm

Hi @gwarmstrong,

Thank you for your message.

The header of my fastq files are like this:
@M00517:58:000000000-A458E:1:1101:17332:1534 1:N:0:0

gwarmstrong · December 18, 2019, 8:23pm

Hmmm...

What do you get for sample-id if you do something like this?

$ python
Python 3.6.7 | packaged by conda-forge | (default, Feb 28 2019, 02:16:08)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from qiime2 import Artifact
>>> from q2_types.per_sample_sequences import SingleLanePerSampleSingleEndFastqDirFmt
>>> import pandas as pd
>>> demux_seqs = Artifact.load('demux-joined-filtered.qza')
>>> fastq_dir = demux_seqs.view(SingleLanePerSampleSingleEndFastqDirFmt)
>>> df = fastq_dir.manifest.view(pd.DataFrame)
>>> df
                                                       forward
sample-id
BAQ1552.1.1  /var/folders/5q/m7zbbc_s08n4xh4tqjdbwmfw0000gq...
BAQ2420.1.1  /var/folders/5q/m7zbbc_s08n4xh4tqjdbwmfw0000gq...
BAQ2420.1.2  /var/folders/5q/m7zbbc_s08n4xh4tqjdbwmfw0000gq...
BAQ2420.1.3  /var/folders/5q/m7zbbc_s08n4xh4tqjdbwmfw0000gq...
BAQ2420.2    /var/folders/5q/m7zbbc_s08n4xh4tqjdbwmfw0000gq...
...                                                        ...
YUN3856.1.1  /var/folders/5q/m7zbbc_s08n4xh4tqjdbwmfw0000gq...
YUN3856.1.2  /var/folders/5q/m7zbbc_s08n4xh4tqjdbwmfw0000gq...
YUN3856.1.3  /var/folders/5q/m7zbbc_s08n4xh4tqjdbwmfw0000gq...
YUN3856.2    /var/folders/5q/m7zbbc_s08n4xh4tqjdbwmfw0000gq...
YUN3856.3    /var/folders/5q/m7zbbc_s08n4xh4tqjdbwmfw0000gq...

[61 rows x 1 columns]

GrSeq · December 18, 2019, 9:17pm

I get the following:

sample-id                                                   
1          /tmp/qiime2-archive-0j85o6bx/f847378f-6244-40a...
2          /tmp/qiime2-archive-0j85o6bx/f847378f-6244-40a...
3          /tmp/qiime2-archive-0j85o6bx/f847378f-6244-40a...
4          /tmp/qiime2-archive-0j85o6bx/f847378f-6244-40a...
5          /tmp/qiime2-archive-0j85o6bx/f847378f-6244-40a...
...                                                      ...
267        /tmp/qiime2-archive-0j85o6bx/f847378f-6244-40a...
268        /tmp/qiime2-archive-0j85o6bx/f847378f-6244-40a...
270        /tmp/qiime2-archive-0j85o6bx/f847378f-6244-40a...
271        /tmp/qiime2-archive-0j85o6bx/f847378f-6244-40a...
275        /tmp/qiime2-archive-0j85o6bx/f847378f-6244-40a...

[227 rows x 1 columns]

Should the sample IDs here be matching the sample IDs in the fastq header of the corresponding files?

In case it will interfere with your interpretation of the output, I ran the commands above on the qiime2 virtual box installed on my laptop, but run qiime2 on a cluster... my laptop isn't powerful enough to do some of the steps in qiime, but qiime2 isn't installed in a way that I can access it like a python package for the commands you suggested above)

Thanks!

gwarmstrong · December 18, 2019, 9:53pm

This is super helpful!

What you did worked great!

Did you upload the data as a fastq? Something like: Importing data — QIIME 2 2019.10.0 documentation
If so, can you also post the first few lines of the manifest file?

GrSeq · December 18, 2019, 10:23pm

I did upload the data as a fastq file, and used the --type EMPPairedEndSequences option, rather than making a manifest file. The sequencing was a Golay barcoded, multiplexed 2*150nt illumina run using the original EMP primers from Caporaso. Is the --type EMPPairedEndSequences option only for the newer EMP protocol? I'm not sure if qiime2 would make a manifest file upon uploading the data, and if so, how I would access it. Sorry!

gwarmstrong · December 18, 2019, 10:57pm

Ah. I think the manifest would have been created during the demultiplexing step. Can you post the qiime command you used to demultiplex the data?

GrSeq · December 18, 2019, 11:00pm

Yes - here you go. Thanks!
qiime demux emp-paired -i-seqs ToolikSequences.qza --m-barcodes-file mapping_Toolik.txt --m-barcodes-column "BarcodeSequence" --o-per-sample-sequences demux --o-error-correction-details demux_details --p-rev-comp-mapping-barcodes --p-golay-error-correction False

gwarmstrong · December 18, 2019, 11:32pm

Awesome. Can you post the first few lines of the mapping_Toolik.txt file?

GrSeq · December 18, 2019, 11:43pm

Yes - here it is.

#SampleID	BarcodeSequence	LinkerPrimerSequence	ReversePrimer	Horizon	Block	Season	IC	ICPairedsample	Location	Coverclass	Project	Description
1	TCTGAGGTTGCC	GTGTGCCAGCMGCCGCGGTAA	CCGGACTACHVGGGTWTCTAAT	O1	1	July	NO	1	Toolik	NA	Toolik	NA
2	TCCAACTGCAGA	GTGTGCCAGCMGCCGCGGTAA	CCGGACTACHVGGGTWTCTAAT	O2	1	July	NO	2	Toolik	NA	Toolik	NA
3	ATAATTGCCGAG	GTGTGCCAGCMGCCGCGGTAA	CCGGACTACHVGGGTWTCTAAT	min	1	July	NO	3	Toolik	NA	Toolik	NA

gwarmstrong · December 19, 2019, 12:17am

There it is!

So the reason you are getting the error above is that the entries in the #SampleID column are integers.

Are you sure these match the #SampleID's in your sample metadata?

If so, then I think you can work around the error by creating a new mapping file with each #SampleID remapped to something with non-numeric characters, e.g., ['sample-1', 'sample-2', 'sample-3', ..., ]. You will need to do this both in the mapping file and the sample metadata file.

If these do not match your sample metadata, you may need to do some more investigating to determine how these sample ID's correspond to the ones in those fastq headings you showed above.

GrSeq · December 19, 2019, 12:21am

Excellent - thanks so much. I will test this out tomorrow and let you know how it goes.

GrSeq · December 19, 2019, 8:14pm

It worked - @gwarmstrong thanks so much!