Plugin error from demux: sample not present in the demultiplexed data - Part 2

Hello again!
I am making a breand new post as the precedent (Plugin error from demux: sample not present in the demultiplexed data) was closed without a solution unfortunately.

I am facing again the same error. The solution proposed by @Mandussi_Montiel unfortunately dont't work for me. I was wondering whether @Oddant1 could help me again.

So, the problem was I was getting the error:

Plugin error from demux:
'N' is not a sample present in the demultiplexed data.
Debug info has been saved to /tmp/qiime2-q2cli-err-tbduy2ea.log

Command is:
singularity run singularity_containers/qiime2_2023.2 qiime demux filter-samples --i-demux $name.qza --m-metadata-file $name.tsv --p-where "CAST([forward sequence count] AS INT) > 500" --o-filtered-demux $name_filtered.qza --verbose

Results is:
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-pi199yjq because the default path (/home/qiime2/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Traceback (most recent call last):
File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3802, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 146, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index_class_helper.pxi", line 49, in pandas._libs.index.Int64Engine._check_type
KeyError: '34'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/site-packages/q2_demux/_filter.py", line 39, in filter_samples
forward = manifest.loc[id].forward
File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/site-packages/pandas/core/indexing.py", line 1073, in getitem
return self._getitem_axis(maybe_callable, axis=axis)
File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/site-packages/pandas/core/indexing.py", line 1312, in _getitem_axis
return self._get_label(key, axis=axis)
File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/site-packages/pandas/core/indexing.py", line 1260, in _get_label
return self.obj.xs(label, axis=axis)
File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/site-packages/pandas/core/generic.py", line 4056, in xs
loc = index.get_loc(key)
File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3804, in get_loc
raise KeyError(key) from err
KeyError: '34'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/site-packages/q2cli/commands.py", line 352, in call
results = action(**arguments)
File "", line 2, in filter_samples
File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 234, in bound_callable
outputs = self.callable_executor(scope, callable_args,
File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 381, in callable_executor
output_views = self._callable(**view_args)
File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/site-packages/q2_demux/_filter.py", line 47, in filter_samples
raise ValueError(f'{id!r} is not a sample present in the '
ValueError: '34' is not a sample present in the demultiplexed data.

Plugin error from demux:

'34' is not a sample present in the demultiplexed data.

See above for debug info.

Any idea about what is the problem?
Thanks in advance for your support!

Hello @AstroBioJack, it looks like '34' is a value that is being pulled out of the metadata as a sample-id, but there is no sample with the id '34' in the demux. I suspect what is happening is the metadata does not have the sample-ids labeled properly and instead the index of the row in the file is being used. If you look at the metadata here from the moving pictures tutorial, there is a column called "sample-id." Does your metadata contain this column with the sample-ids in it? Is the number 34 meant to be the id of one of your samples?

Hello @Oddant1!

Thank you very much for your answer.
What is strange for me is that if I re-run the same command, the value randomly changes! I have a test set of samples that goes from number 33 to 39, and same error appears with all of them in random order by re-running the command.
I generate my own metadata document with a series of command quite home made, that look as following:

ls -d -1 $input_dir/*R1.fastq > $output_dir/list1
ls -d -1 $input_dir/*R2.fastq > $output_dir/list2
cut -d "/" -f5 $output_dir/list1 > $output_dir/sample
cut -d "_" -f1 $output_dir/sample > $output_dir/sample2
paste -d "\t" $output_dir/sample2 $output_dir/list1 $output_dir/list2 > $output_dir/output.tsv
paste -d "\r" $config/header $output_dir/output.tsv > $output_dir/manifest.tsv
sed -i 's3Analysis/3/home/users/XXX/Analysis/3g' $output_dir/manifest.tsv

resulting in a document that, reading into the input folder, will be printed as follows:

sample-id forward-absolute-filepath reverse-absolute-filepath^M33 /home/users/XXX/Analysis/$output_dir/33_R1.fastq /home/users/XXX/Analysis/$output_dir/33_R2.fastq
^M34 /home/users/XXX/Analysis/$output_dir/34_R1.fastq /home/users/XXX/Analysis/$output_dir/34_R2.fastq
^M35 /home/users/XXX/Analysis/$output_dir/35_R1.fastq /home/users/XXX/Analysis/$output_dir/35_R2.fastq
^M36 /home/users/XXX/Analysis/$output_dir/36_R1.fastq /home/users/XXX/Analysis/$output_dir/36_R2.fastq
^M37 /home/users/XXX/Analysis/$output_dir/37_R1.fastq /home/users/XXX/Analysis/$output_dir/37_R2.fastq
^M38 /home/users/XXX/Analysis/$output_dir/38_R1.fastq /home/users/XXX/Analysis/$output_dir/38_R2.fastq
^M39 /home/users/XXX/Analysis/$output_dir/39_R1.fastq /home/users/XXX/Analysis/$output_dir/39_R2.fastq

(I admit it is rough, but I would like it to work with simple commands and in any situation without "external" interference)

As you can see, metadata table seemed okay... As following commands, before qiime demux filter-samples, I run qiime tools import (to get the first .qza), qiime demux summarize, and a qiime tools export to get information on number of reads per sample as suggested by tutorial. They all run without printing any problem.

Problem is, qiime demux filter-samples requires a different metadata file, which is the one printed by qiime tools export and results to be per-sample-fastq-counts.tsv (if I am correct), which by the way seems to have all the samples as follows:

sample-id forward sequence count reverse sequence count
33 81284 81284
36 57921 57921
38 57663 57663
39 51621 51621
34 50750 50750
35 47504 47504
37 40727 40727

What do you think about that?
(Sorry, I feel like I should underline that I am following Atacama soil microbiome tutorial more than Moving pictures tutorial because I have paired end sequences, and I am skipping qiime demux emp-paired and qiime demux subsample-paired because my sample are already demultiplexed and I want to work with the full dataset)

Thank you in advance for your support!

@AstroBioJack. That metadata file looks fine to me. If you look at the MANIFEST file in the .qza you are using as input to filter-samples, what do you see? You can get to the manifest by opening the artifact and going into the data directory inside of it. The MANIFEST file should contain the sample-ids, filenames, and directions for all of the samples in your .qza, so you should see something like

33, 33_R1.fastq.gz, forward
33, 33_R2.fastq.gz, reverse

for all of 33 through 39. Additionally, the referenced .fastq.gz files should all be present in the same directory as the manifest. If any of the samples that are referenced in the metadata are not present here, that would cause this issue.

1 Like

@Oddant1 thanks again.
So, I tried to look at the manifest as you described. I copy it hereby.

sample-id,filename,direction
33,33_0_L001_R1_001.fastq.gz,forward
34,34_1_L001_R1_001.fastq.gz,forward
35,35_2_L001_R1_001.fastq.gz,forward
36,36_3_L001_R1_001.fastq.gz,forward
37,37_4_L001_R1_001.fastq.gz,forward
38,38_5_L001_R1_001.fastq.gz,forward
39,39_6_L001_R1_001.fastq.gz,forward
33,33_7_L001_R2_001.fastq.gz,reverse
34,34_8_L001_R2_001.fastq.gz,reverse
35,35_9_L001_R2_001.fastq.gz,reverse
36,36_10_L001_R2_001.fastq.gz,reverse
37,37_11_L001_R2_001.fastq.gz,reverse
38,38_12_L001_R2_001.fastq.gz,reverse
39,39_13_L001_R2_001.fastq.gz,reverse

For me, it looks that the have that kind of _N_L001 . I dont know how it is generated.

UPDATE: I tried to move the manifest in the same directory of .fastq.gz files, but still not working :frowning:

As usual, the software says
Plugin error from demux:

'35' is not a sample present in the demultiplexed data.

Debug info has been saved to /tmp/qiime2-q2cli-err-h3p0829g.log

and the log file is:

Traceback (most recent call last):
File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3802, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 146, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index_class_helper.pxi", line 49, in pandas._libs.index.Int64Engine._check_type
KeyError: '35'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/site-packages/q2_demux/_filter.py", line 39, in filter_samples
forward = manifest.loc[id].forward
File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/site-packages/pandas/core/indexing.py", line 1073, in getitem
return self._getitem_axis(maybe_callable, axis=axis)
File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/site-packages/pandas/core/indexing.py", line 1312, in _getitem_axis
return self._get_label(key, axis=axis)
File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/site-packages/pandas/core/indexing.py", line 1260, in _get_label
return self.obj.xs(label, axis=axis)
File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/site-packages/pandas/core/generic.py", line 4056, in xs
loc = index.get_loc(key)
File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3804, in get_loc
raise KeyError(key) from err
KeyError: '35'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/site-packages/q2cli/commands.py", line 352, in call
results = action(**arguments)
File "", line 2, in filter_samples
File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 234, in bound_callable
outputs = self.callable_executor(scope, callable_args,
File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 381, in callable_executor
output_views = self._callable(**view_args)
File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/site-packages/q2_demux/_filter.py", line 47, in filter_samples
raise ValueError(f'{id!r} is not a sample present in the '
ValueError: '35' is not a sample present in the demultiplexed data.

UPDATE 2: I tried to rewrite the manifest "manually", like opening Excel and writing my information but still not working

@AstroBioJack, at this point, it would probably be easiest if I could see your inputs. Can you send me your .qza and your metadata? You can either post them in this thread or DM them to me if you don't want to make them publicly accessible. I should be able to figure out more precisely what's wrong if I can see them. Thank you.

@AstroBioJack Thank you for sending me the data. I have figured out what the issue is, and it is a bug on our end. Fortunately, it should be simple to work around.

You cannot use only an integer number (like 33) as an ID for a sample. When we read in your metadata, we read the sample-ids as strings. When we read in the manifest, if your sample-ids can be integers, we read them as integers. I am not sure how familiar you are with programming and how computers store data, but basically, we are storing your ids from your metadata and your ids from your manifest in two different ways causing us to throw that ValueError you are seeing because the integer 33 is not equal to the string "33".

The workaround is fortunately fairly simple. Prefix your sample-ids with an 's' or something like that. That will force the ids to always be strings both in the metadata and the manifest.

Sorry about that.

2 Likes

@Oddant1 Alright, that is a relief for me (I was worried about something being wrong with my commands!!). Thank you very much for the time you dedicated!

EDIT: I worked! Yay! Have a nice weekend :smiley:

2 Likes

We resolved the issue you were having so it shouldn't be a thing in future versions of QIIME 2.

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.