min() arg is an empty sequence error for qiime quality-control evaluate-composition

Hello everyone! I've already looked at a previous forum post which is from a user with the same error.

I used the Gut Microbiome Zymobiomics Standard for the mock communities.

In the previous post @Nicholas_Bokulich said that the error comes from expected and observed taxonomy not matching.

Based on this, I've copied the taxonomy from the expected features from my taxa_barplot.qzv (433.9 KB) that I generated from this data into the expected taxonomy file. In theory, the observed taxonomy and expected taxonomy should be exact matches. I've done this both to the species level and to the level that SILVA database can identify down to (mainly genus level for most taxa). Both result in the same error. This is the expected taxonomy file down to species level: zymo-taxonomy.tsv (2.0 KB). This is the expected taxonomy file to the genus/species level: zymo-taxonomy-genus.tsv (1.6 KB). And, this is the observed frequency table:
zymo-freq-table.qza (133.7 KB)

I can't figure out where the taxonomy file is not matching. I'm sure there's something I'm missing.

Below is the error message I received:

Traceback (most recent call last):
  File "/hpc/home/klt75/miniconda3/envs/qiime2-2023.5/lib/python3.8/site-packages/q2cli/commands.py", line 468, in __call__
    results = action(**arguments)
  File "<decorator-gen-626>", line 2, in evaluate_composition
  File "/hpc/home/klt75/miniconda3/envs/qiime2-2023.5/lib/python3.8/site-packages/qiime2/sdk/action.py", line 274, in bound_callable
    outputs = self._callable_executor_(
  File "/hpc/home/klt75/miniconda3/envs/qiime2-2023.5/lib/python3.8/site-packages/qiime2/sdk/action.py", line 558, in _callable_executor_
    ret_val = self._callable(output_dir=temp_dir, **view_args)
  File "/hpc/home/klt75/miniconda3/envs/qiime2-2023.5/lib/python3.8/site-packages/q2_quality_control/quality_control.py", line 77, in evaluate_composition
    results = _evaluate_composition(
  File "/hpc/home/klt75/miniconda3/envs/qiime2-2023.5/lib/python3.8/site-packages/q2_quality_control/_utilities.py", line 127, in _evaluate_composition
    score_plot = _pointplot_multiple_y(
  File "/hpc/home/klt75/miniconda3/envs/qiime2-2023.5/lib/python3.8/site-packages/q2_quality_control/_utilities.py", line 281, in _pointplot_multiple_y
    sns.pointplot(data=results, x=xval, y=score, ax=axes, color=color)
  File "/hpc/home/klt75/miniconda3/envs/qiime2-2023.5/lib/python3.8/site-packages/seaborn/categorical.py", line 2839, in pointplot
    plotter = _PointPlotter(x, y, hue, data, order, hue_order,
  File "/hpc/home/klt75/miniconda3/envs/qiime2-2023.5/lib/python3.8/site-packages/seaborn/categorical.py", line 1603, in __init__
    self.establish_colors(color, palette, 1)
  File "/hpc/home/klt75/miniconda3/envs/qiime2-2023.5/lib/python3.8/site-packages/seaborn/categorical.py", line 707, in establish_colors
    lum = min(light_vals) * .6
ValueError: min() arg is an empty sequence

These are the commands I used:


# ----------------- SCRIPT START -------------------- # 

# load parameters
dos2unix ./config.sh
source ./config.sh

# setting input/output variables
echo -e "setting input/output variables"  
inputDir="${WKPATH}/output/04-classify/qza"
# directory containing raw fastq.gz files 
outputDir="${WKPATH}/output/05-zymoQC"
outputDirQZA="${outputDir}/qza" 
outputDirQZV="${outputDir}/qzv" 
# directories to place results from script

# if previous output folder exists, delete it 
echo -e "checking for old folders, will remove to rerun analysis" 

if [ -d "$outputDir" ]
then 
    echo -e "Previous output folder exists, deleting now..." 
    rm -Rfv -- "$outputDir"
fi 
    # -R deletes recursively, -f ignore non-existant files, -v verbose
    # '--'' : no more flags for rm command 

# making new import folders 
echo -e "creating new output folders" 
mkdir -p "${outputDir}"/{qza,qzv}
    # -p ; make parent directories if needed

echo -e "Input directory is...$inputDir" 

echo -e "Output directories are... 
main folder: $outputDir
qza: $outputDirQZA
qzv: $outputDirQZV"

echo -e "setting up zymo reference variables..."
echo -e "$(date)"

ZYMOrefseq="/hpc/group/kimlab/Qiime2/reference/zymo-refs/zymo-seqs.fasta" 
ZYMOtax="/hpc/group/kimlab/Qiime2/reference/zymo-refs/zymo-taxonomy-genus.tsv"
echo -e "Zymo references are...
zymo ref seqs: $ZYMOrefseq
zymo taxonomy: $ZYMOtax"

echo -e "finished setting up folders and variables" 
echo -e "$(date)"

# import relative expected zymo sequences into qiime2 format
echo -e "importing expected taxonomy into qiime2 format" 
echo -e "$(date)"
biom convert \
  -i "$ZYMOtax" \
  -o "$outputDir"/expected-taxonomy.biom \
  --table-type="OTU table" \
  --to-json
  ## convert tsv into biom 
qiime tools import \
 --type "FeatureTable[RelativeFrequency]" \
 --input-path "$outputDir"/expected-taxonomy.biom \
 --input-format BIOMV100Format \
 --output-path "$outputDirQZA"/expected-taxonomy.qza

 ## import biom into rel.freq feature table

# import expected zymo sequences into qiime2 format
echo -e "importing expected sequences into qiime2 format" 
echo -e "$(date)"

qiime tools import \
  --input-path "$ZYMOrefseq" \
  --output-path "$outputDirQZA"/expected-seqs.qza \
  --type 'FeatureData[Sequence]'
  ## import fasta file into qza format

# filter out ASV table + rep-seqs to only zymo controls 
echo -e "filtering ASV and rep-seqs table to only include zymo controls" 
echo -e "$(date)"
qiime feature-table filter-samples \
    --i-table "$tableQZA" \
    --m-metadata-file "$MAPname" \
    --p-where '[control]="zymo"' \
    --o-filtered-table "$outputDir"/zymo-table.qza
    ## filter table to only contain zymo controls

qiime feature-table filter-seqs \
    --i-table "$outputDir"/zymo-table.qza \
    --i-data "$inputDir/rep-seqs.qza" \
    --o-filtered-data "$outputDirQZA"/zymo-rep-seqs.qza

# turn ASV table into a table of rel. abundance table  
echo -e "creating relative abundance table" 
echo -e "$(date)"
qiime feature-table relative-frequency \
    --i-table "$outputDir"/zymo-table.qza \
    --o-relative-frequency-table "$outputDirQZA"/zymo-freq-table.qza

# compare expected v actual frequencies 
echo -e "compare expected vs actual relative abundances and sequences" 
echo -e "$(date)"
qiime quality-control evaluate-composition \
  --i-expected-features "$outputDirQZA"/expected-taxonomy.qza \
  --i-observed-features "$outputDirQZA"/zymo-freq-table.qza \
  --o-visualization "$outputDirQZV"/eval-mock-freq-test.qzv

# compare expected sequences to actual sequences 
qiime quality-control evaluate-seqs \
  --i-query-sequences "$outputDirQZA"/zymo-rep-seqs.qza \
  --i-reference-sequences "$outputDirQZA"/expected-seqs.qza \
  --o-visualization "$outputDirQZV"/eval-mock-seqs-test.qzv

echo -e "finished zymo QC" 
echo -e "$(date)"

Hello @klterwelp,

I can see that you mentioned you fixed you problem but then deleted that post. Did you mean to delete the whole topic? If you solved the problem, there's no harm in leaving your solution so others can see.

1 Like

I fixed the error. There were several problems with my script.

Errors

  1. I didn't collapse the feature table by taxonomy prior to inputting it into the quality control evaluate-composition command.
  2. My expected taxonomy file was incorrect. While the formatting was correct, I did not have the right number of columns. I only used one column for expected taxonomy instead of one for each positive control.

Solutions

  1. Added taxa collapse column so that the feature table includes taxonomic information.
 qiime taxa collapse \
    --i-table "$outputDir"/zymo-table.qza \
    --i-taxonomy $taxQZA \
    --p-level 7 \
    --o-collapsed-table collapsed-zymo-table-L7.qza
  1. Remade the expected mock taxonomy file (2.4 KB)
    so there are an equal number of expected taxonomies as observed positive controls.

Below is the updated script in case you're curious:

# ----------------- SCRIPT START -------------------- # 

# load parameters
dos2unix ./config.sh
source ./config.sh

# setting input/output variables
echo -e "setting input/output variables"  
inputDir="${WKPATH}/output/04-classify/qza"
# directory containing raw fastq.gz files 
outputDir="${WKPATH}/output/05-zymoQC"
outputDirQZA="${outputDir}/qza" 
outputDirQZV="${outputDir}/qzv" 
# directories to place results from script

# if previous output folder exists, delete it 
echo -e "checking for old folders, will remove to rerun analysis" 

if [ -d "$outputDir" ]
then 
    echo -e "Previous output folder exists, deleting now..." 
    rm -Rfv -- "$outputDir"
fi 
    # -R deletes recursively, -f ignore non-existant files, -v verbose
    # '--'' : no more flags for rm command 

# making new import folders 
echo -e "creating new output folders" 
mkdir -p "${outputDir}"/{qza,qzv}
    # -p ; make parent directories if needed

echo -e "Input directory is...$inputDir" 

echo -e "Output directories are... 
main folder: $outputDir
qza: $outputDirQZA
qzv: $outputDirQZV"

echo -e "setting up zymo reference variables..."
echo -e "$(date)"

echo -e "Zymo references are...
zymo ref seqs: $MOCKrefseq
zymo taxonomy: $MOCKtax"

echo -e "finished setting up folders and variables" 
echo -e "$(date)"

# import relative expected zymo sequences into qiime2 format
echo -e "importing expected taxonomy into qiime2 format" 
echo -e "$(date)"
biom convert \
  -i "$MOCKtax" \
  -o "$outputDir"/expected-taxonomy.biom \
  --table-type="OTU table" \
  --to-json
  ## convert tsv into biom 
qiime tools import \
 --type "FeatureTable[RelativeFrequency]" \
 --input-path "$outputDir"/expected-taxonomy.biom \
 --input-format BIOMV100Format \
 --output-path "$outputDirQZA"/expected-taxonomy.qza

 ## import biom into rel.freq feature table

# import expected zymo sequences into qiime2 format
echo -e "importing expected sequences into qiime2 format" 
echo -e "$(date)"

qiime tools import \
  --input-path "$MOCKrefseq" \
  --output-path "$outputDirQZA"/expected-seqs.qza \
  --type 'FeatureData[Sequence]'
  ## import fasta file into qza format

# filter out ASV table + rep-seqs to only zymo controls 
echo -e "filtering ASV and rep-seqs table to only include zymo controls" 
echo -e "$(date)"
qiime feature-table filter-samples \
    --i-table "$tableQZA" \
    --m-metadata-file "$MAPname" \
    --p-where "["$controlCol"]='$mockname'" \
    --o-filtered-table "$outputDir"/zymo-table.qza
    ## filter table to only contain zymo controls
qiime feature-table filter-seqs \
    --i-table "$outputDir"/zymo-table.qza \
    --i-data "$inputDir/rep-seqs.qza" \
    --o-filtered-data "$outputDirQZA"/zymo-rep-seqs.qza

 qiime taxa collapse \
    --i-table "$outputDir"/zymo-table.qza \
    --i-taxonomy $taxQZA \
    --p-level 7 \
    --o-collapsed-table "$outputDir"/taxa-zymo-table.qza

# turn ASV table into a table of rel. abundance table  
echo -e "creating relative abundance table" 
echo -e "$(date)"
qiime feature-table relative-frequency \
    --i-table "$outputDir"/taxa-zymo-table.qza \
    --o-relative-frequency-table "$outputDirQZA"/zymo-freq-table.qza

# compare expected v actual frequencies 
echo -e "compare expected vs actual relative abundances and sequences" 
echo -e "$(date)"
qiime quality-control evaluate-composition \
  --i-expected-features "$outputDirQZA"/expected-taxonomy.qza \
  --i-observed-features "$outputDirQZA"/zymo-freq-table.qza \
  --o-visualization "$outputDirQZV"/eval-mock-freq-test.qzv

# compare expected sequences to actual sequences 
qiime quality-control evaluate-seqs \
  --i-query-sequences "$outputDirQZA"/zymo-rep-seqs.qza \
  --i-reference-sequences "$outputDirQZA"/expected-seqs.qza \
  --o-visualization "$outputDirQZV"/eval-mock-seqs-test.qzv 

echo -e "finished zymo QC" 
echo -e "$(date)"
2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.