q2-shotgun bracken error

Heyy you all!

I'm trying the recent q2-shotgun library for trying to find Fungal sequences in ur Shallow Shotgun Sequencing data.

Following @Micro_Biologist tutorial (Importing Kraken2 and bracken databases into qiime2!), I downloaded the PlusPF database (since this one includes fungi information) and import it to qiime2-shotgun distribution.

Then I runned kraken2 (following the github tutorial https://github.com/caporaso-lab/q2-books/blob/main/q2-shotgun/q2-shotgun/00-tutorial.md ), apparently being this execution successful. However, when I try to run the bracken step, the following error appears:


Plugin error from moshpit:

  An error was encountered while running Bracken, (return code 1), please inspect stdout and stderr to learn more.

Debug info has been saved to /tmp/qiime2-q2cli-err-cs94efi4.log

Taking a look at the .log file, I see that there's something wrong with the Kraken2 step, since no reads were found:

cat /tmp/qiime2-q2cli-err-cs94efi4.log

Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.
Command: bracken -d /tmp/qiime2/mgorostidi/data/ab5f1fa9-9c39-4a7d-b7b4-81fd3f428998/data -i /tmp/qiime2/mgorostidi/data/24e0eb23-b553-4272-b706-d2e8d265b722/data/11326-Q-76401-0059-2019-03-08-290.report.txt -o /tmp/tmpiwgpufzu/11326-Q-76401-0059-2019-03-08-290.bracken.output.txt -w /tmp/q2-Kraken2ReportDirectoryFormat-le_95_e2/11326-Q-76401-0059-2019-03-08-290.report.txt -t 0 -r 100 -l S

Checking for Valid Options...
Running Bracken
python src/est_abundance.py -i /tmp/qiime2/mgorostidi/data/24e0eb23-b553-4272-b706-d2e8d265b722/data/11326-Q-76401-0059-2019-03-08-290.report.txt -o /tmp/tmpiwgpufzu/11326-Q-76401-0059-2019-03-08-290.bracken.output.txt -k /tmp/qiime2/mgorostidi/data/ab5f1fa9-9c39-4a7d-b7b4-81fd3f428998/data/database100mers.kmer_distrib -l S -t 0

Checking report file: /tmp/qiime2/mgorostidi/data/24e0eb23-b553-4272-b706-d2e8d265b722/data/11326-Q-76401-0059-2019-03-08-290.report.txt
Error: no reads found. Please check your Kraken report
.......

Could someone help me with this? Since I've checked the github issues and also the qiime2-forum but haven't found anything regarding this.

And more questions..

  1. This part is not included in the tutorial, but should we remove adapters from the files? (I guess yes..)
  2. What's the depth required for this kind of analysis?
  3. Having the following reads count per sample would be enough for fungal kingdom analysis?
  4. What's the difference between using the whole PlusPF db or the 16GB one?

Sample information:

  • Shallow Shotgun Sequencing
  • Sequences' quality graphs show good qualities
  • Paired-end sequencing (each 150 length)
  • demux. qzv:

Demultiplexed sequence counts summary

forward reads reverse reads
Minimum 36 36
Median 379.0 379.0
Mean 467.6875 467.6875
Maximum 2403 2403
Total 29932 29932

THANK YOU SO MUCH IN ADVANCED :smiley:

Hi @MiriamGorostidi ,

Thanks for giving q2-moshpit a try!

Have you inspected the kraken2 report as indicated in the error message? Either it really is empty for some reason, or it does not have any species-level hits. Please export the kraken2 report and inspect it and let us know what you see.

bracken requires a level specification, which is set to species level by default. If there are no species-level hits in your kraken2 report, then bracken2 is failing because there are no species-level hits for estimating abundances. So if that is the case you could add --p-level G (or whatever level makes sense based on your kraken2 report) to your estimate-bracken command to see if you can estimate results at genus-level.

Yes. That tutorial is an alpha release, and more detailed tutorials are forthcoming.

For read-based classification with kraken2 I am not sure if there is a rule of thumb... but this would be a good paper to consult if you want to know how shallow you can go with shallow shotgun sequencing:
https://journals.asm.org/doi/10.1128/msystems.00069-18

You can read more about the databases here:
https://benlangmead.github.io/aws-indexes/k2

In some cases (i.e. for collections with β€œ-8” or β€œ-16” in the name) we used the --max-db-size option to cap the size of the database produced. This makes the index smaller at the expense of some sensitivity and accuracy. In all cases we use the defaults for k-mer length, minimizer length, and minimizer spacing.

Good luck!

1 Like

Hi @Nicholas_Bokulich !

Thank you much for you rapid response :slight_smile:

How could I export the kraken2 report? I mean, you mean to convert the .qza to .qzv right? But which one would be the command? (I'm sorry, first time working with kraken2 too)

I will give a deep read to those two references you mention!

Thank you so much for everything!

Hi @MiriamGorostidi ,

Use the qiime tools export command to export to a text file.

Good luck!

Thank you @Nicholas_Bokulich !

Already checked and you were right! All the kraken reports have the following:

100.00 667 667 0 0 U 0 unclassified

What you suggest me to do?

I mean, taking into account that the PlusFP db includes protozoa and fungi kingdoms, this could mean that there is not any fungi or protozoa in my samples? Or should I try other approaches?

Thank you!

Yes it looks that way. You could try a different database to see if you get any other hits to other organisms.

Or for a quick check you could take some of the raw reads (before inputting to QIIME 2 or doing any QC) and just try BLASTing a few to see what's there.

Let me know what you find. Good luck!

Hi @Nicholas_Bokulich !

I've Blast a couple of them and found bacterias at least, so I will try the pipeline again with another database.

Actually, I already tried it and got the following error: (I mean, everything went correct the first time and now...) Do you know what that problems about space refers to?

Traceback (most recent call last):
File "/pool0/home/mgorostidi/miniconda3/envs/qiime2-shotgun-2023.9/lib/python3.8/site-packages/setuptools/_distutils/file_util.py", line 58, in _copy_file_contents
fdst.write(buf)
OSError: [Errno 28] No space left on device

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/pool0/home/mgorostidi/miniconda3/envs/qiime2-shotgun-2023.9/lib/python3.8/site-packages/qiime2/core/archive/archiver.py", line 409, in from_data
Format.write(rec, type, format, data_initializer,
File "/pool0/home/mgorostidi/miniconda3/envs/qiime2-shotgun-2023.9/lib/python3.8/site-packages/qiime2/core/archive/format/v5.py", line 20, in write
super().write(archive_record, type, format, data_initializer,
File "/pool0/home/mgorostidi/miniconda3/envs/qiime2-shotgun-2023.9/lib/python3.8/site-packages/qiime2/core/archive/format/v1.py", line 18, in write
super().write(archive_record, type, format, data_initializer,
File "/pool0/home/mgorostidi/miniconda3/envs/qiime2-shotgun-2023.9/lib/python3.8/site-packages/qiime2/core/archive/format/v0.py", line 62, in write
data_initializer(data_dir)
File "/pool0/home/mgorostidi/miniconda3/envs/qiime2-shotgun-2023.9/lib/python3.8/site-packages/qiime2/core/path.py", line 45, in _move_or_copy
return self._copy_dir_or_file(other)
File "/pool0/home/mgorostidi/miniconda3/envs/qiime2-shotgun-2023.9/lib/python3.8/site-packages/qiime2/core/path.py", line 33, in _copy_dir_or_file
return distutils.dir_util.copy_tree(str(self), str(other))
File "/pool0/home/mgorostidi/miniconda3/envs/qiime2-shotgun-2023.9/lib/python3.8/site-packages/setuptools/_distutils/dir_util.py", line 185, in copy_tree
copy_file(
File "/pool0/home/mgorostidi/miniconda3/envs/qiime2-shotgun-2023.9/lib/python3.8/site-packages/setuptools/_distutils/file_util.py", line 163, in copy_file
_copy_file_contents(src, dst)
File "/pool0/home/mgorostidi/miniconda3/envs/qiime2-shotgun-2023.9/lib/python3.8/site-packages/setuptools/_distutils/file_util.py", line 60, in _copy_file_contents
raise DistutilsFileError(
distutils.errors.DistutilsFileError: could not write to '/tmp/qiime2/mgorostidi/processes/1889-1707433229.17@mgorostidi/4ad844a0-0145-4f04-b916-281545965f1c.7349497406739894442/4ad844a0-0145-4f04-b916-281545965f1c/data/hash.k2d': No space left on device

During handling of the above exception, another exception occurred:

OSError: [Errno 28] No space left on device

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/pool0/home/mgorostidi/miniconda3/envs/qiime2-shotgun-2023.9/lib/python3.8/site-packages/q2cli/builtin/tools.py", line 267, in import_data
artifact = qiime2.sdk.Artifact.import_data(type, input_path,
File "/pool0/home/mgorostidi/miniconda3/envs/qiime2-shotgun-2023.9/lib/python3.8/site-packages/qiime2/sdk/result.py", line 329, in import_data
return cls.from_view(type, view, view_type, provenance_capture,
File "/pool0/home/mgorostidi/miniconda3/envs/qiime2-shotgun-2023.9/lib/python3.8/site-packages/qiime2/sdk/result.py", line 364, in _from_view
artifact._archiver = archive.Archiver.from_data(
File "/pool0/home/mgorostidi/miniconda3/envs/qiime2-shotgun-2023.9/lib/python3.8/site-packages/qiime2/core/archive/archiver.py", line 421, in from_data
cls._destroy_temp_path(uuid)
File "/pool0/home/mgorostidi/miniconda3/envs/qiime2-shotgun-2023.9/lib/python3.8/site-packages/qiime2/core/archive/archiver.py", line 301, in _destroy_temp_path
cache.process_pool.remove(str(process_alias))
File "/pool0/home/mgorostidi/miniconda3/envs/qiime2-shotgun-2023.9/lib/python3.8/site-packages/qiime2/core/cache.py", line 1637, in remove
with self.cache.lock:
File "/pool0/home/mgorostidi/miniconda3/envs/qiime2-shotgun-2023.9/lib/python3.8/site-packages/qiime2/core/cache.py", line 292, in enter
self.flufl_lock.lock()
File "/pool0/home/mgorostidi/miniconda3/envs/qiime2-shotgun-2023.9/lib/python3.8/site-packages/flufl/lock/_lockfile.py", line 328, in lock
self._write()
File "/pool0/home/mgorostidi/miniconda3/envs/qiime2-shotgun-2023.9/lib/python3.8/site-packages/flufl/lock/_lockfile.py", line 493, in _write
fp.write(self._claimfile)
OSError: [Errno 28] No space left on device

An unexpected error has occurred:

[Errno 28] No space left on device

See above for debug info.

Hi @MiriamGorostidi ,

This means that you ran out of disk space on your computer (or wherever this is running). You will need to delete some files to make space, or find another computer with more disk space.

Good luck!

Hii @Nicholas_Bokulich !

I know what was happening.

First, the guy that sent me the fastq files sent the incorrect ones, so of course, there was not any classification made in those ones.

Second, the server where I am running the analysis had a temp folder with no enough space, so this is why the error regarding disk space appeared. Once the tech support redirect the analysis to a temp folder with more space I have never had that error again.

Thank ou so much for your help :slight_smile:

3 Likes

Hi again @Nicholas_Bokulich !

I'm back since I have tried to analyze my samples using pluspf and standard reference databases from Index zone by BenLangmead

However, the results were not what we were expecting. Since I know, this exact samples were already analyzed some years ago, using SHOGUN and, at least, the bacterial information was more than I am obtaining. Should I give SHOGUN a try? My goal is to try to find fungal information in shallow shotgun..

Thank you!

Hi @MiriamGorostidi ,

You have what's known as "Segal's Law":

You are comparing two different methods and see a difference, but you do not actually know the true composition as you are analyzing a sample with unknown composition (I assume, based on your descriptions). The only way to tell which is giving a more accurate answer would be to test these using samples of known composition, e.g., a simulated or mock community.

Good luck!

1 Like

Hi @Nicholas_Bokulich !

No sorry, I didn't explain myself correctly.

I have some metagenomics files, where I want to see if there is any fungi read. Since it is my first time doing this kind of analysis, I gave q2-shotgun a try. I directly used PlusPF database, but every taxonomy classification was 100% unclassified. At this point I had two options: 1) assume that my samples don't have any fungi and 2) try the same pipeline using the Standard database, to confirm that the pipeline I wrote was ok and it found Bacterial reads.

However, the results for this Standard db were not so good, I mean, there was almost no bacterial reads and the % of unclassified was really high, almost 100% too.

What I didn't know is that these exact samples were already analyzed, years ago, using SHOGUN, and they did find many more bacterial results.

So at this point, I don't know if I am doing something wrong and not using the q2-shotgun plugin correctly, if it is due to Kraken2 profiler and I should try another profiler, if I should use SHOGUN and replicate at least the bacterial analysis and then try with the fungal one (but I am finding it quite hard to set up... There are almost not examples of how to use it... And do you know if SHOGUN could be use for fungal analysis?)

If you have any other idea I'm glad to hear from you :smiley:

Thank you so much again!

Best,

Miriam

Hi @MiriamGorostidi,

just to be sure we are on the same page, could you please share the exact command you ran for both, Kraken2 classification and Bracken estimation? Thanks!

Cheers,
Michal

Hi @misialq

Yes, of course, here is the code:

CONFIDENCE_PERCENTAGE=0.60
MINIMUM_BASE_QUALITY=20
THREADS=10

echo "Running Kraken2 classification..."
qiime moshpit classify-kraken2 \
     --i-seqs demux.qza \
     --i-kraken2-db "$SHOTGUN_FOLDER/kraken2-databases/$REF_DATABASE_NAME/krakendb_$REF_DATABASE_NAME.qza" \
     --p-threads $THREADS \
     --p-confidence $CONFIDENCE_PERCENTAGE \
     --p-minimum-base-quality $MINIMUM_BASE_QUALITY \
     --output-dir $SHOTGUN_FOLDER/results/$ANALYSIS_NAME \
     --p-report-minimizer-data

READ_LENGTH=150
echo "Running Bracken estimation..."
qiime moshpit estimate-bracken \
     --i-bracken-db "$SHOTGUN_FOLDER/kraken2-databases/$REF_DATABASE_NAME/brackendb_$REF_DATABASE_NAME.qza" \
     --p-read-len $READ_LENGTH \
     --i-kraken-reports $SHOTGUN_FOLDER/results/$ANALYSIS_NAME/reports.qza \
     --o-reports $SHOTGUN_FOLDER/results/$ANALYSIS_NAME/bracken-reports.qza \
     --o-taxonomy $SHOTGUN_FOLDER/results/$ANALYSIS_NAME/taxonomy-bracken.qza \
     --o-table $SHOTGUN_FOLDER/results/$ANALYSIS_NAME/table-bracken.qza

However, Brakcen has always given errors, since almost all the results from Kraken2 were 100% unclassified.
(Taking advantage of this comment I would like to ask something: Imagine that I have 10 samples and 8 of them have actually results, but 2 don't (these two could have 100% unclassified results). Would Bracken run correctly? Or is it necessary to remove those that don't have results?)

** I downloaded and created the kraken databases following the steps from the tutorial I mentioned above:

Thank youuu in advance for your help!

Hi @MiriamGorostidi,

Thanks for sharing those! Your commands seem to be ok - I'm not sure what's going on. I see you did not run the classify-kraken2 command in the verbose mode - could you by any chance re-run it with the --verbose flag and share the log? I'm curious to see what Kraken is saying while it's running... Once you've done that, do you think you could also share the reports qza file (if it's not too large)?

Thanks!

Michal

Hi @misialq !

Yes of course, I am already running it with --verbose. However, where is the .log file saved? How can I read it?

Thank you again!

Hi @MiriamGorostidi ,

When running in --verbose mode messages will just be written directly into the terminal and are not saved anywhere, but you can read them directly there.

1 Like

Hii!!

Here's part of the output that appears in the terminal: (It is almost the same for all the samples.. 0 sequences classify):

Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: kraken2 --threads 10 --confidence 0.6 --minimum-base-quality 20 --minimum-hit-groups 2 --report-minimizer-data --db /pool0/home/mgorostidi/tmpdirs/qiime2/mgorostidi/data/b2987de0-d162-427d-b2f0-a59b08fc60c0/data --paired --report /pool0/home/mgorostidi/tmpdirs/q2-Kraken2ReportDirectoryFormat-p0sye63s/11326-Q-76401-0036-2018-02-10.report.txt --output /pool0/home/mgorostidi/tmpdirs/q2-Kraken2OutputDirectoryFormat-zm788ojt/11326-Q-76401-0036-2018-02-10.output.txt /pool0/home/mgorostidi/tmpdirs/qiime2/mgorostidi/data/48ac24f2-5421-4327-8efe-3f4bf47cefeb/data/11326-Q-76401-0036-2018-02-10_38_L001_R1_001.fastq.gz /pool0/home/mgorostidi/tmpdirs/qiime2/mgorostidi/data/48ac24f2-5421-4327-8efe-3f4bf47cefeb/data/11326-Q-76401-0036-2018-02-10_157_L001_R2_001.fastq.gz

Loading database information... done.
188636 sequences (55.78 Mbp) processed in 1.188s (9525.0 Kseq/m, 2816.73 Mbp/m).
0 sequences classified (0.00%)
188636 sequences unclassified (100.00%)

Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: kraken2 --threads 10 --confidence 0.6 --minimum-base-quality 20 --minimum-hit-groups 2 --report-minimizer-data --db /pool0/home/mgorostidi/tmpdirs/qiime2/mgorostidi/data/b2987de0-d162-427d-b2f0-a59b08fc60c0/data --paired --report /pool0/home/mgorostidi/tmpdirs/q2-Kraken2ReportDirectoryFormat-p0sye63s/11326-Q-76401-0036-2018-02-12.report.txt --output /pool0/home/mgorostidi/tmpdirs/q2-Kraken2OutputDirectoryFormat-zm788ojt/11326-Q-76401-0036-2018-02-12.output.txt /pool0/home/mgorostidi/tmpdirs/qiime2/mgorostidi/data/48ac24f2-5421-4327-8efe-3f4bf47cefeb/data/11326-Q-76401-0036-2018-02-12_39_L001_R1_001.fastq.gz /pool0/home/mgorostidi/tmpdirs/qiime2/mgorostidi/data/48ac24f2-5421-4327-8efe-3f4bf47cefeb/data/11326-Q-76401-0036-2018-02-12_158_L001_R2_001.fastq.gz

Loading database information... done.
415307 sequences (123.18 Mbp) processed in 2.423s (10284.0 Kseq/m, 3050.29 Mbp/m).
0 sequences classified (0.00%)
415307 sequences unclassified (100.00%)

Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: kraken2 --threads 10 --confidence 0.6 --minimum-base-quality 20 --minimum-hit-groups 2 --report-minimizer-data --db /pool0/home/mgorostidi/tmpdirs/qiime2/mgorostidi/data/b2987de0-d162-427d-b2f0-a59b08fc60c0/data --paired --report /pool0/home/mgorostidi/tmpdirs/q2-Kraken2ReportDirectoryFormat-p0sye63s/11326-Q-76401-0037-2018-03-12.report.txt --output /pool0/home/mgorostidi/tmpdirs/q2-Kraken2OutputDirectoryFormat-zm788ojt/11326-Q-76401-0037-2018-03-12.output.txt /pool0/home/mgorostidi/tmpdirs/qiime2/mgorostidi/data/48ac24f2-5421-4327-8efe-3f4bf47cefeb/data/11326-Q-76401-0037-2018-03-12_40_L001_R1_001.fastq.gz /pool0/home/mgorostidi/tmpdirs/qiime2/mgorostidi/data/48ac24f2-5421-4327-8efe-3f4bf47cefeb/data/11326-Q-76401-0037-2018-03-12_159_L001_R2_001.fastq.gz

Loading database information... done.
482549 sequences (140.00 Mbp) processed in 2.685s (10784.3 Kseq/m, 3128.71 Mbp/m).
2 sequences classified (0.00%)
482547 sequences unclassified (100.00%)

Hi @MiriamGorostidi,
Sorry for the delay!
Our developers are currently hosting a workshop but they will respond as soon as possible!
Thank you for your patience.

1 Like