metagenome_pipeline.py: ValueError: No sequence ids overlap between all three of the input files.

Mike26 · October 22, 2019, 11:06am

Hello All. I update qiime2-2019.4 to qiime2-2019.7 and the picrust2 qiime plug-in using conda (in ubuntu desktop) following the tutorial. In qiime2-2019.4, picrust2 run smoothly without any error on my dataset (whole community). However, we are interested on the functions of the differentially abundant OTUS detected by Aldex in R. So I created a separate text file containing only the representative sequences of differentially abundant OTUS and the OTU abundance file, which I successfully converted to biom format and exported as qiime artifacts (.qza). The workflow run smoothly until I got an error in the qiime picrust2 custom-tree-pipeline. This is the same workflow that I used in the whole community dataset.

Based on this: Sequence ids don't overlap at metagenome pipeline step · Issue #16 · picrust/picrust2 · GitHub the error is likely caused by the differences the the sequences identifier in the repseq and OTU abundance file. I think that in my case representative sequences are not a problem because I used qiime2 dada2 plug-in and I have checked that sequence IDs in fasta and table are the same.

I attached here the files that I used and the workflow that I used. Any suggestions on how to proceed with this error will be greatky appreciatted.

Thank you very much.

(qiime2-2019.7) mmbl@mmbl:~/Desktop/picrust/spongeballs_Aldex2$ qiime picrust2 custom-tree-pipeline --i-table ballspongeOTUtableforpicrust_v2.qza --i-tree insert_ballsponge_picrust/tree.qza --output-dir spongeballs_Aldex2_picrust2 --p-threads 2 --p-hsp-method mp --p-max-nsti 2 --verbose

Running the below commands:
hsp.py -i 16S -t /tmp/tmpo7fbcumv/placed_seqs.tre -p 1 -n -o /tmp/tmpo7fbcumv/picrust2_out/16S_predicted.tsv.gz -m mp
hsp.py -i EC -t /tmp/tmpo7fbcumv/placed_seqs.tre -p 2 -o /tmp/tmpo7fbcumv/picrust2_out/EC_predicted.tsv.gz -m mp
hsp.py -i KO -t /tmp/tmpo7fbcumv/placed_seqs.tre -p 2 -o /tmp/tmpo7fbcumv/picrust2_out/KO_predicted.tsv.gz -m mp
metagenome_pipeline.py -i /tmp/tmpo7fbcumv/intable.biom -m /tmp/tmpo7fbcumv/picrust2_out/16S_predicted.tsv.gz -f /tmp/tmpo7fbcumv/picrust2_out/EC_predicted.tsv.gz -o /tmp/tmpo7fbcumv/picrust2_out/EC_metagenome_out --max_nsti 2.0

Error running this command:
metagenome_pipeline.py -i /tmp/tmpo7fbcumv/intable.biom -m /tmp/tmpo7fbcumv/picrust2_out/16S_predicted.tsv.gz -f /tmp/tmpo7fbcumv/picrust2_out/EC_predicted.tsv.gz -o /tmp/tmpo7fbcumv/picrust2_out/EC_metagenome_out --max_nsti 2.0

Standard output of failed command:
""

Standard error of failed command:
"0 of 46 ASVs were above the max NSTI cut-off of 2.0 and were removed.
Traceback (most recent call last):
File "/home/mmbl/miniconda3/envs/qiime2-2019.7/bin/metagenome_pipeline.py", line 122, in
main()
File "/home/mmbl/miniconda3/envs/qiime2-2019.7/bin/metagenome_pipeline.py", line 104, in main
skip_norm=args.skip_norm)
File "/home/mmbl/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/picrust2/metagenome_pipeline.py", line 68, in run_metagenome_pipeline
pred_marker)
File "/home/mmbl/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/picrust2/util.py", line 358, in three_df_index_overlap_sort
"input files.")
ValueError: No sequence ids overlap between all three of the input files.
"

ballspongeOTUtableforpicrust_v2.tsv (4.7 KB) ballspongeOTUtableforpicrust_v2.qza (10.4 KB) ballspongeforpicrust_v2.qza (9.3 KB)

Nicholas_Bokulich · October 23, 2019, 2:13pm

Hi @Mike26,
I am cc:ing @gmdouglas to take a look.
Thanks!

gmdouglas · October 23, 2019, 2:30pm

Hey @Mike26,

I took a look at ballspongeOTUtableforpicrust_v2.tsv and it looks like there are single-quotes around all the ASV and sample ids. I bet this is causing the error since these quotes are probably not present in the ASV FASTA file. If these quotes were added outside of QIIME2 then removing them will hopefully fix this problem.

Best,

Gavin

Mike26 · October 24, 2019, 2:44am

Hello @gmdouglas

Thank you for pointing that out. I am not sure how the ASV and sample ids got a single quote on it since I created the text file without any single quotes. Seems like it got the single quotes when I opened and saved the text file as .tsv in excel.

But yeah, removing the single-quotes works perfectly.

Thank you very much.

system · November 24, 2019, 8:45am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.