Deblurring failed for specific file

Hi all!

I am working with demultiplexed 16S Illumina paired end files. After filtering by quality-filter q-score-joined , I am now deblurring them. The process hasn't finished yet but looking at the log, I can see that it failed to deblur the first file (the others are being deblured properly so far).
I am attaching an extract of the log where file AD12 failed, but AD9 was ok . Also the file AD12_2_L001_R1_001.fastq.gz.trim.derep.no_artifacts.msa is empty. Will I get rep seqs anyway ? or should I terminate the process? How can I fix this error in AD12? thank you very much for your help!

INFO(140294221977408)2018-03-08 09:46:18,286:*************************
INFO(140294221977408)2018-03-08 09:46:18,286:deblurring started
WARNING(140294221977408)2018-03-08 09:46:18,286:deblur version 1.0.3 workflow started on /tmp/qiime2-archive-s1dgm_48/73f122f8-fd87-44fb-b921-7cce1b19a361/data
WARNING(140294221977408)2018-03-08 09:46:18,286:parameters: {'pos_ref_db_fp': (), 'logger': <logging.Logger object at 0x7f98a30ddd68>, 'pos_ref_fp': (), 'seqs_fp': '/tmp/qiime2-archive-s1dgm_48/73f122f8-fd87-44fb-b921-7cce1b19a361/data', 'mean_error': 0.005, 'log_level': 2, 'log_file': '/home/mqbpqoa4/qiime2-afon-goch/deblur.log', 'output_dir': '/tmp/tmpw14o6ewd', 'threads_per_sample': 1, 'neg_ref_db_fp': (), 'min_reads': 10, 'trim_length': 460, 'is_worker_thread': None, 'indel_prob': 0.01, 'keep_tmp_files': True, 'jobs_to_start': 1, 'left_trim_length': 0, 'neg_ref_fp': (), 'indel_max': 3, 'error_dist': [1, 0.06, 0.02, 0.02, 0.01, 0.005, 0.005, 0.005, 0.001, 0.001, 0.001, 0.0005], 'overwrite': True, 'min_size': 2}
INFO(140294221977408)2018-03-08 09:46:18,286:error_dist is : [1, 0.06, 0.02, 0.02, 0.01, 0.005, 0.005, 0.005, 0.001, 0.001, 0.001, 0.0005]
INFO(140294221977408)2018-03-08 09:46:18,286:deblur main program started
INFO(140294221977408)2018-03-08 09:46:18,286:processing directory /tmp/qiime2-archive-s1dgm_48/73f122f8-fd87-44fb-b921-7cce1b19a361/data
INFO(140294221977408)2018-03-08 09:46:18,287:building negative db sortmerna index files
INFO(140294221977408)2018-03-08 09:46:18,287:build_index_sortmerna files ['/home/mqbpqoa4/miniconda2/envs/qiime2-2018.2/lib/python3.5/site-packages/deblur/support_files/artifacts.fa'] to dir /tmp/tmpw14o6ewd/deblur_working_dir
INFO(140294221977408)2018-03-08 09:46:19,717:building positive db sortmerna index files
INFO(140294221977408)2018-03-08 09:46:19,718:build_index_sortmerna files ['/home/mqbpqoa4/miniconda2/envs/qiime2-2018.2/lib/python3.5/site-packages/deblur/support_files/88_otus.fasta'] to dir /tmp/tmpw14o6ewd/deblur_working_dir
INFO(140294221977408)2018-03-08 09:47:04,656:processing per sample fasta files
INFO(140294221977408)2018-03-08 09:47:04,656:--------------------------------------------------------
INFO(140294221977408)2018-03-08 09:47:04,656:launch_workflow for file /tmp/qiime2-archive-s1dgm_48/73f122f8-fd87-44fb-b921-7cce1b19a361/data/AD12_2_L001_R1_001.fastq.gz
INFO(140294221977408)2018-03-08 09:56:03,807:dereplicate seqs file /tmp/tmpw14o6ewd/deblur_working_dir/AD12_2_L001_R1_001.fastq.gz.trim
INFO(140294221977408)2018-03-08 09:56:05,884:remove_artifacts_seqs file /tmp/tmpw14o6ewd/deblur_working_dir/AD12_2_L001_R1_001.fastq.gz.trim.derep
INFO(140294221977408)2018-03-08 09:56:10,989:total sequences 61636, passing sequences 61636, failing sequences 0
INFO(140294221977408)2018-03-08 09:56:10,989:multiple_sequence_alignment seqs file /tmp/tmpw14o6ewd/deblur_working_dir/AD12_2_L001_R1_001.fastq.gz.trim.derep.no_artifacts
INFO(140294221977408)2018-03-08 10:54:29,247:msa failed for file /tmp/tmpw14o6ewd/deblur_working_dir/AD12_2_L001_R1_001.fastq.gz.trim.derep.no_artifacts (maybe only 1 read?)
WARNING(140294221977408)2018-03-08 10:54:31,985:msa failed. aborting
WARNING(140294221977408)2018-03-08 10:54:32,122: failed for file /tmp/qiime2-archive-s1dgm_48/73f122f8-fd87-44fb-b921-7cce1b19a361/data/AD12_2_L001_R1_001.fastq.gz
INFO(140294221977408)2018-03-08 10:54:32,558:--------------------------------------------------------
INFO(140294221977408)2018-03-08 10:54:32,558:launch_workflow for file /tmp/qiime2-archive-s1dgm_48/73f122f8-fd87-44fb-b921-7cce1b19a361/data/AD9_10_L001_R1_001.fastq.gz
INFO(140294221977408)2018-03-08 10:56:29,008:dereplicate seqs file /tmp/tmpw14o6ewd/deblur_working_dir/AD9_10_L001_R1_001.fastq.gz.trim
INFO(140294221977408)2018-03-08 10:56:29,653:remove_artifacts_seqs file /tmp/tmpw14o6ewd/deblur_working_dir/AD9_10_L001_R1_001.fastq.gz.trim.derep
INFO(140294221977408)2018-03-08 10:56:31,618:total sequences 14661, passing sequences 14661, failing sequences 0
INFO(140294221977408)2018-03-08 10:56:31,619:multiple_sequence_alignment seqs file /tmp/tmpw14o6ewd/deblur_working_dir/AD9_10_L001_R1_001.fastq.gz.trim.derep.no_artifacts
INFO(140294221977408)2018-03-08 11:03:30,168:deblurring 14661 sequences
INFO(140294221977408)2018-03-08 11:09:03,676:6094 unique sequences left following deblurring
INFO(140294221977408)2018-03-08 11:09:03,825:remove_chimeras_denovo_from_seqs seqs file /tmp/tmpw14o6ewd/deblur_working_dir/AD9_10_L001_R1_001.fastq.gz.trim.derep.no_artifacts.msa.deblurto working dir /tmp/tmpw14o6ewd/deblur_working_dir
INFO(140294221977408)2018-03-08 11:09:16,443:finished processing file
INFO(140294221977408)2018-03-08 11:09:16,462:--------------------------------------------------------

Hi @Oscar,

Is it possible AD12 is full of artifacts? I'd let the process complete and then see how things are.

Best,
Daniel

1 Like

I tried to complete the process but after deblurring 6 out of the 12 files, it got stuck. It looks like it is still running in the terminal but the log hasn't been actualized in 24 hours, and deblur doesn't show up anymore when I use the top command. I guess this is a different topic but any suggestions? It is the third time this has happened.

Okay, so some of the Deblur processes are still running according to top? Behind the scenes, there will be various calls to programs like mafft, sortmerna and vsearch which when running should show up in top.

When you have a moment, would you be able to describe any upstream processing that was performed?

For runtime, I've previously executed Deblur over the Yatsunenko et al 2012 dataset which has about 2 billion sequences over ~500 samples, and IIRC we had it run in less than a day (using a few threads, but I don't recall specifics off hand). So I am a bit surprised to hear that the execution time is so high, and suggests there may be something unexpected with the inputs. Would you be able to try a conservative trim length of, say, 100nt?

Best,
Daniel

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.