Deblur - no sequences hitting the reference in mock community

cathb · June 13, 2018, 2:53pm

Hi,

I'm using deblur on some 16S amplicon sequences. They are paired end, and I have gone through the following steps: trim adaptors and primers using cutadapt --> merge read pairs using vsearch join-pairs --> deblur.

After the deblur step, most of my samples look fine - i.e. there are a couple of hundred deblur unique seqs that hit the reference database, except for my mock community sample. Here I have 19 sequences coming our from the deblur algorithm, but none are hitting the reference database.

I checked the quality of the merged reads and that looks good (mostly q35 and above across the read), I've checked the sequence lengths and they are mostly 250 and above (only 50 are shorter).

I've previously used this workflow with the same mock community and had no problems, so was worried I had mislabelled the sample, but I extracted the merged reads for the mock community and blasted them against a database of the species the mock community contains, and 99.9% of sequences have 97% similarity or higher to one of these sequences.

I'm not sure how to trouble shoot this - is there a way to get the deblured seqs before the stage of checking against the reference database?

Here is the output for that sample from the deblur log:

INFO(47077780597696)2018-06-13 09:58:27,691:--------------------------------------------------------
INFO(47077780597696)2018-06-13 09:58:27,691:launch_workflow for file /tmp/qiime2-archive-ndrxwqdx/92a4a30a-82b4-4d81-88c0-199a895f127d/data/HMP_mock_2_1_L001_R1_001.fastq.gz
INFO(47077780597696)2018-06-13 09:58:47,836:dereplicate seqs file /tmp/tmp183f0y2y/deblur_working_dir/HMP_mock_2_1_L001_R1_001.fastq.gz.trim
INFO(47077780597696)2018-06-13 09:58:47,914:remove_artifacts_seqs file /tmp/tmp183f0y2y/deblur_working_dir/HMP_mock_2_1_L001_R1_001.fastq.gz.trim.derep
INFO(47077780597696)2018-06-13 09:58:48,484:total sequences 1096, passing sequences 1096, failing sequences 0
INFO(47077780597696)2018-06-13 09:58:48,484:multiple_sequence_alignment seqs file /tmp/tmp183f0y2y/deblur_working_dir/HMP_mock_2_1_L001_R1_001.fastq.gz.trim.derep.no_artifacts
INFO(47077780597696)2018-06-13 09:58:51,314:deblurring 1096 sequences
INFO(47077780597696)2018-06-13 09:58:51,398:19 unique sequences left following deblurring
INFO(47077780597696)2018-06-13 09:58:51,399:remove_chimeras_denovo_from_seqs seqs file /tmp/tmp183f0y2y/deblur_working_dir/HMP_mock_2_1_L001_R1_001.fastq.gz.trim.derep.no_artifacts.msa.deblurto working dir /tmp/tmp183f0y2y/deblur_working_dir
INFO(47077780597696)2018-06-13 09:58:51,433:finished processing file

And here is the command I used for the deblur step

qiime deblur denoise-16S --i-demultiplexed-seqs merged_seqs.qza --p-trim-length 250 --p-sample-stats --o-representative-sequences rep-seqs-deblur.qza --o-table table-deblur.qza --o-stats deblur-stats.qza

And finally, the summary from the stats table for the mock sample and a couple of others for comparison.

Any advice on how to troubleshoot what is happening here would be much appreciated.

Thanks!

Cath

thermokarst · June 15, 2018, 1:15pm

Hi there @cathb!

Hmm - I'll be honest, I am not sure what is going on here, either - pinging @wasade for some advice.

You won't be able to get those files easily through q2-deblur, but, you could re-run the command using deblur, directly. Deblur is already available in your Q2 environment. That is a bit of a pain, because you would need to export your demultiplexed sequences first. Sorry I don't have a better answer for you at the moment. Stay tuned! :qiime2:

wasade · June 15, 2018, 7:10pm

Thanks, @thermokarst.

@cathb, I'm also not sure yet what's going on. It is possible to disable the "positive" filter when running deblur directly (i.e., not through q2-deblur).

Would it be possible to share merged_seqs.qza with me (or at least just the data for the mock sample)? I'd be curious to investigate the sequences.

Best,
Daniel

cathb · June 16, 2018, 1:06am

Hi @thermokasrt and @wasade,

Thanks for your responses. As an update, when I look at the feature table, there are still 19 different features for that mock community sample. So even though it looks like nothing is passing filter from the deblur summary table, they are still in the feature table after all.

I used an inappropriate sample name when demultiplexing outside of qiime2 (had underscores - HMP_mock2) - could this cause any funny behaviour?

In any case, I still have the sequences in the feature table, so I assume it should be fine to move on with the analysis. Thanks for your help.

Cath

cathb · June 18, 2018, 12:36am

Apologies, forgot to attached the merged_seqs.qza file. I've linked to the file below - you'll need to enter the password "password" to access the file.

https://cloudstor.aarnet.edu.au/plus/s/4wCQjNuiQNp8smU

thermokarst · July 12, 2018, 3:28pm

Hey there @cathb - sorry this took so long to get back to you on! I think you might've found a bug!

It looks like the underscores in the sample IDs are breaking some part of the pipeline --- if I reprocess with modified sample IDs:

59%20AM

Looks like with a modified sample ID the reads all hit the reference. I opened up an issue for this bug here.

In the meantime, it looks like you are good to go with underscore-free sample IDs.

Thanks for being patient! :qiime2:

PS - thanks @wasade for the suggestion of looking into the sample IDs!

cathb · July 13, 2018, 4:48pm

Hi @thermokarst - thanks so much for tracking down the issue! Much appreciated.

Cath

system · August 13, 2018, 10:56pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.