Hi,
I'm using deblur on some 16S amplicon sequences. They are paired end, and I have gone through the following steps: trim adaptors and primers using cutadapt --> merge read pairs using vsearch join-pairs --> deblur.
After the deblur step, most of my samples look fine - i.e. there are a couple of hundred deblur unique seqs that hit the reference database, except for my mock community sample. Here I have 19 sequences coming our from the deblur algorithm, but none are hitting the reference database.
I checked the quality of the merged reads and that looks good (mostly q35 and above across the read), I've checked the sequence lengths and they are mostly 250 and above (only 50 are shorter).
I've previously used this workflow with the same mock community and had no problems, so was worried I had mislabelled the sample, but I extracted the merged reads for the mock community and blasted them against a database of the species the mock community contains, and 99.9% of sequences have 97% similarity or higher to one of these sequences.
I'm not sure how to trouble shoot this - is there a way to get the deblured seqs before the stage of checking against the reference database?
Here is the output for that sample from the deblur log:
INFO(47077780597696)2018-06-13 09:58:27,691:--------------------------------------------------------
INFO(47077780597696)2018-06-13 09:58:27,691:launch_workflow for file /tmp/qiime2-archive-ndrxwqdx/92a4a30a-82b4-4d81-88c0-199a895f127d/data/HMP_mock_2_1_L001_R1_001.fastq.gz
INFO(47077780597696)2018-06-13 09:58:47,836:dereplicate seqs file /tmp/tmp183f0y2y/deblur_working_dir/HMP_mock_2_1_L001_R1_001.fastq.gz.trim
INFO(47077780597696)2018-06-13 09:58:47,914:remove_artifacts_seqs file /tmp/tmp183f0y2y/deblur_working_dir/HMP_mock_2_1_L001_R1_001.fastq.gz.trim.derep
INFO(47077780597696)2018-06-13 09:58:48,484:total sequences 1096, passing sequences 1096, failing sequences 0
INFO(47077780597696)2018-06-13 09:58:48,484:multiple_sequence_alignment seqs file /tmp/tmp183f0y2y/deblur_working_dir/HMP_mock_2_1_L001_R1_001.fastq.gz.trim.derep.no_artifacts
INFO(47077780597696)2018-06-13 09:58:51,314:deblurring 1096 sequences
INFO(47077780597696)2018-06-13 09:58:51,398:19 unique sequences left following deblurring
INFO(47077780597696)2018-06-13 09:58:51,399:remove_chimeras_denovo_from_seqs seqs file /tmp/tmp183f0y2y/deblur_working_dir/HMP_mock_2_1_L001_R1_001.fastq.gz.trim.derep.no_artifacts.msa.deblurto working dir /tmp/tmp183f0y2y/deblur_working_dir
INFO(47077780597696)2018-06-13 09:58:51,433:finished processing file
And here is the command I used for the deblur step
qiime deblur denoise-16S --i-demultiplexed-seqs merged_seqs.qza --p-trim-length 250 --p-sample-stats --o-representative-sequences rep-seqs-deblur.qza --o-table table-deblur.qza --o-stats deblur-stats.qza
And finally, the summary from the stats table for the mock sample and a couple of others for comparison.
Any advice on how to troubleshoot what is happening here would be much appreciated.
Thanks!
Cath