Deblur without 16s filter

qwertz · April 29, 2018, 2:28pm

Hi everyone,

I am trying to keep all Deblur reads like the 'all.biom' file in https://github.com/biocore/deblur, but the Deblur plugin (deblur denoise-16S) outputs only reference hit reads. So I am wondering would it be possible to do that without coving the qza file back to fastq and processing in deblur workflow?

Cheers,
Xi

wasade · April 30, 2018, 5:31pm

Hi @qwertz,

We have not exposed that output from q2-deblur as we do not recommend using all.biom for subsequent analysis unless you have an explicit question to ask as it will contain non-target sequences. Right now you'd need to execute deblur directly to obtain that output. If your data are part of or uploaded to Qiita, you would be able to obtain this output directly from that resource.

Best,
Daniel

qwertz · May 1, 2018, 12:00pm

Hi @wasade,

thank you very much!

I am trying to analyse the richness of some EMP dataset and additional datasets that are not uploaded to Qitta. As far as I know the EMP paper used all sequences instead of the reference-hit sequences for doing such analysis (due to suspicion of missing sequences in the reference database) so I would like to process other datasets in a same way to facilitate comparison. Would you recommend to use all sequences in this setting? (I am a newbie)

Cheers,
Xi

wasade · May 1, 2018, 5:47pm

Hi @qwertz,

That's exciting! You may be interested in redbiom which facilitates searches and retrieval from Qiita.

The EMP only used sequences which passed the positive filter. The positive filter is incredibly permissive and only requires 60% sequence identity. At that level, I'd expect sequences to be retained even if they originated from candidate phyla not present in the reference. The intention with the filter is to keep anything that is putatively like the target (e.g., 16S v4).

Best,
Daniel

qwertz · May 3, 2018, 4:14pm

Hi @wasade,

thank you very much! Just one follow up question regarding the positive filter: the default similarity threshold sim_thresh in deblurenv seems to be 0.65, but the parameter is not mentioned in q2-deblur, do they actually use the same default parameter? Are Qiita data also filtered with 0.65 cutoff?

I am also confused about the EMP dataset because it seems like only the all.biom was kept in this file run_deblur_emp_new.sh (maybe I missed something).

Best
Xi

wasade · May 7, 2018, 6:30pm

@Luke_Thompson, can you comment regarding the use of all.biom in the EMP?

@qwertz, you're right, coverage is 60% and similarity is 65%. q2-deblur uses the same values as deblur, and Qiita relies on the values used by deblur.

Best,
Daniel

Luke_Thompson · May 9, 2018, 1:35pm

Hi @qwertz, I believe you are correct that all.biom was used for the EMP. However, note that the code that actually was used for the paper is run_deblur_emp_original.sh. The script was written and executed by Amnon Amir. If I recall correctly, the only filtering we did was to remove sOTUs found less than 25 times (total count) across all samples in the dataset.

Best,
Luke

system · June 9, 2018, 7:35pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.