Q2-deblur questions

Hello,
I used q2-deblur to clean my sequences using this command:

deblur workflow --seqs-fp name.fasta --output-dir test -t 420

and I have few questions about this.

  1. My sequences were demultiplexed and each sequence was labeled with the sample name. After the analysis, the labels disappeared. How can I change this? I would like to keep the sample names linked to each sequence.
  2. My sequences are from euks, not bacteria. Should I use a different command for euks? Not sure I totally got how deblur is working. Sorry!
  3. Can I convert the biom file that I got with deblur using qiime? I used to use Qiime 1.9 but I am new to QIIME 2.

Thank you so much for your help,

Francesca

Hi @ranocchia,
While you installed deblur through the QIIME 2 plugin, you’re not using it through QIIME 2 when you call it this way. The command that you’re running here is part of the deblur software package, which is used by q2-deblur (the QIIME 2 plugin). I realize this is a bit confusing, but the deblur package has it’s own command line interface, and that’s what you’re accessing here. To use deblur through QIIME 2, you’d call qiime deblur --help (for example).

You’re not required to use deblur through QIIME 2, but if you want to go that route, I recommend that you start by working through the QIIME 2 documentation, as there are a few concepts that you’ll need to be familiar with. The Getting Started page describes what we think is the best path through the documentation.

I’m going to leave your deblur-related questions to @wasade, who is the developer of the q2-deblur plugin as I’m not very familiar with its methods. @wasade, would you be able to help @ranocchia with the deblur-specific questions here?

Thanks for your interest in QIIME 2 @ranocchia, and sorry for the confusion!

3 Likes

Thanks, @gregcaporaso.

@ranocchia, I’m not sure what you mean by 1), can you provide an example? For 2) you most likely want to specify a different positive filtering database as, by default, it the method is attempting to retain only those sequences which appear to be 16S. So for instance, if you’re targeting 18S, you may want to use SILVA. For 3), if using deblur via q2-deblur, then there isn’t a need to import the resulting files as they are already QIIME2 artifacts. If you’re using deblur directly, you will need to use the qiime tools import command, and the semantic type of the BIOM table in this context is a FeatureTable[Frequency].

Best,
Daniel

3 Likes

Thank you @gregcaporaso and @wasade for the clarifications!

For question #1, what I meant was that my sequences (from MiSeq 300x2) are already demultiplexed and merged. I used Pandaseq to merge them and split library.py via Qiime 9.1 to demultiplex them, and obtained a fasta file with all my merged sequences that I am using as input file for deblur. In this input file the header of my sequences looks like this:

Za2.slurry_2 M02542:199:000000000-AW0BR:1:1101:14785:1215:GTGAAA orig_bc=AGAGGATT new_bc=AGAGGATT bc_diffs=0

After I use deblur, there is no header anymore only the symbol > and the sequences right after, like this:

AGAGCTAAGCTGCGGTAATTCCAGCTCTG…

Also, I noticed today that the output sequences are double of the length of what they should be. Is deblur merging them?
My sequences should be about 480 max 500 bp long, and now I get sequences 840 bp long. I am not sure what I am doing wrong. Can you help me figure this out?

For question #2, I am using now the option --pos-ref-fp $database.fasta, using our custom database. It is compatible with QIIME 1.9, is it okay for deblur as well?

I am sorry I know I have a lot of questions, but I am totally new to deblur, and I feel a little lost.

Thank you very much for your help, it is really appreciated.

Francesca

1 Like

The output sequence file from Deblur only contains the observed deblurred sequences. The output BIOM table, however, contains the associations between what sample contained what sequence, and at what frequency. I think it is this latter object which you’re interested in.

Deblur does not merge sequences, and the output sequences must exist in the input set. Are you positive the output sequence you’re finding which is 840nt long does not exist in your input dataset?

Regarding the reference, it should only be necessary to change the reference if you are using non-16S data. If that is the case, then any FASTA file should be fine.

Best,
Daniel

2 Likes

Perfect thank you!

I will try to figure out why I get 800 bp sequences in the output file. My sequences are 18S reads from Miseq 300x2 bp so they cannot be longer than 600. It is super weird. Not sure why. I am for sure doing something wrong. Anyway, thank you very much for your help. If I find out why they are so long, I’ll post it.

Thanks again

Francesca

2 Likes

Great, please let me know if I can help further!

Best,
Daniel