Switching from QIIME 1 to QIIME 2 with MiSeq Data and using Deblur

Dear Whom It May Concern,

I am switching from QIIME 1 to 2 and had some questions about the importing and dowstream analysis of my data. From my sequencing company I have multiple MiSeq 16S V4 runs, some have the same barcodes due to being on multiple runs. My raw data is a file that contains R1 and R2 sequences for each sample, the primer sequences are already removed, but I do not have the barcodes file required for the import tutorial. I was also hoping to use Deblur in downsteam analysis, but read it only works with single-paired data on another topic thread? My main question is how do I actually import this data into QIIME 2 in a way that can be used for Deblur analysis down the line? Can I still use the SILVA database?

Thank you very much for your time and help.

Sincerely,

David Bradshaw

I used the following scripts in QIIME 1 on each run and then concatenated them into one fna and went down the pipeline to de novo sequencing using the SILVA database

rule multiple_join_paired_ends:
    input:
        config.get("input_dir","")
    output:
        config['output_dir'],
        expand(config['output_dir'] + "/{sample}_R1/fastqjoin.join.fastq",sample=sample_list)
    params:
        in_dir=config.get("input_dir",""),
        out_dir=config.get("output_dir",""),
        r1=config.get("read1_suffix","_R1"),
        r2=config.get("read2_suffix","_R2")
    shell:
        "multiple_join_paired_ends.py -i {params.in_dir} -o {params.out_dir} --read1_indicator {params.r1} --read2_indicator {params.r2}"

rule create_1_dir:
    input:
        config['output_dir']
    output:
        config['output_dir'] + "_1"
    shell:
        "mkdir -p {output}"


rule move_join_fastq:
    input:
        config['output_dir'] + "_1",
        config['output_dir'] + "/{sample}_R1/fastqjoin.join.fastq"
    output:
        config['output_dir'] + "_1/{sample}_R1.fastq"
    shell:
        "mv {input[1]} {output}"

rule multiple_split_libraries_fastq:
    input:
        dir=config['output_dir'] + "_1",
        seqs=expand(config['output_dir'] + "_1/{sample}_R1.fastq",sample=sample_list)
    output:
        dir=config['output_dir'] + "_2",
        seqs=config['output_dir'] + "_2/seqs.fna"
    params:
        bc_type=config.get("barcode_type","not-barcoded")
    shell:
        "multiple_split_libraries_fastq.py -m sampleid_by_file -i {input.dir} -o {output.dir}"

Hi David,

You are right, deblur only works with the forward sequence as it doesn’t handle pair joining on the fly so your approach of joining before using deblur is the right one.

Now, you can import your files to Q2 via a FASTQ manifest and you can define the database of your preference within the denoise-other option, suggest checking: qiime deblur denoise-other --help.

Hope this helps.

Dear Antonio,

Thank you for the help. So I should use the paired end version of the FASTQ manifest to import it and there is a script that takes care of joining the paired ends later or do I first join paired ends using something like QIIME 1 and then use the single end version instead so that I can use deblur in later steps?

What is the main differences between DADA2 and deblur? Do you have a preference? Sorry if these are inappropriate questions for the forum, I understand if you cannot answer.

Does it if my samples are a different file type (picture below)?

Thank you very much for your time and help,

David Bradshaw

Well, I think it’s always hard to say which tool is “better” (for any step) cause at the end of the day it depends on the kind of samples you are working on, the kind of analysis you want to do and the full analytical methodology (including wet lab) that you plan to carry on. Anyway, my suggestion is to check both papers and make your decision based on that. Another option is to check what current similar publications to you dataset are doing and try to follow that pipeline.

Now, one of those decision is if you want to work with a single read (forward) or both reads (forward/reverse). DADA2 has a joining step (which I would suggest checking it’s assumptions) and deblur doesn’t. Thus, if you want to use deblur and use your joined sequences, the only option (currently) is to do the joining outside of Q2.

Finally, for the compression/import, if you use a manifest file Q2 will detect and automatically gzip those that are not gzip but if it’s a directory, then it needs to be already compressed.

Again, hope this helps.

2 Likes

Dear Antonio Pena,

That is very helpful thank you. I also wanted to run an alternative by you since I still wanted to do chimera checking on my samples before deblur analysis.

In my QIIME 1 protocol I joined my demultiplexed sequences my pair ends then ran multiple split libraries fastq on each of my runs and then combined them and performed chimera checking and filtering resulting in the seqs_chimeras_filtered.fna file. Can that file be imported into QIIME 2 somehow to continue with deblur or dada2? Something similar to this thread Importing demultiplexed sequence data from QIIME 1.9.1?

I would not be skipping any suggested quality filtering steps by doing that correct?

Sorry for all the questions, I am very new to microbiome analysis and greatly appreciate your time and help.

Sincerely,

David Bradshaw

@David_Bradshaw, chimera checking is a step that is already included in deblur and, AFAIK, also in dada2 so in theory you should be fine if you are using one of these methods.

Now, in praxis, chimera checking can be done de-novo or reference based. Each of them has it’s biases and each tool/algorithm has it’s pros and cons - suggest checking the literature if you are really interested on this topic. Anyway, my suggestion (*) is to stick to one published method and follow it without changing any of the steps, in this case: dada2 or deblur.

(*) I base this suggestion on my guess that you want to do your analysis as well as possible using the “best” (see previous responses) possible published techniques vs. a methods/algorithm developer that wants to benchmark different methods and generate new ones.

Again, hope this helps.

Dear Antonio,

Thank you very much that helps. I did not realize that deblur had chimera checking built in, the way I read paper is that it used vsearch separately.

I imagine that the newest feature, Analyzing paired end reads in QIIME 2, would likely be the workflow that I use now to import my sequences? My raw data from the sequencer gives me a folder with the R1 and R2 sequences in there, but barcodes are not present. Can I still just follow that protocol and upload that folder or do I need to make any modifications or changes?

Thank you very much for your time and help. I appreciate your guidance and patience with a new user as well.

Sorry for slow reply. Yes, that looks like the right tutorial and I’ll suggest following from top to bottom (with one single tool).

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.