Importing and Demultiplex process for 4 Fastq Files: R1, R2, Index1 and Index2

Hi Matthew,

Thank you very much for your help with this. I could re-attached the barcodes to my reads with the merge barcode python script I used with QIIME 1 - but these are still dual barcoded, paired-ended reads that have not been joined. I assume from your statement that cutadapt will not handle paired-end - dual barcoded reads, since they will need to be joined? Is that correct - QIIME2 doesn’t join reads?

Unfortunately, I can’t process my data as single-end reads, as I will not be able to de-multiplex on single forward or single reverse reads, as I used a dual-indexing sequencing strategy.

I appreciate the link you sent on how to import the seq.fna file. It seems like this may be the best way to utilize QIIME 2 with my data at this time.

Also, I normally remove chimeras from this seq.fna file but should I wait to import into QIIME 2 before I do this step? Or should I remove chimera’s with QIIME 1 and import the filtered seq.fna file into QIIME 2.

Thank you,

Sara

Hi @Sara_Jeanne08 - I kinda figured you had DI data, but thought “just in case…”.

Are you talking about joining forward and reverse reads, or, joining your barcodes back into your reads?

I am about to start working on supporting DI demultiplexing with q2-cutadapt (open issue here), where the barcodes are in the reads (as opposed to the fastq header, or a separate file, like your case). We also have an outstanding ticket that @colinbrislawn linked to above to support DI demultiplexing for EMP protocol data, which is basically what you have. It is safe to say that the q2-cutadapt solution will be in place first, so if you have a way to “re-attach” your barcodes to your reads, you could use this new method when it is released (maybe 2017.2?).

As far as read joining goes (well, joining forward and reverse reads), this is accomplished in QIIME 2 in a few ways. First, the denoise methods in q2-dada2 join PE reads (this is how DADA2 works generally speaking, outside of Q2). Please note, DADA2 actually relies on processing unjoined PE data - the error model won’t work as expected if you provide already joined reads, so please keep that in mind! We also have a q2-vsearch method for joining reads, which is helpful if you are going to use deblur, or one of the OTU-based methods in q2-vsearch.

RE chimeras, I don’t know if it makes more sense for you to remove pre-import or not, but, in case you haven’t seen it, we have a community tutorial for removing chimeras using q2-vsearch.

Stay tuned, and thanks for writing! :t_rex:

1 Like

Hi @Sara_Jeanne08,

QIIME2 supports chimera checking as part of the dada2 or deblur methods, or separately in vsearch as @thermokarst mentioned. So I think it makes sense to use qiime1 to demultiplex your data, then import to QIIME2 for denoising (with built-in chimera checking) or OTU picking followed by chimera checking.

1 Like

Hi Matthew,

Thank you very much for your help with this. To clarify what I meant by “joining” - My first step is to re-attach my barcodes - then to run the extract barcode script to make a properly formatted barcode file for QIIME 1 (my reads no longer have the barcodes attached after this step). To be able to demultiplex the split_libraries_fastq.py script in QIIME 1 - my reads then need to be joined (join_paired_ends.py)

<[quote=“thermokarst, post:8, topic:2586”]
ADA2 actually relies on processing unjoined PE data - the error model won’t work as expected if you provide already joined reads, so please keep that in mind! We also have a q2-vsearch method for joining reads, which is helpful if you are going to use deblur, or one of the OTU-based methods in q2-vsearch.
[/quote]

As part of demultiplexing with split libraries - I include a quality filter, so I am guessing not being able to use DADA2 may ok for my samples. I have been able to import the seq.fna file after this step and plan to remove chimera within QIIME 2 with UChime

another question: In the first tutorial you sent is states:

Dereplicating a SampleData[Sequences] artifact

If you are beginning your analysis with dereplicated, quality controlled sequences, such as those in a QIIME 1 seqs.fna file13, your first step is to import that data into a QIIME 1 artifact. The semantic type used here is SampleData[Sequences], indicating that the data represents collections of sequences associated with one or more samples.

So does that mean I do not need to de-replicate my imported seq.fna file from QIIME 1? It is already dereplicated correct?

Last question (for now): Will there be an issue installing QIIME 1 again on my mac osX after I installed QIIME 2? I have my seq.fna from my MiSeq run completed last year that I imported into QIIME 2 but I need to still pre-process my new sequencing run with QIIME 1.

Thank you very much,

Sara

Hi @Sara_Jeanne08,

The quality filtering applied by dada2 is more effective at removing errors than qiime1-style quality filtering (during demultiplexing) and OTU picking, but there is nothing technically wrong with what you are doing so that is fine. Besides, since it sounds like you need to join reads to get the dual index demultiplexing to work, it sounds like dada2 is off the table.

However, you could still use deblur to denoise your sequences (instead of OTU picking) in QIIME2 if you want to take advantage of one of these new denoising methods. Deblur actually expects that sequences already have a rough quality filter step (the same qiime1 quality filter that you are using, and in QIIME2 we implement the same process in q2-quality-filter), so passing your data to deblur instead of OTU picking should be fine. You will not need to chimera check after using deblur.

It looks like that was a typo in the community tutorial — I have fixed this (these sequences are “demultiplexed”, not “dereplicated”). If your seqs.fna file is the output of split_libraries_fastq.py, it has not been dereplicated — you will need to dereplicate prior to OTU picking (but not prior to deblur if you go that route).

No, but this is a great question! QIIME1 and QIIME2 can be installed as separate environments with conda (e.g., if you have an old native installation and have not already installed QIIME1 as a conda environment, you can follow these instructions). Thus, when you want to switch between QIIME1 and QIIME2 it is as simple as activating the different environments:
source activate qiime1
source activate qiime2-2017.12

I hope that helps!

1 Like

Thank you so much Nicholas. I am glad that with my sequences I can use the quality filtering via deblur. I can not run MacQIIME on my updated Mac but I am working on installing (I found that others have had this issue due to Apple’s new security measures). I am glad I can use both QIIME 1 and 2 without any interference.

I am a little confused how deblur replaces OTU-Picking … or is OTU-picking part of the deblur workflow?
Also, you mention that after passing my data through deblur I do not need to do chimera checking - is that because it is included as part of the deblur tool?

Thanks!

Sara

Deblur uses a static error profile to resolve “true” sequence variants and identify errors, so is similar to dada2 in that respect (vis-a-vis OTU picking). You can read the original article for more details on how it works. This replaces OTU picking because the sequence variants are already dereplicated and denoised (so you can effectively think of them as 100% OTUs minus erroneous sequences). This replaces the two-fold goal of OTU clustering:

  1. collapse sequences based on sequence similarity to reduce downstream computation (this is achieved in deblur by dereplication)
  2. provide a certain level of quality filtering by clustering sequences — sequences with low error rates will be clustered together with theoretically error-free sequences to diminish noise (this is achieved by denoising in deblur)

Yes! The article might provide more details on the specific algorithm. Deblur also outputs a stats visualization that contains info on how many sequences were removed at each stage (including chimera filtering) so you can peek under the hood a bit.

I hope that helps!

1 Like

Thanks so much! Your explanation is very helpful!

1 Like

Hi Nicolas and Matthew,

I am trying to use QIIME 2 to work with my now QIIME 1 imported seq.fna files. From what I understand my next step is to quality filter via qiime quality-filter q-score script and then to run deblur denoise-16S.

However, I read in the deblur plugin page the following:

Usage: qiime deblur denoise-16S [OPTIONS]

Perform sequence quality control for Illumina data using the Deblur workflow with a 16S reference as
a positive filter. Only forward reads are supported at this time…

I have joined reads (forward and reverse put together) at this point that are about 311bp long. Does this mean I can’t actually use Deblur? Or do I need to specify a special option when I pass my data though?

Otherwise it is looking like my only option is VSearch, proceeded by a separate UChime chimera checking step.

Thank you very much for your time and for clarifying,

Sara

Hi @Sara_Jeanne08,

My understanding was that you have already demultiplexed these with QIIME1’s split_libraries_fastq.py, which already performs the same quality filter as quality-filter. If so, you can proceed directly to deblur (but if you’ve already passed your data through quality-filter that’s fine too.

I just mentioned the q2-quality-filter plugin in case you want to replicate these steps in the future when QIIME2 supports methods for dual-index paired-end reads and QIIME1 is no longer needed for your analysis.

Not to worry — I think that text really just means that unjoined paired-end reads are not supported. If it’s any reassurance, the community tutorial for processing paired-end reads recommends deblur for denoising joined paired-end reads. If you do get an error then perhaps I have misunderstood the format that your data are in, but as far as I can tell you should be fine.

I hope that clarifies!

Hi Nicolas,

I have been trying to get deblur to work with my qiime 1 pre-processed demultiplexed reads but it has not been successful:

(qiime2-2017.12) bash-3.2$ qiime deblur denoise-16S --i-demultiplexed-seqs seqs.qza --p-trim-length -1 --o-representative-sequences 01212018_pilot_deblur1_rep_seqs.qza --o-table 01212018_pilot_deblur1_table.qza --p-sample-stats --o-stats 01212018_pilot_deblur1_stats.qza --verbose
Traceback (most recent call last):
  File "/Users/Sara_Jeanne/miniconda2/envs/qiime2-2017.12/lib/python3.5/site-packages/q2cli/commands.py", line 224, in __call__
    results = action(**arguments)
  File "<decorator-gen-344>", line 2, in denoise_16S
  File "/Users/Sara_Jeanne/miniconda2/envs/qiime2-2017.12/lib/python3.5/site-packages/qiime2/sdk/action.py", line 196, in bound_callable
    self.signature.check_types(**user_input)
  File "/Users/Sara_Jeanne/miniconda2/envs/qiime2-2017.12/lib/python3.5/site-packages/qiime2/core/type/signature.py", line 299, in check_types
    "subtype of %r." % (name, spec.qiime_type))
TypeError: Argument to parameter 'demultiplexed_seqs' is not a subtype of SampleData[JoinedSequencesWithQuality | PairedEndSequencesWithQuality | SequencesWithQuality].

Plugin error from deblur:

  Argument to parameter 'demultiplexed_seqs' is not a subtype of SampleData[JoinedSequencesWithQuality | PairedEndSequencesWithQuality | SequencesWithQuality].

See above for debug info.

Also, I tried to summarize / visualize the quality of my reads with demux and got a similar error:

(qiime2-2017.12) bash-3.2$ qiime demux summarize --i-data seqs.qza --o-visualization seqs.qzv
Plugin error from demux:

  Argument to parameter 'data' is not a subtype of SampleData[JoinedSequencesWithQuality | PairedEndSequencesWithQuality | SequencesWithQuality].

Debug info has been saved to /var/folders/pv/pn9r2m8x7y5dqnnh35m1m9fm0000gn/T/qiime2-q2cli-err-0e0_my7k.log

Thank you very much for your time and help with this. I really appreciate it.

Sara

Hi @Sara_Jeanne08,
Looks like seqs.qza is not any of the required data formats. Could you please share the precise command that you used to import this file into QIIME2? (I’m assuming that this is the imported demultiplexed sequence file that you mentioned at the start of this thread, and that no other processing has occurred since importing)

Could you also please show us the output of the following command?
qiime tools peek seqs.qza

Thanks!

Hi Nicolas,

Here is the command I used to import my qiime 1 demultiplexed sequence file from the earlier tutorial link:

(qiime2-2017.12) bash-3.2$ qiime tools import --input-path /Users/Sara_Jeanne/Desktop/QIIME_122017/seqs.fna --output-path /Users/Sara_Jeanne/Desktop/QIIME_122017/seqs.qza --type SampleData[Sequences]

Also, here is the output for the command you provided:

(qiime2-2017.12) bash-3.2$ qiime tools peek seqs.qza
UUID:        f688e65d-bd66-460a-bf4b-5a15157f2dc8
Type:        SampleData[Sequences]
Data format: QIIME1DemuxDirFmt

I am assuming there is a more appropriate type for importing my data - SampleData[JoinedSequencesWithQuality]

Is there a source parameter I am supposed to use for importing joined reads?

Thank you again!

Sara

Hi @Sara_Jeanne08,
Sorry! Looks like I gave bad advice when I suggested you could still use deblur with your data. I was mistaken about the data types that could be used as input. :disappointed:

It sounds like for now q2-vsearch dereplication/OTU picking is the only method that can handle qiime1 demultiplexed data (i.e., following the first tutorial that @thermokarst linked to above). We do have an open issue here to add support for this data type with deblur, and this has been discussed previously here.

I’m sorry if my advice held you up! In any case, it sounds like we should have upcoming support for (1) dual-index reads and (2) processing qiime1 demultiplexed data in deblur, so in the future QIIME2 should be better equipped to handle data sets like yours.

Hi Nicholas,

Thank you for letting me know - I had a feeling I may have to go back and try V-Search.

My big concern is that I use the Silva Database as a reference for alignment and classification- I was using it successfully in QIIME 1 with specific parameter file for each step in the open-reference clustering/ OTU-picking but I am not sure how to set this up in QIIME 2.

Is there a tutorial for using a different reference from the default GreenGenes (it hasn’t been updated in a very long time)? Do I import the Silva database as follows:
qiime tools import
–input-path aligned-sequences.fna
–output-path aligned-sequences.qza
–type ‘FeatureData[AlignedSequence]’

How do I best import Silva’s taxonomy strings / levels?

Sorry for all the questions - I am a bit stuck in using this new QIIME program to analyze my specific dataset (especially since I can’t use default settings and whatnot with my dual-indexed samples).

So Far I have been able to do the following:

  1. Import my joined and demultiplexed reads into QIIME 2
  2. De-replicate Sequences via V-Search

Next:
3) Remove Chimeras
4) Run Open-Reference OTU Picking with VSearch
5) Align and classify my representative sequences
6) Build Phylogenetic Trees from my newly aligned rep set.
7) Run Alpha-diversity tests
8) Run Beta-diversity test

Thank you very much for your time and help with this. It is greatly appreciated.

Sara

Hi @Sara_Jeanne08,

No problem at all! SILVA maintains an archive of qiime1-and-2-compatible downloads (note that the most recent release, 132, is not yet available, but 128 is). You can download one of those archives and use the files in the same way that one would use greengenes.

For all steps (e.g., as a reference sequence database for open-reference OTU picking with vsearch), I believe you would use the unaligned sequences and import as FeatureData[Sequence] type. So look in the “rep_set” directory in the SILVA release files linked above.

qiime tools import \
  --type 'FeatureData[Taxonomy]' \
  --source-format HeaderlessTSVTaxonomyFormat \
  --input-path 99_otu_taxonomy.txt \
  --output-path ref-taxonomy.qza

Understood! The fact that dual-indexed sequences are not yet supported does complicate matters.

Excellent — the remaining steps should go smoothly once you have the correct data types imported. The remaining steps that you have listed are all covered in various tutorials, e.g., the “moving pictures” tutorial covers most of the post-OTU picking steps (that tutorial uses dada2, but the downstream steps are the same).

I hope that helps!

Hi Nicolas,

I was able to complete denovo and reference based chimera checking and now moving on to OTU picking with VSearch. I am having a hard time importing my Silva taxonomy following the format below:

This the command I passed:

(qiime2-2017.12) bash-3.2$ qiime tools import --type FeatureData[Taxonomy] --source-format HeaderLessTSVTaxonomyFormat --input-path ./consensus_taxonomy_7_levels.txt --output-path ref-taxonomy.qza
Traceback (most recent call last):
  File "/Users/Sara_Jeanne/miniconda2/envs/qiime2-2017.12/lib/python3.5/site-packages/qiime2/sdk/util.py", line 91, in parse_format
    format_record = pm.formats[format_str]
KeyError: 'HeaderLessTSVTaxonomyFormat'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/Sara_Jeanne/miniconda2/envs/qiime2-2017.12/lib/python3.5/site-packages/q2cli/tools.py", line 116, in import_data
    view_type=source_format)
  File "/Users/Sara_Jeanne/miniconda2/envs/qiime2-2017.12/lib/python3.5/site-packages/qiime2/sdk/result.py", line 180, in import_data
    view_type = qiime2.sdk.parse_format(view_type)
  File "/Users/Sara_Jeanne/miniconda2/envs/qiime2-2017.12/lib/python3.5/site-packages/qiime2/sdk/util.py", line 93, in parse_format
    raise TypeError("No format: %s" % format_str)
TypeError: No format: HeaderLessTSVTaxonomyFormat

An unexpected error has occurred:

  No format: HeaderLessTSVTaxonomyFormat

See above for debug info.

Thank you very much for your time and help with this,

Sara

Try HeaderlessTSVTaxonomyFormat instead — sorry, I had copied/pasted the above command from elsewhere and obviously propagated a typo. You can see the correct command in this tutorial, and I have corrected this typo above.

I hope that helps!

EDIT: the original command I wrote (copied/pasted) above is correct, but for some reason the forum is automatically capitalizing the “less” in “Headerless”. Make sure that the “less” is lowercase. HeaderlessTSVTaxonomyFormat. Thanks!

An off-topic reply has been split into a new topic: Questions about Open Reference OTU Picking

Please keep replies on-topic in the future.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.