Importing and Demultiplex process for 4 Fastq Files: R1, R2, Index1 and Index2

Hi,

Last year I used QIIME 1 with 4 fastq files (Read1, Read2, Index1, Index2) for my PE Multiplexed MiSeq Data. It took many steps but with your support back then it worked. With QIIME 2, I am not sure where to start because this option / sequencing file option is not listed in the import docs.

Below are the pre-processing steps I took with QIIME 1- I am hoping you can help me do the following with QIIME 2:

$ python merge_bcs_reads.py Undetermined_S0_L001_I2_001.fastq Undetermined_S0_L001_R1_001.fastq Full_Read1_wBarcodes.fastq

$ python merge_bcs_reads.py Undetermined_S0_L001_I1_001.fastq Undetermined_S0_L001_R2_001.fastq Full_Read2_wBarcodes.fastq

$ extract_barcodes.py --input_type barcode_paired_end -f Full_Read1_wBarcodes.fastq -r Full_Read2_wBarcodes.fastq -m Pilot_QIIME_MapFile_2.txt --rev_comp_bc2 --switch_bc_order --bc1_len 8 --bc2_len 8 -o parsed_barcodes_fullreads/

$ join_paired_ends.py -f reads1.fastq -r reads2.fastq -b barcodes.fastq -o /Users/Sara_Jeanne/Desktop/QIIME/fastq-join_joined_FullReads

$ split_libraries_fastq.py -i fastqjoin.join.fastq -b fastqjoin.join_barcodes.fastq -m Pilot_QIIME_MapFile_2.txt -o Split_Lib_FullReads_attachedBC_q20/ --store_qual_scores --barcode_type 16 -q 19  

Thank you very much!

Sara

1 Like

Good morning Sara,

You are correct, this method is still being developed. These two issues are tracking it.

It looks like Qiime 2 2017.12 now supports cutadapt, which might help you demultiplex your reads. Take a look at this plugin and let me know if demux-paired is a good fit for your data.
https://docs.qiime2.org/2017.12/plugins/available/cutadapt/

Colin

Hi Colin,

Thank you for your time and help with this. My barcodes are all the same length but It looks like from the other issues I would need to start with QIIME 1 to add on my barcodes to my sequences and then extract these barcodes for a single barcode file and then join my reads…

Or is there an extract barcode and join script for QIIME 2 that I can run before the cutadapt tool you mentioned? The cutadapt works with the attached barcodes on my reads but the reads are not joined together yet - so I am not sure how this tool’s paired demux would work without joining my reads first.

Or should I completed all my preprocessing steps and the demultiplexing (split library) in QIIME 1 and then import this data (seq.fna file) into QIIME 2?

Thank you very much,

Sara

Hi @Sara_Jeanne08!

Unfortunately q2-cutadapt probably won't work for you --- it only supports demultiplexing reads with the barcode still in the read sequence (I think this is what you are talking about when you said that your reads need to be "joined" together).

This option could get you moving quickest - check out this Community Tutorial for some help on getting started with that route.

Another option would be to process these data as single-end reads - just follow the EMP import and demux protocol, but that assumes you can demux forward reads with just the forward barcodes (e.g., this isn't a dual-indexing scenario). Similarly, if this isn't a dual-indexing scenario, you could import and demux your forward reads, import and demux your reverse reads, export both resulting sets of demuxed reads, and re-import as paired-end demux data using a manifest format. That is a lot packed in there, and I might be missing some critical assumption, but if that is something you want to pursue we could try and walk you through it.

Those options make a lot of assumptions, so your mileage may vary. In the meantime, we will track this use-case for future support of this sequence format. Thanks, and if you get stuck, you know where to find us! :t_rex:

Hi Matthew,

Thank you very much for your help with this. I could re-attached the barcodes to my reads with the merge barcode python script I used with QIIME 1 - but these are still dual barcoded, paired-ended reads that have not been joined. I assume from your statement that cutadapt will not handle paired-end - dual barcoded reads, since they will need to be joined? Is that correct - QIIME2 doesn’t join reads?

Unfortunately, I can’t process my data as single-end reads, as I will not be able to de-multiplex on single forward or single reverse reads, as I used a dual-indexing sequencing strategy.

I appreciate the link you sent on how to import the seq.fna file. It seems like this may be the best way to utilize QIIME 2 with my data at this time.

Also, I normally remove chimeras from this seq.fna file but should I wait to import into QIIME 2 before I do this step? Or should I remove chimera’s with QIIME 1 and import the filtered seq.fna file into QIIME 2.

Thank you,

Sara

Hi @Sara_Jeanne08 - I kinda figured you had DI data, but thought "just in case...".

Are you talking about joining forward and reverse reads, or, joining your barcodes back into your reads?

I am about to start working on supporting DI demultiplexing with q2-cutadapt (open issue here), where the barcodes are in the reads (as opposed to the fastq header, or a separate file, like your case). We also have an outstanding ticket that @colinbrislawn linked to above to support DI demultiplexing for EMP protocol data, which is basically what you have. It is safe to say that the q2-cutadapt solution will be in place first, so if you have a way to "re-attach" your barcodes to your reads, you could use this new method when it is released (maybe 2017.2?).

As far as read joining goes (well, joining forward and reverse reads), this is accomplished in QIIME 2 in a few ways. First, the denoise methods in q2-dada2 join PE reads (this is how DADA2 works generally speaking, outside of Q2). Please note, DADA2 actually relies on processing unjoined PE data - the error model won't work as expected if you provide already joined reads, so please keep that in mind! We also have a q2-vsearch method for joining reads, which is helpful if you are going to use deblur, or one of the OTU-based methods in q2-vsearch.

RE chimeras, I don't know if it makes more sense for you to remove pre-import or not, but, in case you haven't seen it, we have a community tutorial for removing chimeras using q2-vsearch.

Stay tuned, and thanks for writing! :t_rex:

1 Like

Hi @Sara_Jeanne08,

QIIME2 supports chimera checking as part of the dada2 or deblur methods, or separately in vsearch as @thermokarst mentioned. So I think it makes sense to use qiime1 to demultiplex your data, then import to QIIME2 for denoising (with built-in chimera checking) or OTU picking followed by chimera checking.

1 Like

Hi Matthew,

Thank you very much for your help with this. To clarify what I meant by "joining" - My first step is to re-attach my barcodes - then to run the extract barcode script to make a properly formatted barcode file for QIIME 1 (my reads no longer have the barcodes attached after this step). To be able to demultiplex the split_libraries_fastq.py script in QIIME 1 - my reads then need to be joined (join_paired_ends.py)

<[quote="thermokarst, post:8, topic:2586"]
ADA2 actually relies on processing unjoined PE data - the error model won’t work as expected if you provide already joined reads, so please keep that in mind! We also have a q2-vsearch method for joining reads, which is helpful if you are going to use deblur, or one of the OTU-based methods in q2-vsearch.
[/quote]

As part of demultiplexing with split libraries - I include a quality filter, so I am guessing not being able to use DADA2 may ok for my samples. I have been able to import the seq.fna file after this step and plan to remove chimera within QIIME 2 with UChime

another question: In the first tutorial you sent is states:

Dereplicating a SampleData[Sequences] artifact

If you are beginning your analysis with dereplicated, quality controlled sequences, such as those in a QIIME 1 seqs.fna file13, your first step is to import that data into a QIIME 1 artifact. The semantic type used here is SampleData[Sequences], indicating that the data represents collections of sequences associated with one or more samples.

So does that mean I do not need to de-replicate my imported seq.fna file from QIIME 1? It is already dereplicated correct?

Last question (for now): Will there be an issue installing QIIME 1 again on my mac osX after I installed QIIME 2? I have my seq.fna from my MiSeq run completed last year that I imported into QIIME 2 but I need to still pre-process my new sequencing run with QIIME 1.

Thank you very much,

Sara

Hi @Sara_Jeanne08,

The quality filtering applied by dada2 is more effective at removing errors than qiime1-style quality filtering (during demultiplexing) and OTU picking, but there is nothing technically wrong with what you are doing so that is fine. Besides, since it sounds like you need to join reads to get the dual index demultiplexing to work, it sounds like dada2 is off the table.

However, you could still use deblur to denoise your sequences (instead of OTU picking) in QIIME2 if you want to take advantage of one of these new denoising methods. Deblur actually expects that sequences already have a rough quality filter step (the same qiime1 quality filter that you are using, and in QIIME2 we implement the same process in q2-quality-filter), so passing your data to deblur instead of OTU picking should be fine. You will not need to chimera check after using deblur.

It looks like that was a typo in the community tutorial — I have fixed this (these sequences are "demultiplexed", not "dereplicated"). If your seqs.fna file is the output of split_libraries_fastq.py, it has not been dereplicated — you will need to dereplicate prior to OTU picking (but not prior to deblur if you go that route).

No, but this is a great question! QIIME1 and QIIME2 can be installed as separate environments with conda (e.g., if you have an old native installation and have not already installed QIIME1 as a conda environment, you can follow these instructions). Thus, when you want to switch between QIIME1 and QIIME2 it is as simple as activating the different environments:
source activate qiime1
source activate qiime2-2017.12

I hope that helps!

1 Like

Thank you so much Nicholas. I am glad that with my sequences I can use the quality filtering via deblur. I can not run MacQIIME on my updated Mac but I am working on installing (I found that others have had this issue due to Apple’s new security measures). I am glad I can use both QIIME 1 and 2 without any interference.

I am a little confused how deblur replaces OTU-Picking … or is OTU-picking part of the deblur workflow?
Also, you mention that after passing my data through deblur I do not need to do chimera checking - is that because it is included as part of the deblur tool?

Thanks!

Sara

Deblur uses a static error profile to resolve "true" sequence variants and identify errors, so is similar to dada2 in that respect (vis-a-vis OTU picking). You can read the original article for more details on how it works. This replaces OTU picking because the sequence variants are already dereplicated and denoised (so you can effectively think of them as 100% OTUs minus erroneous sequences). This replaces the two-fold goal of OTU clustering:

  1. collapse sequences based on sequence similarity to reduce downstream computation (this is achieved in deblur by dereplication)
  2. provide a certain level of quality filtering by clustering sequences — sequences with low error rates will be clustered together with theoretically error-free sequences to diminish noise (this is achieved by denoising in deblur)

Yes! The article might provide more details on the specific algorithm. Deblur also outputs a stats visualization that contains info on how many sequences were removed at each stage (including chimera filtering) so you can peek under the hood a bit.

I hope that helps!

1 Like

Thanks so much! Your explanation is very helpful!

1 Like

Hi Nicolas and Matthew,

I am trying to use QIIME 2 to work with my now QIIME 1 imported seq.fna files. From what I understand my next step is to quality filter via qiime quality-filter q-score script and then to run deblur denoise-16S.

However, I read in the deblur plugin page the following:

Usage: qiime deblur denoise-16S [OPTIONS]

Perform sequence quality control for Illumina data using the Deblur workflow with a 16S reference as
a positive filter. Only forward reads are supported at this time...

I have joined reads (forward and reverse put together) at this point that are about 311bp long. Does this mean I can't actually use Deblur? Or do I need to specify a special option when I pass my data though?

Otherwise it is looking like my only option is VSearch, proceeded by a separate UChime chimera checking step.

Thank you very much for your time and for clarifying,

Sara

Hi @Sara_Jeanne08,

My understanding was that you have already demultiplexed these with QIIME1's split_libraries_fastq.py, which already performs the same quality filter as quality-filter. If so, you can proceed directly to deblur (but if you've already passed your data through quality-filter that's fine too.

I just mentioned the q2-quality-filter plugin in case you want to replicate these steps in the future when QIIME2 supports methods for dual-index paired-end reads and QIIME1 is no longer needed for your analysis.

Not to worry — I think that text really just means that unjoined paired-end reads are not supported. If it's any reassurance, the community tutorial for processing paired-end reads recommends deblur for denoising joined paired-end reads. If you do get an error then perhaps I have misunderstood the format that your data are in, but as far as I can tell you should be fine.

I hope that clarifies!

Hi Nicolas,

I have been trying to get deblur to work with my qiime 1 pre-processed demultiplexed reads but it has not been successful:

(qiime2-2017.12) bash-3.2$ qiime deblur denoise-16S --i-demultiplexed-seqs seqs.qza --p-trim-length -1 --o-representative-sequences 01212018_pilot_deblur1_rep_seqs.qza --o-table 01212018_pilot_deblur1_table.qza --p-sample-stats --o-stats 01212018_pilot_deblur1_stats.qza --verbose
Traceback (most recent call last):
  File "/Users/Sara_Jeanne/miniconda2/envs/qiime2-2017.12/lib/python3.5/site-packages/q2cli/commands.py", line 224, in __call__
    results = action(**arguments)
  File "<decorator-gen-344>", line 2, in denoise_16S
  File "/Users/Sara_Jeanne/miniconda2/envs/qiime2-2017.12/lib/python3.5/site-packages/qiime2/sdk/action.py", line 196, in bound_callable
    self.signature.check_types(**user_input)
  File "/Users/Sara_Jeanne/miniconda2/envs/qiime2-2017.12/lib/python3.5/site-packages/qiime2/core/type/signature.py", line 299, in check_types
    "subtype of %r." % (name, spec.qiime_type))
TypeError: Argument to parameter 'demultiplexed_seqs' is not a subtype of SampleData[JoinedSequencesWithQuality | PairedEndSequencesWithQuality | SequencesWithQuality].

Plugin error from deblur:

  Argument to parameter 'demultiplexed_seqs' is not a subtype of SampleData[JoinedSequencesWithQuality | PairedEndSequencesWithQuality | SequencesWithQuality].

See above for debug info.

Also, I tried to summarize / visualize the quality of my reads with demux and got a similar error:

(qiime2-2017.12) bash-3.2$ qiime demux summarize --i-data seqs.qza --o-visualization seqs.qzv
Plugin error from demux:

  Argument to parameter 'data' is not a subtype of SampleData[JoinedSequencesWithQuality | PairedEndSequencesWithQuality | SequencesWithQuality].

Debug info has been saved to /var/folders/pv/pn9r2m8x7y5dqnnh35m1m9fm0000gn/T/qiime2-q2cli-err-0e0_my7k.log

Thank you very much for your time and help with this. I really appreciate it.

Sara

Hi @Sara_Jeanne08,
Looks like seqs.qza is not any of the required data formats. Could you please share the precise command that you used to import this file into QIIME2? (I’m assuming that this is the imported demultiplexed sequence file that you mentioned at the start of this thread, and that no other processing has occurred since importing)

Could you also please show us the output of the following command?
qiime tools peek seqs.qza

Thanks!

Hi Nicolas,

Here is the command I used to import my qiime 1 demultiplexed sequence file from the earlier tutorial link:

(qiime2-2017.12) bash-3.2$ qiime tools import --input-path /Users/Sara_Jeanne/Desktop/QIIME_122017/seqs.fna --output-path /Users/Sara_Jeanne/Desktop/QIIME_122017/seqs.qza --type SampleData[Sequences]

Also, here is the output for the command you provided:

(qiime2-2017.12) bash-3.2$ qiime tools peek seqs.qza
UUID:        f688e65d-bd66-460a-bf4b-5a15157f2dc8
Type:        SampleData[Sequences]
Data format: QIIME1DemuxDirFmt

I am assuming there is a more appropriate type for importing my data - SampleData[JoinedSequencesWithQuality]

Is there a source parameter I am supposed to use for importing joined reads?

Thank you again!

Sara

Hi @Sara_Jeanne08,
Sorry! Looks like I gave bad advice when I suggested you could still use deblur with your data. I was mistaken about the data types that could be used as input. :disappointed:

It sounds like for now q2-vsearch dereplication/OTU picking is the only method that can handle qiime1 demultiplexed data (i.e., following the first tutorial that @thermokarst linked to above). We do have an open issue here to add support for this data type with deblur, and this has been discussed previously here.

I’m sorry if my advice held you up! In any case, it sounds like we should have upcoming support for (1) dual-index reads and (2) processing qiime1 demultiplexed data in deblur, so in the future QIIME2 should be better equipped to handle data sets like yours.

Hi Nicholas,

Thank you for letting me know - I had a feeling I may have to go back and try V-Search.

My big concern is that I use the Silva Database as a reference for alignment and classification- I was using it successfully in QIIME 1 with specific parameter file for each step in the open-reference clustering/ OTU-picking but I am not sure how to set this up in QIIME 2.

Is there a tutorial for using a different reference from the default GreenGenes (it hasn’t been updated in a very long time)? Do I import the Silva database as follows:
qiime tools import
–input-path aligned-sequences.fna
–output-path aligned-sequences.qza
–type ‘FeatureData[AlignedSequence]’

How do I best import Silva’s taxonomy strings / levels?

Sorry for all the questions - I am a bit stuck in using this new QIIME program to analyze my specific dataset (especially since I can’t use default settings and whatnot with my dual-indexed samples).

So Far I have been able to do the following:

  1. Import my joined and demultiplexed reads into QIIME 2
  2. De-replicate Sequences via V-Search

Next:
3) Remove Chimeras
4) Run Open-Reference OTU Picking with VSearch
5) Align and classify my representative sequences
6) Build Phylogenetic Trees from my newly aligned rep set.
7) Run Alpha-diversity tests
8) Run Beta-diversity test

Thank you very much for your time and help with this. It is greatly appreciated.

Sara

Hi @Sara_Jeanne08,

No problem at all! SILVA maintains an archive of qiime1-and-2-compatible downloads (note that the most recent release, 132, is not yet available, but 128 is). You can download one of those archives and use the files in the same way that one would use greengenes.

For all steps (e.g., as a reference sequence database for open-reference OTU picking with vsearch), I believe you would use the unaligned sequences and import as FeatureData[Sequence] type. So look in the "rep_set" directory in the SILVA release files linked above.

qiime tools import \
  --type 'FeatureData[Taxonomy]' \
  --source-format HeaderlessTSVTaxonomyFormat \
  --input-path 99_otu_taxonomy.txt \
  --output-path ref-taxonomy.qza

Understood! The fact that dual-indexed sequences are not yet supported does complicate matters.

Excellent — the remaining steps should go smoothly once you have the correct data types imported. The remaining steps that you have listed are all covered in various tutorials, e.g., the "moving pictures" tutorial covers most of the post-OTU picking steps (that tutorial uses dada2, but the downstream steps are the same).

I hope that helps!