Questions about import Paired-end sequencing data

Hello, I have some questions about starting with PE sequencing data. I have sequencing data. They are 16S MiSeq using EMP primers and barcodes. I follow the instructions from this tutorials (https://docs.qiime2.org/2019.10/tutorials/atacama-soils/)

I have some questions:

1> The first step, I need to run “qiime tools import” to generate a qza file (emp-paired-end-sequences.qza). It worked for me.

My questions here: It seems the fold that I imported must have three files and the name must be exactly forward.fastq.gz, reverse.fastq.gz, bacodes.fastq.gz. If I don’t use these file names. qiime2 give me errors?
So, everytime before I import my data, I have to rename those files?

2>Next, I need to run demuliplex command. This is an example:

qiime demux emp-paired
–m-barcodes-file sample-metadata.tsv
–m-barcodes-column barcode-sequence
–p-rev-comp-mapping-barcodes
–i-seqs emp-paired-end-sequences.qza
–o-per-sample-sequences demux.qza
–o-error-correction-details demux-details.qza

My questions:

a> At the beginning, I suppose that I don’t need “–p-rev-comp-mapping-barcodes”. However, if I don’t add this, the scripts give me errors. The error is “No sequences were mapped to samples. Check that your barcodes are in the correct orientation (see the rev_comp_barcodes and/or rev_comp_mapping_barcodes options).”

What is the difference between rev_comp_barcodes and rev_comp_mapping_barcodes? In my case, only rev_comp_mapping_barcodes work. When should I use rev_comp_barcodes?

b>To be honest, I don’t know why I need to use reverse compliment. These data are my old data. I used to use QIIME 1 to split the library which is split_libraries_fastq.py. I don’t think QIIME1 does automatic reverser compliment for me. I just using the same column of barcode as I used in QIIME2. I don’t know why it didn’t work in QIIME2 (has to reverse compliment?)

c>This questions follow up with the above questions. After demulitplex, I compare the reads between Qiime1 and qiime 2 results. I found I lost 95% of reads using Qiime 2 demulitplex workflow?

I am not sure if I did it correctly or not. My barode is 12nt. I found a parameter-- “–p-golay-error-correction / --p-no-golay-error-correction Perform 12nt Golay error correction on the barcode reads.”

How many nt of barcode would QIIME2 think default? 8nt? Should I try both 12nt with and without reverse compliment parameters?

I lost too many reads.

d>Once I finish demultiplex, I created a qzv file, so I can visualize it. It requires running "qiime tools view " and open firefox to see the statistical results of demulitiplex.

I am using remote connection to a super computer? Is it possbile I can generate a txt file? this is not convenient to me.

1 Like

Hi @moonlight,

There's a lot of different questions going on here so let's break them down, in the future you may consider separating some of these questions in separate threads, not only is that easier for archiving and searching but from experience you will get faster responds from the community. But let's begin!

If you want to use the EMP import method, yes.

Here is the what the help file says: qiime demux emp-paired --help

--p-rev-comp-barcodes / --p-no-rev-comp-barcodes
                       If provided, the barcode sequence reads will be
                       reverse complemented prior to demultiplexing.
                                                              [default: False]
  --p-rev-comp-mapping-barcodes / --p-no-rev-comp-mapping-barcodes
                       If provided, the barcode sequences in the sample
                       metadata will be reverse complemented prior to
                       demultiplexing.                        [default: False] 

So you'll need to take a look at your raw reads and your sequences in your metadata category to see which set up you have and which tag you'll need to use. Those options are there to help you match those 2 entities. I'm not super familiar with Qiime1 but from a quick look, I think you are right in that it doesn't have those set by default but the options were certainly there.
The loss of 95% is certainly a sign of concern so there's definitely something is a miss here. It should be much higher than that, especially if as you say with Qiime1 it worked fine.

As per the help file, 12nt.

--p-golay-error-correction / --p-no-golay-error-correction
                       Perform 12nt Golay error correction on the barcode
                       reads.                                  [default: True]

The golay-error-correction parameter which is turned on by default here should only be used if your barcodes are indeed error-correcting. If they are not error-correcting, you should try running this again with the --p-no-golay-error-correction flag instead. This actually is a common issue for users who don't have error-correcting barcodes, so this is a good starting step.

Certainly! You can extract the raw data from any qiime2 artifacts, including the visualizations, have a look through the export/extract tutorial here to see how. These artifacts can also be unzipped using your preferred zip program too, look into the data folder for the .txt.
That being said, the visualization artifacts can also be downloaded locally on your computer if you do want to utilize its neat features in a browser. I find it quite useful.

Hope this helps, keep us posted!

1 Like

Hello Mehrod,

Thanks for you reply. Just a follow up of your reply. I did some test

1>In my dataset, If I don't add the parameters "--p-rev-comp-barcodes " and "--p-rev-comp-mapping-barcodes". The script give me errors
2>If I only have parameter "--p-rev-comp-barcodes", the script gives me error.
3>If I only have parameter "--p-rev-comp-mapping-barcodes", I loss 95% of my reads compare to QIIME1.
4>If I add both "--p-rev-comp-barcodes " and "--p-rev-comp-mapping-barcodes". The demultiplex resutls are almost same as QIIME1.

Some comments:

This is really odd to me. Here is what I understand the parameters . " "--p-rev-comp-barcodes " will find all the barcodes in my raw reads and do a reverse compliment before demux. Also, "--p-rev-comp-mapping-barcodes" will find the barcode column in my mapping file and do a reverse compliment before demux.

If you do twice reverse compliment, this means you don't need to do this. :wink:

As I did demux in QIIME 1, I never consider about the reverse compliment. Just use the normal orientation of barcode.

Well, I guest I have to try different combination when I get new data next time.

What is the difference between rev_comp_barcodes and rev_comp_mapping_barcodes?

Here is the what the help file says: qiime demux emp-paired --help

--p-rev-comp-barcodes / --p-no-rev-comp-barcodes
                       If provided, the barcode sequence reads will be
                       reverse complemented prior to demultiplexing.
                                                              [default: False]
  --p-rev-comp-mapping-barcodes / --p-no-rev-comp-mapping-barcodes
                       If provided, the barcode sequences in the sample
                       metadata will be reverse complemented prior to
                       demultiplexing.                        [default: False] 

So you’ll need to take a look at your raw reads and your sequences in your metadata category to see which set up you have and which tag you’ll need to use. Those options are there to help you match those 2 entities. I’m not super familiar with Qiime1 but from a quick look, I think you are right in that it doesn’t have those set by default but the options were certainly there.
The loss of 95% is certainly a sign of concern so there’s definitely something is a miss here. It should be much higher than that, especially if as you say with Qiime1 it worked fine.

How many nt of barcode would QIIME2 think default?

As per the help file, 12nt.

--p-golay-error-correction / --p-no-golay-error-correction
                       Perform 12nt Golay error correction on the barcode
                       reads.                                  [default: True]

The golay-error-correction parameter which is turned on by default here should only be used if your barcodes are indeed error-correcting. If they are not error-correcting, you should try running this again with the --p-no-golay-error-correction flag instead. This actually is a common issue for users who don’t have error-correcting barcodes, so this is a good starting step.

1 Like

False (according to the method defaults). The mapping barcodes must be reverse-complemented to be considered valid golay barcodes. Then the barcodes must be reverse-complemented to match your mapping barcodes!

But you are correct, no RC needs to occur, if --p-no-golay-error-correction is used, since your barcodes and mapping barcodes are already in the same orientation (just not the correct orientation to be recognized as golay barcodes).

The chief difference between demux emp-paired and qiime1's split_libraries_fastq.py is that emp-paired is designed to work with the Earth Microbiome Project protocols whereas split_libraries_fastq.py is just a generic method for demultiplexing (and trimming/filtering reads! so keep in mind the methods are not at all equivalent). So the default settings for these methods will not necessarily be the same.

Sounds like you found your solution!

1 Like

A post was split to a new topic: dada2 workflow questions

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.