Demultiplexing- Paired-end- Barcode


(Meha) #1

Hi Friends,

In demultiplexing step in the Qiime2 tutorial https://docs.qiime2.org/2019.1/tutorials/atacama-soils/#atacama-demux it is said that a user has to have a barcode file with fastq.gz extension. I do not have barcode with this file extension (I know the barcod’s sequences used in my amplicons). Do I have to make it manually? If so, how? By Qiime2 or what? or how can I provide it?

Next question: After having the three files R, F and barcode, should they be put in a directory? Then run this command below:

qiime tools import _
_ --type EMPPairedEndSequences _
_ --input-path emp-paired-end-sequences _
_ --output-path emp-paired-end-sequences.qza

If yes, I opened this command’s view there is written:

  • name:“emp-paired-end-sequences.qza”
  • uuid:“84315e83-532c-40e5-a70c-1b86173c0d92”
  • type:“EMPPairedEndSequences”
  • format:“EMPPairedEndDirFmt”

But there is no FORMAT argument in the command mentioned.

Another question: this step highly looks like the importing step. Why we do this part of demultiplexing? Othwerise, it is already done in the previous step!

and finally, in the main demultiplexing command, there is an argument with .tsv extension. For my data, how can I provide it? This format is used in the next steps, not in the earlier steps based on what I read in the tutorial.

Thnx


Bolding or makeing clear for novice
(Matthew Ryan Dillon) assigned thermokarst #2

(Matthew Ryan Dillon) #3

Let’s take a step back @Mehrdad — what data do you have on hand? Is it multiplexed? Is it demultiplexed? How many files did you receive from your sequencing center? What are the filenames?


(Matthew Ryan Dillon) unassigned thermokarst #4

(Meha) #5

Hi, thank you for replied and the deep question.

I got 8 fastq.gz files. They involved four files are R1 (it means they are forward) and the four R2 (it means they are Reverse). They are absolutely multiplexed. Absolutely I am going to do demultiplex them.

In first step, I put all files (8 files) in a directory then imported from fastq.gz to .qza file extension. It means the artifact is on available. In the next step, I have to perform demultiplex step. They are two commands as mentioned in the previous message.

The first command is:
qiime tools import
–type EMPPairedEndSequences
–input-path emp-paired-end-sequences
–output-path emp-paired-end-sequences.qza

The second command is:
qiime demux emp-paired
–m-barcodes-file sample-metadata.tsv
–m-barcodes-column BarcodeSequence
–i-seqs emp-paired-end-sequences.qza
–o-per-sample-sequences demux.qza
–p-rev-comp-mapping-barcodes

qiime demux summarize
–i-data demux.qza
–o-visualization demux.qzv

These commands are in this link below:
https://docs.qiime2.org/2019.1/tutorials/atacama-soils/#atacama-demux

Now what I am trying to achieve:

  1. How can I provide sample-metadata.tsv for my data?
  2. My barcode are in an Excel sheet but in tutorial said it is a fastq.gz file. now, how can I
    provide for this argument? --m-barcodes-column BarcodeSequence
  3. Should I change these arguments’ text:
    –i-seqs XXXXXXXXX
    –o-perXXXXXXXXXX
    –p-revXXXXXXXXXX
  4. Should I also change their content?
    qiime demux summarize
    –i-data demux.qza
    –o-visualization demux.qzv

I tried to give a complete reply to you. If there is a question, let me know.
Thanks


(Matthew Ryan Dillon) #6

Thanks @Mehrdad - let’s slow down before jumping into commands, we need to understand what you are working with first.

What do those 8 files represent? I am a bit confused, because it sounds like you have data that is already demultiplexed — are there 4 samples represented here (2 files per sample)? It isn’t possible to import 8 files as the EMPPairedEndSequences format, so how exactly did you accomplish that? Maybe the 8 files represent 4 sequencing runs?

One of the things I asked about above was the names of the files - any chance you can provide some insight there? Thanks! :qiime2:


(Meha) #7

Probably I should have been put the 8 files in four directories. Maybe I made a mistake. Honestly I had four time points of my data files. In other words, I think I should create four directories which each one involving (R1 and R2 fastq.gz files). In my eye, it is right way! but I am firstly working with Qiime thats why I made a mistake. But no problem. Now I know how can I do import and make .qza file.

I just created .qza file or artifact. That is all. I have not done demultiplexing step. I focused on doing that.

In tutorial, it is working on EMP, not Casava. My data are Casava because they have R1 and R2 fastq.gz files. But I am following the tutorial by this link
https://docs.qiime2.org/2019.1/tutorials/atacama-soils/#atacama-demux
because it is for paired-end data.

Your last question means I put Reverses in one directory and Forwards in another. It means I seperate Forwards from Reverses?

Thanks


(Matthew Ryan Dillon) #8

Hi there @Mehrdad — I am honestly a bit confused at this point — I need more information in order to help provide guidance for you to get started with QIIME 2. Can you please provide information about the filenames in your dataset? Please provide a screenshot, or, copy and paste the results of running ls or tree on your “raw” data directory (the files you received from your sequencing center).


(Meha) #9

Dear Friend,
Based on photo I attached, I have four libraries (each library has two fastq.gz file one of them is F and one of them is R) but I did not get any barcode file from there. I have to say I used different barcoded-primers in PCR for each library. Sequencing centre also used adaptors (a kind of barcode) when they wanted to mix libraries in a same sequencing (according to my colleague opinions sequencing center washed these adaptors at the end of sequencing).

I will re-import each library in different directories, not in ONE directory (last timre I put all libraries in one directory). I am saying again I did not get any barcode file from sequencing centre. I used Barcoded-primers differently for each library when I run PCR to make PCR amplicons. I have barcode and primer sequences which are in available in an excel file (I got these sequences from primer making company, not sequence center). If there is any unclear thing, let me know!

One of C file is zip file(quality control) in the photo. Ignore it!

Now my question stands in this command below:
You van visit the command in this link:
https://docs.qiime2.org/2019.1/tutorials/atacama-soils/

qiime demux emp-paired _
_ --m-barcodes-file sample-metadata.tsv _
_ --m-barcodes-column BarcodeSequence _
_ --i-seqs emp-paired-end-sequences.qza _
_ --o-per-sample-sequences demux.qza _
_ --p-rev-comp-mapping-barcodes

qiime demux summarize _
_ --i-data demux.qza _
_ --o-visualization demux.qzv

There are different parameters:point_up_2::point_up_2: One of them is metadata. As you know, one of the metadata . It must be metadata.tsv file extension but I do not know how to provide that.The second parameter is :point_up_2:regarding barcode which must have fastq.gz file I do not have it, but I have their sequences in an excel sheet. The third parameter I already made in importing step. It is ok! The fourth is parameter four called demux.qza I do not know I have to change its text or not!? The fifth is parameter five is called mapping barcode I have no clue what is it! And the last one I do not know what text should be about these parameters or do them need to be changed or not :
iime demux summarize
–i-data demux.qza
–o-visualization demux.qzv
I need just small hint to fix the issue!

Most recently I found this page! It sounds it is proper to my case. But its commands does not clear like EMP! There are more stuff. How can I handle it? Tell me what should I do based on details I showed you earlier.
https://docs.qiime2.org/2018.2/plugins/available/cutadapt/demux-paired/
I sent some photos to understand me efficiently. Maybe help!
Thanks


(Nicholas Bokulich) assigned thermokarst #11

(Matthew Ryan Dillon) #12

It sounds to me like you have already demultiplexed data (4 paired-end samples). This means you don’t need barcodes at all — you don’t need a barcode file, or barcodes in your sample metadata file. Barcodes are only used for demultiplexing multiplexed data, and since that step has already been done for you (my guess is your sequencing center did it for you), you don’t need to think about barcodes at all.

Please import your data using the FASTQ Manifest format. Once your data is imported you can skip demultiplexing and move straight to running demux summarize.

Another question that you will need to be able to answer before moving on to denoising is whether or not your primers/barcodes have been removed from the reads. Please consult your sequencing center for that if you are unsure - it is not possible for us or QIIME 2 to answer that question for you.


I strongly encourage that you spend an hour or two reading through the documentation for QIIME 2 — many people have volunteered countless hours towards crafting efficient, comprehensive docs — all of your questions asked here have been answered in greater detail in those docs.


(Matthew Ryan Dillon) unassigned thermokarst #13

(Meha) #14

I spend three days for reading the whole tutorial.
I imported my data with Casava paired-end format. In next step which is demultiplexing I must work with casava 1.8 paire-end format, but there is only EMP format.

My libraries have not been demultiplexed. I passed only Importing stage. That is all. By the way, each of library including treated and untreated samples. So absolutely I have to de multiplex them!
I am in demultiplexing step! My data were casava 1.8, not EMP.
I need demultiplexing command for casava 1.8. There is no plugin based on I received from Linux terminal. You told me I follow FASTQ manifest format, but in delmultiplexing step there are only two ways:

Read more about demultiplexing and give it a spin with the moving pictures tutorial (for single-end data) and Atacama soils tutorial (for paired-end data). Those tutorials cover EMP format data (as described in the importing docs). Have barcodes and primers in-line in your reads? See the cutadapt tutorials for using the demux methods in q2-cutadapt . Have dual-indexed reads or mixed-orientation reads or some other unusual format? Pray hard :pray:. Then check out the QIIME 2 forum to see if someone has found a workaround.”

I did not find any demultiplexing plugin for Casava 1.8 paired-end!
Shall I follow demultiplexing for EMP paired-end format? If yes, how would it be valid while I started with Casava 1.8 paired-end?

Existing formats are Moving Picture and Atacama, not further more!
I got confused!


(Matthew Ryan Dillon) #15

CASAVA 1.8 is already demultiplexed. As I said above:


(Matthew Ryan Dillon) split this topic #16

An off-topic reply has been split into a new topic: Can’t Find my Metadata

Please keep replies on-topic in the future.


(Matthew Ryan Dillon) closed #17