How to determine whether fastq files are already demultiplexed?

Dear all,

I have 26 fastq files (Paired ends 13 R1 and 13 R2). I am not sure whether my files require the demultiplexing step or not.

When I opened each fastq file, it is as follows

The files are attached as follow


Bac18-041119-F2-R22_S233_L001_R1_001.fastq.gz (2.2 MB) Bac18-041119-F2-R22_S233_L001_R2_001.fastq.gz (2.5 MB)

By looking at the sequence information, can I determine whether the barcodes are associated with the files and I need to remove them through demultiplexing step?

1 Like

Hello @Parul_Baranwal,

Do these 13 pairs of files correspond to your 13 samples? If you have separate files for separate samples, then your data has already been demultiplexed!

You could import these 13 samples using a fastq manifest format:
https://docs.qiime2.org/2020.2/tutorials/importing/#fastq-manifest-formats


Good question!

If your data has already been demultiplexed, you might not have to remove barcodes at all. If you do find you need to remove barcodes, you could import your data using the cutadapt plugin, which will remove them automatically.

Let us know if you have any other questions!

Colin

2 Likes

P.S. We have been working on a Quick Reference Guide for Demultiplexing Fastq Files. Check it out and let us know if it’s helpful!

2 Likes
  1. Yes I have 13 pairs of files. So there are 13 forward reads and 13 reverse reads. If I am correct, (Please correct me if I am wrong) as I am having separate forward and separate reverse reads, it means my files are already demultiplexed.
    I have uploaded the image file for all my 13 pairs of data.

  1. I know demultiplexers remove the barcodes and everything preceding it. Suppose my data are already demultiplexed and then also I am demultiplexing it in q2, will it affect my data?

You are correct!

Nope. You will not be able to demultiplex twice, so you can't hurt your data.


While you do not need to demultiplex your data (because it is already demultiplexed!), you still need to import your data into Qiime. That's why I recommended the Fastq Manifest format.

Let me know what you try next and if you have any questions!
Colin

@colinbrislawn Thank you so much!
I imported my data like this

Is this correct?

When you use qiime demux summarize to look at the quality score, do you see all your sample names in the file? How does the quality look?

I was suggesting the Fastq Manifest format, but if Casava works for your data set, that’s good too!

Qiime 2 is pretty flexible so you can choose whatever works well for your data.

Colin

1 Like

when i did demux summarize, I got the following
demux-paired-end.qzv (289.2 KB)

After removing the primers, when I did demux summarize, I got the following
trimmed-remove-primers.qzv (294.6 KB)
By looking at the quality plot for trimmed-remove-primers.qzv file, I am deciding to trim using dada2 as --p-trim-left-f 0 --p-trim-left r 0 --trunc-len-f 200 --p-trunc-len-r 200

Can you please suggest am I going correct?

Thank you

1 Like

Those look great! Looks like trimming worked very well.

That makes sense to me!

Sometimes dada2 likes longer areas of overlap, and your quality is pretty good. You could also try some higher settings, like
--trunc-len-f 220 --p-trunc-len-r 220
or maybe even
--trunc-len-f 240 --p-trunc-len-r 220

Let me know what you find!

Colin

1 Like

Thank you for the reply @colinbrislawn
I tried to run other two settings,. But only was able to run one due to limited memory space.

When I ran the --trunc-len-f 200 --p-trunc-len-r 200, I got following
stats-dada2-trimmed.qzv (1.2 MB)

When I ran the --trunc-len-f 220 --p-trunc-len-r 220, I got following
stats-dada-trimmed220.qzv (1.2 MB)

I guess I am getting better non-chimeric inputs when I am truncating at 200 as compare to 250.
Please suggest!

Both 200 and 220 look pretty good. The 200 file appears to have larger number of reads retained, so I would use that one!

Great progress! Raw fastq file to a dada2 feature in just a week!

Colin

1 Like

Hello everyone,

I am running the 13 paired sequence (16S) data.

I got the quality reads as follow:
demux-paired-end.qzv (289.2 KB)

I trimmed the primer using cutadapt and then I using DADA2 for trimming and truncating. I used the command as


I got the following:
stats-dada2-trimmed.qzv (1.2 MB)

After feature table construction, I got the following,
rep-seqs-trimmed.qzv (296.1 KB)

I am wondering If i truncated the sequences at 200, why I am getting minimum length as 249 and maximum length as 312 in the sequence length statistics table. I guess both values should be 200 (As it was same truncating value in moving picture tutorial i.e 120).

Is the merging not proper?

Please help! Thank you so much in advance!

Hello again,

(I've merged your new thread into your old thread because I think these questions are related.)

Dada2 truncates and trims reads before they are merged.
But this table lists read lengths after they are merged.

So before merging, all the reads are 200 bp

R1 ====================>
R2 <====================

But then after merging, they could be different total lengths depending on how much they overlap between reads.

R1 ====================>
R2      <====================
M: ========================== 250 after merging from 150 bp overlap

R1 ====================>
R2           <====================
M: =============================== 300 bp after merging from 100 bp overlap

So different lengths of overlap cause different read lengths.

Colin

2 Likes

Thank you again for instant reply and for explanation!

So do you think the merging is okey and I can continue ahead!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.