By looking at the sequence information, can I determine whether the barcodes are associated with the files and I need to remove them through demultiplexing step?
Do these 13 pairs of files correspond to your 13 samples? If you have separate files for separate samples, then your data has already been demultiplexed!
If your data has already been demultiplexed, you might not have to remove barcodes at all. If you do find you need to remove barcodes, you could import your data using the cutadapt plugin, which will remove them automatically.
Yes I have 13 pairs of files. So there are 13 forward reads and 13 reverse reads. If I am correct, (Please correct me if I am wrong) as I am having separate forward and separate reverse reads, it means my files are already demultiplexed.
I have uploaded the image file for all my 13 pairs of data.
I know demultiplexers remove the barcodes and everything preceding it. Suppose my data are already demultiplexed and then also I am demultiplexing it in q2, will it affect my data?
Nope. You will not be able to demultiplex twice, so you can't hurt your data.
While you do not need to demultiplex your data (because it is already demultiplexed!), you still need to import your data into Qiime. That's why I recommended the Fastq Manifest format.
Let me know what you try next and if you have any questions!
Colin
After removing the primers, when I did demux summarize, I got the following trimmed-remove-primers.qzv (294.6 KB)
By looking at the quality plot for trimmed-remove-primers.qzv file, I am deciding to trim using dada2 as --p-trim-left-f 0 --p-trim-left r 0 --trunc-len-f 200 --p-trunc-len-r 200
Those look great! Looks like trimming worked very well.
That makes sense to me!
Sometimes dada2 likes longer areas of overlap, and your quality is pretty good. You could also try some higher settings, like --trunc-len-f 220 --p-trunc-len-r 220
or maybe even --trunc-len-f 240 --p-trunc-len-r 220
After feature table construction, I got the following, rep-seqs-trimmed.qzv (296.1 KB)
I am wondering If i truncated the sequences at 200, why I am getting minimum length as 249 and maximum length as 312 in the sequence length statistics table. I guess both values should be 200 (As it was same truncating value in moving picture tutorial i.e 120).
(I've merged your new thread into your old thread because I think these questions are related.)
Dada2 truncates and trims reads before they are merged.
But this table lists read lengths after they are merged.
So before merging, all the reads are 200 bp
R1 ====================>
R2 <====================
But then after merging, they could be different total lengths depending on how much they overlap between reads.
R1 ====================>
R2 <====================
M: ========================== 250 after merging from 150 bp overlap
R1 ====================>
R2 <====================
M: =============================== 300 bp after merging from 100 bp overlap
So different lengths of overlap cause different read lengths.