Okay thanks Yes, the quality was very high with this round of sequencing. However, I lost most of my negative control reads after dada2 filtering step.
Awesome. Thanks! Do you think inputting demultiplexed files to QIIME would make any difference? Meaning that, would it be possible that QIIME would miss assigning some reads compared to the sequencing center? I only used multiplexed files because it was faster to read all at once into qiime. Reading in just two files (R1 and I1) instead of having so many demultiplexed sequences
It’s hard to say, probably not. The only difference might occur if different similarity thresholds are set between qiime2 demultiplexing and your sequencing facility. For example, some sequencing facilities allow for 1 mismatch in their barcodes because their barcodes are designed with the Hamming distance of those in mind, meaning 1 mismatch might not assign to a different sample. I’m guessing in qiime2, mismatches are not allowed, but perhaps @thermokarst or @Nicholas_Bokulich can confirm this. That means that perhaps reads that have 1 barcode mismatch might be dropped completely in qiime2, whereas they may have been saved with a different algorithm.
Like I said before though, the differences would be so minor that I wouldn’t get caught up with this
That’s correct — no barcode error correction.
I decided to try analyzing my data using paired-end reads from the exact same sequences. Weirdly, even though this is the exact same dataset, qiime plot I am getting is of much lower read quality. So what you see above is from the forward read. What I am uploading here is both reads. They are very different and even only looking at dark parts of the plot, still shows very low quality. Fastqc again shows much higher quality.
and out of the 35 samples, only 13 are shown here. I don’t know what happened to the rest.
Hey there @Negin - can you share the original QZV files used to generate the single and paired versions of the demux summary plots? We are missing a lot of critical information when you only provide the box plots.
Wouldn’t that mean that everyone would have access to my sequences if I share them here?
No, not if you share the QZVs.
ah okay. cool then. Here are the qzv files:
Here is the one for single-end
V4-20181012-demux.qzv (287.4 KB)
here is the paired-end
V4-20181012-demux-p.qzv (293.0 KB)
Okay, that is really helpful! First, your paired-end plot is based on 13 sequences (!!!), while your single-end plot is based on 555154… So, that is a good first step in tracking down why the plots look so drastically different (the sample size for the box plots are orders of magnitude different). BTW, I pulled that information from the table on the first page of the viz near the top.
Second, by looking through the provenance of the visualizations, I noted that you demuxed the single-end reads using the
--p-no-rev-comp-mapping-barcodes flag, while, the paired-end reads used the
--p-rev-comp-mapping-barcodes flag, which is going to completely change how the demux process works. In the case of the paired-end reads, this took the reverse complement of your barcodes before demuxing.
So, before proceeding, maybe it makes sense to determine exactly what you need first — do you need to take the reverse complement? Seems like that is a “no” but, that is something you should talk to whoever did sample prep (or maybe your sequencing center) about.
As far as your original post, where you were comparing the quality plots between QIIME 2 and FastQC — what exactly did you plot in FastQC? Was it the original demuxed data, or was it one single sample? Again, we are missing some critical context there to help you. For example, for the Moving Pictures dataset, here is the FastQc plot for the full, multiplexed reads:
And here is the QIIME 2 demux summary, which is generated on the demultiplexed reads (which means not all reads will be present due to barcode errors), as well, the plot is made using a subsample of sequences (by default, 10,000):
Finally, the last thing to consider is that FastQC is binning nt positions, while QIIME 2 is plotting a boxplot for each nt position.
Hope that helps!
Thank you for all the explanations. This makes sense. So as for your second question, I mentioned above that I used multiplexed file for making the fastqc but demultiplexed one for qiime2.
For your first note, I am pretty sure I should not take the reverse complement of the barcodes because obviously, what I am getting for the single-end makes more sense in terms of number of samples that I have. I used the paired-end code from the Atacama soil microbiome tutorial and I was not aware that this code is taking the reverse complement. I will try again with the correct code and update you.
Thanks so much
I tried what you said and it worked. Thanks! Just a quick question, how should my metadata changed from single-end to paired end in terms of the LinkerPrimerSequence? I used the V4 primer for read1 in my metadata when I used single-end. What should I include for the paired-end?
I am not sure that metadata column is actually being used here (i.e., the atacama tutorial), so do not worry about it. Please correct me if I am wrong or if I am contradicting other advice — I have not followed this whole topic thread so am not aware of all commands you are running.
I hope that helps!
but in general, would the LinkerPrimerSequence be ever used because I don;t know how to include both forward and reverse primers in my metadata file if needed.
LinkerPrimerSequence column was required in QIIME 1 metadata, but is not required in QIIME 2. In fact, many users don’t even wind up knowing what those values are for their runs, since the sequencing center demuxes and prepares their reads for them. If you don’t need it for anything you are doing, then don’t worry about including it.
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.