FastQC compared to QIIME 2 quality plot

thermokarst · November 20, 2018, 9:53pm

Okay, that is really helpful! First, your paired-end plot is based on 13 sequences (!!!), while your single-end plot is based on 555154.... So, that is a good first step in tracking down why the plots look so drastically different (the sample size for the box plots are orders of magnitude different). BTW, I pulled that information from the table on the first page of the viz near the top.

Second, by looking through the provenance of the visualizations, I noted that you demuxed the single-end reads using the --p-no-rev-comp-mapping-barcodes flag, while, the paired-end reads used the --p-rev-comp-mapping-barcodes flag, which is going to completely change how the demux process works. In the case of the paired-end reads, this took the reverse complement of your barcodes before demuxing.

So, before proceeding, maybe it makes sense to determine exactly what you need first --- do you need to take the reverse complement? Seems like that is a "no" but, that is something you should talk to whoever did sample prep (or maybe your sequencing center) about.

As far as your original post, where you were comparing the quality plots between QIIME 2 and FastQC --- what exactly did you plot in FastQC? Was it the original demuxed data, or was it one single sample? Again, we are missing some critical context there to help you. For example, for the Moving Pictures dataset, here is the FastQc plot for the full, multiplexed reads:

01%20PM

And here is the QIIME 2 demux summary, which is generated on the demultiplexed reads (which means not all reads will be present due to barcode errors), as well, the plot is made using a subsample of sequences (by default, 10,000):

15%20PM

Finally, the last thing to consider is that FastQC is binning nt positions, while QIIME 2 is plotting a boxplot for each nt position.

Hope that helps! :qiime2: