For brief context, I received sequencing data back (Illumina MiSeq 2x250) and randomly selected 3 samples (out of 240) to check in fastqc. I noticed a pattern, where in the reverse read, there is a major drop in quality at position 6bp in the "Per Base sequence quality". In the "Per base N content", there is also increased N at the same position of 6bp. I find it a little strange that the random 3 samples that I selected seemed to have the same pattern.
I wanted to ask if there is tool (similar to fastqc) where I can analyze and summarize the information of all 240 samples.
I haven't seen this before in other sequencing data that I have received, so I'd also like to understand if this is cause for concern. Thanks!
Because fastq files are the defacto standard for raw sequencing data, there are a lot of tools that summarize and visualize their quality!
Within Qiime2, there's qiime demux summarize which makes this output. Click on the tab called Interactive Quality Plot to see a similar graph to the one fastqc makes.
where I can analyze and summarize the information of all 240 samples.
Yes, the Qiime2 plugin provides summary stats across all samples you have imported! So if you import 240 samples, you will see the q-score box plot for all 240 samples at once!
I've seen this before. This is why plugins like DADA2 allow you to trim bases from the start of the read! Chopping off the first 6 bases from R2 is a good option!
And thank you for the reminder of already available tools within qiime2! I will definitely check out demux summarize and vsearch fastq-stats
I saw a post on SEQanswers about MultiQC and AfterQC, which seemed interesting, but I am currently unfamiliar with them.
I've seen this before. This is why plugins like DADA2 allow you to trim bases from the start of the read! Chopping off the first 6 bases from R2 is a good option!