Interactive quality plot

Farzad_Aslani · March 12, 2020, 9:53am

Hi,
I run "qiime cutadapt demux-paired" plugin to split my Illumina data, which is multiplex paired-end sequence with barcodes in-sequence, into samples. After visualizing the output (demultiplexed01B.qza), I have got the attached interactive plots. I have two questions regarding these plots

:

the shape of box plots does not look like a normal interactive plot. The quality of sequences is not decreasing toward longer sequences as usual.
Most of the boxes are red instead of blue with this error (The plot at position 127 was generated using a random sampling of 9998 out of 19678677 sequences without replacement. This position (127) is greater than the minimum sequence length observed during subsampling (29 bases). As a result, the plot at this position is not based on data from all of the sequences, so it should be interpreted with caution when compared to plots for other positions. Outlier quality scores are not shown in box plots for clarity). Although I have read the comments have been made in the forum regarding this error, but I still do not know the reason. Is it a bug? or there is an error in my data or demultiplexing process?

Thank you very much in advance
Farzad

colinbrislawn · March 12, 2020, 2:23pm

Hello Farzad,

Welcome to the Qiime forums! :qiime2:

I think I know what's going on here, though I'm not sure of its root cause. Here was the biggest clue:

These plots show forward and reverse reads of 250 bp each, which is a common Illumina run length. However... why is it 9998 and not the default of 10,000 reads?

Can you scroll down on that page a bit and post your Demultiplexed sequence length summary?

Thanks,
Colin

Farzad_Aslani · March 13, 2020, 1:14am

Hi Colin,
Thanks for your response.
This is the summary requested.

=== Summary ===

Total read pairs processed: 11,871,738
Read 1 with adapter: 9,842,281 (82.9%)
Read 2 with adapter: 9,842,281 (82.9%)
Pairs that were too short: 40 (0.0%)
Pairs written (passing filters): 23,743,396 (200.0%)

Total basepairs processed: 5,935,869,000 bp
Read 1: 2,967,934,500 bp
Read 2: 2,967,934,500 bp
Total written (filtered): 5,704,439,571 bp (96.1%)
Read 1: 2,851,694,857 bp
Read 2: 2,852,744,714 bp

The log file is attached as well.Demultiplex01B.txt (229.6 KB)
Cheers,
Farzad

Farzad_Aslani · March 13, 2020, 2:48am

Sorry, I forgot to attach the demultiplexed sequence length summary.

colinbrislawn · March 13, 2020, 2:55pm

Thanks for sending me the log file and length summary.

I took a look at the log file, and I think this data set is OK. I would suggest continuing on to denoising and see if most of your reads make it through. If you don't discover any issues downstream, then you are good to go!

Sometimes the graphs are strange

If you want to post your stats after filtering, joining, and chimera checking I can take another look to make sure all is well.

Colin

Farzad_Aslani · March 16, 2020, 1:14am

Thank you Colin
I denoised the demultiplexed sequences and got the attached output.
My concerns are 1) very low percentage of input non-chimeric
and
2) high variations between samples.
I would be appreciated if you could let me know your point of view about the output of denoise stage.

Regards,
Farzadmetadata.tsv (5.2 KB)

colinbrislawn · March 16, 2020, 4:37pm

Hello Farzad,

Thanks for attaching the table!

That table shows the number of reads (and percent of reads) that made it through each stage of dada2 denoising. These stages are:

filtering
merging
chimera filtering

So now I have some questions for you:

In which stage were the most reads removed?
What could cause reads to be removed during this stage?

Answering these questions will tell us what to try next.

Colin
P.S. You are making great progress! Fastq files to a dada2 feature table is only 4 days is quite good.

Farzad_Aslani · March 17, 2020, 12:24pm

Hello Colin,

Based on the file attached to the previous post, 50-70% of sequences were removed by the filtering stage and the rest lost during merging and chimera filtering so that only less than one percent of reads remained in most samples. Unfortunately, I have no idea what caused this major losing sequences. I tried the deblur method and I got different feature table (much higher number of features and frequency) please seeDeblur stats.csv (9.9 KB).

Thank you for your kind help
Farzad

colinbrislawn · April 24, 2020, 2:39pm

Hello @Farzad_Aslani,

Sorry to keep you waiting. I wasn't really sure what to try next, but I got some advice from the devs and I think I have a way forward!

Try importing only the forward read, then process with dada2 or deblur. This avoids any issues with pairing, and gives you similar settings to balance quality vs quantity of your data.

This should also reduce the number of singletons.

Let me know if this works well for you, or if you found another way to move forward.

Colin

system · May 25, 2020, 9:08pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.