Hi,
I run "qiime cutadapt demux-paired" plugin to split my Illumina data, which is multiplex paired-end sequence with barcodes in-sequence, into samples. After visualizing the output (demultiplexed01B.qza), I have got the attached interactive plots. I have two questions regarding these plots
the shape of box plots does not look like a normal interactive plot. The quality of sequences is not decreasing toward longer sequences as usual.
Most of the boxes are red instead of blue with this error (The plot at position 127 was generated using a random sampling of 9998 out of 19678677 sequences without replacement. This position (127) is greater than the minimum sequence length observed during subsampling (29 bases). As a result, the plot at this position is not based on data from all of the sequences, so it should be interpreted with caution when compared to plots for other positions. Outlier quality scores are not shown in box plots for clarity). Although I have read the comments have been made in the forum regarding this error, but I still do not know the reason. Is it a bug? or there is an error in my data or demultiplexing process?
I think I know what's going on here, though I'm not sure of its root cause. Here was the biggest clue:
These plots show forward and reverse reads of 250 bp each, which is a common Illumina run length. However... why is it 9998 and not the default of 10,000 reads?
Can you scroll down on that page a bit and post your Demultiplexed sequence length summary?
Hi Colin,
Thanks for your response.
This is the summary requested.
=== Summary ===
Total read pairs processed: 11,871,738
Read 1 with adapter: 9,842,281 (82.9%)
Read 2 with adapter: 9,842,281 (82.9%)
Pairs that were too short: 40 (0.0%)
Pairs written (passing filters): 23,743,396 (200.0%)
Total basepairs processed: 5,935,869,000 bp
Read 1: 2,967,934,500 bp
Read 2: 2,967,934,500 bp
Total written (filtered): 5,704,439,571 bp (96.1%)
Read 1: 2,851,694,857 bp
Read 2: 2,852,744,714 bp
The log file is attached as well.Demultiplex01B.txt (229.6 KB)
Cheers,
Farzad
Thanks for sending me the log file and length summary.
I took a look at the log file, and I think this data set is OK. I would suggest continuing on to denoising and see if most of your reads make it through. If you don’t discover any issues downstream, then you are good to go!
Sometimes the graphs are strange
If you want to post your stats after filtering, joining, and chimera checking I can take another look to make sure all is well.
Thank you Colin
I denoised the demultiplexed sequences and got the attached output.
My concerns are 1) very low percentage of input non-chimeric
and
2) high variations between samples.
I would be appreciated if you could let me know your point of view about the output of denoise stage.
Based on the file attached to the previous post, 50-70% of sequences were removed by the filtering stage and the rest lost during merging and chimera filtering so that only less than one percent of reads remained in most samples. Unfortunately, I have no idea what caused this major losing sequences. I tried the deblur method and I got different feature table (much higher number of features and frequency) please seeDeblur stats.csv (9.9 KB).
Sorry to keep you waiting. I wasn’t really sure what to try next, but I got some advice from the devs and I think I have a way forward!
Try importing only the forward read, then process with dada2 or deblur. This avoids any issues with pairing, and gives you similar settings to balance quality vs quantity of your data.
This should also reduce the number of singletons.
Let me know if this works well for you, or if you found another way to move forward.