Dada2 vs Qiime2 plugin Dada2 on Quality score

Zentoo · June 29, 2021, 9:17am

Hi, I'm quite new in bioinformatics and working with paired-end FASTQ files (*.fastq.gz) that I think already demultiplexed by the sequencing facility.

I have imported these files by using manifest file and ran demux summarize to check the quality by using below commands:

qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path manifest.csv
--input-format PairedEndFastqManifestPhred33
--output-path demux-paired-end.qza

(Another quick question here, I used tried to use Phred33v2 with tsv file, but it gave an error message as below:

Traceback (most recent call last): File "/home/cbjeong/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/qiime2/sdk/util.py", line 90, in parse_format format_record = pm.formats[format_str] KeyError: 'PairedEndFastqManifestPhred33v2' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/cbjeong/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/q2cli/builtin/tools.py", line 157, in import_data artifact = qiime2.sdk.Artifact.import_data(type, input_path, File "/home/cbjeong/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/qiime2/sdk/result.py", line 206, in import_data view_type = qiime2.sdk.parse_format(view_type) File "/home/cbjeong/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/qiime2/sdk/util.py", line 92, in parse_format raise TypeError("No format: %s" % format_str) TypeError: No format: PairedEndFastqManifestPhred33v2 An unexpected error has occurred: No format: PairedEndFastqManifestPhred33v2 See above for debug info.
)

and then

qiime demux summarize
--i-data demux-paired-end.qza
--o-visualization paired-end-demux.qzv

These are the quality plots:

The Q score is dropping so rapidly which seems unusual, so I ran Dada2 in R.

And Q scores for foward reads of my seven samples were as below:

For me, the results from Dada2 in R looks more reliable.
I am not sure what I did wrong in Qiime2.
And is it correct to obtain only two representative plots for forward and reverse reads even though I imported 14 FASTQ files?
Any comments or advise would be highly appreciated!
Please help.

ChrisKeefe · June 29, 2021, 4:13pm

Welcome to the forum, @Zentoo! Let's address this guy before we get into any of your other questions. What version of QIIME 2 are you using? (running qiime --version in an active QIIME 2 env should get you that information.)

ChrisKeefe · June 29, 2021, 5:52pm

@Zentoo , after taking another look at this post, I've decided to reclassify it as User Support for now. I'm wondering whether you're general questions will resolve themselves when we troubleshoot your error message.

From your error traceback, it looks like you're running version 2021.4, so the V2 manifest formats should be available. That leaves the possibility that PairedEndFastqManifestPhred33v2 doesn't exist because it is spelled incorrectly. Can you spot the error?

How did you work around the error message in order to produce the quality plots you show above?

Zentoo · June 30, 2021, 5:50am

I really appreciate for your advise.

Yes, the version of my Qiime2 is 2021.4.
And I think there was no typo in my commands as you can see below.

What I did was using Phred33 version 1 with manifest csv file. Those quality plots were generated by [PairedEndFastqManifestPhred33] command.

And I would like to also note that the FASTQ files contained primer sequences which I don't think it matters for quality scores.

And again, is it right that the Dada2 in Qiime2 produces only two representative Q plots for forward and reverse reads?

Thank you for your help!

ChrisKeefe · June 30, 2021, 3:34pm

There was, in the name of the format mentioned in the error message.
PairedEndFastqManifestPhred33v2 != PairedEndFastqManifestPhred33V2

Yes, as is described below the plots you screenshot, that visualization randomly subsamples from all of your samples to summarize your data's quality.

Not sure what you mean by this. I suspect that if there are still primers in your sequences, their bases will also have quality scores assigned to them.

Without a key, it's hard to interpret the plots you shared, but it looks like the middle row of samples all have steep drops in data quality for some of the reads. Those are your deepest samples, and are likely to be well-represented in the subsampled plots provided by demux summarize. Unless I'm misreading, these plots are saying roughly the same thing.

ChrisKeefe · June 30, 2021, 3:45pm

@Zentoo , I think you have a valid General discussion question here (which I don't know how to answer), and that is "Why does my sequence quality drop suddenly around 135 bp?" That question is obscured in this topic by a) the error message and b) your claim that DADA2 and q2-dada2 are showing different results.

I'd be happy to continue discussing plot interpretation and or error messages here, but I'd like to encourage you to open a new topic in General Discussion that is focused on that core question. If you do, please include information about sequencing technology used, amplicons targeted, sample type, etc. that will help the community understand the relevant context. Hopefully someone here has seen your situation before and has useful feedback.

best of luck!
Chris

Zentoo · June 30, 2021, 5:43pm

Thank you very much for the answers. Now I found what I misunderstood.

If this doesn’t bother you, I would like to give one more question.
According to the quality scores of my sequence data, what would you recommend for trunc values?
Those sequences need to be trimmed to remove primer sequences which is 17 bp, and the amplicon size is expected as 320 bp.

Many thanks for your kind help!

ChrisKeefe · June 30, 2021, 6:50pm

@Zentoo, you're the only person who can make the right choices for your study about trimming and truncation parameters. There are lots of great discussions of how to set them on this forum, and I'd encourage you to take advantage of the search function if you're looking for more information on how others make those choices. The DADA2 paper is also a useful resource.

If, after researching, you have some more specific questions, please create new topics for them. We try to keep our discussions here to one question per topic.

Thanks, and good luck!
Chris

Zentoo · July 2, 2021, 2:14pm

Thank you so much for your answers!

Good luck!

system · August 2, 2021, 8:14pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.