does anyone know how to work with pyrosequencing reads in qiime2?

kevin_SalOrt · August 31, 2022, 8:42pm

hello friends!

I'm doing an exercise with sequences obtained by pyrosequencing and my first problem was to import them, so, in order to achieve it, I imported them with the help of a manifest file with SingleEndFastqManifestPhred33v2 format and it seems I succeeded.

Then, when generating the visualizer, I realized that the quality score is displayed differently than if it were obtained by illumina and here is my problem, because, for the denoising and then, to generate the ASVs, the next step (according to me, is dada2 denoise-pyro), I need to indicate the points cut in terms of quality and these are not reached to appreciate.
my (qzv file) quality socres:

but that is not similar to this (obtained from the web):

Does anyone know if the problem lies since I imported my data?
or maybe my process has errors?
does anyone know another way?

thanks

colinbrislawn · September 4, 2022, 4:59pm

Yes, denoise-pyro is the next step!

The quality plot looks like normal Roche 454 / Pyrosequencing data to me; the quality is more variable than Illumina, it's always single end, and the forward read length can be > 300 bp long.

Because that quality plot is not super useful, you may try running with several different settings for --p-max-ee and --p-trunc-len and see what works best.

gregcaporaso · September 6, 2022, 6:30pm

Hi @kevin_SalOrt,
Here's a quality plot from the Human Microbiome Project 454 data - I've recently been trying to work out a 454 workflow for this.

It has the features that @colinbrislawn notes: more variability, longer reads (but pretty low quality beyond about 300 bases), and single end (I didn't show the reverse reads section of the viz in that screenshot since nothing is there.

It's hard to tell what's going on in your plot - it'll be more informative if you can zoom in a bit (click and drag on the plot to select an area to Zoom in on). Whether your workflow will work will depend a lot on how the data has been prepared prior to QIIME 2. How did you get from the 454 files (e.g., sff or fasta/qual) to fastq? There are various conversion tools out there. It looks like your scores are in the appropriate range though, so you may be good here, but I recommend just inspecting those distributions a little more by zooming in and ensuring that your pre-QIIME steps are right.

Also, if you have primers or adapters in your reads, don't forget to trim those. I forgot to do that at first, since I'm used to Illumina protocols that don't sequence these.

I attached a provenance replay generated script from my QIIME 2 analysis of the HMP data so you can roughly see what I did, in case it's helpful. I was comparing against a pre-existing analysis result that used closed-reference OTU picking against Greengenes, so I took those steps in this workflow - that may or may not be something you want to do (avoid it, if you can - ASVs are generally better, but the QC provided by closed-reference OTU picking may be helpful on your data). My 454 workflow is still a work in progress, so definitely just treat this as a potentially useful reference, not an established protocol.

replay.bash (6.4 KB)

kevin_SalOrt · October 7, 2022, 11:49pm

Thanks it was very helpful

kevin_SalOrt · October 7, 2022, 11:50pm

Thanks it was very helpful

system · November 8, 2022, 5:50am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.