Using QIIME2 to view 454 pyrosequencing data

perdita · November 18, 2020, 6:28am

Hi everyone, thanks for taking the time to read this.

I have some 454 pyrosequencing data from years ago, that was processed by Research and Testing. I would really like to take a look at it on QIIME 2 using an EC2 instance, but I am having a hard time. I converted the .fna and .qual files into a .fastq on QIIME(1) using convert_fastaqual_fastq.py, with the hope of using that to see what I could see. Before attempting to run anything though, I want to make sure that my methodology makes sense, and I'm hoping that someone more familiar with with these file types could tell me what I'm looking at, here.

This is an overview of RaT processing that was done before data delivery at the time, from the document that arrived with the samples.

When I look at the raw data I see lines like this: A-03-[Primer 1]::M02233:62:000000000-A9GLW:1:1116:21355:24484 in the sequence identifier / description fields. This sample identifier might show up in a few entries but with different numbers at the end. It looks like 8 base pair barcodes are present at the beginning of the sequences (5' end) and in quality data, and the primers were removed? Is there an easy way to see whether this is single end or paired end? I'm trying to figure out whether it's demultiplexed, and whether I can import this kind of data into QIIME 2 and have any hope of getting meaningful interpretations out of it. This isn't EMP or Casava 1.8 data from what I can see, but is the generated fastq file (from fna + qual) best fit to multiplexed fastq data, or by using a manifest file?

Any thoughts or advice on this would be very welcome. I really appreciate your time.

andrewsanchez · November 18, 2020, 11:19pm

Hi, @perdita!

Please correct me if I am mistaken, but it seems to me that the only relevant portion of this diagram is the upper left corner which shows that the 454 Sequencer generated .fna/.qual files. Given that you have the .fna/.qual files, and no* the results of the rest of that analysis process (right?), I don't think the rest of the diagram is relevant. Does that make sense or am I missing something?

It appears that the question is not so straightforward. Please see this discussion on biostars.

I need to do a bit more research to answer your other questions confidently. But elsewhere on the forum, people have discussed importing this type of data using a manifest file. For example, see this post: Importing 454 data to run dada2 denoise-pyro - #3 by Mdavrandi

You might also look into using q2-cutadapt for demultiplexing if required.

Also, depending on what you need to do, you might have other options.

perdita · November 19, 2020, 12:01am

Thank you for the reply! You are correct that I am picking up at the .fna/.qual node of the flowchart, however other versions of this lab's charts include "FASTA / Qual Prepared for QIIME" as an end step from the "Quality Checking / Demultiplexing" node, so I am trying to figure out what I am working with.

From Importing and Demultiplexing Sequence Data Quick Reference, and the QIIME2 Import Tutorial, it appears that I need to know if the data are multiplexed or not.

Where I am at:
Barcodes in Sequence (If they are only at the beginning, does that always imply single-end?)
Multiplexed? I think it is multiplexed, as several sequences share the same Barcode?
Need to strip out Barcodes to a metadata file for QIIME2 import?

Thanks for your time and input.

andrewsanchez · November 20, 2020, 11:46pm

Sounds like you are on the right track, @perdita!

That sounds right to me.

The metadata file can be used to map barcodes to sample-ids whereas the manifest file maps filepaths to sample IDs.

Hope that helps. Let me know if you still need a hand going forward. Feel free to share the data you're working with (here or in a DM).

system · December 22, 2020, 5:46am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.