Hi everyone, thanks for taking the time to read this.
I have some 454 pyrosequencing data from years ago, that was processed by Research and Testing. I would really like to take a look at it on QIIME 2 using an EC2 instance, but I am having a hard time. I converted the .fna and .qual files into a .fastq on QIIME(1) using convert_fastaqual_fastq.py, with the hope of using that to see what I could see. Before attempting to run anything though, I want to make sure that my methodology makes sense, and I'm hoping that someone more familiar with with these file types could tell me what I'm looking at, here.
This is an overview of RaT processing that was done before data delivery at the time, from the document that arrived with the samples.
When I look at the raw data I see lines like this: A-03-[Primer 1]::M02233:62:000000000-A9GLW:1:1116:21355:24484 in the sequence identifier / description fields. This sample identifier might show up in a few entries but with different numbers at the end. It looks like 8 base pair barcodes are present at the beginning of the sequences (5' end) and in quality data, and the primers were removed? Is there an easy way to see whether this is single end or paired end? I'm trying to figure out whether it's demultiplexed, and whether I can import this kind of data into QIIME 2 and have any hope of getting meaningful interpretations out of it. This isn't EMP or Casava 1.8 data from what I can see, but is the generated fastq file (from fna + qual) best fit to multiplexed fastq data, or by using a manifest file?
Any thoughts or advice on this would be very welcome. I really appreciate your time.
Please correct me if I am mistaken, but it seems to me that the only relevant portion of this diagram is the upper left corner which shows that the 454 Sequencer generated .fna/.qual files. Given that you have the .fna/.qual files, and no* the results of the rest of that analysis process (right?), I don't think the rest of the diagram is relevant. Does that make sense or am I missing something?
It appears that the question is not so straightforward. Please see this discussion on biostars.
I need to do a bit more research to answer your other questions confidently. But elsewhere on the forum, people have discussed importing this type of data using a manifest file. For example, see this post: Importing 454 data to run dada2 denoise-pyro - #3 by Mdavrandi
You might also look into using q2-cutadapt for demultiplexing if required.
Also, depending on what you need to do, you might have other options.
Thank you for the reply! You are correct that I am picking up at the .fna/.qual node of the flowchart, however other versions of this lab's charts include "FASTA / Qual Prepared for QIIME" as an end step from the "Quality Checking / Demultiplexing" node, so I am trying to figure out what I am working with.
Where I am at:
Barcodes in Sequence (If they are only at the beginning, does that always imply single-end?)
Multiplexed? I think it is multiplexed, as several sequences share the same Barcode?
Need to strip out Barcodes to a metadata file for QIIME2 import?