I am working with Ion Torrent amplicon (V3-V4) sequencing data that is already demultiplexed. The data is in FASTQ format, and the sequences look like this:
My goal is to convert these FASTQ files into PICRUSt2-supported formats:
.fna: A FASTA file with sequences.
.biom: A feature table in BIOM format.
So far, I’ve tried the following:
Converting the FASTQ file to FASTA format using the FASTX-toolkit.
Running qiime tools import for downstream processing.
However, I’m not sure how to:
Generate the required ASVs (amplicon sequence variants) for a feature table.
Build a compatible .biom table for PICRUSt2 analysis.
Could anyone guide me through the process or share a recommended workflow? Any pointers on using Qiime2 for these steps, or alternative tools/methods, would be greatly appreciated!
From what I understand, you would like to use the picrust2 software and you need 2 things:
a .fna file
a .biom file
I think this is a pretty simple process in qiime2!
First, you import your data into qiime2 using the manifest format or casava format instructions depending on your data. The above links describe when to use each format if you have questions.
Then run your imported data through DADA2 denoise-pyro. DADA2 denoise-pyro is specifically for IonTorrent, so make sure you are using denoise-pyro and not another action. After DADA2 denoise-pyro, you should end up with 2 things:
a req-seqs.qza which contains your dereplicated quality controled sequences. You will need to unzip the qza and go into the data directory. In the data directory there should be a .fasta file. I think this will be good enough for picrust2 becuase these file types are basically the same from what I understan dbut I bet there are some third party tools that can convert .fasta to an .fna if it gives you issues.
a feature table.qza. Again, if you unzip this qza, and go into the data dirctory, you will find a feature-table.biom. I believe this is what you are trying to get here:
I am not quite sure what you mean by this? Could you clarify what steps your did? It helps if you include any commands your ran!
Low Read Counts After DADA2 Denoising – Need Help with SILVA for V3-V4 Region OTU Assignment
Dear Qiime2 Community,
I am working with Qiime2 (qiime2-amplicon-2024.10) for amplicon sequencing analysis of the V3-V4 region. However, after running DADA2 denoising, I encountered an issue where my output FASTA file is significantly smaller than expected (original input ~30MB, but output is only 275 bytes with a single read). Below is my workflow:
Steps Taken
The exported rep-seqs.fasta file is only 275 bytes, containing a single read.
The input FASTQ files were large (~30MB), so I expected more sequences in my output.
Possible Cause & Next Steps
I believe I need to use the SILVA reference database for OTU assignment. I would like guidance on:
How to incorporate SILVA 138 reference files specifically for the V3-V4 region in my Qiime2 pipeline.
Whether the low read count issue might be related to DADA2 filtering parameters (e.g., --p-trunc-len 240).
Any insights or suggestions would be greatly appreciated.
You need help with how to use a classifier to classify your data.
My take parrots @colinbrislawn. We need to slow down and take this one step at a time. There is no point in classifying your reads (what you are calling OTU assignment), if you don't have confidence in the output that came from DADA2.
So lets first focus on your concerns about your DADA2 results and once we have tackled that, lets circle back around to classifting your reads. I think this is the best approach so we are not juggling too many things at once.
The first and most obvious issue that I see is what Colin pointed out. I specified in my first post that you would need to use qiime dada2 denoise-pyro because it is specifically for IonTorrent sequencing data but then according to your code you used qiime2 dada2 denoise-single
You should re-run dada2 with the appropriate method for your sequencing data.
I think the primary issue is that you are not using the right method denoise-pyro vs denoise-single. However, your parameters might be effecting how many sequences make it through DADA2.
The best way to know more about what happened during DADA2 is to check the stats that you get from DADA2.
Did you look at this file? Did it tell you anything about what step in the process filtered out your reads? This is a very important file to be checking everytime you run DADA2 so you know what percentage of your sequences made it through each step in the DADA2 proccess.
Without seeing denoising-stats table, its basically impossiible for me to suggest motifications to your trim parameters.
NEXT STEPS:
Re-run DADA2 using qiime dada2 denoise-pyro.
Check the denoising stats, see what percentage of reads made it though. Here is our high level video on denoising and here is our finer detailed video on running qiime2 dada2 and how to interpret outputs. I think these could be helpful but remember that your data/pipeline will be alittle different since you have IonTorrent sequencing data and the tutorial does not.
Let us know if you have any questions about parameter selections, but remember at the end of the day you are the data scientist here so you will have to make the final desicions about what parameters make sense for your data
Come back once your have fine tuned your DADA2 parameters and are happy with the outputs. I'd be happy to answer any classifier questions once you have representative sequences that you are happy with.