Converting Ion Torrent Amplicon Data (V3-V4) to PICRUSt2-Compatible Files (.fna and .biom)

Sudipto_Biswas · January 28, 2025, 7:28am

Hi Qiime2 Community,

I am working with Ion Torrent amplicon (V3-V4) sequencing data that is already demultiplexed. The data is in FASTQ format, and the sequences look like this:

@27D1N:01366:11662
CCTACGGGAGGCAGCAGTAGGGAATCTTCCGCAATGGACGAAAGTCTGACGGAGCAACGCCGCGTGAGTGATGAAGGCTTTCGGGTCGTAAAACTCTGTTGTTAGGGAAGAACAAGTACGAGAGTAACTGCTCGTACCTTGACGGTACCTAACCAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAAT
+
:699;<>7::6::;;999=??4:6::<6<6<=>7<;7;:555*599;<===7=<==7:::689;<:886...54044<==8==@8>>===>?7?>==::4:<6;;=7<8<<8<=9====8898:<<8>>===<;=<<8=7>:::..-;68:49289:9-674:><67788;;;<<==<:=<<;9:6;83/--)-
@27D1N:01374:11722
CCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGCGAAAGCCTGATGCAGCAACGCCGCGTGAGGGATGACGGCCTTCGGGTTGTAAACCTCTTTTAGCAGGGAAGAAGCGAAAGTGACGGTCCTAGCAGAAAAAAGCGCCGGCTAACTACGTGACCAGCAGCCGCGGTAAT
+
B9>==?C=<<8<<:<<===?@5>9<<<8=>===:===7<988.6;5=<777:<<<?9<=<9<<<<<=<>7===;;;582:5::9397<;;<4;6:8888)8:;<<>7=9>>8<<888*7<::9:4774...7:777777%7;8728387837-,,,,,,,2478::95944-43-3

My goal is to convert these FASTQ files into PICRUSt2-supported formats:

.fna: A FASTA file with sequences.
.biom: A feature table in BIOM format.

So far, I’ve tried the following:

Converting the FASTQ file to FASTA format using the FASTX-toolkit.
Running qiime tools import for downstream processing.

However, I’m not sure how to:

Generate the required ASVs (amplicon sequence variants) for a feature table.
Build a compatible .biom table for PICRUSt2 analysis.

Could anyone guide me through the process or share a recommended workflow? Any pointers on using Qiime2 for these steps, or alternative tools/methods, would be greatly appreciated!

Thank you in advance for your help!

cherman2 · January 28, 2025, 3:48pm

Hi @Sudipto_Biswas,

Welcome to the :qiime2_square: forum!

From what I understand, you would like to use the picrust2 software and you need 2 things:

a .fna file
a .biom file

I think this is a pretty simple process in qiime2!

First, you import your data into qiime2 using the manifest format or casava format instructions depending on your data. The above links describe when to use each format if you have questions.

Then run your imported data through DADA2 denoise-pyro. DADA2 denoise-pyro is specifically for IonTorrent, so make sure you are using denoise-pyro and not another action. After DADA2 denoise-pyro, you should end up with 2 things:

a req-seqs.qza which contains your dereplicated quality controled sequences. You will need to unzip the qza and go into the data directory. In the data directory there should be a .fasta file. I think this will be good enough for picrust2 becuase these file types are basically the same from what I understan dbut I bet there are some third party tools that can convert .fasta to an .fna if it gives you issues.
a feature table.qza. Again, if you unzip this qza, and go into the data dirctory, you will find a feature-table.biom. I believe this is what you are trying to get here:

I am not quite sure what you mean by this? Could you clarify what steps your did? It helps if you include any commands your ran!

I hope this helps!

Sudipto_Biswas · January 29, 2025, 11:49am

Low Read Counts After DADA2 Denoising – Need Help with SILVA for V3-V4 Region OTU Assignment

Dear Qiime2 Community,

I am working with Qiime2 (qiime2-amplicon-2024.10) for amplicon sequencing analysis of the V3-V4 region. However, after running DADA2 denoising, I encountered an issue where my output FASTA file is significantly smaller than expected (original input ~30MB, but output is only 275 bytes with a single read). Below is my workflow:
Steps Taken

Importing Sequences

qiime tools import
--type 'SampleData[SequencesWithQuality]'
--input-path /home/sudiptobiolab/Qiime2_trialset/manifest.tsv
--output-path demux.qza
--input-format SingleEndFastqManifestPhred33

Demultiplexing Summary

qiime demux summarize
--i-data demux.qza
--o-visualization demux.qzv

DADA2 Denoising

qiime dada2 denoise-single
--i-demultiplexed-seqs demux.qza
--p-trim-left 0
--p-trunc-len 240
--o-table table.qza
--o-representative-sequences rep-seqs.qza
--o-denoising-stats denoising-stats.qza

Exporting Outputs

qiime tools export --input-path rep-seqs.qza --output-path exported-rep-seqs
qiime tools export --input-path table.qza --output-path exported-table

Issue

The exported rep-seqs.fasta file is only 275 bytes, containing a single read.
The input FASTQ files were large (~30MB), so I expected more sequences in my output.

Possible Cause & Next Steps

I believe I need to use the SILVA reference database for OTU assignment. I would like guidance on:

How to incorporate SILVA 138 reference files specifically for the V3-V4 region in my Qiime2 pipeline.
Whether the low read count issue might be related to DADA2 filtering parameters (e.g., --p-trunc-len 240).

Any insights or suggestions would be greatly appreciated.

Thanks in advance for your help!

Best regards,
Sudipto

colinbrislawn · January 29, 2025, 11:59am

Hello Sudipto,

I think you make a mistake here:

Do you see the problem?

Well, you may not like this, but my advice is to slow down.

The solution to your problem has already been posted!

Sudipto_Biswas · January 29, 2025, 2:46pm

where is the problem i didn't found , actually i need to train my feature classifier

cherman2 · January 29, 2025, 3:22pm

Hi @Sudipto_Biswas,
Seems like you have 2 issues here:

You are concerned about your DADA2 Results
You need help with how to use a classifier to classify your data.

My take parrots @colinbrislawn. We need to slow down and take this one step at a time. There is no point in classifying your reads (what you are calling OTU assignment), if you don't have confidence in the output that came from DADA2.

Sudipto_Biswas:

I believe I need to use the SILVA reference database for OTU assignment. I would like guidance on:
How to incorporate SILVA 138 reference files specifically for the V3-V4 region in my Qiime2 pipeline.

So lets first focus on your concerns about your DADA2 results and once we have tackled that, lets circle back around to classifting your reads. I think this is the best approach so we are not juggling too many things at once.

The first and most obvious issue that I see is what Colin pointed out. I specified in my first post that you would need to use qiime dada2 denoise-pyro because it is specifically for IonTorrent sequencing data but then according to your code you used qiime2 dada2 denoise-single

You should re-run dada2 with the appropriate method for your sequencing data.

I think the primary issue is that you are not using the right method denoise-pyro vs denoise-single. However, your parameters might be effecting how many sequences make it through DADA2.

The best way to know more about what happened during DADA2 is to check the stats that you get from DADA2.

Did you look at this file? Did it tell you anything about what step in the process filtered out your reads? This is a very important file to be checking everytime you run DADA2 so you know what percentage of your sequences made it through each step in the DADA2 proccess.

Without seeing denoising-stats table, its basically impossiible for me to suggest motifications to your trim parameters.

NEXT STEPS:

Re-run DADA2 using qiime dada2 denoise-pyro.
Check the denoising stats, see what percentage of reads made it though. Here is our high level video on denoising and here is our finer detailed video on running qiime2 dada2 and how to interpret outputs. I think these could be helpful but remember that your data/pipeline will be alittle different since you have IonTorrent sequencing data and the tutorial does not.
Let us know if you have any questions about parameter selections, but remember at the end of the day you are the data scientist here so you will have to make the final desicions about what parameters make sense for your data
Come back once your have fine tuned your DADA2 parameters and are happy with the outputs. I'd be happy to answer any classifier questions once you have representative sequences that you are happy with.

Sudipto_Biswas · February 25, 2025, 11:08am

I imported single end read V3 .fastq file of ion torrent sequencing
using manifest file .tsv
file description :
sample-id,absolute-filepath,direction
SM1,/home/sudiptobiolab/Qiime2_trialset/Test 1150/SM2.fastq,forward

succesfully imported and i'm able to
cutadapt trim-single

then i checked it in qiime view

looks like this

then I used this command to get representative sequence and table.qza file

qiime dada2 denoise-pyro
--i-demultiplexed-seqs trimmed_demux.qza
--p-trunc-len 140
--p-trim-left 0
--p-max-len 150
--p-max-ee 4
--p-pooling-method independent
--p-chimera-method consensus
--o-table dada2out/table.qza
--o-representative-sequences dada2out/rep-seqs-dada2.qza
--o-denoising-stats dada2out/stats.qza

where the input.fastq

file is like 32MB size the output file is like 20kb each

is it normal ?

I tried opening this in txt editor where its like lots of 0000000 there

rep-seqs-dada2.qza (18.9 KB)
table.qza (20.2 KB)
stats.qza (18.3 KB)

(I used trunc-len 180)

how can I fix this issue or please give me some suggestions to make it correct

My requirement is to get classified by silva database
Feature table which contains the abundance of ASVs(biom table)

a qiime artiface of types Feature Data[sequence] which contain the sequence of each ASV in FASTA format

cherman2 · February 28, 2025, 9:10pm

Hi @Sudipto_Biswas,

This is not normal. I took a look through you stats.qza (which I would recommend taking a look at when you are trying to pick appropriate parameters for DADA2.) and it looks like only .39% of your reads are making it through the filtering steps. This is not normal or ideal.

I believe that your issue is the --p-max-len parameter

the help text for this parameter says:

Remove reads prior to trimming or truncation which
are longer than this value. If 0 is provided no
reads will be removed based on length.

So you are removing all reads longer than your max-length before truncation and looking at your demux, most of your sequencing are longer than 140 or 180.

I would try running this command again, you could use the default value of 0 or if you have a opinion of what read length would be too long for your targeted region then you could select that value but I would increase it from 180, since most of your reads are longer than that!

system · April 1, 2025, 3:11am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.