Problem with capillery FASTA data handling

sree · July 23, 2025, 4:40pm

Dear all,
I am analysing fungi data obtained from a capillery sequencer. I have converted the fasta file to fastq files using seqtk. After loading them into QIIME2 using the type paired end sequencing and the result I am getting from the qzv file is only 1 sequence per sample. So I separated the forward and reverse sequences and analysed individually. Still after making QZV files only one sequence is reported. Could you please let me know what to do to resolve this issue also if you suggest an alternative method to analyse fasta files from the capillery sequencer it will be very helpful.
fungi_forward-demux.qzv (303.1 KB)
fungi_reverse-demux.qzv (303.1 KB)

colinvwood · July 23, 2025, 5:19pm

Hello @sree,

Which command did you use to import the sequences? How many sequences do you expect to have per sample? Do you have one fastq file per sample? Does capillary sequencing produce two read directions?

sree · July 24, 2025, 6:52am

Dear @colinvwood
I created manifest file to import the data. Per sample using grep command I am getting around 400 to 600 bp. No I have forward and reverse Fasta file not Fastq file. So I converted all the forward and reverse Fasta file to Fastq files using Seqtk. Yes, the data I have is generated from a capillery sequencer and per sample it has two files both forward and reverse.

colinvwood · July 24, 2025, 6:29pm

Hello @sree,

I meant how many sequences per sample, not base pairs per sequence. Would you mind attaching one of the fastq files that you have, if it's not too large? If it is too large, could you take the first 100 records or so and attach those?

sree · July 26, 2025, 8:14pm

PCR_14_sample1R1.txt (325 Bytes)
PCR_14_sample1R2.txt (574 Bytes)

PCR_14_ITS_1_F02.fastq (604 Bytes)
PCR_14_ITS_4_F02.fastq (1.1 KB)
I am unable to upload the fasta file so attached it as text. I have also attached the converted fastq files

cherman2 · July 28, 2025, 5:11pm

Hi @sree,
Looking at the fastq files you uploaded it looks like these files only have one sequence per sample. If this is not what you are expecting, I think you will have to talk to the individuals that generated the data.

sree · July 29, 2025, 5:35am

Dear @cherman2
This is actually forward and reverse sample of PCR 14 and this is generated from capillary sequencer.
I am stuck with the analysis because most of the samples are going after creating the table file. So now I am individually analysing the samples and planning to merge the rep.seqs at the end. Is this a good approach?

colinvwood · July 29, 2025, 5:58pm

Hello @sree,

I'm unsure what you mean by "the samples are going after creating the table file". I think that the nature of your data is possibly such that it doesn't make much sense to analyze with qiime2, given that you only have one sequence per sample and thus will have only one feature per sample in your feature table.

sree · July 30, 2025, 1:12pm

Dear @colinvwood
I understand that. Samples I meant to say sequences. Since Per sample I have one sequence, after quality control many samples are going and out of 19 samples I am left with only one or two files to go for taxonomy profiling. If Qiime2 is not a good option for my analysis could you please let me know what is a better way. As of now I am analysing one by one and finally trying to merge all the table.qza and rep.qza file will that be a good method.

colinvwood · July 30, 2025, 5:21pm

Hello @sree,

If Qiime2 is not a good option for my analysis could you please let me know what is a better way.

It depends on what your goal is. Diversity analyses are going to be difficult and not very meaningful with such a small number of features.

yangyue · July 31, 2025, 1:52am

Shall QIIME2 developing with the function for one sequence testing or annotiation in the future?@colinvwood I think this post author need a one sequence service from qiime2-platform, otherwise other bio-informatic platform maybe suitable for the posted situation to be resoving. I just want to ask the post author why you convert fasta files to fastq files?@sreeIf you persuade me or give me a full wise explanation, I may help you solve something which is troubling with you.

sree · July 31, 2025, 7:57am

Dear @yangyue ,
Thank you for the response. These sequence files are generated from the capillary sequencer. They directly output a FASTA file. Since qiime2 or dada2 only takes fastq files, I did the conversion. The conversion was done using seqtk. I have both forward and reverse reads for a sample, so I have only two sequences per sample. So I thought it would be good to do a paired-end analysis. That is one of the main reasons I converted the fasta file to a fastq file, so that there should not be any exporting issue with qiime2. When I tried to export the FASTA files without quality information for the sequence, the pipeline showed an error. I exported the files, creating a manifest file.

sree · July 31, 2025, 7:57am

Dear @colinvwood ,
This is a small exploratory study; we want to know whether the species of interest are present in the samples or not. Thats why such a small sample size also we are planning to collect more than 150 samples based on these results. But I am stuck with this initial analysis as after the table file creation my majority samples are getting removed

SoilRotifer · July 31, 2025, 3:02pm

Hi @sree,

I come from an age of merging and assembling reads using sanger data, specifically, using the electropherograms / trace files of many paired reads to merge like you are currently doing, as well as assembling a portion of a mitochontrial genome a couple of decades ago. Mainly using commercial tools like DNASTAR (Seqman), Sequencher, and Genious, etc.. I might have some insight...

First, some of the above mentioned tools offer free trials that you may be able to use to merge your reads...

However, I think there are some free and open source tools you can use to properly merge your reads which make use of the elechtropherogram files, basically similar to fastq in the sense that they have a quality measurements that can be used to guide the merging of the reads.

I've come across, merge_sanger_sequences, TraceTrack, and Tracy which may be able to help you merge your *.seq files from the sanger sequencer into a single sequence.

There are likely many other tools or packages in python or R that I am not aware of. But the key point is to use the proper tools to merge/assemble sanger reads rather than making up fake quality scores to merge your reads.

I hope this helps!

sree · August 6, 2025, 6:20am

Dear @SoilRotifer ,

Thank you for the suggestion. But even after merging the fasta files, I have to convert them into fastq files to import into qiime2. Then, after quality control using dada2, the majority of my samples are not passing the filtering criteria, and I am left with only one or two samples to go further for alpha, beta, and taxonomic assignment and moreover this is a fungal dataset. Could you please help me to resolve this issue

SoilRotifer · August 6, 2025, 2:45pm

Hi @sree,

As I implied converting your data to FASTQ and then running your data through DADA2. is incorrect You are essentially violating the assumptions of DADA2 by using fake quality scores. Remember, the point of DADA2 is to denoise, i.e. error correct, raw FASTQ data as generated by the Illumina, 454, PacBio platforms.

EDIT (2025-08-11):
Using converted quality scores might lead to erroneous DADA2 output. Mainly, due to the fact that the sequences are not derived from one of the above platforms, which might have different error profiles for DADA2 to appropriately work with.

I suggest just keeping with traditional OTU clustering at 97-99% similarity. Following this approach as outlined from the older documentation. I am unsure is the new documentation has been updated to include this.

FYU, I’ll be intermittently offline for a few days, so hopefully others can help you in the interum.

-Cheers!

sree · August 7, 2025, 8:36am

Dear @SoilRotifer ,

I am not using any fake quality score these are quality scores from the AB1 files obtained from capillary sequencer I use seqtk and custom script to make use of these AB1 files to assign the quality scores. If there is anyway I can simply import and use the fasta files and do the alignment and taxonomy mapping and further perform the alpha and beta diversity without loosing my samples it was convenient. Please anyone know such a way?

SoilRotifer · August 7, 2025, 12:25pm

When I was speaking of the quality scores, I was referring to using DADA2. As you mentioned you were going to use... Denoising tools, like DADA2 require FASTQ files.

Yes, follow the link I previously posted for taking your merged fasta files (generated from your ABI files) and constructing OTUs. Once you work through that short tutorial you can enter any of the other tutorials after the DADA2 steps. You can skip to the portions you need.

See the Docs page, scroll down to the Dataset focused tutorials.

sree · August 8, 2025, 5:44am

Thank you very much I will work through it. I will update in this forum soon.

SoilRotifer · September 2, 2025, 5:28pm

An off-topic reply has been split into a new topic: Joined PE reads with Ns classification

Please keep replies on-topic in the future.