ABI reads with Ns: OTUs & classification

I worked through the pipeline and encountered an issue where the import feature did not accept any files in FASTA format. As a result, I had to convert the files to FNA format. Since I had both forward and reverse files, instead of merging them, I concatenated them with "NNNN" spacers between the forward and reverse sequences. After converting them to FNA format, I concatenated them again and used that as input for QIIME 1. Using the following commands, I was able to successfully import them into the table.qza and rep.qza files. I then classified with unites database trained with qiime2 classifier

qiime tools import
--input-path seqs.fna \
--output-path fungi_seqs.qza \
--type 'SampleData[Sequences]'

qiime vsearch dereplicate-sequences
--i-sequences fungi_seqs.qza \
--o-dereplicated-table fungi_table.qza
--o-dereplicated-sequences fungi_rep-seqs.qza
So now I want to do the phylogeny so I straight away went to moving picture tutorial and used the following comands
qiime feature-classifier classify-sklearn \
--i-classifier unite-ref-seqs.qza \
--i-reads fungi_rep-seqs.qza \
--o-classification fungi_taxonomy.qza

I want to know whether this is the right approach or I have to complete the entire vsearch pipeline ( Clustering sequences into OTUs using q2-vsearch — QIIME 2 2018.4.0 documentation ) that has been mentioned. My aim here is to find the fungal phylogenetic diversity in these samples. Please let me know whether I am doing the analysis in the right way or not. Please also let me know if it's wrong as well, and also explain to me why the OTU-based approach is better than this.

Hi @sree,

Did you try merging the reads using the tools I suggested in this post? From later posts in that thread it appeared that you did this. If so, there is no need to merge forward and reverse reads with Ns. In fact, joining with Ns is likely incorrect. A properly constructed capillary sequencing experiment should generate overlapping reads, that you can merge/assemble given the tools I referenced. There should be no need to join with Ns, as you've already merged them.

In fact, I reverse complimented the sequence in the PCR_14_ITS_4_F02.fastq file and was able to generally and manually align it to the PCR_14_ITS_1_F02.fastq:

It's not great, but I assume these reads are from two different isolates, with slight sequence variation (or errors?). So, the alignment is not perfect. Anyway, these look highly overlapping.

Again, once you have these merged reads you can follow the OTU tutorial I linked. Can you provide me with the forward and reverse reads for a couple of isolates?

1 Like

sequence.7z (1.2 KB)
Dear @SoilRotifer ,
sorry for the delayed response. I didn’t use the merger programs you mentioned. I am not that much experienced with scripting so I used this alternate way. But if the merger programs are the only solution to this then I will find a way to work with it. Only problem is these are fasta files so there is no quality scores to properly quality control with qiime2. If there is any alternate method please let me know

The tools I mentioned are standard graphical interface tools. No scripting knowledge required there.

I've thought about how you sent the ABI files to seqtk to generate fastq files based on the abi output. I think this is sensible. Though remember, you will not be able to denoise this data, only OTU clustering. You can try following this approach:

Copy / past all the "forward" fastq seqs into one file, and all of the "reverse" fastq seqs into another file. Make sure each read is labeled the same and in the same order in both files. then you should be able to import them into QIIME2. From here merge with qiime vsearch merge-pairs ... Then you should be able to follow this approach.

The type of data you are generating might be best analyzed by another suite of tools. Perhaps others have better ideas.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.