There was a problem importing seqs.fna: seqs.fna is not a(n) QIIME Demux Format file

Rosie · May 6, 2021, 1:36pm

Hello everyone!

I would like to cluster sequences into OTUs using QIIME 2.
I am following the tutorial described here.

I have sequences in .fastq.gz format (2 files per sample, R1 and R2), but I need them to be in a .fna format (like the ones generated by the QIIME 1 split_libraries*.py command) to import them into QIIME 2.

First of all, I have trimmed sequences to remove primers and I have merged and transformed fastq file into fasta ones through the following commands

for R1 in *_R1_*.fastq.gz ; do vsearch --fastq_mergepairs ${R1} --reverse ${R1/_R1/_R2} --fastaout ${R1/R1_*/merged.fasta} --relabel ${R1}
cat *.fasta > seqs.fna

When I try to import the seqs.fna file as an artifact (otu-seqs.qza), the script gives me the following error:

qiime tools import --input-path seqs.fna --output-path otu-seqs.qza --type 'SampleData[Sequences]'

There was a problem importing seqs.fna: seqs.fna is not a(n) QIIME Demux Format file

Of course this file was not generated by QIIME 1, but I still need to import my data as an artifact. I have also tried to modify the --type, but this did not solve my problem.
Besides, split_libraries*.py is helpful to perform demultiplexing, but I don't need it, since MiSeq has already generated demultiplexed sequences.
Is there another way to perform OTU clustering using QIIME 2?
Any help would be much appreciated!!!
All the best,

Rosie

PS: This is how seqs.fna file looks like:

head seqs.fna 
RunCDSC01_S1_L001_R1_001.fastq.gz1
TACGGAGGATCCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTTTGATAAGTTAGAGGTGAAATACC
GGTGCTTAACACCGGAACTGCCTCTAATACTGTTGAACTAGAGAGTAGTTGCGGTAGGCGGAATGTATGGTGTAGCGGTG
AAATGCTTAGAGATCATACAGAACACCGATTGCGAAGGCAGCTTACCAAACTATATCTGACGTTGAGGCACGAAAGCGTG
GGGAGCAAACAGG
RunCDSC01_S1_L001_R1_001.fastq.gz2
TACAGAGGTCTCAAGCGTTGTTCGGAATCACTGGGCGTAAAGCGTGCGTAGGCTGTTTCGTAAGTCGTGTGTGAAAGGCG
CGGGCTCAACCCGCGGACGGCACATGATACTGCGAGACTAGAGTAATGGAGGGGGAACCGGAATTCTCGGTGTAGCAGTG
AAATGCGTAGATATCGAGAGGAACACTCGTGGCGAAGGCGGGTTCCTGGACATTAACTGACGCTGAGGCACGAAGGCCAG
GGGAGCGAAAGGG

tail seqs.fna 
RunCDSM20_S50_L001_R1_001.fastq.gz63490
TACGGAGGATCCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGAGCGTAGGCGGACGCTTAAGTCAGTTGTGAAAGTTT
GCGGCTCAACCGTAAAATTGCAGTTGATACTGGGTGTCTTGAGTACAGTAGAGGCAGGCGGAATTCGTGGTGTAGCGGTG
AAATGCTTAGATATCACGAAGAACTCCGATTGCGAAGGCAGCTTGCTGGACTGTAACTGACGCTGATGCTCGAAAGTGTG
GGTATCAAACAGG
RunCDSM20_S50_L001_R1_001.fastq.gz63491
TACGGAGGATCCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGAGCGTAGGTGGACTGGTAAGTCAGTTGTGAAAGTTT
GCGGCTCAACCGTAAAATTGCAGTTGATACTGTCAGTCTTGAGTACAGTAGAGGTGGGCGGAATTCGTGGTGTAGCGGTG
AAATGCTTAGATATCACGAAGAACTCCGATTGCGAAGGCAGCTCACTGGACTGCAACTGACACTGATGCTCGAAAGTGTG
GGTATCAAACAGG

llenzi · May 6, 2021, 1:53pm

Hi @Rosie,

To me, the tutorial you pointing is just an example, should work as well if you
import the sequences as pairs straight at the beginning, then performing the primer removal and merging within qiime2 (the vsearch joinpairs uses mergepairs option in vsearch too, join-pairs: Join paired-end reads. — QIIME 2 2021.4.0 documentation)
Then you could go straight to the vsearch dereplication in qiime2.
In this way you will avoid the passage on fna file!
Hope it helps
Luca

Rosie · May 7, 2021, 10:19am

Thanks a lot for your advice!!

God, I don't know why I thought I needed to create a new file in order to generate OTUs Of course I can use the demux-paired-end.qza I already have, I have just removed primers before creating it!
This is the very first time I am performing closed-reference OTU clustering and I am doing it using SILVA as the reference database.
Do you think I can do it with Naïve Bayes classifier trained on SILVA?
Thanks again!

Rosie

llenzi · May 7, 2021, 10:38am

Hi @Rosie,
Glad the hear you already have what you need!
For the closed reference OTU process, you don't need the trained classifier, just the SILVA initial representative sequence file qza!

Hope it helps
Luca

Rosie · May 11, 2021, 8:51am

Hello Luca,

Thanks for your kind advice, I did 97% and 99% closed-reference OTU clustering and it worked!!
As I already said, this is the first time I am trying to obtain OTU tables, and I did not apply any filter or quality control step, I have just removed primers with cutadapt trimming.
Am I missing something else to do?
I just want to be sure I am applying the pipeline in the right way.
Best,

Rosie

llenzi · May 11, 2021, 2:38pm

Hi @Rosie,
I see. You could apply a quality filter to the joined sequence set if you like:

qiime quality-filter q-score
--i-demux demux-joined.qza
--o-filtered-sequences demux-joined-filtered.qza
--o-filter-stats demux-joined-filter-stats.qza

Or probably the ideal is apply a filter to remove low frequency OTUs after the clustering.

I never did closed-reference cluster really, but at the time I use de-novo clustering and another important filtering step was the chimera detection, as described in:
https://docs.qiime2.org/2021.2/tutorials/chimera/

Hope it helps
Luca

Rosie · May 12, 2021, 7:42am

Hello @llenzi,

Every advice you gave me helped a lot!
I appreciated your valuable input about OTU clustering.
All the best,

Rosie

system · June 12, 2021, 1:42pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.