Hello everyone!
I would like to cluster sequences into OTUs using QIIME 2.
I am following the tutorial described here.
I have sequences in .fastq.gz
format (2 files per sample, R1 and R2), but I need them to be in a .fna
format (like the ones generated by the QIIME 1 split_libraries*.py
command) to import them into QIIME 2.
First of all, I have trimmed sequences to remove primers and I have merged and transformed fastq
file into fasta
ones through the following commands
for R1 in *_R1_*.fastq.gz ; do vsearch --fastq_mergepairs ${R1} --reverse ${R1/_R1/_R2} --fastaout ${R1/R1_*/merged.fasta} --relabel ${R1}
cat *.fasta > seqs.fna
When I try to import the seqs.fna
file as an artifact (otu-seqs.qza
), the script gives me the following error:
qiime tools import --input-path seqs.fna --output-path otu-seqs.qza --type 'SampleData[Sequences]'
There was a problem importing seqs.fna: seqs.fna is not a(n) QIIME Demux Format file
Of course this file was not generated by QIIME 1, but I still need to import my data as an artifact. I have also tried to modify the --type, but this did not solve my problem.
Besides, split_libraries*.py
is helpful to perform demultiplexing, but I don’t need it, since MiSeq has already generated demultiplexed sequences.
Is there another way to perform OTU clustering using QIIME 2?
Any help would be much appreciated!!!
All the best,
Rosie
PS: This is how seqs.fna
file looks like:
head seqs.fna
RunCDSC01_S1_L001_R1_001.fastq.gz1
TACGGAGGATCCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTTTGATAAGTTAGAGGTGAAATACC
GGTGCTTAACACCGGAACTGCCTCTAATACTGTTGAACTAGAGAGTAGTTGCGGTAGGCGGAATGTATGGTGTAGCGGTG
AAATGCTTAGAGATCATACAGAACACCGATTGCGAAGGCAGCTTACCAAACTATATCTGACGTTGAGGCACGAAAGCGTG
GGGAGCAAACAGG
RunCDSC01_S1_L001_R1_001.fastq.gz2
TACAGAGGTCTCAAGCGTTGTTCGGAATCACTGGGCGTAAAGCGTGCGTAGGCTGTTTCGTAAGTCGTGTGTGAAAGGCG
CGGGCTCAACCCGCGGACGGCACATGATACTGCGAGACTAGAGTAATGGAGGGGGAACCGGAATTCTCGGTGTAGCAGTG
AAATGCGTAGATATCGAGAGGAACACTCGTGGCGAAGGCGGGTTCCTGGACATTAACTGACGCTGAGGCACGAAGGCCAG
GGGAGCGAAAGGG
tail seqs.fna
RunCDSM20_S50_L001_R1_001.fastq.gz63490
TACGGAGGATCCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGAGCGTAGGCGGACGCTTAAGTCAGTTGTGAAAGTTT
GCGGCTCAACCGTAAAATTGCAGTTGATACTGGGTGTCTTGAGTACAGTAGAGGCAGGCGGAATTCGTGGTGTAGCGGTG
AAATGCTTAGATATCACGAAGAACTCCGATTGCGAAGGCAGCTTGCTGGACTGTAACTGACGCTGATGCTCGAAAGTGTG
GGTATCAAACAGG
RunCDSM20_S50_L001_R1_001.fastq.gz63491
TACGGAGGATCCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGAGCGTAGGTGGACTGGTAAGTCAGTTGTGAAAGTTT
GCGGCTCAACCGTAAAATTGCAGTTGATACTGTCAGTCTTGAGTACAGTAGAGGTGGGCGGAATTCGTGGTGTAGCGGTG
AAATGCTTAGATATCACGAAGAACTCCGATTGCGAAGGCAGCTCACTGGACTGCAACTGACACTGATGCTCGAAAGTGTG
GGTATCAAACAGG