Hello everyone!
I would like to cluster sequences into OTUs using QIIME 2.
I am following the tutorial described here.
I have sequences in .fastq.gz format (2 files per sample, R1 and R2), but I need them to be in a .fna format (like the ones generated by the QIIME 1 split_libraries*.py command) to import them into QIIME 2.
First of all, I have trimmed sequences to remove primers and I have merged and transformed fastq file into fasta ones through the following commands
for R1 in *_R1_*.fastq.gz ; do vsearch --fastq_mergepairs ${R1} --reverse ${R1/_R1/_R2} --fastaout ${R1/R1_*/merged.fasta} --relabel ${R1}
cat *.fasta > seqs.fna
When I try to import the seqs.fna file as an artifact (otu-seqs.qza), the script gives me the following error:
qiime tools import --input-path seqs.fna --output-path otu-seqs.qza --type 'SampleData[Sequences]'
There was a problem importing seqs.fna: seqs.fna is not a(n) QIIME Demux Format file
Of course this file was not generated by QIIME 1, but I still need to import my data as an artifact. I have also tried to modify the --type, but this did not solve my problem. ![]()
Besides, split_libraries*.py is helpful to perform demultiplexing, but I don't need it, since MiSeq has already generated demultiplexed sequences.
Is there another way to perform OTU clustering using QIIME 2?
Any help would be much appreciated!!! ![]()
All the best,
Rosie
PS: This is how seqs.fna file looks like:
head seqs.fna
RunCDSC01_S1_L001_R1_001.fastq.gz1
TACGGAGGATCCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTTTGATAAGTTAGAGGTGAAATACC
GGTGCTTAACACCGGAACTGCCTCTAATACTGTTGAACTAGAGAGTAGTTGCGGTAGGCGGAATGTATGGTGTAGCGGTG
AAATGCTTAGAGATCATACAGAACACCGATTGCGAAGGCAGCTTACCAAACTATATCTGACGTTGAGGCACGAAAGCGTG
GGGAGCAAACAGG
RunCDSC01_S1_L001_R1_001.fastq.gz2
TACAGAGGTCTCAAGCGTTGTTCGGAATCACTGGGCGTAAAGCGTGCGTAGGCTGTTTCGTAAGTCGTGTGTGAAAGGCG
CGGGCTCAACCCGCGGACGGCACATGATACTGCGAGACTAGAGTAATGGAGGGGGAACCGGAATTCTCGGTGTAGCAGTG
AAATGCGTAGATATCGAGAGGAACACTCGTGGCGAAGGCGGGTTCCTGGACATTAACTGACGCTGAGGCACGAAGGCCAG
GGGAGCGAAAGGG
tail seqs.fna
RunCDSM20_S50_L001_R1_001.fastq.gz63490
TACGGAGGATCCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGAGCGTAGGCGGACGCTTAAGTCAGTTGTGAAAGTTT
GCGGCTCAACCGTAAAATTGCAGTTGATACTGGGTGTCTTGAGTACAGTAGAGGCAGGCGGAATTCGTGGTGTAGCGGTG
AAATGCTTAGATATCACGAAGAACTCCGATTGCGAAGGCAGCTTGCTGGACTGTAACTGACGCTGATGCTCGAAAGTGTG
GGTATCAAACAGG
RunCDSM20_S50_L001_R1_001.fastq.gz63491
TACGGAGGATCCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGAGCGTAGGTGGACTGGTAAGTCAGTTGTGAAAGTTT
GCGGCTCAACCGTAAAATTGCAGTTGATACTGTCAGTCTTGAGTACAGTAGAGGTGGGCGGAATTCGTGGTGTAGCGGTG
AAATGCTTAGATATCACGAAGAACTCCGATTGCGAAGGCAGCTCACTGGACTGCAACTGACACTGATGCTCGAAAGTGTG
GGTATCAAACAGG