Hello everyone,
I am currently trying to analyze MinION sequencing data with QIIME2. Most of my bioinformatics workflow is inspired by the q2ONT pipeline except for the OTU taxonomic analysis that I carried out outside of QIIME2, with the blastn command of the BLAST CLI applications.
However, when I tried to link each OTU taxonomy to its frequency, I faced the problem that the OTU identifiers are not the same in the rep-seq and feature table files produced by the vsearch clustering command.
Here is the clustering command I ran:
qiime vsearch cluster-features-open-reference \
--i-table uchime_ref_out/table-nonchimeric-wo-borderline.qza \
--i-sequences uchime_ref_out/rep-seqs-nonchimeric-wo-borderline.qza \
--i-reference-sequences Reference_sequences.qza \
--p-perc-identity 0.85 \
--p-threads 46 \
--o-clustered-table table-op_ref-85.qza \
--o-clustered-sequences rep-seqs-op_ref-85.qza \
--o-new-reference-sequences new-ref-seqs-op_ref-85.qza
I then exported table and rep-seqs qza files:
qiime tools export \
--input-path rep-seqs-op_ref-85.qza \
--output-path .
mv dna-sequence.fasta rep-seqs-op_ref-85.fasta
qiime tools export \
--input-path table-op_ref-85.qza \
--output-path .
biom convert -i feature-table.biom -o table-op_ref-85.tsv --to-tsv
However, when I inspected the content of these files, I noticed that OTU identifiers are not the same between them, thus preventing me from linking OTU taxonomies to their frequencies in each sample:
head -n 4 rep-seqs-op_ref-85.fasta
>000002d61b9b7a0ef325434391c0158d966ebfc7
GTGCGAAGGTAGCATAATCATTGGATTTTAATTGAAAGCTGGTATGAATGGTTTGATGAAAAATTAACTGTCTCATTTTAATTTTATTAGAATTTTATTTTTAAGTTAAAATGCTTAAATGTTTTATAAAGGCAAGAAGACCCTATAGAGTTTAATATTATAATAATTTATTTATTTTATGTTTTTAATTTAGATTTTTTGTTTTGGTATTTGCTGGGGCGGTTAGAGAAATTTATTTAACTTTTCTTTTATTTTTACATTTATTTTTGAGTTTATGATCCTTTTATTGATTTTAAGATTAAATTACCTTAGGGATAACAGCGTAATTTTTGGAAAGTTCATATTTATAAAAAGTTTGCGACCCCGATGTTGAAC
>000027f5b9a1cd16cf6dbe82b2f5829f02a3071f
GTTCAACATCGGGGTCGCAAACTTTTATAAATATGAACTTTCCAAATTACGCTGTTATCCCTAAAGATGACCCAATCTTAAAATCCAATAAAAAGGATCATAAACTCAAAAATAAATGTAAAAATAAAGAAAAGTTAAATAAATTTTCTATAACCGCCCCAGCAAAATACACCAAAACAAAAAAATCTAAATTAAAAAACATAAAATAAATAAGTATTATAATATTAAACTCTATAGGGTCTTCTCGTCTTTATAAAACATTTAAAAGCATTTTAACTTAAAATAAAATTCTAATAAAATTAAAATGAGACAGTTAATTTTTCATCAAACCATTCATACCAGCTTTCAATTAAAAAACTAATGATTATGCTACCTTCG
head table-op_ref-85.tsv
# Constructed from biom file
#OTU ID output_reads_barcode1 output_reads_barcode2 output_reads_barcode3 output_reads_barcode4
MK820720.1 9.0 5.0 41454.0 59611.0
KT425071.1 1.0 0.0 0.0 0.0
KT272776.1 7.0 3.0 44193.0 63891.0
MG584727.1 3.0 2.0 0.0 1.0
KX461803.1 38.0 19.0 0.0 1.0
MK614510.1 3.0 1.0 0.0 0.0
JX412842.1 4.0 2.0 1.0 1.0
KX087316.1 7.0 1.0 0.0 2.0
In consequence, I would like to know if there is a way to get around this problem and obtain rep-seq and feature-table files with identical OTU identifiers?
Thank you in advance !
Ben