importing fasta file from PacBio

Hello, I’ve got a dataset sequenced by PacBio Single Molecular Real-Time, which is a fasta file.
I want to apply qiime2 for the analysis, and based on my search here, dada2 1.10, which is integrated in qiime2-2019.10, should support this full-length results. Please correct me if I am wrong.

The problem I have is importing the data. I know there are so many formats that qiime2 supports, but I don’t know which one is for this one. The head of the file looks like this.

U1A_1
GACGAACGCTGGCGGCGTGCCTAATACATGCAAGTCGAACGAACTCTGGTATTGATTGGTGCTTGCATCATGATTTACATTTGAGTGAGTGGCGAACTGGTGAGTAACACGTGGGAAACCTGCCCAGAAGCGGGGGATAACACCTGGAAACAGATGCTAATACCGCATAACAACTTGGACCGCATGGTCCGAGTTTGAAAGATGGCTTCGGCTATCACTTTTGGATGGTCCCGCGGCGTATTAGCTAGATGGTGAGGTAACGGCTCACCATGGCAATGATACGTAGCCGACCTGAGAGGGTAATCGGCCACATTGGGACTGAGACACGGCCCAAACTCCTACGGGAGGCAGCAGTAGGGAATCTTCCACAATGGACGAAAGTCTGATGGAGCAACGCCGCGTGAGTGAAGAAGGGTTTCGGCTCGTAAAACTCTGTTGTTAAAGAAGAACATATCTGAGAGTAACTGTTCAGGTATTGACGGTATTTAACCAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGCGCAGGCGGTTTTTTAAGTCTGATGTGAAAGCCTTCGGCTCAACCGAAGAAGTGCATCGGAAACTGGGAAACTTGAGTGCAGAAGAGGACAGTGGAACTCCATGTGTAGCGGTGAAATGCGTAGATATATGGAAGAACACCAGTGGCGAAGGCGGCTGTCTGGTCTGTAACTGACGCTGAGGCTCGAAAGTATGGGTAGCAAACAGGATTAGATACCCTGGTAGTCCATACCGTAAACGATGAATGCTAAGTGTTGGAGGGTTTCCGCCCTTCAGTGCTGCAGCTAACGCATTAAGCATTCCGCCTGGGGAGTACGGCCGCAAGGCTGAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCTACGCGAAGAACCTTACCAGGTCTTGACATACTATGCAAATCTAAGAGATTAGACGTTCCCTTCGGGGACATGGATACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTATTATCAGTTGCCAGCATTAAGTTGGGCACTCTGGTGAGACTGCCGGTGACAAACCGGAGGAAGGTGGTGATGACGTCAAATCATCATGCCCCTTATGACCTGGGCTACACACGTGCTACAATGGATGGTACAACGAGTTGCGAACTCGCGAGAGTAAGCTAATCTCTTAAAGCCATTCTCAGTTCGGATTGTAGGCTGCAACTCGCCTACATGAAGTCGGAATCGCTAGTAATCGCGGATCAGCATGCCGCGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCATGAGAGTTTGTAACACCCAAAGTCGGTGGGGTAACCTTTTAGGAACCAGCCGCCTAAGGTGGGACAGATGATTAGGGTG
L5B_2
ATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAACGGTAACAGGAAGCAGCTTGCTGTTTCGCTGACGAGTGGCGGACGGGTGAGTAATGTCTGGGAAACTGCCTGATGGAGGGGATAACTACTGGAAACGGTAGCTAATACCGCATAACGTCGCAAGACCAAAGAGGGGGACCTTCGGGCCTCTTGCCATCGGATGTGCCCAGATGGGATTAGCTAGTAGGTGGGGTAACGGCTCACCTAGGCGACGATCCCTAGCTGGTCTGAGAGGATGACCAGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCCATGCCGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGGGAGGAAGGGAGTAAAGTTAATACCTTTGCTCATTGACGTTACCCGCAGAAGAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTTTGTTAAGTCAGATGTGAAATCCCGGGCTCAACCTGGGAACTGCATCTGATACTGGCAAGCTTGAGTCTCGTAGAGGGGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGATCTGGAGGAATACCGGTGGCGAAGGCGGCCCCCTGGACGAAGACTGACGCTCAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGTCGACTTGGAGGTTGTGCCCTTGAGGCGTGGCTTCCGGAGCTAACGCGTTAAGTCGACCGCCTGGGGAGTACGGCCGCAAGGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGATGCAACGCGAAGAACCTTACCTGGTCTTGACATCCACGGAAGTTTCAGAGATGGAATGTGCCTTCGGGAACCGTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAAATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTATCCTTTGTTGCCAGCGGTCCGGCCGGGAACTCAAAGGAGACTGCCAGTGATAAACTGGAGGAAGGTGGGGATGACGTCAAGTCATCATGGCCCTTACGACCAGGGCTACACACGTGCTACAATGGCGCATACAAAGAGAAGCGACCTCGCGAGAGCAAGCGGACCTCATAAAGTGCGTCGTAGTCCGGATTGGAGTCTGCAACTCGACTCCATGAAGTCGGAATCGCTAGTAATCGTGGATCAGAATGCCACGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCATGGGAGTGGGTTGCAAAAGAAGTAGGTAGCTTAACCTTCGGGAGGGCGCTTACCACTTTGTGATTCATGACTGGGGTG
U1D_3
AGCCCCAGTTACCACTTTCGCCCTAGGCCGCTCCTCTCGGTCACGGACTTCAGGCGCCCGCGGCTTCCATGGCTTGACGGGCGGTGTGTACAAGGCCCGGGAACGTATTCACCGCGCCATGGCTGATGCGCGATTACTAGCGAATCCAACTTCGTGGAGTCGGGTTGCAGACTCCAGTCCGAACTGGGACGGCTTTTCGAGATCCGCATCCCGTCGCCGGGTAGCCTCCCTCTGTGGCCGCCATTGTAACACGTGTGTCGCCCGGACGTAAGGGCCGTGCTGATTTGACGTCATCCCCCCCCTCCTCACACCTTGCGGTGGCAGTCCCGCCAGAGTGCCCAGCTTCACCTGATGGCAACTGGCGGCAGGGTTGCGCTCGTTATGGCACTTGAGCCGACACCTCACGGCACGAGCTGACGACAACCATGCAGCACCTCGGCACCTGTCCGAAGACCCACCCGTCTCTGGGTGGTTCAGGCGCCGTTCGAGCCCGGGTAAGGTTCCTCGCGTATCATCGAATTAAACCACATGTTCCTCCGCTTGTGCGGGCCCCGTCAATTCCTTTGAGTTTCATCGTTGCCGACGTACTCCCCAGGTGGATGGCTTATCGCTTTCGCTTGGCCACCGACAGTGTGTCGCCGGCGGTTAGCCATCATCGTTTACGGCGTGGAATACCAGGGTATCTAATCCTGTTCTATCCCCACGCTTTCGTGCCTCAGCGTCAGTTACGGATTCGCCAGATGCCTTCGCAATCGGTGTTCTGAGTGATATCTAAGCATTTCACCGCTACACCACTCATTCCTCCGGCGGCATCCGCACTCCAGCGCGACAGTATCAAGGGCAGCCCCGGAGTTGAGCCCCGGAATTTCACCCCTGACTTGACGCACAGCCTACGCACCCTTTAAACCCAATGAATCCGGATAACGCTCGCATCCCCCGTATTACCGCGGCTGCTGGCACGGAGTTAGCCGATGCTTATTCCCCTGGTACTCTCATCGGACGTGCGCGACGCCCTTATTGCCCCAGGCAAAAGCGGTTCACAGCCCATAGGGCCTCCTTCCCGCACGCGGCATGGCTGGTTCAGGCTTCCGCCCATTGACCAATATTCCTCACTGCTGCCTCCCGTAGGAGTATGGACCGTGTCTCAGTTCCATTGTGGGGGACCTTCCTCTCAGAACCCTACCGATCGTAGCCTTGGTGGGCCGTTACCCCGCCAACAAGCTAATCGGACGCGAGCCAATCCGTCGCCGCCGTAACTTTCAACAGAGACCCATGAGGGCCTCCGTCCCATCGGGGATTAGTCGGCGTTTCCACCGGTTGTCCCCGGGCAACGGGCATGTCGCTCACGCGTTACGCACCCTTCCGCCGGTCGCCGCCAGGACGTTGCCGCCCCGCGCTGCCCTCGACTTGCATGTGTTAAGCCTGCCGCTAGCGTTCATC

Thanks for your time!

Good morning @binzuo

Welcome to the forums! :qiime2:

I have some bad news for you; long PacBio reads are supported by dada2 1.10, but they are not yet supported by the Qiime2 plugin. :crying_cat_face:

But we are working on it! You can keep track of our progress here:

For now, you would have to process your data directly with dada2 using R.

Colin

1 Like

Thanks, Colin.

I use conda list to check the version of dada2, and I got this:
bioconductor-dada2 1.10.0 r351hf484d3e_0 bioconda

Doesn’t this mean the dada2 in qiime2-2019.10 is the version 1.10? I am rookie in this linux and qiime2 environment, so forgive me if I am wrong.

Yes, you have dada2 1.10.0 installed. So you could open up R and use the settings for long reads.

But even though dada2 can use PacBio long reads, the Qiime 2 plugin does not have the option for it. If you look at the documentation for the dada2 Qiime 2 plugin, there are options for single, paired, and pyro, but no option for long reads.

Hopefully a 4th option for long PacBio reads will be added in 2020, but until then you can do this analysis in R.

Colin

1 Like

OK, I get it. Thanks for your detailed and patient explanation.

1 Like