How to combine metagenome-assembled genomes and Amplification sequence

Hi, I am seeking some help with theoretical topics on 'How to combine metagenome-assembled genomes and Amplification sequence'?
I read this issue but the topic is not exactly what I want.

  1. Here is my first question:
    I utilized 'barrnap' to extract the 16S rRNA sequences from the MAGs I assembled in the metagenome analysis. Surprisingly, I noticed the presence of multiple 16S rRNA sequences within the same MAGs, which appears to contradict my understanding of 16S amplification data. In this situation, how should I determine the appropriate 16S rRNA sequence for each MAG?"
  2. Here is my last question:
    Guidance in needed on incorporating the 16S sequences represented by the obtained MAGs into the qiime2 database. Specifically, I intend to add these sequences to the Sliva 138 database. Could you please advise me on the appropriate steps to achieve this or should I try
    RESCRIPt?

Looking forward to your reply!

Hello @Rainjie,

I utilized 'barrnap' to extract the 16S rRNA sequences from the MAGs I assembled in the metagenome analysis. Surprisingly, I noticed the presence of multiple 16S rRNA sequences within the same MAGs, which appears to contradict my understanding of 16S amplification data. In this situation, how should I determine the appropriate 16S rRNA sequence for each MAG?"

These multiple 16S sequences, are they taxonomically distinct, or are they just copies of one another? Some microbes have more than one copy of the 16S gene in their genome.

Guidance in needed on incorporating the 16S sequences represented by the obtained MAGs into the qiime2 database. Specifically, I intend to add these sequences to the Sliva 138 database. Could you please advise me on the appropriate steps to achieve this or should I try
RESCRIPt?

What do you hope to accomplish by incorporating the mag-extracted 16S sequences into a classification database? It would possible to add your own sequences to the Silva 138 database sequences and re-train, but I'm not sure how useful that would be.

1 Like

Thank you for your response. I apologize for the unclear description in my previous question. Here's the additional information to clarify my question:

As an example shows, I observed the presence of four 16S sequences within the same MAG (MAG named SRR5275395_maxbin2_bin.10_sub.fa). My understanding is that these four sequences may collectively represent SRR5275395_maxbin2_bin.10_sub.fa, but I am uncertain if this interpretation is accurate or this situation really exist? Do you have any suggestions for effectively screening these sequences to obtain a more precise 16S sequence for MAGs, that is say how to polish 16S sequences in MAGs?

>SRR5275395_maxbin2_bin.10_sub.fa_16S_1
TTGAAGAGTTTGATCATGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAACGGTAACAGGAAGCAGCTTGCTGTTTTGCTGACGAGTGGCGGACGGGTGAGTAATGTCTGGGAAACTGCCTGATGGAGGGGGATAACTACTGGAAACGGTAGCTAATACCGCATAACGTCGCAAGACCAAAGAGGGGGACCTTCGGGCCTCTTGCCATCGGATGTGCCCAGATGGGATTAGCTAGTAGGTGGGGTAACGGCTCACCTAGGCGACGATCCCTAGCTGGTCTGAGAGGATGACCAGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCCATGCCGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGGGGAGGAAGGGAGTAAAGTTAATACCTTTGCTCATTGACGTTACCCGCAGAAGAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTTTGTTAAGTCAGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCTGATACTGGCAAGCTTGAGTCTCGTAGAGGGGGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGATCTGGAGGAATACCGGTGGCGAAGGCGGCCCCCTGGACGAAGACTGACGCTCAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGTCGACTTGGAGGTTGTGCCCTTGAGGCGTGGCTTCCGGAGCTAACGCGTTAAGTCGACCGCCTGGGGAGTACGGCCGCAAGGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGATTAATTCGATGCAACGCGAAGAACCTTACCTGGTCTTGACATCCACGGAAGTTTTCAGAGATGAGAATGTGCCTTCGGGAACCGTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAAATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTATCCTTTGTTGCCAGCGGTCCGGCCGGGAACTCAAAGGAGACTGCCAGTGATAAACTGGAGGAAGGTGGGGATGACGTCAAGTCATCATGGCCCTTACGACCAGGGCTACACACGTGCTACAATGGCGCATACAAAGAGAAGCGACCTCGCGAGAGCAAGCAGACCTCATAAAGTGCGTCGTAGTCCGGATTGGAGTCTGCAACTCGACTCCATGAAGTCGGAATCGCTAGTAATCGTGGATCAGAATGCCACGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCATGGGAGTGGGTTGCAAAAGAAGTAGGTAGCTTAACCTTCGGGAGGGCGCTTACCACTTTGTGATTCATGACTGGGGTGAAGTCGTAACAAGGTAACCGTAGGGGAACCTGCGGTTGGATCACCTCCTT
>SRR5275395_maxbin2_bin.10_sub.fa_16S_6
ATGCTAATACCGCATAAGACCACAGTGTCGCATGGCACAGGGGTCAAAGGATTTATCCGCTGAAAGATGGGCTCGCGTCCGATTAGCTAGATGGTGAGGTAACGGCCCACCATGGCGACGATCGGTAGCCGGACTGAGAGGTTGAACGGCCACATTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCGACGCCGCGTGGAGGAAGAAGGTCTTCGGATTGTAAACTCCTGTCCCAGGGGACGATAATGACGGTACCCTGGGAGGAAGCACCGGCTAACTACGTGCCAGCAGCCGCGGTAAAACGTAGGGTGCAAGCGTTGTCCGGAATTACTGGGTGTAAAGGGAGCGCAGGCGGATTGGCAAGTTGGGAGTGAAATCTATGGGCTCAACCCATAAATTGCTTTCAAAACTGTCAGTCTTGAGTGGTGTAGAGGTAGGCGGAATTCCCGGTGTAGCGGTGGAATGCGTAGATATCGGGAGGAACACCAGTGGCGAAGGCGGCCTACTGGGCACTAACTGACGCTGAGGCTCGAAAGCATGGGTAGCAAACAGGATTAGATACCCTGGTAGTCCATGCCGTAAACGATGATTACTAGGTGTGGGAGGATTGACCCCTTCCGTGCCGCAGTTAACACAATAAGTAATCCACCTGGGGAGTACGACCGCAAGGTTGAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCAGTGGAGTATGTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCAGGTCTTGACATCGGATGCATACCTAAGAGATTAGGGAAGTCCTTCGGGACATCCAGACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTATCGTTAGTTACTACGCAAGAGGACTCTAACGAGACTGCCGTTGACAAAACGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCTTTATGACCTGGGCTACACACGTACTACAATGGCTATTAACAGAGAGAAGCGATACCGCGAGGTGGAGCAAACCTCACAAAAATAGTCTCAGTTCGGATCGCAGGCTGCAACCCGCCTGCGTGAAGCCGGAATTGCTAGTAATCGCGGATCAGCATGCCGCGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCATGAGAGCCGGGGGGACCCGAAGTCGGTAGTCTAACCGTAAGGAGGACGCCGCCGAAGGTAAAACTGGTGATTGGGGTGAAGTCGTAACAAGGTAGCCGTATCGGAAGGTGCGGCTGGATCACCTCCTTT
>SRR5275395_maxbin2_bin.10_sub.fa_16S_2
ATGGAGAGTTTGATCCTGGCTCAGGATGAACGCTGGCGGCATGCCTAATACATGCAAGTCGAACGAAGTTTCGAGGAAGCTTGCTTCCAAAGAGACTTAGTGGCGAACGGGTGAGTAACACGTAGGTAACCTGCCCATGTGCCCGGGATAACTGCTGGAAACGGTAGCTAAAACCGGATAGGTATACAGAGCGCATGCTCAGTATATTAAAGCGCCCATCAAGGCGTGAACATGGATGGACCTGCGGCGCATTAGCTAGTTGGTGAGGTAACGGCCCACCAAGGCAATGATGCGTAGCCGGCCTGAGAGGGTAAACGGCCACATTGGGACTGAGACACGGCCCAAACTCCTACGGGAGGCAGCAGTAGGGAATTTTCGTCAATGGGGGAAACCCTGAACGAGCAATGCCGCGTGAGTGAAGAAGGTCTTCGGATCGTAAAGCTCTGTTGTAAGTGAAGAACGGCTCATAGAGGAAATGCTATGGGAGTGACGGTAGCTTACCAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTATCCGGAATCATTGGGCGTAAAGGGTGCGTAGGTGGCACGATAAGTCTGAAGTAAAAGGCAACAGCTCAACTGTTGTATGCTTTGGAAACTGTCGAGCTAGAGTGCAGAAGAGGGCGATGGAATTCCATGTGTAGCGGTAAAATGCGTAGATATATGGAGGAACACCAGTGGCGAAGGCGGTCGCCTGGTCTGTAACTGACACTGAGGCACGAAAGCGTGGGGAGCAAATAGGATTAGATACCCTAGTAGTCCACGCCGTAAACGATGAGAACTAAGTGTTGGAGGAATTCAGTGCTGCAGTTAACGCAATAAGTTCTCCGCCTGGGGAGTATGCACGCAAGTGTGAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGTGGAGTATGTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCAGGCCTTGACATGGATATAAATGTTCTAGAGATAGAAAGATAGCTATATATCACACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGTCTTCTGTTACCAGCATTAGGTTGGGGACTCAGGAGAGACTGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGGCCTGGGCTACACACGTACTACAATGGCGCCTACAAAGAGCAGCGACACCGCGAGGTGGAGCGAATCTCATAAAGGGCGTCTCAGTTCGGATTGAAGTCTGCAACTCGACTTCATGAAGTCGGAATCGCTAGTAATCGC
>SRR5275395_maxbin2_bin.10_sub.fa_16S_7
GAAAGCCTGATGCAGCAACGCCGCGTGAGCGATGAAGGCCTTCGGGTCGTAAAGCTCTGTCCTCAAGGAAGATAATGACGGTACTTGAGGAGGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCTAGCGTTATCCGGAATTACTGGGCGTAAAGGGTGCGTAGGTGGTTTCTTAAGTCAGAGGTGAAAGGCTACGGCTCAACCGTAGTAAGCCTTTGAAACTGGGAAACTTGAGTGCAGGAGAGGAGAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTTGCGAAGGCGGCTCTCTGGACTGTAACTGACACTGAGGCACGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGAGTACTAGCTGTCGGAGGTTACCCCCTTCGGTGGCGCAGCTAACGCATTAAGTACTCCGCCTGGGAAGTACGCTCGCAAGAGTGAAACTCAAAGGAATTGACGGGGACCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCTAAGCTTGACATCCTTTTGACCGATGCCTAATCGCATCTTTCCCTTCGGGGACAGAAGTGACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGCCTTTAGTTGCCAGCATTAAGTTGGGCACTCTAGAGGGACTGCCAGGGATAACCTGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGCTTAGGGCTACACACGTGCTACAATGGGTGGTACAGAGGGCAGCCAAGTCGTGAGGCGGAGCTAATCCCTTAAAGCCATTCTCAGTTCGGATTGTAGGCTGAAACTCGCCTACATGAAGCTGGAGTTACTAGTAATCGCAGATCAGAATGCTGCGGTGAATGCGTTCCCGGGTCTTGTACACACCGCCCGTCACACCACGGGAGTTGGGGGCGCCCGAAGCCGGATTGCTAACCTTTTGGAAGCGTCCGTCGAAGGTGAAATCAATAACTGGGGTGAAGTCGTAACAAGGTAGCCGTATCGGAAGGTGCGGCTGGATCACCTCCTTT

I intend to incorporate the bacteria I mined from metagenomics sequencing into the Silva database. This will allow me to check if these bacteria were also present in my previous research work, which solely utilized 16S sequencing due to limited funding. Reconstructing a new, personalized Silva Database is crucial to achieving this objective, and it serves the purpose.

Could you tell me how to add my personal 16S sequence to the Silva database, I will try my best to do it.

By the way, I have already begun following the tutorial and obtained some results(Follow the tutorial to build a database for the extracted 16s sequence from mag-extracted 16S sequences, so many sequences being 'Unassigned'). So, I am wondering about the way of adding the mag-extracted 16S sequences to the Silva Database in order to minimize the Unassigned sequence. Could you please help?

Thank you very much!
Rainjie

I have a new idea, and I'm uncertain if it is theoretically viable. My approach involves extracting sequences and taxonomy data from the SILVAR 138.1 database, merging them with the sequence which in my MAGs, and ultimately using this combined data to create a new SILVAR library? Do you think this can work, especially if this approach is theoretically viable?

Hello @Rainjie,

As an example shows, I observed the presence of four 16S sequences within the same MAG (MAG named SRR5275395_maxbin2_bin.10_sub.fa). My understanding is that these four sequences may collectively represent SRR5275395_maxbin2_bin.10_sub.fa, but I am uncertain if this interpretation is accurate or this situation really exist?

These 16S sequences may be copies from a single species, and thus correctly grouped into a MAG, or they may be 16S sequences from different taxonomic groups, showing you that MAG construction performed poorly.

Do you have any suggestions for effectively screening these sequences to obtain a more precise 16S sequence for MAGs, that is say how to polish 16S sequences in MAGs?

See the above answer for an explanation for why the concept of "a more precise 16S sequence" doesn't really make sense.

I intend to incorporate the bacteria I mined from metagenomics sequencing into the Silva database. This will allow me to check if these bacteria were also present in my previous research work, which solely utilized 16S sequencing due to limited funding

This sounds to me more like you want to classify your MAGs with the silva database rather than incorporating the sequences into the database, is that correct? Since you have metagenomic data, I would recommend other ways of taxonomic classification that will have much higher accuracy because you have much data to work with.

If you want to add your extracted genomes to the silva sequences and train a new classifier, you would need to have the taxonomic classification of the sequences in the first place anyway.

I have a new idea, and I'm uncertain if it is theoretically viable. My approach involves extracting sequences and taxonomy data from the SILVAR 138.1 database, merging them with the sequence which in my MAGs, and ultimately using this combined data to create a new SILVAR library? Do you think this can work, especially if this approach is theoretically viable?

This is what I was suggesting originally. This is certainly possible, but my above responses hopefully explain why this probably isn't what you want to do in this case.

2 Likes

Many thanks for your earnestly Reply,i think i will try to retrain SILVAR 138.1 database first.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.