16S full length nanopore to ASVs (new approach?)

LucasArg00 · April 18, 2024, 12:57pm

Hi everybody, I am new here and I have no better idea than to do something crazy. I have no idea if it is possible due to the fact of the sequencing nature. But could I map ASV reference sequences produced by QIIME2 into 16S full-length nanopore sequences?

Both the Illumina (MiSeq 2 × 250 paired-end) and nanopore sequencing were made with soil samples.

My final goal is to produce an ASV ID abundance table with the nanopore data. So as to "compare" the samples sequenced with Illumina with the new samples sequenced with nanopore (of course I could omit a new microorganism with this approach if it is not present in the Illumina sequences, but I am not worried about that)

I have read other posts that want to get the ASV from the nanopore sequences but I did not find a post that uses already created ASV using Illumina to find those in the nanopore sequences. Maybe I could map them using minimap2 (although I am not familiar with parameters could I use for this use case), (should I use the ASVs as ref or as query?).
Sorry for these general questions.

These are the statistics about the already generated ASV with the Illumina data:
Sequence Count: 72912
Min Length: 181
Max Length: 314
Mean Length: 253.24
Range: 133
Standard Deviation: 2.79

Nanopore sequences were produced by Seqcoast with the service for 16S full-length 25k reads. One of my colleagues uses the wf-metagenome workflow to process them. The samples have a mean length of ~1300b and a media of ~1500b. The number of reads per sample is between 2.5e+5 and 4.2e+5.

Please if you need more information ask me. Thanks in advance.

timanix · April 18, 2024, 1:10pm

Hello!

Welcome to the forum!

It is how the science works!

Technically it is possible, why not to try it?

You can import your nanopore reads to qiime2 and dereplicate them with VSEARCH. That action will create a feature abundance table and rep-sequences file. But due to the nature of nanopore reads, most of the features will be "unique".

At least, I am not aware of those examples as well.

I would use nanopore reads as a reference and Illumina ASVs as query. Yes, one can use minimap2 to do the job. Fasta file with nanopore sequences can be extracted from rep-seqs.qza file which will be produced after dereplication of nanopore reads with VSEARCH. I know that minimap2 has a special mode for mapping nanopore reads to the reference, but I am not sure if that mode would be appropriate for mapping short reads to the nanopore reference.

Good luck with the analyses!

Best,

LucasArg00 · April 18, 2024, 1:26pm

Wow, thanks for such a fast answer. I do appreciate it.

I did not know I could dereplicate nanopore in qiime2, I have read so many bad experiences with nanopore and qiime2 that I was apprehensive about the idea of importing the data. Now I will give it a try.

I will try to update the post with advances and problems I face during the process. Maybe it could come in handy to someone else in this community (awesome community btw).

Again, thanks a lot for taking the time to answer.

timanix · April 18, 2024, 1:31pm

Yes, you can dereplicate it in Qiime2, but the problem with long nanopore reads is that they are long and have numerous errors in sequence. Most of the reads are actually "unique" and found only once in one sample. This makes them not quite useful for working at the "ASV" level.

I would appreciate it since I am also playing with nanopore data and trying different approaches to get it working for me. And I am definitely curious which results you will obtain.

colinbrislawn · April 18, 2024, 8:22pm

Hi Lucas,

Welcome to the forums! This is a great first post!

Have you already done a lit review? You are not the first to attempt this, which means you can ~~steal~~ remix what others have done:

This is one of my favorite pieces of software, as it works great with long and noisy reads. For historical reasons, lots of stuff in this ecosystem is designed for short and accurate reads. Good luck!

LucasArg00 · April 18, 2024, 10:05pm

Hi Colin, thank you!!!

I did a review, and I could not find a solution (or at least I could not foresee how to ~~steal~~ remix other's work to my use hehehehe).

I saw the MetONTIIME2 repository, it solves problems that I have read in the forum before. However, my true intention is to "track" the already created ASVs sequences from the Illumina data in my nanopore sequences. As far as I understand (not much because my education formation was mainly in wet lab experiments) MetONTIIME2 can not do that. But giving it a second look maybe if I use the ASV fasta files in --dbSequencesFasta the workflow could classify my nanopore read with the ASV sequences.

I did see the q2ONT repository before, thanks for the links. And also thanks for the good luck... I will need it.

So far, I've tried importing my nanopore samples to qiime2 and performed qiime vsearch dereplicate-sequences but, as Timur said, most of the sequences are unique due to the error rate (and I have a lot of sequences... 3933168 to be accurate). I have been struggling to create the rep-seq.qzv (maybe due to the number of sequences, not sure).

I also try using minimap2 with the data I already have (fasta with ASV seq and fastq.gz per sample from the nanopore). I use this command: minimap2 -x map-ont --secondary=no -t 6 rep_seqs.fasta barcode0.fastq.gz > barcode0_mapped.paf

Then using a Python script I count the amount each ASV ID was mapped and create an abundance table from that for that sample. I am not sure how to check if it is right... maybe I could try to import the abundance table to qiime2 classify it and see the abundance of phylum or classes in each sample (I have that data from the original nanopore sequences).

Sorry for writing too much. I like to put what I am thinking and what I tried so maybe someone can relate to that.

colinbrislawn · April 18, 2024, 10:30pm

Well, I would start here.

Having a 'smoke test' or a 'sanity check' is good
Having unit tests is better