How to import paried end sequencing files and full length sequencing files together into qiime2?

Yuhang_Wu · November 20, 2024, 5:10pm

Hi every one,

I have learn the qiime2 amplicon pipeline for paired-end sequencing files and these official tutorials are really helpful. There are some paired end sequencing files and full length amplicon files in my research. I want to analyze them using qiime2 pipeline but I have no idea how to import them together into QIIME2. I have come out two solutions but I didn't try it.

Solution 1: Create a slice of sequences comprised of target variant region and its primer, barcode sequences using awk or python script. Then, cut the slice into forward read and reverse read. These paired read sequences will be imported into qiime2.

Solution 2: First we merged the forward read and paired read after removing barcodes & primer sequences. Second, create a slice of target variant region. Finally, we import these single end sequences into qiime2.

Are there any available tutorials for importing both paired end files and full length single end files?

colinbrislawn · November 20, 2024, 5:21pm

Hello Yuhang,

Welcome to the forums! :qiime2:

This is a great question! I think it depends on how the full length sequences were made and how you plan to use them...

Like, were these assembled from shotgun reads to be used as a reference database?
Or were they sequenced with PacBio or Nanopore and you want to denoise them to make ASVs?

Yuhang_Wu · November 21, 2024, 2:06am

Hi Colin,

Thank you for your warm reply .

My 16S rRNA gene full length sequencing data was produced by PacBio sequencing (primer 27F & 1492R). I want to add these single-end data into qiime2 with my previous short paired-end files. But I don't know how to try it.

colinbrislawn · November 21, 2024, 6:16pm

You may have found this already, but if not, here are the docs for DADA2 CCS (for PacBio)
https://docs.qiime2.org/2024.10/plugins/available/dada2/denoise-ccs/

Once you have long PacBio ASVs and short Illumina ASVs, trimming all of them to be short is the easiest way forward. Consider the RESCRIPt plugin for this: GitHub - bokulich-lab/RESCRIPt: REference Sequence annotation and CuRatIon Pipeline

qiime rescript extract-seq-segments: Using RESCRIPt's 'extract-seq-segments' to extract reference sequences without PCR primer pairs.

Zooming out, multi-region analysis and variable-length analysis are huge unsolved problems. See this discussion from a few years ago.

The method I'm proposing is basically 'turn everything into short reads so they match' which is the cleanest option, but totally removes the advantage of your longer reads!

Yuhang_Wu · November 25, 2024, 6:28am

Thank youuuuuuu, Colin!

Sorry for the late response, I will try these methods.