ITS analysis for Pacbio generated data sets

divyaprince321 · January 26, 2023, 4:57pm

Hi all?
I am for the first time analyzing the fungal metagenome. I have obtained my dataset from Pacbio.
Now my question is that I need to know what Pipe line I have to follow.
Secondly, I am not able to understand the data file as both forward and reverse reads are in the same file. So, how can I process my files. Do I need to split the data into forward and reverse reads or no. If so how to split.
Thanks and regards in Advance

Keegan-Evans · February 1, 2023, 4:58pm

@divyaprince321,

Any there is a question of workflows and pipelines come up, there are a couple of great resources on our website that often answer basic questions and help get you oriented to the problem: QIIME 2 for Experienced Microbiome Researchers and Overview of QIIME 2 plugin workflows.

That being said, your case is somewhat complicated by a few things that are a little outside of typical QIIME 2 workflows, namely your PacBio sequencing data and your targeting of ITS.

For quality control and deneoising with PacBio sequencing data, the best tool that we have right now is denoise-ccs found in q2-DADA2(DOCS). As far as I am aware, there are not really "forward" or "reverse" reads, rather the genetic material is sequenced continuously in a loop until the quality scores rise to an acceptable level, so what you are seeing with your raw sequencing data makes a lot of sense, and the DADA2 method does dereplicate them as needed as well. You will want to import your raw sequencing data with the type SampleData[SequencesWithQuality] to feed it into q2-DADA2.

After running through q2-DADA2, you should be left with a FeatureTable[Frequency] and a FeatureData[Sequence] that you can use for any of the typical downstream QIIME 2 workflows. To look at ITS specifically, checkout the ITS Fungal Analysis tutorial and the q2-ITSexpress tutorial.

divyaprince321 · February 17, 2023, 8:27am

Thank You Keegan
For the timely reply.
Actually, I have both the forward and the reverse reads in a single file, So I am confusing how to proceed. Do I need to sperate the reads, if So how to do it.
For your convenience, I tried to attach a file but it exceeded the limits.

Keegan-Evans · February 27, 2023, 5:01pm

@divyaprince321,

Were your sequences read in CLR instead of CCS mode?

divyaprince321 · February 27, 2023, 7:40pm

Thank you Keegan
So how to process with these files and which pipeline to follow.
Thanks in Advance

lizgehret · March 6, 2023, 11:18pm

Hi @divyaprince321,

Please take a look at @Keegan-Evans response - he is asking whether your sequences were read in CLR or CCS mode. It is important that you read and respond to your posts with any information requested from you in order for us to further assist you.

divyaprince321 · March 7, 2023, 7:37am

Hi @ lizgehret and Keegan-Evans
Sorry for inconvenience,
I am not able to understand in what mode my sequences are.
For your reference, I have shared one file.
Thanks and Regards

Keegan-Evans · March 10, 2023, 9:20pm

@divyaprince321,
It sounds like the next step is to contact your sequencing center and confirm the mode of your sequences. If they were read as CLR, you will not be able to use Q2-DADA2 to perform your denoising If this is the case, you will need to denoise(if necessary) using a PacBio recommended tool(I do not know what their recommended processing for CLR reads is), after which, you can import your data into QIIME2 following the guidance in the importing tutorial and proceed with your analysis.