How to Import of PacBio seq data

baehsung · June 27, 2023, 7:30pm

Hi,

I'm going to start analyzing seqs from PacBio sequencing. The sew size is ~1.0 kb and they are formatted by fastq. I would appreciate if you could give me comments for the preparation of manifest files and the data importing (e.g., SingleEndFastqManifest Phred33V2 or 64V2 or something else), with appropriate commands. The successful importing will be followed by dada2 filtering.

Thanks,

Hee-Sung

lizgehret · June 27, 2023, 9:27pm

Hi @baehsung,

You'll want to import your data as SingleEndFastqManifestPhred33V2, and then use denoise-ccs within DADA2 for your filtering. You can see an example for how to import PacBio data here, which includes reference to our denoise-ccs documentation as well.

Hope this helps! Cheers

baehsung · June 28, 2023, 5:12pm

thanks Lizgehret for quick replay.

I'll try to import and filter seqs using dada2 as you suggested, and will get back with results soon.

Hee-Sung

baehsung · June 29, 2023, 8:01pm

Hi Lizgehret,

I successfully imported my seq data using SingleEndFastqManifestPhred33V2.

Now I'm going to run denoise-ccs, and met a few of question in the command line; that is, --p-front TEXT and --p-adapter TEXT, which required the seqs of adapter ligated to the 5' end and 3' end, respectively.

My seqs received from the sequencing lab are starting with either F or R primer (see examples below indicated in brackets) without other seqs preceeding those primers. Probably, the adapters might be removed through a pre-filtering process in sequencing lab.

In my case, can I use those primers instead of addapter?

@m64219e_230411_210514/114/ccs
[GGGAAAGGAACTTCGGCAC]GAACGAACTGGTCAAGCGAATAGAATCCACGGGGTTAAAGGACCTCGTGAACACCCGGGTGTTGATCCTGCCCCAACTAGGCGCCACCGGGGTGATGGCACACATTGTGAAGAAGCGCACCGGCTTCAAGGTGGA

@m64219e_230411_210514/100/ccs
[CAGGCGCCGCATTCGATACA]GGCGTCACGGTTCCCGATTTCGACCCTGCCATTTGTGGGCCTCAGAACAGCATGAGGGCAGACGAACAGACACATCCCACAACCCACGCATTTTTCTTCGTCAAGTGTTAAGGTTACAACAT

Thanks,

Hee-Sung

colinvwood · June 29, 2023, 10:46pm

Hello @baehsung,

Passing a sequence to the --p-adapter option means that that sequence will be searched for at the 3' end and all preceding bases will be trimmed. This is probably not what you want, your primer sequences seem to both be at the 5' end. Instead, use the --p-front argument for both primer sequences.

But yes, the tool does not care if the sequence was an adapter sequence or a primer sequence--it just removes it.

baehsung · June 30, 2023, 8:26pm

Thanks for your kind explanation.

Sound that I may use following commands:
--p-front GGGAAAGGAACTTCGGCAC CAGGCGCCGCATTCGATACA.

But how about --p-adaptor if I do not have adapter seqs?

Best

Hee-Sung

colinvwood · July 1, 2023, 7:55pm

Hello @baehsung,

I did not realize that --p-adapter was a required parameter, sorry for the confusion. I'm not sure why that value is required, and there's no obvious way to make it an empty sequence or something similar. I'm going to reach out to one of the developers who implemented this command to see what the reasoning was, and they or I will get back to you once it's cleared up.

Thanks.

baehsung · July 5, 2023, 6:54pm

Thanks Colin,

Another question;
PacBio ccs reads are mixtures of forward and reverse directed seqs as indicated by starting F primer and R primer seqs. Do I need to make a single orientation of these reads before running dada2 in qiime2?

Hee-Sung

colinvwood · July 5, 2023, 11:04pm

Hello @baehsung,

PacBio ccs reads are mixtures of forward and reverse directed seqs as indicated by starting F primer and R primer seqs.

Could you explain what you mean by this?

Are these forward and reverse primers from your 16S amplicons? Are these even 16S amplicons? Could you give us a rundown of the library preparation used?

If these are 16S amplicons, do you also have consensus reads that contain both amplicon primers? Or was your amplicon size too large to be covered by 1kb reads?

Regarding, the --p-adapter option, I'm still looking into that. Many of us are still inexperienced with analyzing ccs data, so we're trying to understand the full context here.

Thanks.

baehsung · July 6, 2023, 3:20pm

Sorry for unclear question.

These primers are not from 16s amplicons but from a functional gene related to a geochemical process. They was amplified by F (GGNAARGGVACYTTYGGVAC) and R (CADGCGCCRCAYTCVATRCA), and then subjected to SMRTBell library preparation for PacBio Sequencing.

As mentioned above, some of our reads are starting from GGNAARGGVACYTTYGGVAC (forward direction) and the other from CADGCGCCRCAYTCVATRCA (reverse direction), indicating they are differently oriented.

I wonder if dada2 can recognize the direction of reads. If not, dada2 may regard the reads, which are oriented different but with actually same sequences, as two different reads. If dada2 is able to recognize the orientation of read, that's great. We don't need an additional step prior to running dada2. What do you think?

Thanks,

Hee-Sung

colinvwood · July 6, 2023, 6:40pm

Hello @baehsung,

I see. Yes, dada2 will orient the reads for you.

This still doesn't solve your --p-adapter issue. Is the gene targeted by your primers too long to be covered by one 1kb ccs read? I'm still trying to understand if you would expect both primers in a read, regardless of whether the consensus sequence is in reverse complement orientation or not.

baehsung · July 6, 2023, 7:12pm

Hi Colin,

I successfully finished dada2, with --p-fron with F-primer and --p-adaptor with R-primer.

Thanks for all your discussion.

Hee-Sung

system · August 10, 2023, 11:58pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.