qiime dada2 denoise-ccs without primers or adapters

Hello,

I received clean, merged PacBio reads from a sequencing company and was able to successfully import the reads to qiime2. Now, I am trying to use DADA2 for denoising and creating the necessary outputs for further analysis. I found this command specifically for PacBio reads, but unfortunately it requires an input for primers and adapters even though mine have already been trimmed. Are there any different options for this analysis?

qiime dada2 denoise-ccs
--i-demultiplexed-seqs demux.qza
--p-front
--p-adapter
--p-max-len 1600
--o-table table.qza
--o-representative-sequences rep-seqs.qza
--o-denoising-stats stats.qza

Thank you,
Jackie

1 Like

Hi @jmetz2015
It seems like QIIME 2 currently does not support this functionality. I will open a github issue about it so that we can support data like yours better! :qiime2:

Before I do open the issue, Can you tell me a little about how you got this clean data? Is this something you requested from the sequencer or is this just how they returned it to you?

For more context. I am just trying to better understand why we have those are required parameters, if pacbio sequences can be deliver without primers.

1 Like

Hi Chloe,

Thank you so much for your help. This is the only sequence data I was provided by the company, for some reason they will not provide the raw reads with the primers and adapters attached. The reads are filtered for barcodes and primers and merged. Unfortunately, when I go through the entire pipeline after using qiime dada2 denoise-single half of the reads in each sample end up being unclassified as these reads are the reverse compliments.

I think because I cannot use the qiime dada2 denoise-ccs with the --p-front option I cannot orient the reads correctly before classification.

Are there any ways around this? I was starting to look into RESCRIPt's orient-seqs command. Additionally, does the Silva data base not include both forward and reverse compliment reference sequences? Does --p-read-orientation both not include both of these sequences for the command qiime feature-classifier extract-reads?

Thank you,
Jackie

3 Likes

Hi @jmetz2015
Thank you for the info!

I have started an issue here! Please feel free to add additional information if you think it will be helpful This will allow you to follow process on this issue.

As for your second question, I am not sure the best way to deal with this. @SoilRotifer, Do you have any ideas?

This is my personal opinion, but it is probably worth considering avoiding them in the future, as this does not support FAIR standards. Also, many only repositories require that the raw data be shared. I'd push them to provide you the raw data.

Sounds like they did some of the work? Perhaps denoising is not required? Still very unclear what they did to these data. Are the data already demultiplexed? Did they provide you a detailed methods outline?

The only thing I can think of is preparing a SILVA database that is reverse complimented. That is, you'd run : qiime rescript orient-seqs ... without the --i-reference-sequences parameter. This will simply reverse compliment all of the SILVA data, and train. Then you can classify twice, once against the regular oriented database, and again with the reverse oriented database. Though this will get messy as you'll have to filter data that did / did not classify, and make sure there are no duplicate data retained when you merge data back together...

But before you do this, try using qiime feature-classifier classify-consensus-vsearch ..., as this approach does not care about read orientation. If you get reasonable classification, then you can go ahead with that output, or use the approach about if you'd like to continue with the naïve approach. Obviously, do not use downstream phylogenetic approaches as the read orientation will be mixed, messing with sequence alignment and phylogenetic reconstruction.

1 Like

Thank you very much for this information Mike. Yes, I will definitely be avoiding this company in the future, I tried to get the raw reads for the data and they have already deleted it from their system even though they never provided it to me.

The data is already demultiplexed and denoised using DADA2. However, they must not have used qiime dada2 denoise-ccs as the reads are in mixed orientation. So qiime rescript orient-seqs can't be use to reorient the reads themselves but just the SILVA database? Is there no way to merge the SILVA database so that it includes the forward and reverse sequences so that both directions can be classified?

qiime feature-classifier classify-consensus-vsearch sounds like a great option. What are some examples of the downstream phylogenetic approaches I will not be able to utilize using this method?

Thank you so much again,
Jackie

That is unfortunate. :frowning:

You can use any database, GreenGenese2, SILVA, GTDB, etc...

Not that I'm aware, as you'd have duplicate IDs etc..

You would not be able to use any phylogenetic methods that are based on traditional sequence alignment. I think this would also include fragment insertion. Again, this assumes there is not another step checking for orientation. There might be some methods out there that do this, or use a kmer based approach that might be able to handle mixed oriented reads. But I am unaware of what these might be.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.