QIIME2 Demux and Dada2 with LC Sciences Results

Hey all,

I have a question regarding using " LC Sciences 16S rRNA Sequencing Clean Data"

I have no concerns running QIIME2 using ds with forward and reverse read; however, I was given the following data format:

  1. (1) Trimming the barcode and sequencing adaptor sequences from the raw reads

  2. (2) Merging paried-end reads into single tags

  3. (3) Excluding tags with more than 5% of ambiguous bases (N)

  4. (4) Excluding tags with more than 20% of low quality bases (Phred score < 1

I am not sure how to proceed to the next step to get the OTU count; there are no barcodes or any forms of R1 vs R2 etc. Any suggestions?

1 Like

Hello @Gordon_Zheng,

Are the four steps that you've listed steps you've already performed? If so, have you used qiime2 to do them?

The dada2 software is one of the most commonly used methods to go from rawer sequencing data to OTUs, although the outputs of dada2 are not OTUs in the traditional percent identity case.

If instead you're starting from scratch and want to perform the steps you've listed, the first thing you should do is import your sequences into a demux artifact--see this tutorial for guidance. I'm assuming that your sequences are already demultiplexed, but if not you can import them as multiplexed sequences and perform the demultiplexing in qiime2 as well.

2 Likes

Hi,

The following procedure above has been done for me. I am starting with the ds combined into one fastq file that has been "cleaned".

The following is the protocol I followed:


qiime tools import \
  --type 'SampleData[SequencesWithQuality]' \
  --input-path se-33-manifest \
  --output-path single-end-demux.qza \
  --input-format SingleEndFastqManifestPhred33V2
 dada2 denoise-single \
>   --i-demultiplexed-seqs QZA/demux-single-end.qza \
>   --p-trim-left 0 \
>   --p-trunc-len 428 \
>   --o-representative-sequences QZA/rep-seqs.qza \
>   --o-table QZA/table.qza \
>   --o-denoising-stats QZA/stats-dada2.qza
Saved FeatureTable[Frequency] to: QZA/table.qza
Saved FeatureData[Sequence] to: QZA/rep-seqs.qza
Saved SampleData[DADA2Stats] to: QZA/stats-dada2.qza
1 Like

Hello @Gordon_Zheng,

It looks like everything ran successfully--are there any other questions you have?

Something worth mentioning is that we usually don't recommend that people perform quality-based filtering on the sequencing data before using dada2 because dada2 takes the quality score information into account in its model. It's possible that the same logic applies to ambiguous base filtering. I don't think it's a big deal if your results still look acceptable, just something to be aware of.

1 Like

Thanks for the reply;

Based on your suggestion should I instead use this instead as the data has been filtered:

qiime vsearch dereplicate-sequences
--i-sequences demux-single-end.qza
--o-dereplicated-table table.qza
--o-dereplicated-sequences rep-seqs.qza

Hello @Gordon_Zheng,

I believe that that action only dereplicates sequences, meaning only amplicons with exact sequence identity are grouped into features. OTU clustering using one of the cluster-features-* actions in the vsearch plugin would be preferable.

That said, these methods are the older (arguably even outdated) approach. I think that dada2 will likely produce better results, even given the upstream steps that you've performed.

Hi,

Thank you for your assistance. My only concern is that since the company sent me the "cleaned data" with joined reads, I am unable to use DADA2. I used the dada2 denoise-single function but only recovered half of the reads as expected. The format of the data provided by the company doesn’t match what’s required for Deblur.

Do you think I should instead process the files as paired-end reads and analyze both the forward and reverse sequences? (Attempted joined ends without success)

sample-id, absolute-filepath, direction
C1, /users/g/o/Plan1R2/CleanData/LC_C1.fastq, forward
C1, /users/g/o/Plan1R2/CleanData/LC_C1.fastq, reverse

When I attempted the manifest above, I got difference length from the forward and reverse causing dada2-pair end to fail. Are there any work around?

Again, thank you so much for your responses.

Hello @Gordon_Zheng,

Using dada2 on unjoined paired end reads is the most common approach.

Importing gave you an error or dada2 gave you an error? Could you attach this error?

The manifest snippet you attached shows identical paths for both the forward and reverse read files--this shouldn't be the case. If you were sent only joined reads from the people who performed sequencing there's no way of undoing this joining to get the separate read directions (there's also no way to treat the joined reads as separate read directions).

1 Like

Hi,

Below is an attempt to demux followed by dada2 read joined sequence with varying manifest.

qiime tools import \
>   --type 'SampleData[JoinedSequencesWithQuality]' \
>   --input-path manifest-joined-fixed.csv \
>   --output-path merged-seqs.qza \
>   --input-format SingleEndFastqManifestPhred33
  1. Filtering Error in (function (fn, fout, maxN = c(0, 0), truncQ = c(2, 2), truncLen = c(0, :
    Mismatched forward and reverse sequence files: 77214, 72540.

The sequencing company provided the following details:

  • Paired-end reads were merged using FLASH.
  • Quality filtering was performed using fqtrim (v0.94) under specific conditions to generate high-quality clean tags.
  • Chimeric sequences were filtered with Vsearch (v2.3.4).
  • Dereplication and ASV generation were done using DADA2, yielding a feature table and feature sequences.

Questions:

  1. My understanding is that DADA2 typically cannot process pre-joined paired-end sequences. Is this correct?
  2. Could the company have used Deblur instead? Deblur can handle joined reads and produce an ASV table, but they did not elaborate further (possibly due to proprietary protocols).

Any insights would be appreciated!

Hello @Gordon_Zheng,

It's true that it's not recommended to pass pre-joined reads to dada2, but there's nothing stopping one from doing so as far as I am aware--in this case dada2 would treat the submitted reads as single-end. In qiime2 we guard against this behavior with the types of input that are allowed to the dada2 denoising actions.