All 16s unique sequences are being removed in deblur for having less than 10 reads

Hello. I have been trying to process 16s mouse gut microbiome sequences that were collected on a novaseq 6000 illumina sequencer with 100bp paired-end reads. I am trying to use deblur on merged and filtered reads with the following commands:

qiime deblur denoise-16S
--i-demultiplexed-seqs demux-joined-filtered.qza
--p-trim-length 85
--p-sample-stats
--p-jobs-to-start 1
--o-representative-sequences rep-seqs-malessri.qza
--o-table tablemalessri.qza
--o-stats deblur-stats-malessri.qza

After this is run this I get this visualization:
deblur-stats-malessri.qzv (204.5 KB)

As you can see no unique sequences are hitting artifact or reference. I believe this is due to the fact that most unique sequences are being removed by the "min-reads" parameter being set to 10. This is exemplified in the deblur.log:

INFO(46912496356992)2023-10-31 08:42:03,722:for output biom table loaded 1 samples, 146182 unique sequences
INFO(46912496356992)2023-10-31 08:42:03,818:keeping 2 (out of 146182 sequences) with >=10 reads

So my question is why are all of my unique sequences only being detected in less than 10 reads? Is this an issue with trimming the adaptors? Am I using the incorrect reference here? I believe this should be using greengenes 16s database so I'm not sure why that wouldn't work. Please let me know if you have any suggestions for me. Attached below is the full deblur log as well as stats from the joined fastq file. I have 56 samples total but am only running 1 at the moment to try and troubleshoot more quickly.

deblur.txt (5.6 KB)
demux-joined-test.qzv (294.0 KB)

Hi @dkropp2,
Welcome to the QIIME 2 Forum!

I notice a couple of things looking at this.

First, how many samples are you expecting to have? You currently only have one, so I'm wondering if there should be a demultiplexing step included in your workflow that assigns reads to their sample of origin. If demultiplexing is supposed to have been performed but wasn't, depending on where the barcodes are that could cause this (e.g., if the barcodes are still in the reads, they may not map to the reference because the barcode prevents alignment to reference sequences). Alternatively, do you know if adapters and barcodes have already been removed from your sequences? That could also be the problem, as you note. QIIME 2's cutadapt plugin can help you with this if you still need to remove adapters/barcodes.

Second, you're using a very old version of QIIME 2. I recommend upgrading to the latest version (QIIME 2023.9) before proceeding with your analysis. You can find the install instructions here.

1 Like

Hello @gregcaporaso,

I will be running 56 samples total, I will likely be braking these up into groups of 4 to process at a time as my HPC does not have enough power to handle all samples at once. I would prefer to use DADA2 here however I am not able to complete processing in time and combining tables after DADA2 seems statistically innapropriate given variable error rates so they would all need to be done at once.

When I received the fastq files they had already been demultiplexed so I have not been demultiplexing them following input into the program. I am not sure if the adapters have been removed and that is one thing I saw elsewhere in the forum. I know that a nextera dna prep kit was used on the samples but I am not sure what to do with that information in order to remove primers/adapters. I have found in the kits documentation what primers and adapters were used but that is as far as I can figure what to do there.

Additionally, yes I am running qiime2 version 2020.6. Unfortunately I require my campuses HPC clusters in order to have adequate processing power. On the cluster QIIME2 has been installed but it is this 2020.6 version and I do not have read/write permissions. Unless there is a workaround for this I believe that I am stuck using this legacy version.

Any advice you have for processing these samples would be greatly appreciated. Thanks so much for your time.

Hi @dkropp2,
I have a few questions for you before I'm able to help:

  • What issues are you running into when trying to apply DADA2 to these data?
  • Do you know why there is only one sample in the demux-joined-test.qzv file that you shared above - is that intentional?
  • Where was this sequencing done? Can you get in touch with whoever did it and ask if the adapters were removed, and if not what the sequence of the adapters that should be removed are?
  • Have you reached out to your cluster administrator to see if they would be willing to install a new version of QIIME 2 for you to use?

Greg

@gregcaporaso

Thank you very much for all of your help. I have solved the issue. I inherited these data from a student who had previously graduated and was told that they were 16s rRNA sequences. However after reaching out to the sequencing core at my institution I discovered that these were in fact whole genome sequencing samples, not amplicon. This would explain a lot. I am going to continue by either integrating the shotgun plugin of qiime, or if I cannot get that with the permissions available on my HPC, I will build a pipeline in rstudio. Thanks again for your help and this issue has been solved.

2 Likes

Glad you got it sorted out @dkropp2! Good luck!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.