High chimeras reads

maysa1 · May 19, 2020, 2:08pm

Hi qiime team,

I am new to qiime, I continue to have chimeras issues and have run different truncation and trimming lengths as well in order to improve the non-chimeras reads (~90% of my reads are chimeras).
Here is my script I use a graham system:

#!/bin/bash
#SBATCH --account=def-XXXX
#SBATCH --ntasks=16 # Run on 16 nodes
#SBATCH --mem-per-cpu=64000 # requested memory (in MB)
#SBATCH --time=72:05:00 # Time limit hrs:min:sec
#SBATCH --output=serial_test_%j.log # Standard output and error log

module load miniconda3
conda activate qiime2-2020.2
qiime dada2 denoise-paired --i-demultiplexed-seqs ~paired.demux.qza
–p-trim-left-f 20 --p-trim-left-r 20 --p-trunc-len-f 187 --p-trunc-len-r 124 --p-n-threads 0 --p-min-fold-parent-over-abundance 8 --o-table ~/table4.qza --o-representative-sequences ~/rep-seqs4.qza --o-denoising-stats ~/denoising-stats4.qza --verbose

I got the following error when adding “–p-min-fold-parent-over-abundance” to my code.

DADA2: 1.10.0 / Rcpp: 1.0.2 / RcppParallel: 4.4.4

Filtering …
Learning Error Rates
187091770 total bases in 1120310 reads from 5 samples will be used for learning
the error rates.
116512240 total bases in 1120310 reads from 5 samples will be used for learning
the error rates.
Denoise remaining samples …Duplicate sequences in merged outp
ut.
…
Duplicate sequences detected and merged.
Remove chimeras (method = consensus)
Write output
Traceback (most recent call last):
File “/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/q2cli/command
s.py”, line 328, in call
results = action(**arguments)
File “</opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/decorator.py
:decorator-gen-459>”, line 2, in denoise_paired
File “/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/sdk/ac
tion.py”, line 240, in bound_callable
output_types, provenance)
File “/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/sdk/ac
tion.py”, line 411, in callable_executor
spec.qiime_type, output_view, spec.view_type, prov)
File “/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/sdk/re
sult.py”, line 273, in _from_view
provenance_capture=provenance_capture)
File “/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/core/a
rchive/archiver.py”, line 316, in from_data
Format.write(rec, type, format, data_initializer, provenance_capture)
File “/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/core/a
rchive/format/v5.py”, line 21, in write
provenance_capture)
File “/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/core/a
rchive/format/v1.py”, line 26, in write
prov_dir, [root / cls.METADATA_FILE, archive_record.version_fp])
File "/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/core/a:
File “/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/tzlocal/utils.py”, line 38, in assert_tz_offset
raise ValueError(msg)
ValueError: Timezone offset does not match system offset: 0 != -14400. Please, check your config files.

Plugin error from dada2:

Timezone offset does not match system offset: 0 != -14400. Please, check your config files.

See above for debug info.
Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada_paired.R /tmp/tmpk052titw/forward /tmp/tmpk052titw/reverse /tmp/tmpk052titw/output.tsv.biom /tmp/tmpk052titw/track.tsv /tmp/tmpk052titw/filt_f /tmp/tmpk052titw/filt_r 187 124 20 20 2.0 2.0 2 consensus 4.0 0 1000000.

Attached are my files, can any one tell me what caused this high chimeras rate and how can I improve it?

Regards,
Maysa
manifest.csv (8.8 KB)
demux.qzv (295.8 KB)
denoising-stats.qzv (1.2
MB)
table.qzv (1020.8 KB)

jwdebelius · May 19, 2020, 2:37pm

Hi @maysa1,

When I look at your DADA2 stats, I see an issue in merging, not chimera removal. I dont know what region you're sequencing, but your 150 nt trim length may be a bit aggressive. You need at least 20nt as an overlap in DADA2. Maybe try relaxing your trimming parameters. You may lose more reads in denoising, but you'll likely retain more overall.

Best,
Justine

maysa1 · May 19, 2020, 7:51pm

Hi @jwdebelius,

Thanks for your reply, I tried to make it relax by set --p-trunc-len-f 187 --p-trunc-len-r 124 and --p-trunc-len-f 240 --p-trunc-len-r 160 still have the same issues. And use –p-min-fold-parent-over-abundance as well, Do you have a chance to look over the features counts of the reverse and forwards reads, they are identical, is this normal?

Maysa

jwdebelius · May 19, 2020, 7:55pm

Hi @maysa1,

What region are you covering? What's your expected amplicon length?

Best,
Justine

maysa1 · May 19, 2020, 7:59pm

maysa1:

DADA2: 1.10.0 / Rcpp: 1.0.2 / RcppParallel: 4.4.4

Filtering …

Learning Error Rates
187091770 total bases in 1120310 reads from 5 samples will be used for learning
the error rates.
116512240 total bases in 1120310 reads from 5 samples will be used for learning
the error rates.

Denoise remaining samples …Duplicate sequences in merged outp
ut.
…
Duplicate sequences detected and merged.

Remove chimeras (method = consensus)

Write output
Traceback (most recent call last):
File “/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/q2cli/command
s.py”, line 328, in call
results = action(**arguments)
File “</opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/decorator.py
:decorator-gen-459>”, line 2, in denoise_paired
File “/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/sdk/ac
tion.py”, line 240, in bound_callable
output_types, provenance)
File “/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/sdk/ac
tion.py”, line 411, in callable_executor
spec.qiime_type, output_view, spec.view_type, prov)
File “/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/sdk/re
sult.py”, line 273, in _from_view
provenance_capture=provenance_capture)
File “/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/core/a
rchive/archiver.py”, line 316, in from_data
Format.write(rec, type, format, data_initializer, provenance_capture)
File “/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/core/a
rchive/format/v5.py”, line 21, in write
provenance_capture)
File “/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/core/a
rchive/format/v1.py”, line 26, in write
prov_dir, [root / cls.METADATA_FILE, archive_record.version_fp])
File "/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/core/a:
File “/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/tzlocal/utils.py”, line 38, in assert_tz_offset
raise ValueError(msg)
ValueError: Timezone offset does not match system offset: 0 != -14400. Please, check your config files.

Plugin error from dada2:

Timezone offset does not match system offset: 0 != -14400. Please, check your config files.

See above for debug info.
Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada_paired.R /tmp/tmpk052titw/forward /tmp/tmpk052titw/reverse /tmp/tmpk052titw/output.tsv.biom /tmp/tmpk052titw/track.tsv /tmp/tmpk052titw/filt_f /tmp/tmpk052titw/filt_r 187 124 20 20 2.0 2.0 2 consensus 4.0 0 1000000.

Hi @jwdebelius
I am targeting 16S V4, amplicon length is 250bp.

jwdebelius · May 19, 2020, 9:06pm

Hi @maysa1,

Im not sure about your primers, but with EMP 515-806, I get about a 290nt amplicon... which is just slightly too short for your trimming parameters.
Perhaps try relaxing them more to see if that helps.

Best,
Justine

maysa1 · May 19, 2020, 9:12pm

Hi @jwdebelius,

I will check this again and try more relaxed parameters and let you know if that works!

Thanks for your reply.
Best,
Maysa

andrewsanchez · May 20, 2020, 4:54am

Hi, @maysa1!

WRT this mysterious timezone error: a little @thermokarst bird told me that this was addressed in a newer version of QIIME 2 than the one you happen to be running. Please go ahead an run the latest version of QIIME 2 and you should be all good on that front.

That being said, I'm a bit confused as to why the traceback showed a reference to qiime2-2019.10 while your script seems to have at least attempted to activate the current version of QIIME 2 with conda activate qiime2-2020.2. You might have to ping your sys admin about this one.

I recommend verifying you are running the version you think you are running with the following commands:

conda deactivate
conda activate qiime2-2020.2
qiime --version

Good luck!

maysa1 · May 20, 2020, 7:26am

Hi @andrewsanchez,

You are correct, the conda environment for installation is not currently supported on SHARCNET though this is the installation I had success with when installing QIIME2-2020.2 on graham that have qiime2-2019.10. I will try your command and see if this works!
Best,
Maysa

maysa1 · May 20, 2020, 1:36pm

Hi @andrewsanchez
Many thanks for your help, I update QIIME version it runs without error but still have the majority of my reads as chimeras even with --p-min-fold-parent-over-abundance 8 !
any one can look to my denoising-stats and tell what going wrong?
denoising-stats7.qzv (1.2 MB)

jwdebelius · May 20, 2020, 2:05pm

Hi @maysa1,

Your issue isn't in the chimera removal; your issue is in merging. Please check the denosing stats and the "percentage of reads merged" column.

I think you may need to either continue thinking about the parameters there and use longer reads or decide to take single-end reads and get the best you can out of those.

Best,
Justine

maysa1 · May 20, 2020, 2:14pm

Hi @jwdebelius,

I think you are right, I tried longer read ( --p-trim-left-f 20 --p-trim-left-r 20 --p-trunc-len-f 200 --p-trunc-len-r 175 ) do I need to go further?
and please what do you mean by check “percentage of reads merged” column? also from the table.qzv, I fell like my forward and reverse read are identical. They do have the same number of feature counts, any help on that?

BW,
Maysa

Zach_Burcham · May 20, 2020, 2:36pm

Hi @maysa1 , If you look at the “percentage of reads merged” column of your denoising stats you will see, for example, that sample F10H1 only has 0.18% of reads merging. Sample F12H2 has the highest merge rate of ONLY 36% merging. That is where the problem and loss of data is occurring. It might be helpful if you post your demux.qzv so that we could see your interactive quality plot.

maysa1 · May 20, 2020, 10:28pm

Hi @Zach_Burcham,

sounds like I lost most of my data, here is my demux.qzv if any one can come up with better trimming parameters.

Regards,
Maysademux.qzv (295.8 KB)

Zach_Burcham · May 20, 2020, 11:15pm

@maysa1, thank you for sharing. Can I suggest relaxing your quality threshold a little bit? It might worth trying to just keep your quality above above a 25 phred score. It might seem low, but this isn't too bad in reality. I suggest you use 0 for your trim left parameters and cutting your forward reads around 250 and your reverse reads around 220. This should leave you enough overlap for merging. If you could post an updated denoising stats file after running with the above suggestions, it would be useful!

maysa1 · May 20, 2020, 11:28pm

Sure @Zach_Burcham I will give it a trail and update the denoising stats soon!

Thanks,
Maysa

maysa1 · May 21, 2020, 5:25pm

Hi @Zach_Burcham,

I tried (--p-trim-left-f 0 --p-trim-left-r 0 --p-trunc-len-f 260 --p-trunc-len-r 222 ) and it works Attached is a denoising stats results. From the “percentage of reads merged” column the highest merged rate is now 60%.
Also I need to clarify that I am targeting V3-V4, so my expected product about 460 bp, I have 22bp overlapped (260 +222 - 460). @jwdebelius sorry for confusing you last time!
Do I need to adjust the trim parameters more? by looking to the demux.qzv, I went below 20 phred score!

Best,
Maysadenoising-stats8.qzv (1.2 MB)

Zach_Burcham · May 22, 2020, 1:40pm

Hi @maysa1 , I am glad to hear it worked out! The denoising stats look a lot better. I would just continue analyzing your data with the current settings. Any more tweaking will likely lose a good overlap between your reads. Your current trimming leaves your reads at a median phred score around 25. (You can zoom on the box and whisker plots in the demux file to see it clearly).