denoise-ccs filtering all reads

Venkatakrishna_Jala · April 5, 2023, 1:10pm

Thank you Greg.
It ran almost 24 h. But no luck.

However, I could do it through Powershell (ubuntu) on Windows.

Secondly, I am processing Nanopore seq (full length 16S-1530 bp) with q score of 27 (average). It is failing at DADA2 (denoising step). Any ideas would be great to resolve it.

(qiime2-2023.2) jvrao001@BCC-A92859:~$ qiime dada2 denoise-ccs --i-demultiplexed-seqs reads_qza/rawSeq.qza --p-min-len 1200 --p-max-len 1500 --p-front GCATCAGRRTTYGATYHTGGYTYAG --p-adapter GCATCRGYTACCTTGTTAYGACTT --o-table dada2_output/table.qza --o-representative-sequences dada2_output/representative_sequences.qza --o-denoising-stats dada2_output/stats.qza --verbose
Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada.R --input_directory /tmp/qiime2/jvrao001/data/9e39f766-f9f1-4ba0-b153-e6321c5ff58f/data --output_path /tmp/tmpb7pkmji8/output.tsv.biom --output_track /tmp/tmpb7pkmji8/track.tsv --removed_primer_directory /tmp/tmpb7pkmji8/nop --filtered_directory /tmp/tmpb7pkmji8/filt --forward_primer GCATCAGRRTTYGATYHTGGYTYAG --reverse_primer GCATCRGYTACCTTGTTAYGACTT --max_mismatch 2 --indels False --truncation_length 0 --trim_left 0 --max_expected_errors 2.0 --truncation_quality_score 2 --min_length 1200 --max_length 1500 --pooling_method independent --chimera_method consensus --min_parental_fold 3.5 --allow_one_off False --num_threads 1 --learn_min_reads 1000000 --homopolymer_gap_penalty NULL --band_size 32

R version 4.2.2 (2022-10-31)
Loading required package: Rcpp
DADA2: 1.26.0 / Rcpp: 1.0.10 / RcppParallel: 5.1.6

Removing Primers
Multiple matches to the primer(s) in some sequences. Using the longest possible match.
9763 sequences out of 24842 are being reverse-complemented.
Read in 24842, output 15080 (60.7%) filtered sequences.
11440 sequences out of 30184 are being reverse-complemented.
Read in 30184, output 18684 (61.9%) filtered sequences.
11863 sequences out of 29921 are being reverse-complemented.
Read in 29921, output 19308 (64.5%) filtered sequences.
7763 sequences out of 21784 are being reverse-complemented.
Read in 21784, output 12471 (57.2%) filtered sequences.
11106 sequences out of 28244 are being reverse-complemented.
Read in 28244, output 17582 (62.3%) filtered sequences.
14352 sequences out of 35740 are being reverse-complemented.
Read in 35740, output 22722 (63.6%) filtered sequences.
8965 sequences out of 23520 are being reverse-complemented.
Read in 23520, output 14370 (61.1%) filtered sequences.
11119 sequences out of 29175 are being reverse-complemented.
Read in 29175, output 17806 (61%) filtered sequences.
8845 sequences out of 23824 are being reverse-complemented.
Read in 23824, output 14375 (60.3%) filtered sequences.
7465 sequences out of 20226 are being reverse-complemented.
Read in 20226, output 12340 (61%) filtered sequences.
6194 sequences out of 17519 are being reverse-complemented.
Read in 17519, output 9937 (56.7%) filtered sequences.
11922 sequences out of 29422 are being reverse-complemented.
Read in 29422, output 19417 (66%) filtered sequences.
11115 sequences out of 28489 are being reverse-complemented.
Read in 28489, output 17486 (61.4%) filtered sequences.
13090 sequences out of 32819 are being reverse-complemented.
Read in 32819, output 20881 (63.6%) filtered sequences.
10105 sequences out of 26629 are being reverse-complemented.
Read in 26629, output 16127 (60.6%) filtered sequences.
14275 sequences out of 34658 are being reverse-complemented.
Read in 34658, output 23565 (68%) filtered sequences.
9087 sequences out of 23609 are being reverse-complemented.
Read in 23609, output 14270 (60.4%) filtered sequences.
10518 sequences out of 26983 are being reverse-complemented.
Read in 26983, output 17086 (63.3%) filtered sequences.
8339 sequences out of 22789 are being reverse-complemented.
Read in 22789, output 13355 (58.6%) filtered sequences.
6231 sequences out of 16397 are being reverse-complemented.
Read in 16397, output 10042 (61.2%) filtered sequences.
5912 sequences out of 16829 are being reverse-complemented.
Read in 16829, output 9639 (57.3%) filtered sequences.
4402 sequences out of 12716 are being reverse-complemented.
Read in 12716, output 7219 (56.8%) filtered sequences.
12786 sequences out of 31226 are being reverse-complemented.
Read in 31226, output 20184 (64.6%) filtered sequences.
7242 sequences out of 19459 are being reverse-complemented.
Read in 19459, output 11908 (61.2%) filtered sequences.
17643 sequences out of 44509 are being reverse-complemented.
Read in 44509, output 29346 (65.9%) filtered sequences.
.........................
Filtering The filter removed all reads: /tmp/tmpb7pkmji8/filt/As_UroA1_17_L001_R1_001.fastq.gz not written.
The filter removed all reads: /tmp/tmpb7pkmji8/filt/As_UroA2_18_L001_R1_001.fastq.gz not written.
The filter removed all reads: /tmp/tmpb7pkmji8/filt/As_UroA3_19_L001_R1_001.fastq.gz not written.
The filter removed all reads: /tmp/tmpb7pkmji8/filt/As_UroA4_20_L001_R1_001.fastq.gz not written.
The filter removed all reads: /tmp/tmpb7pkmji8/filt/As_UroA5_21_L001_R1_001.fastq.gz not written.
The filter removed all reads: /tmp/tmpb7pkmji8/filt/As_UroA6_22_L001_R1_001.fastq.gz not written.
The filter removed all reads: /tmp/tmpb7pkmji8/filt/As_UroA7_23_L001_R1_001.fastq.gz not written.
The filter removed all reads: /tmp/tmpb7pkmji8/filt/As_UroA8_24_L001_R1_001.fastq.gz not written.
The filter removed all reads: /tmp/tmpb7pkmji8/filt/As_VEH1_9_L001_R1_001.fastq.gz not written.
The filter removed all reads: /tmp/tmpb7pkmji8/filt/As_VEH2_10_L001_R1_001.fastq.gz not written.
The filter removed all reads: /tmp/tmpb7pkmji8/filt/As_VEH3_11_L001_R1_001.fastq.gz not written.
The filter removed all reads: /tmp/tmpb7pkmji8/filt/As_VEH4_12_L001_R1_001.fastq.gz not written.
The filter removed all reads: /tmp/tmpb7pkmji8/filt/As_VEH5_13_L001_R1_001.fastq.gz not written.
The filter removed all reads: /tmp/tmpb7pkmji8/filt/As_VEH6_14_L001_R1_001.fastq.gz not written.
The filter removed all reads: /tmp/tmpb7pkmji8/filt/As_VEH7_15_L001_R1_001.fastq.gz not written.
The filter removed all reads: /tmp/tmpb7pkmji8/filt/As_VEH8_16_L001_R1_001.fastq.gz not written.
The filter removed all reads: /tmp/tmpb7pkmji8/filt/UroA1_4_L001_R1_001.fastq.gz not written.
The filter removed all reads: /tmp/tmpb7pkmji8/filt/UroA2_5_L001_R1_001.fastq.gz not written.
The filter removed all reads: /tmp/tmpb7pkmji8/filt/UroA3_6_L001_R1_001.fastq.gz not written.
The filter removed all reads: /tmp/tmpb7pkmji8/filt/UroA4_7_L001_R1_001.fastq.gz not written.
The filter removed all reads: /tmp/tmpb7pkmji8/filt/UroA5_8_L001_R1_001.fastq.gz not written.
The filter removed all reads: /tmp/tmpb7pkmji8/filt/VEH1_0_L001_R1_001.fastq.gz not written.
The filter removed all reads: /tmp/tmpb7pkmji8/filt/VEH2_1_L001_R1_001.fastq.gz not written.
The filter removed all reads: /tmp/tmpb7pkmji8/filt/VEH3_2_L001_R1_001.fastq.gz not written.
The filter removed all reads: /tmp/tmpb7pkmji8/filt/VEH4_3_L001_R1_001.fastq.gz not written.
xxxxxxxxxxxxxxxxxxxxxxxxx
Error: No reads passed the filter (was truncLen longer than the read length?)
Traceback (most recent call last):
File "/home/jvrao001/miniconda3/envs/qiime2-2023.2/lib/python3.8/site-packages/q2_dada2/_denoise.py", line 440, in denoise_ccs
run_commands([cmd])
File "/home/jvrao001/miniconda3/envs/qiime2-2023.2/lib/python3.8/site-packages/q2_dada2/_denoise.py", line 36, in run_commands
subprocess.run(cmd, check=True)
File "/home/jvrao001/miniconda3/envs/qiime2-2023.2/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['run_dada.R', '--input_directory', '/tmp/qiime2/jvrao001/data/9e39f766-f9f1-4ba0-b153-e6321c5ff58f/data', '--output_path', '/tmp/tmpb7pkmji8/output.tsv.biom', '--output_track', '/tmp/tmpb7pkmji8/track.tsv', '--removed_primer_directory', '/tmp/tmpb7pkmji8/nop', '--filtered_directory', '/tmp/tmpb7pkmji8/filt', '--forward_primer', 'GCATCAGRRTTYGATYHTGGYTYAG', '--reverse_primer', 'GCATCRGYTACCTTGTTAYGACTT', '--max_mismatch', '2', '--indels', 'False', '--truncation_length', '0', '--trim_left', '0', '--max_expected_errors', '2.0', '--truncation_quality_score', '2', '--min_length', '1200', '--max_length', '1500', '--pooling_method', 'independent', '--chimera_method', 'consensus', '--min_parental_fold', '3.5', '--allow_one_off', 'False', '--num_threads', '1', '--learn_min_reads', '1000000', '--homopolymer_gap_penalty', 'NULL', '--band_size', '32']' returned non-zero exit status 2.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/jvrao001/miniconda3/envs/qiime2-2023.2/lib/python3.8/site-packages/q2cli/commands.py", line 352, in call
results = action(**arguments)
File "", line 2, in denoise_ccs
File "/home/jvrao001/miniconda3/envs/qiime2-2023.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 234, in bound_callable
outputs = self.callable_executor(scope, callable_args,
File "/home/jvrao001/miniconda3/envs/qiime2-2023.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 381, in callable_executor
output_views = self._callable(**view_args)
File "/home/jvrao001/miniconda3/envs/qiime2-2023.2/lib/python3.8/site-packages/q2_dada2/_denoise.py", line 443, in denoise_ccs
raise ValueError(
ValueError: No reads passed the filter. trunc_len (0) may be longer than read lengths, or other arguments (such as max_ee or trunc_q) may be preventing reads from passing the filter.

Plugin error from dada2:

No reads passed the filter. trunc_len (0) may be longer than read lengths, or other arguments (such as max_ee or trunc_q) may be preventing reads from passing the filter.

See above for debug info.

Thanks Greg.

gregcaporaso · April 5, 2023, 8:23pm

Hi @Venkatakrishna_Jala, I'm glad you have a working installation now.

On thing I notice immediately is that your length parameters are more restrictive than the recommendation for 16S sequences. Have you tried with --p-min-len 1000 --p-max-len 1600? I don't think this would cause an issue this extreme, but I wanted to note that what you're doing is different than what is recommended in the help text for this command.

Also, have you run qiime demux summarize on your reads_qza/rawSeq.qza? It will be helpful to see that to assist with debugging this. Could you run that (if you haven't already) and share the result here?

Venkatakrishna_Jala · April 11, 2023, 4:10pm

Sorry for late reply. My computer crashed Ubuntu after updating Windows.

Can I send as attachment file?

Here are the results:
$ qiime demux summarize

Demultiplexed sequence counts summary

	forward reads
Minimum	12716
Median	26629.0
Mean	26060.52
Maximum	44509
Total	651513

Forward Reads Frequency Histogram

Download as PDF

Reverse Reads Frequency Histogram

No reads in this direction

Per-sample sequence counts

Total Samples: 25 (forward)

	forward sequence count
sample ID
VEH4	44509
As_UroA6	35740
As_VEH8	34658
As_VEH6	32819
VEH2	31226
As_UroA2	30184
As_UroA3	29921
As_VEH4	29422
As_UroA8	29175
As_VEH5	28489
As_UroA5	28244
UroA2	26983
As_VEH7	26629
As_UroA1	24842
As_VEH1	23824
UroA1	23609
As_UroA7	23520
UroA3	22789
As_UroA4	21784
As_VEH2	20226
VEH3	19459
As_VEH3	17519
UroA5	16829
UroA4	16397
VEH1	12716

Download as TSV
visualization.qzv (447.3 KB)

Venkatakrishna_Jala · April 13, 2023, 7:28pm

Hi Gregg
Is there any method to use denoising NanoPore Sequencing to process through Qiime. I'm having hard time to bet beyond this step.

ebolyen · April 14, 2023, 4:31pm

I'm not aware of any such method for QIIME 2 at the moment, although perhaps others know of something?

@jwdebelius, does q2-sidle use long reads that were sequenced, or is more of a reference genome thing?

Keegan-Evans · April 14, 2023, 6:00pm

@Venkatakrishna_Jala,
As @ebolyen said, there are no methods for processing Nanopore Sequencing data right now.

Yeah, something looks weird about those quality scores, at least for PacBio CCS, which should generally be producing q-scores generally closer to 30 or 35 pretty consistently. This looks much more like what I would expect from an(old-school) PacBio CLR read, they have something else they call a CLR now, but these have much higher accuracy levels than the error-prone Continuous Long Read they used to offer. We also don't currently have a dedicated method for either of these formats

The denoising of long read sequences is not nearly as well studied as the denoising of targeted, short read technologies, but there is some progress and adding more techniques for working with this data to QIIME 2 has been a topic of discussion, but is unfortunately not a top priority in our current round of development, but likely it will become much more important after the release of our shotgun metagenomics plug-in suite sometime this year.

For your situation in particular here, it does look like the issues with filtering may be length/trimming/truncation related, it might be worth dropping the ---p-max-len argument and re-trying. How long were you expecting your reads to be?

Venkatakrishna_Jala · April 14, 2023, 6:47pm

Thanks @Keegan-Evans and @ebolyen for replies.
I have been trying different combinations. No luck yet. It may be to do quality scores (I think again). It is dead dropping ..
I see average quality score around 27-30. But still not going through the script.

jwdebelius · April 16, 2023, 3:48pm

Hi @ebolyen,

Sidle is definitely a short read thing, and unfortunately wouldn't help here.

Best,
Justine

Keegan-Evans · April 24, 2023, 5:36pm

@Venkatakrishna_Jala,

I am still looking into this, we have just had a lot going on before the upcoming QIIME 2 release and I have not come up with an answer that I am entirely happy with for you yet, but I am still working on it.