q2-sidle reconstruct-counts pluging error

Hi everyone!!

I am playing with the great q2-sidle to try to analyze five amplicons of five hipervariable regions of the 16S rRNA bacterial gen. However, when I run qiime sidle reconstruct-counts command, I get the next error message:

 qiime sidle reconstruct-counts --p-region AmpliconR1_GG --i-kmer-map /home/elsa/Documentos/Doctorado/QIIME2/SMURF-CCR/green_database/db_ampliconR1/ggenes-13-8-db-Amplicon-R1-100nt-map.qza --i-regional-alignment R1_CCR16_6samples-align-map.qza --i-regional-table /home/elsa/Documentos/Doctorado/QIIME2/SMURF-CCR/v2smurf_11062021-271140879/FASTQ_Generation_2021-06-14_14_52_34Z-427957897/CCR16_6TIPOS/table-R1-TRIM-CA-CCR16-6samples-100nt.qza --p-region AmpliconR2_GG --i-kmer-map /home/elsa/Documentos/Doctorado/QIIME2/SMURF-CCR/green_database/db_ampliconR2/ggenes-13-8-db-Amplicon-R2-100nt-map.qza --i-regional-alignment R2_CCR16_6samples-align-map.qza --i-regional-table /home/elsa/Documentos/Doctorado/QIIME2/SMURF-CCR/v2smurf_11062021-271140879/FASTQ_Generation_2021-06-14_14_52_34Z-427957897/CCR16_6TIPOS/table-R2-TRIM-CA-CCR16-6samples-100nt.qza --p-region AmpliconR4_GG --i-kmer-map /home/elsa/Documentos/Doctorado/QIIME2/SMURF-CCR/green_database/db_ampliconR4/ggenes-13-8-99-db-Amplicon-R4-100nt-map.qza --i-regional-alignment R4_CCR16_6samples-align-map.qza --i-regional-table /home/elsa/Documentos/Doctorado/QIIME2/SMURF-CCR/v2smurf_11062021-271140879/FASTQ_Generation_2021-06-14_14_52_34Z-427957897/CCR16_6TIPOS/table-R4-TRIM-CA-CCR16-6samples-100nt.qza --p-region AmpliconR5_GG --i-kmer-map /home/elsa/Documentos/Doctorado/QIIME2/SMURF-CCR/green_database/db_ampliconR5/ggenes-13-8-99-db-Amplicon-R5-100nt-map.qza --i-regional-alignment R5_CCR16_6samples-align-map.qza --i-regional-table /home/elsa/Documentos/Doctorado/QIIME2/SMURF-CCR/v2smurf_11062021-271140879/FASTQ_Generation_2021-06-14_14_52_34Z-427957897/CCR16_6TIPOS/table-R5-TRIM-CA-CCR16-6samples-100nt.qza --p-n-workers 2 --o-reconstructed-table reconstruction-6samples-noR3/CCR16_6samples-table.qza --o-reconstruction-summary reconstruction-6samples-noR3/CCR16_6samples-summary.qza --o-reconstruction-map reconstruction-6samples-noR3/CCR16_6samples-map.qza --verbose
Traceback (most recent call last):
  File "/home/elsa/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/q2cli/commands.py", line 329, in __call__
    results = action(**arguments)
  File "<decorator-gen-219>", line 2, in reconstruct_counts
  File "/home/elsa/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/qiime2/sdk/action.py", line 244, in bound_callable
    outputs = self._callable_executor_(scope, callable_args,
  File "/home/elsa/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/qiime2/sdk/action.py", line 390, in _callable_executor_
    output_views = self._callable(**view_args)
  File "/home/elsa/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/q2_sidle/_reconstruct.py", line 101, in reconstruct_counts
    aligned_kmers = _get_unique_kmers(align_map['kmer'])
  File "/home/elsa/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/q2_sidle/_reconstruct.py", line 525, in _get_unique_kmers
    kmers = np.hstack([[a.split("@")[0] for a in kmer.split('|')]
  File "<__array_function__ internals>", line 5, in hstack
  File "/home/elsa/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/numpy/core/shape_base.py", line 346, in hstack
    return _nx.concatenate(arrs, 1)
  File "<__array_function__ internals>", line 5, in concatenate
ValueError: need at least one array to concatenate

Plugin error from sidle:

  need at least one array to concatenate

See above for debug info.

As you can see, I have indicated that I want to study five regions and I only include four in the command. The reason why I did not add the R3 region it is because I can't denoise it after trimming with dada2 as I did with the others.

I thought that maybe it could be an out of memory problem, but I reanalyzed the data in other server and still got the same error:

    qiime dada2 denoise-paired --i-demultiplexed-seqs R3-trimmed-CA-CCR16-6samples.qza --p-trim-left-f 4 --p-trim-left-r 2 --p-trunc-len-f 131 --p-trunc-len-r 136 --o-table table-R3-trimmed-CA-CCR16-6samples.qza --o-representative-sequences rep-seqs-R3-trimmed-CA-CCR16-6samples.qza --o-denoising-stats stats-R3-trimmed-CA-CCR16-6samples.qza
Plugin error from dada2:

  An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

Debug info has been saved to /tmp/qiime2-q2cli-err-a8ocf3ae.log
(qiime2-2021.4) [email protected]:~/MICROBIOTA/v2smurf_11062021-271140879/FASTQ_Generation_2021-06-14_14_52_34Z-427957897/CCR16_6TIPOS$ cat /tmp/qiime2-q2cli-err-a8ocf3ae.log
Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada_paired.R /tmp/tmp6pzax_r6/forward /tmp/tmp6pzax_r6/reverse /tmp/tmp6pzax_r6/output.tsv.biom /tmp/tmp6pzax_r6/track.tsv /tmp/tmp6pzax_r6/filt_f /tmp/tmp6pzax_r6/filt_r 131 136 4 2 2.0 2.0 2 12 independent consensus 1.0 1 1000000

R version 4.0.3 (2020-10-10) 
Loading required package: Rcpp
DADA2: 1.18.0 / Rcpp: 1.0.6 / RcppParallel: 5.1.2 
1) Filtering The filter removed all reads: /tmp/tmp6pzax_r6/filt_f/CCR-16-1-FFPE-Gmet_S56_L001_R1_001.fastq.gz and /tmp/tmp6pzax_r6/filt_r/CCR-16-1-FFPE-Gmet_S56_L001_R2_001.fastq.gz not written.
The filter removed all reads: /tmp/tmp6pzax_r6/filt_f/CCR-16-1-FFPE-GsinMet_S57_L001_R1_001.fastq.gz and /tmp/tmp6pzax_r6/filt_r/CCR-16-1-FFPE-GsinMet_S57_L001_R2_001.fastq.gz not written.
The filter removed all reads: /tmp/tmp6pzax_r6/filt_f/CCR-16-1-FFPE-TumGrasa_S55_L001_R1_001.fastq.gz and /tmp/tmp6pzax_r6/filt_r/CCR-16-1-FFPE-TumGrasa_S55_L001_R2_001.fastq.gz not written.
The filter removed all reads: /tmp/tmp6pzax_r6/filt_f/CCR-16-1-FFPE-Tumor-A_S53_L001_R1_001.fastq.gz and /tmp/tmp6pzax_r6/filt_r/CCR-16-1-FFPE-Tumor-A_S53_L001_R2_001.fastq.gz not written.
The filter removed all reads: /tmp/tmp6pzax_r6/filt_f/CCR-16-1-FFPE-Tumor-B_S54_L001_R1_001.fastq.gz and /tmp/tmp6pzax_r6/filt_r/CCR-16-1-FFPE-Tumor-B_S54_L001_R2_001.fastq.gz not written.
Some input samples had no reads pass the filter.
xx.xxx
2) Learning Error Rates
127 total bases in 1 reads from 1 samples will be used for learning the error rates.
134 total bases in 1 reads from 1 samples will be used for learning the error rates.
3) Denoise samples .
.
4) Remove chimeras (method = consensus)
Error in isBimeraDenovoTable(unqs[[i]], ..., verbose = verbose) : 
  Input must be a valid sequence table.
Calls: removeBimeraDenovo -> isBimeraDenovoTable
Ejecución interrumpida
Traceback (most recent call last):
  File "/home/elsa/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/q2_dada2/_denoise.py", line 266, in denoise_paired
    run_commands([cmd])
  File "/home/elsa/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/q2_dada2/_denoise.py", line 36, in run_commands
    subprocess.run(cmd, check=True)
  File "/home/elsa/miniconda3/envs/qiime2-2021.4/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['run_dada_paired.R', '/tmp/tmp6pzax_r6/forward', '/tmp/tmp6pzax_r6/reverse', '/tmp/tmp6pzax_r6/output.tsv.biom', '/tmp/tmp6pzax_r6/track.tsv', '/tmp/tmp6pzax_r6/filt_f', '/tmp/tmp6pzax_r6/filt_r', '131', '136', '4', '2', '2.0', '2.0', '2', '12', 'independent', 'consensus', '1.0', '1', '1000000']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/elsa/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/q2cli/commands.py", line 329, in __call__
    results = action(**arguments)
  File "<decorator-gen-616>", line 2, in denoise_paired
  File "/home/elsa/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/qiime2/sdk/action.py", line 244, in bound_callable
    outputs = self._callable_executor_(scope, callable_args,
  File "/home/elsa/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/qiime2/sdk/action.py", line 390, in _callable_executor_
    output_views = self._callable(**view_args)
  File "/home/elsa/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/q2_dada2/_denoise.py", line 279, in denoise_paired
    raise Exception("An error was encountered while running DADA2"
Exception: An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

So, I think that, perphas, these two errors are interrelated.

Thank you for your help!

Elsa

Hi @elsamdea,

The error is telling you that there's a problem with the aligned sequences and the mapping.

could you please try running qiime metadata tabulate passing in all your alignment artifacts and sharing the results?

Best,
Justine

1 Like

Hi @jwdebelius,

You are right, there's a problem with the aligned sequences. When I run qiime metadata tabulate in the five files, I get:

It is the same message in the five, so I only share it once.

May I try to rerun the qiime sidle align-regional-kmers command again with each file?

Oh, and here it is the visualization of the map file for this region:

Thank you!

Elsa

Thanks @elsamdea,

Could you attach your table summary (qiime feature-table summarize) as the artifact, so I can peak at the provenance? What are your alignment parameters?

Can you show me your commands for building the database?

Best,
Justine

Sure @jwdebelius!

Here it is the table summary of the first region (I can upload the others too, if you want):


image
(upload://iV4Dx2hETav68NukU1cgslToxQK.png)

The commands were the following:

  1. Download the database greengenes:

wget ftp://greengenes.microbio.me/greengenes_release/gg_13_5/gg_13_8_otus.tar.gz

  1. File Descompresion
    tar xzvf gg_13_8_otus.tar.gz

  2. Import to x

    TAXONOMY
    qiime tools import --type 'FeatureData[Taxonomy]' --input-format HeaderlessTSVTaxonomyFormat --input-path taxonomy/99_otu_taxonomy.txt --output-path green-genes-99-otu-taxonomy.qzaImported taxonomy/99_otu_taxonomy.txt as HeaderlessTSVTaxonomyFormat to green-genes-99-otu-taxonomy.qza

    REP-SEQS
    qiime tools import --type 'FeatureData[Sequence]' --input-path rep_set/99_otus.fasta --output-path not-aligned-rep-seq-green-genes-99-otu.qza

  3. Visualization

qiime feature-table tabulate-seqs --i-data not-aligned-rep-seq-green-genes-99-otu.qza --o-visualization not-aligned-rep-seq-green-genes-99-otu.qzv

  1. Filtering the database: degenerated sequences and taxas exclusion:

qiime sidle filter-degenerate-sequences --i-sequences not-aligned-rep-seq-green-genes-99-otu.qza --p-max-degen 3 --o-filtered-sequences sidle-rep-seq-notaligned-GGenes-99-otu.qza

The original SMURF tool suggested the value of 3 in the parameter --p-max-degen (the maximum number of degenerate sequences for a sequence to be retained).

Taxas exclusion:

qiime taxa filter-seqs --i-sequences sidle-rep-seq-notaligned-GGenes-99-otu.qza --i-taxonomy taxonomy-green-genes-99-otu.qza --p-include __p --p-exclude mitochondria,chloroplast,metazoa --p-mode contains --o-filtered-sequences ggenes-13-8-db-99-otu-filtered-by-phylum-tax-exclude-mitochondria-chloroplast-metazoa.qza

  1. Regional Database Preparation

6.1. Extract reads

`qiime feature-classifier extract-reads --i-sequences /home/elsa/Documentos/Doctorado/QIIME2/SMURF-CCR/v2smurf_11062021-271140879/FASTQ_Generation_2021-06-14_14_52_34Z-427957897/CCR16_6TIPOS/ggenes-13-8-db-99-otu-filtered-by-phylum-tax-exclude-mitochondria-chloroplast-metazoa.qza 
--p-f-primer TGGCGAACGGGTGAGTAA --p-r-primer CCGTGTCTCAGTCCCARTG --o-reads ggenes-13-8-99-filt-F1-R1-set1-primers.qza

6.2. Prepare extracted region

qiime sidle prepare-extracted-region --i-sequences silva-138-99-filt-F5-R5-set5-primers.qza --p-region "AmpliconR1_GG" --p-fwd-primer TGGCGAACGGGTGAGTAA --p-rev-primer CCGTGTCTCAGTCCCARTG --p-trim-length 100 --o-collapsed-kmers ggenes-13-8-db-Amplicon-R1-100nt-kmers.qza --o-kmer-map ggenes-13-8-db-Amplicon-R1-100nt-map.qza

Thanks!

Elsa

Hi @elsamdea,

I need the artifact file (the qvz), not just screenshoots to be able to trouble shoot. However, it looks like you're having a problem with denoising. The issue isn't so much Sidle as that few sequences survived denosing (you've got 176 on average which is very low) and only 14 features. So, you need to focus on working through denoising before you can run Sidle.

Best,
Justine

Hi @jwdebelius,

Sorry, I missunderstood you.

Here are qzv files of the five regions analized:
table-R1-TRIM-CA-CCR16-6samples-100nt.qzv (414.2 KB)
table-R2-trimmed-CA-CCR16-6samples.qzv (401.7 KB)
table-R3n-trimmed-CA-CCR16-6samples.qzv (474.3 KB)
table-R4-trimmed-CA-CCR16-6samples.qzv (403.6 KB)
table-R5-trimmed-CA-CCR16-6samples.qzv (406.0 KB)

But yes, I have some troubles with the denoising. I have tried with dada2 and deblur, but I still keep few sequences and low number of features. I have tried with only forward reads, but the problem still persists. Maybe somebody could say other suggestion?

Again, thank you so much for your help! When I fix the denoising problem, I will continue with Sidle.

Regards,

Elsa

Hi @jwdebelius,

First of all, thank you for your patience.

I followed your adviced and worked on the denoising steps. Right now, I have good results (I think) after the denoising and dada2-trim-posthoc steps. (I found out the reason after my sequence lost: I didn't realize the --p-trunc-len-f/r "value" parameter trimmed the sequences which sequence length were inferior to the "value", I though it trimmed after this position of the sequence length). Also, I found out that the --p-indels parameter in cutadapt wasn't working good with my sequences.

Well, when I got these results, I though I could go to the next level: the Sequence Reconstruction Phase. However, this is resisting me.

So, I still have troubles with the alignment steps. In the moment I try to visualize the results with the command qiime metadata tabulate, I only obtain the error message: :"Metadata must contain at least one ID".

Maybe it is a problem related to the database step? I checked the kmer and map files and they look normal. Also, I re-did this initial phase of database preparation with two different databases: silva-138 y greengenes-13-8 and neither worked. I don't know where I could be failed.

Any help will be great!!

Best,

Elsa

Hi @elsamdea,

Thanks for being patient with me as we work through this!

I'm glad that you figured out your demultiplexing and the environment looks good. There is definitely a problem with alignment, and I don't think this time it's due to low sequence counts. (We're making progress :tada:!)

There are human or mammalian associated samples, right and 16S sequencing? I'd expect the database to cover them, so I don't think its a database matching isssue. Could you share an alignment file, even if it fails the metadata summary? You can dm it to me if you don't want to post it here.

Best,
Justine

1 Like