Error while reconstructing a table via sidle plugin

lea.k · March 15, 2023, 4:25pm

Hi everybody,

I´m analyzing my first microbial dataset and ran into an issue trying to reconstruct a table using the sidle plugin command "qiime sidle reconstruct-counts".

I followed the very clear and useful sidle tutorial!
I have demultiplexed my dataset containing reads from the bacterial V45 and V6 region succesfully and everything worked well until I encountered an error after using this code:

qiime sidle reconstruct-counts \
--p-region V45 \
--i-regional-alignment ${alignment_V45} \
--i-regional-table ${table_V45} \
--p-region V6 \
--i-regional-alignment ${alignment_V6} \
--i-regional-table ${table_V6} \
--i-database-map ${bacterial_map} \
--i-database-summary ${bacterial_summary} \
--p-n-workers ${cores} \
--o-reconstructed-table ${table_bacterial_reconst} \
--verbose

The error was:

There are 32 samples with fewer than 1000 total sequences. Please check your minimum counts and make sure your representative sequences are aligned with the database.

I assume filtering out the 32 samples with fewer than 1000 total sequences would solve the error partly but I don´t know which file I would have to filter and could not find a method to filter by sequence counts.
Could you tell me how or in which file I have to check the minimum counts since the visualized "table_V45" and "table_V6" only show frequencies?
How do I check if my representative sequences are aligned with the database? Is there a way to visualize the region specific alignments?

I am kind of lost on what file or parameter used in previous code, which was all according to the tutorial, could cause the problem or help solve the error.

Thanks in advance.
Kind regards,
Lea

jwdebelius · March 16, 2023, 2:18pm

Hi @lea.k,

I'm glad to see people using Sidle!

This is an issue where we've gone back and forth on whether to error or let things pass. There were a lot of people frustrated they weren't getting an error and just losing a bunch of samples (which was the original implementation).

I have good news and bad news for solving this. The good news is that there is code. The bad news is that while it's been tested, it hasn't been reviewed and there's no documentation. (The documentation is half complete, but unfortunately, Sidle is currently lower priority and I have less free time than I did a few years ago.) The branch is called "accounting", and I think you can install the code from that branch with

pip install git+https://github.com/jwdebelius/q2-sidle.git@accouting

(Although you may need to do a little bit more reserach into how to install a branch, but I think this should work. )

The function that will let you count yoru features is called track-aligned-counts and it takes your regional alignment(s), feature table(s) and produces an artifact that you can then summarize wtih qiime metadata tabulate. (Hopefully the help documentation explains it well). If it works as intended, it will give you a list of the aligned sequencing depths, and you could use that to filter your tables.

I typically summarize the regional alignment artifact (again qiime metadata tabulate and look at how many reads get aligned per region. If I dont see enough, I tend to relax the parameters slightly. You're working in a region that's got fairly good database coverage, so you may want to consider allowing ASVs that don't match perfectly to align.

Sorry I dont have a perfect, out of the box solution.

Best,
Justine

lea.k · April 12, 2023, 9:05am

Hi @jwdebelius,
thank you for the quick and detailed response and sorry for my late answer.

I have visualized the region specific alignment and it turns out, that the regional alignment_V6 artifact is empty. It has only the header in the tsv table but no data. I do not get an error while running the align-regional kmers lines:

qiime sidle align-regional-kmers \
--i-kmers ${db_V6_kmers} \
--i-rep-seq ${rep_seqs_trimmed_V6_MR} \
--p-region V6 \
--p-max-mismatch 5 \
--p-n-workers ${cores} \
--o-regional-alignment ${alignment_V6}

only as soon as i want to visualize the artifact it gives the error that it obviously can´t visualize it since there is no data in the tsv table.
I have played around with the denoising trunc length parameters (as suggested in other discussions) and the final trim length to create the "rep_seqs_trimmed_V6" atrifact but nothing changes.
My final trim length for the V6 region is 351 which is the median sequence length of the untrimmed rep_seqs_V6 maybe this is too long or there is a better way to determine this length?
These are the primers used for extracting region from pre filtered database and preparing extracted region.
V6_for=GCACAAGCRGHGGARCATG
V6_rev=CCGYCAATTYMTTTRAGTTT
I use the Silva 138-99 database downloaded from the Qiime2 data resources homepage and filter out mitochondrial and chloroplast sequences.

The workflow for the V45 region works well and without any errors.

Any kind of tip on how to solve that would be greatly appreciated!!

Thank you, regards
Lea

jwdebelius · April 12, 2023, 2:26pm

Hi @lea.k,

Could you walk me through exactly what commands you're running? There are a lot of directions to trouble shoot this more, and it would be helpful to understand your full pipeline.

I tend to go shorter: I typically target keeping 90+% of my ASVs (remember you discard anything below the minimum trim length)

This is okay, but you won't be able to build a phylogenetic tree because your silva version isn't compatible with tree building. It's unfortunately, but the current issue with the system.

Best,
Justine

lea.k · April 14, 2023, 9:19am

Hi @jwdebelius,

first of all i rerun my workflow with a shorter final trim length in order to keep about 90% of the sequences which did not change the error.
These are the commands I used from import until the error in the regional alignment in V6 occurred. To determine trunc-length i viewed the demultiplexed-seqs artifact and after 180 and 200 the sequence quality drops. I have tried smaller values but the same error occured later on.
I am not quite sure if the cutadapt demux-paired command "--p-mixed-orientation" is necessary or not but again i ran the analysis with TRUE and FALSE and it still led to the error in the regional alignment.
I ran the exact same commands for the V45 region exept adapted trunc lenghts, final trim length and of course primer sequences leading to a viewable alignment_vis_V45 artifact.

Import

qiime tools import \
--type MultiplexedPairedEndBarcodeInSequence \
--input-path $dirData/sequences/MB-Lib1/ \
--output-path $dirOut/multiplexed-seqs.qza

Demultiplex by region (same parameters for all regions; due to the multiplexing regime the BC(9nt) plus forward Primer(19nt) act as Barcode so the whole "Barcode" Sequence is 28nt long)

qiime cutadapt demux-paired \
--i-seqs $dirOut/multiplexed-seqs.qza \
--m-forward-barcodes-file ${metadata_V6} \
--m-forward-barcodes-column BC+Primer-Sequenz \
--p-mixed-orientation TRUE \
--o-per-sample-sequences $dirOut/demultiplexed-seqs-V6-0.25.qza \
--o-untrimmed-sequences $dirOut/untrimmed-V6-0.25.qza \
--p-error-rate 0.25 \
--p-cores 23 \
--verbose

Read Preparation
a) Denoise

qiime dada2 denoise-paired \
--i-demultiplexed-seqs $dirOut/demultiplexed-seqs-V6-0.25.qza \
--p-trunc-len-f 180 \
--p-trunc-len-r 200 \
--o-table ${table_V6_MR} \
--o-representative-sequences ${rep_seqs_V6_MR} \
--p-n-threads 23 \
--o-denoising-stats ${den_stats_V6_MR}

   b) trim to uniform length

qiime sidle trim-dada2-posthoc \
--i-table ${table_V6_MR} \
--i-representative-sequences ${rep_seqs_V6_MR} \
--p-trim-length 220 \
--o-trimmed-table ${table_trimmed_V6_MR} \
--o-trimmed-representative-sequences ${rep_seqs_trimmed_V6_MR}

Database preparation
a) filter database

qiime rescript cull-seqs \
--i-sequences ${ref_sequences_bacterial_in} \
--p-num-degenerates 5 \
--o-clean-sequences ${ref_sequences_bacterial_f}

qiime taxa filter-seqs \
--i-sequences ${ref_sequences_bacterial_f} \
--i-taxonomy ${ref_taxonomy_bacterial} \
--p-exclude mitochondria,chloroplast \
--p-mode contains \
--o-filtered-sequences ${ref_sequences_bacterial_taxf}

b) Prepare regional database for each region

qiime feature-classifier extract-reads \
--i-sequences ${ref_sequences_bacterial_taxf} \
--p-f-primer GCACAAGCRGHGGARCATG \
--p-r-primer CCGYCAATTYMTTTRAGTTT \
--o-reads ${ref_sequences_bacterial_taxf_V6}

qiime sidle prepare-extracted-region \
--i-sequences ${ref_sequences_bacterial_taxf_V6} \
--p-region "V6" \
--p-fwd-primer GCACAAGCRGHGGARCATG \
--p-r-primer CCGYCAATTYMTTTRAGTTT \
--p-trim-length 220 \
--o-collapsed-kmers ${db_V6_kmers} \
--o-kmer-map ${db_V6_kmers_map}

Sequence reconstruction
a) regional alignment

qiime sidle align-regional-kmers \
--i-kmers ${db_V6_kmers} \
--i-rep-seq ${rep_seqs_trimmed_V6_MR} \
--p-region V6 \
--p-max-mismatch 5 \
--p-n-workers 23 \
--o-regional-alignment ${alignment_V6}

qiime metadata tabulate \
--m-input-file ${alignment_V6} \
--o-visualization ${alignment_vis_V6}

Attached the demux-seqs, the rep_seqs, the rep_seqs_trimmed and the db_kmers_map visualization and the empty alignment.qza for V6 region.
demultiplexed-seqs-V6.qzv (314.2 KB)
rep_seqs_V6.qzv (1.6 MB)
rep_seqs_trimmed220_V6.qzv (1.3 MB)
silva-V6-kmers_map.qzv (1.3 MB)
alignment_V6_220.qza (109.7 KB)

jwdebelius · May 5, 2023, 2:52pm

Hi @lea.k,

I am sorry it has taken me so long to follow up with you. Looking things through, it seems like the pipeline is reasonable. The only thing I'm wondering about is the starting positiona nd extraction, and if there's a position where the sequences align at a different position, but Im not sure where that would shift.

Best,
Justine

system · June 5, 2023, 10:37pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.