Losing samples after reconstruction step in q2-sidle

elsamdea · December 3, 2021, 9:37am

Hi everyone!!

I am working with q2-sidle in qiime version 2021.4 and every step runs perfectly. However, I have observed that some samples dissapear in the reconstructed table (output of the qiime sidle reconstruct-counts command).

My first thougth was looking into the dada2 table and the dada2 trimmed at x nts length table, but all the inicial samples were there. Maybe these samples dissapeared during the alignment step? But, in all regions? It didn't feel correct to me. I also tried with another database and even more samples dissapear.

I had been running this step previously with the same database and parameters and I kept all the samples in the final reconstructed table.

Any suggestions?

Thank you all for keep this amazing forum to resolve doubts like mine!!

Best,

Elsa

jwdebelius · December 6, 2021, 9:45am

Hi @elsamdea,

There is a minimum sample size (min_counts) parameter which removes samples with fewer than 1000 counts in the reconstructed features. You could try lowering this parameter and see if you retain more samples.

However, if this is happening, you should be getting a warning that says something like,

There are 3 samples with fewer than 1000 total reads. These samples will be discarded.

Have you tried running with the verbose flag?

Best,
Justine

elsamdea · December 10, 2021, 10:03am

Hi @jwdebelius!

Sorry for answering late.

In fact, I had run this command with the same number of samples before and didn't have this problem. Also, I always add the verbose flag because I like to see the evolution. And about the warning message, didn't show... I feel a bit confuse.

If I find the reason behind this problem, I will share it!

Thank you for your help!!

Best,

Elsa

jwdebelius · December 10, 2021, 4:14pm

Hi @elsamdea,

Which version of sidle are you using? Have you checked that the samples survive denoising? I am working (slowly) on code to do some accounting, but its still very much a work in progress.

Best,
Justine

elsamdea · December 13, 2021, 11:33am

Hi @jwdebelius,

I am using the version qiime2-2021.4. Should I update qiime2, maybe?

Also, I have checked the dada2 output and the output of trim-dada2-posthoc. All the samples appeared there.

This is amazing!! If I can help in any way, please let me know :).

Best,

Elsa

nandreani · December 15, 2021, 10:48am

Hello,
I am having the same problem.
as I had already analysed the 2 datasets independently, I used the reps-seqs and tables. I start with 480 samples (x2) and I end up with 112.

Do you have any idea on why this is happening?

I could of course start over from raw data, but I am not sure this will be working either.

thanks

Nadia

jwdebelius · December 15, 2021, 10:49am

Hi @elsamdea,

The dada2 trim posthoc doesn't care about the depth; the filtering happens during re-construction. If you merge the tables and summarize them, what do the counts look like?

Best,
Justine

jwdebelius · December 15, 2021, 10:52am

Hi @nandreani,

I'm sorry you're having issues! It's hard to trouble shoot without more details about your process and possibly the data.

Could you describe the process (how do you get the tables going in) and check their depth as well? Are you using the version from the main branch on github?

Or, you could try changing the --p-min-counts parameter.

Best,
Justine

nandreani · December 15, 2021, 11:08am

Dear Justine,

thanks for the quick reply.

I am using sidle in qiime2-2020.8 and following the tutorial here.
https://q2-sidle.readthedocs.io/en/latest/read_preparation.html

the tables were created independently with dada2 (for V1V2 and V3V4).

qiime dada2 denoise-paired --i-demultiplexed-seqs paired-end-demux.qza --p-trim-left-f 20--p-trim-left-r 20 --p-trunc-len-f 290 --p-trunc-len-r 290 --p-trunc-q 3 --o-table ./table.qza --o-representative-sequences ./rep-seqs.qza --o-denoising-stats denoising-stats.qza --p-n-threads 20

this is just as an example.

Everything runs smoothly until the qiime sidle reconstruct-counts command. The tables (also the 110nt) ones have 480 samples but the reconstructed one has only 112.

qiime sidle reconstruct-counts
--p-region V1V2
--i-kmer-map /home/andreani/nadiadata/AN/sidle/myDB/sidle-db-V1V2-100nt-map.qza
--i-regional-alignment ./V1V2/alignment/V1V2-align-map.qza
--i-regional-table ./V1V2/table-100nt.qza
--p-region V3V4
--i-kmer-map /home/andreani/nadiadata/AN/sidle/myDB/sidle-db-V3V4-100nt-map.qza
--i-regional-alignment ./V3V4/alignment/V3V4-align-map.qza
--i-regional-table ./V3V4/table-100nt.qza
--p-n-workers 32
--o-reconstructed-table ./reconstruction2/sidle_table.qza
--o-reconstruction-summary reconstruction2/sidle_summary.qza
--o-reconstruction-map reconstruction2/sidle_map.qza --verbose

This is the command I use.

The summary show me only 112 samples and on the second column, it looks like they were found only in one region.

Sorry for the long comment!

Nadia

jwdebelius · December 15, 2021, 12:07pm

Hi @nandreani,

Could you send me the alignment maps for both regions, please? I want to know if that's maybe where the issue is?

Best,
Justine

jwdebelius · December 16, 2021, 11:35am

I guess, as a general question for both of you: would you prefer an error instead of a warning if samples are going to drop out?

nandreani · December 16, 2021, 2:44pm

Dear Justine,

I also run it in verbose mode and I got the following warning:

Database map summarized
/home/data/anaconda3/envs/q2-sidle/lib/python3.8/site-packages/distributed/worker.py:4325: UserWarning: Large object of size 808.06 MiB detected in task graph:
([['AAQK01003909.1492.2988', 'AAQK01003909.1492.29 ... 672.1.1491']],)
Consider scattering large objects ahead of time
with client.scatter to reduce scheduler burden and
keep data on workers

future = client.submit(func, big_data)    # bad

big_future = client.scatter(big_data)     # good
future = client.submit(func, big_future)  # good

warnings.warn(
Alignment map constructed
/home/data/anaconda3/envs/q2-sidle/lib/python3.8/site-packages/q2_sidle/_reconstruct.py:180: UserWarning: There are 854 samples with fewer than 1000 total reads. These samples will be discarded.
warnings.warn("There are %i samples with fewer than %i total"
counts loaded
Relative abundance calculated.

So I guess this is why I lose so many.

Any suggestions on why this is happening? When independently analysing the 2 regions I have no problem of quality at all and I rarefied at >10,000 reads.

I do attach the 2 mapping files.

Thanks a lot,

Nadia
V1V2-align-map.qza (57.1 KB) V3V4-align-map.qza (161.3 KB)

nandreani · December 16, 2021, 2:45pm

Hi Justine,

I would say yes! As otherwise, it looks everything was ok, but it was not!

Thanks a lot,

Nadia

elsamdea · December 21, 2021, 3:41pm

Hi @jwdebelius,

I would say yes too! With an error, the command stop and I can see what happen at the moment

jwdebelius · December 21, 2021, 8:22pm

Okay @nandreani and @elsamdea,

I will work on adding that as an update. I'm sorry the error isn't showing up! I've opened an issue on github, and I'll see if I can add it to the pull request that does sample accounting.

As far as your specific issue goes @nandreani, you only had 1 ASV align in the V13 region and none in the V34. (You can the sequences that aligned with regions using qiime metadata tabulate. Most of the alignment/kmer map/reconstruction database files can be coerced to look like metadata.) I might try primer trimming through cutadapt before you generate your ASV table.

So, I think this is a symptom of an earlier problem. I've got a pull request to look at counts retained after alignment, I'll keep you updated about when I can get it merged. (Sorry for the slow updates, sidle is a side project for a lot of the development team.)

Best,
Justine

nandreani · January 11, 2022, 10:47am

thank you very much, Justine. I am trying to start over with 4 samples only to see if I can solve the issue. It doesn't seem I do have adapters in my sequences.

Thank you again for your help,

Nadia

system · February 11, 2022, 4:48pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.