Duplicate records in q2-sidle reconstruct-counts output

Hello @jwdebelius,

I was exploring the reconstructed feature table coming from qiime sidle reconstruct-counts and I noticed that I have duplicated records. Specifically, there are groups of two or more records with the same count number in the same samples. Broadly, they are assigned to the same taxa, unless there are small differences at the species level, but they have different database IDs from sidle reconstruct-database.

I have attached an screenshoot with a fraction of these cases to facilitate understanding. I suspect these are ASVs that map to multiple sequences in the SILVA database, and all of them are assigned to the same taxonomy level. However, I believe there should be only one record per ASV in my feature table before downstream analysis.

Additionally, I am attaching the .qza file to allow you to check the parameters used in both the reconstruct-database and reconstruct-counts actions.

I’m not sure if this is the expected behavior and I should proceed with some preprocessing to retain only one record per ASV or, if I might be misunderstanding some parameter in these two q2-sidle actions that is causing this specific duplicate output.

Please, let me know if you need additional information.

Andrés

MB_ATBMM_feature_table_recons.qza (527.9 KB)

Hi @andresarroyo,

This sounds like a somewhat expected behavior. Reconstruction of the table is taxonomy agnostic, it only looks at the match between the ASV and the sequence, and then the sequence IDs.

Would you be willing to DM me your database map and database summary? My guess is that there's one region that splits those taxa into clusters (becuase they look like they're mostly clusters with a least one reference sequence) and that might have been missing, in which case, you'd get splitting.

Sidle has been on my list to revisit for a while, so I will see if I can find time to poke more.

Best,
Justine

1 Like