multiple sequences assigned to same organism, copy number issue or best guess?

I've got a set of about 16 fecal, vomit ,and soil samples having to do with dogs and their
play area environment that have been obtained from the same source over about 5
years. For a variety of reasons, I wanted to do a sequence level merge and
up to a base or two trimming the sequences were easy to equate even with
various taxnomic issues over the years. What I hadn't appreciated however
is howmany sequences were assigned to the same organism- in some cases
20 of them IIRC to Fusobacterium mortiferum. Several had in excess of 10.

The party line is copy number with an organism having different sequences
in each copy. However, the variations don't appear to correlate well.
Blkasting these indicates some are a bit questionable and I have just patched
them myself but the remaining ones still suggest large copy numbers.

What is the current thinking on this? Do they tend to have this kind of variation?
I could post sequences if anyone is interested. In at least one case the
Fusobacterium was actually a perfect match to an organism IIRC associated
with aquatic environments.

Thanks.

Hello @marchywka,

Welcome back to the forums!

Neat! Looks like you have sub-species variation within F. mortiferum.

I think the hot new thing is ASVs, which matches will with your finding! I have not seen folks working on copy numbers lately, though that may be in my corner of the field.

That's... unexpected. Could be an issue with the database, or perhaps a novel finding.

You can totally post the sequence!

Thanks. My first thought was subspecies but I found more literature on copy
number and it turns out the ID involved some mismatch in most cases.
And in at least one case, there was a better or exact match to an unexpected
organism. I was also talking to an author who looks for problems in
16s publications, recently finding batch effects in low count biological
samples, and he suggested library errors are very common.
If there is interest I can post some stuff on github but it will take a little
while to sort out. "final.fasta" may include the species and you can see
some bar plots. I'm not using qiime for any of this all my own stuff :slight_smile:

16s stuff

I guess as a rule it may be a good idea to blast before publishing ...

Here is the sequence, I have code to merge projects ( biom + fastaq ) and take
most recent assignment plus a "patch" file using assignments from my own blast searches.
This was originally a Fusobacterium but is a better match to g__Alkalihalobacillus s__bogoriensis
from NCBI nucleotide,

AGTGGGGAATATTGGACAATGGACCAAAAGTCTGATCCAGCAATTCTGTGTGCACGATGACGGTCTTAGGATTGTAAAGTGCTTTCAATCGGGAAAAAGAAAGTGATGGTACCGATAGAAGAAGCGACGGCTAAATACGTGCCAGCAGCCGCGGTAATACGTATGTCGCAAGCGTTATCCGGATTTATTGGGCGTAAAGCGCGTCTAGGCGGTCTGGTAAGTCTGATGTGGAAATGCGGGGCTCAACTCCGTATTGCGTTGGAAACTGCCAGACTAGAGTACTGGAGAGGTGGGCGGAACTACAAGTGTAGAGGTGAAATTCGTAGATATTTGTAGGAATGCCGATAGAGAAGTCAGCTCACTGGACAGATACTGACGCTGAAGCGCGAAAGCATGGGGAGCAAA final.fasta

mjmseq0 k__Bacteria p__Bacillota c__Bacilli o__Bacillales f__Bacillaceae g__Alkalihalobacillus s__bogoriensis aka=2 ../zymo/data/2018-03-09 5 k__Bacteria p__Fusobacteria c__Fusobacteriia o__Fusobacteriales f__Fusobacteriaceae g__Fusobacterium s__NA

1 Like