This might be common knowledge already, but I wanted to articulate why I think close-ref OTU picking could be a good fit for multi-region-studies
Three high-level strategies for defining OTUs... are canonically described as de novo, closed-reference, and open-reference OTU picking... Each of these methods has benefits and drawbacks.
...
In closed-reference OTU picking, input sequences are aligned to pre-defined cluster centroids in a reference database. If the input sequence does not match any reference sequence at a user-defined percent identity threshold, that sequence is excluded. (peerj, 2014).
This is essentially 'counting database hits' so
- resulting OTUs are 100% biased by the database
- resulting OTUs are 100% consistent with the database
- resulting OTUs are literally just the ones from the database
Modern ASV methods aim to be just as consistent without introducing database bias, but for this project we are knowingly using this strong bias to normalize across regions.
Let us know what you find!
Colin
P.S. You could get some popcorn and read this flaming review of closed-ref clustering, or don't because we use ASVs now!