Unit of measurement for sequence length: unit base pairs versus unit nucleotides

Hi y'all

I hope everyone is doing well. I have two set of sequences with Read 1 and Read 2 label. I performed merging and my rep-seq summary turned out around 252 to 254 nucleotides long and a pretty short range and std which I believe captured the expected +/- 254-bp for 16S V4 length.

The primer used was the Caporaso et al., (2011) primers 515F/806R.

I have these questions:

  1. So when we do merging, the sequences we have in rep-seqs.qza becomes a single series of nucleotides and since they are consensus sequences, it is only appropriate to refer to the length in unit nucleotides (nt)?

  2. With this, the literature says that V4 length is approx 254 bp (and many other literature have been using base pairs (bp) as unit length for reads), is it safe to assume that my 254 nt reads are within the expected 16S V4 length of 254-bp if our reads are merged? (basically asking if 254 bp = 254 nt; technically they are different because the other one is base pairs, but I'm asking this in the context of ~length~)

  3. Also, the sequences in read 1 file and read 2 file are all technically in unit nucleotides and not in unit base pairs?

Sorry it seems a somehow a poop question. I just wanna make sure I am using the right units. I've read resources that read length are in unit base pairs but the statistics in rep-seq.qzv are in unit nucleotides.

Thanks a lot! :heartpulse: :blush:

Hi @rmbn,

The two terms/units are pretty much interchangable. Remember that a nucleotide consists of a sugar backbone, with a "base" (like acid/base) attached plus phosphate groups for energy. A single nucleotide can sometimes be refered to as a "base" since we're interested on that base bit that sticks out. When you have a double stranded nucleotide (DNA, dsRNA), the bases pair off: base pair. In terms of sequences, whether you have a single base (nt) or both bases (bp), the length will remain the same.

I think because the R1 and R2 files are technically single stranded (since Illumina does sequencing by synthesis whcih involves the single stranded sequencing via PCR) you could say they're nucleotides instead of basepairs. In reality, since nucleotides and base pairs measure the exact same thing from a sequencing perspective, pick one abbreviation that you like and stick with it. (I'm rapidly coming to the conclusion that people don't care as long as its consistent).

Trust me, we're a microbiome forum that does a lot of gut work. You can ome back and ask us more questions about poop!

Best,
Justine

3 Likes

I think this is a great question, as it let's us zoom into to the units we are using!

Justine did a really good job summarizing why these terms are biologically equivalent.

We could also frame this question in terms of measurement theory.

Get this; there is no 'Unit of measurement' here because we are not measuring these things, we are counting them. Like, we could report the length of our sequences in nanometers (a metric unit). Instead we, count the number of letters in the sequence.

And counts are unitless! :exploding_head:

2 Likes

Thank you much Justine @jwdebelius and Colin @colinbrislawn!

In terms of sequences, whether you have a single base (nt) or both bases (bp), the length will remain the same.

Thanks for confirming this. I saw that a huge majority of literature I've read refer to lengths as bp, with few exception to WGS contigs where I saw a couple of papers used bases to refer to length (e.g. 10 kBases). Some don't mention units at all and just numbers.

Get this; there is no 'Unit of measurement' here because we are not measuring these things, we are counting them. Like, we could report the length of our sequences in nanometers (a metric unit). Instead we, count the number of letters in the sequence.

Interesting. I saw a paper too that only mentions number and says the length of this gene is INTEGER. No units at all. Now it makes sense.

Trust me, we're a microbiome forum that does a lot of gut work. You can ome back and ask us more questions about poop!

We love poop, the community(ies) they have is/are important part to animals' health. And as I said, this qualifies as a poop question because this is from the 16S rDNA analysis of the gut contents I'm doing. :wink: :joy:

Thanks to both of you again for clarifying my issues about :dna: length! :straight_ruler:

Stay safe and best regards!

2 Likes