mikefc

uint8 - An 8-bit unsigned integer type for R

  • An 8-bit unsigned integer type for R based upon the raw type.
  • Each value is stored in a single byte (whereas integer values in R are stored in 4 bytes), so there can be memory savings.
  • Stores only values 0-255
  • No NA, NaN, or Inf allowed
  • Basic arithmethic
  • The aim is to have similar behaviour to the uint8_t type in C.

Rationale

  • An experiment to see if there’s more useful data type for working with 8-bit data (without dropping down to the level of Rcpp).

Installation

devtools::install_github('coolbutuseless/uint8')

Features

  • Each uint8 is stored in a single byte (but there is, of course, overhead associated with atomic vector metainfo).
  • Using uint8 can result in 25% of the memory use of R integer values. e.g.
    • pryr::object_size(1:10000) => 40 kB
    • pryr::object_size(uint8(1:10000)) => 10.2 kB
  • Specific S3 methods written for:
    • uint8() (data creation)
    • +, -, *, /
    • %%, %/%
    • : (sequence creation)
    • any(), all(), as.bitstring(), abs(), order(), sign()
  • A lot of things come for free because of the use of the built-in R raw type
    • as.integer(), as.numeric(), as.character()
    • %in%
    • <, <=, ==, !=, >=, >

Limitations

  • Can’t use a uint8 for subscripts (similar limitation in the bit64 package for integer64)
  • The : operator only dispatches on its first argument, so uint8(1):10 will work, but 1:uint8(10) won’t.

Comparison to raw

  • raw allows for 8-bit data, but doesn’t support any arithmetic operators
  • uint8 support basic arithmetic
  • raw accepts values outside the range 0-255 by setting them to zero
  • uint8 accepts values outside the range 0-255 by calculating the value modulo 256. e.g. uint8(-10) is 246
  • raw accepts NA, NaN, Inf by setting them to zero.
  • uint8 disallows NA, NaN and Inf

Examples

b02 <- as.uint8(  2)
b0a <- as.uint8( 10)
bff <- as.uint8(255)


# Addition is modulo 256
bff + b0a   # (255 + 10) %% 256 => 9
[1] 9
# Division result trucated to integer value
b0a/b02   # 10/2 => 5
[1] 5
bff/b0a   # 255/10 => 25.5 => 25
[1] 25
# exponentiation is modulo 256
b02 ^ b0a   # 2^10 => 1024.    1024 %% 256 => 0
[1] 0
# initialisation is modulo 256
as.uint8(seq(200, 300, 5))
 [1] 200 205 210 215 220 225 230 235 240 245 250 255   4   9  14  19  24
[18]  29  34  39  44

Future

  • Flesh out the supported functions as needed
  • Investigate if there’s any point to an actual Rcpp backend for speed or accuracy of behaviour.