Pack a double into the lower bits of an int32 with the given bits for sign, exponent and mantissa.

dbl_to_lofi(dbl, float_name = "bfloat16", float_bits = NULL)

lofi_to_dbl(lofi, float_name = "bfloat16", float_bits = NULL)

Arguments

dbl

64 bit R double

float_name

'single', 'half', 'bfloat16'. Default: 'bfloat16'

float_bits

length (in number of bits) of sign, exponent and mantissa. Default: NULL. If this value is not null, then it will override anything the user may have specified for float_name

lofi

low-bit representation

Value

32 bit integer with lower bits set to represent the quantized floating point value

Details

By packing into a low fidelity bit representation you will definitely lose precision i.e. converting back into full 64 bit precision will not give you back the number you started with.

Packing a double into low-fidelity format has no explicit support for special values such as NaN, NA or Inf. These values may get converted to other numeric values or other special values. The result is undefined. Operate on special values at your own risk.