Pack a double floating point value into fewer than 32 bits.

Pack a double into the lower bits of an int32 with the given bits for sign, exponent and mantissa.

dbl_to_lofi(dbl, float_name = "bfloat16", float_bits = NULL)

lofi_to_dbl(lofi, float_name = "bfloat16", float_bits = NULL)

Arguments

dbl	64 bit R double
float_name	'single', 'half', 'bfloat16'. Default: 'bfloat16'
float_bits	length (in number of bits) of sign, exponent and mantissa. Default: NULL. If this value is not null, then it will override anything the user may have specified for `float_name`
lofi	low-bit representation

Value

32 bit integer with lower bits set to represent the quantized floating point value

Details

By packing into a low fidelity bit representation you will definitely lose precision i.e. converting back into full 64 bit precision will not give you back the number you started with.

Packing a double into low-fidelity format has no explicit support for special values such as NaN, NA or Inf. These values may get converted to other numeric values or other special values. The result is undefined. Operate on special values at your own risk.

Pack a double floating point value into fewer than 32 bits.

Arguments

Value

Details

Contents