Number Representation
Value representation
Computers have finite number of bits to represent numbers (e.g. for 32-bit).
- Overflow: When the result of an operation is too large to be represented in the number of bits available.
- Underflow: When the result of an operation is too small to be represented in the number of bits available.
A position number system. Each digit is multiplied by the base raised to the power of its position .
Integer part: Repeatedly divide by 2 and record the remainder.
Example
Fractional part: Repeatedly multiply by 2 and record the integer part.
Example
Conversion for can be done replacing division by with division by .
Sign representation
A function represents the value of a binary number based on the sign representation.
Uses a Sign bit: 0 for positive, 1 for negative. The excess is added to the value to represent the sign.
- Range: to where is the number of bits.
- Top half is positive, bottom half is negative.
Example
- Excess 127: is represented as .
- Excess 128: is represented as .
Saves the leftmost bit for the sign. The binary representation is inverted for negative numbers.
- Range: to .
Save leftmost bit for the sign. The binary representation is inverted and 1 is added for negative numbers.
- Range: to .
Only two's complement can sum up numbers of different signs without special cases, to get the sum.
Floating-point representation
Represent a floating point binary number as long bits.
First convert the binary number to scientific notation and have a single digit to the left of the radix point (normalized):
Where is the Significand, is the exponent, and is the bias.
The binary representation is:
- Sign bit: ,
- Significand : Add trailing zeros to fill bits of available length
The following table shows the parameters for multiple IEEE 754 formats:
Parameter | Binary32 | Binary64 | Binary128 |
---|---|---|---|
Storage () | 32 bits | 64 bits | 128 bits |
Exponent () | 8 bits | 11 bits | 15 bits |
Bias () | 127 | 1023 | 16383 |
Converting 13.375 to IEEE Binary32
Binary32:
- Sign bit: (positive)
- Normalize:
- (ignore leading 1)
- Significand: (add trailing zeros)
- Exponent:
- Biased Exponent:
Representation:
Converting Binary32 to decimal
Consider :
- Sign bit: (negative)
- Exponent:
- Significand:
- Remove training 0s and add leading "1."
Note that the range of is for Binary32. and are reserved for special cases. !!! info "Representable numbers"
The range of floating point numbers is limited by the number of bits used for the significand and exponent.
For a given format with exponent bits, significand bits and :
- Range:
- Bias value:
- All values:
- Use smallest values:
- Use largest values:
- Spacing:
- Represented numbers are unevenly spaced as spacing depends on exponent, which scales logarithmically.
- Spacing grows as we go further from 0 as scaling factor of .
Operations
2's complement
- Addition: Add numbers as unsigned, discard overflow.
- Subtraction: Add the 2's complement of the number to be subtracted.
- Multiplication: Shift-and-add unsigned numbers, and adjust sign.
For each bit in the multiplier, if it is 1, add the multiplicand shifted by the position of the bit.
To compute :
Result is negative:
Floating point
Basic steps:
- Align exponents
- Add / subtract number (exclude exponent)
- Normalize result
- Round to fit significand
Addition example
To compute :
-
Align exponents:
-
Add numbers:
-
Normalize:
-
Result:
Basic steps:
- Add / subtract exponents (subtract / add bias if working with IEEE)
- Multiply / divide numbers
- Normalize result
- Round to fit significand
- Determine sign
Multiplication example
To compute :
- Add exponents:
- Multiply numbers:
- To normalize, use exponent:
- Normalize: (unchanged)
- Result:
Note that we skipped subtracting bias as we're working with real binary numbers instead of a IEEE representation.
Multiplication example (IEEE)
To compute :
- Extract exponents: ,
- Add exponents and subtract bias:
- Extract numbers by removing trailing zeros and multiply:
- Normalize:
- Result: