对于64位双精度,最大的可表示整数是253(9007199254740992),对于32位浮点数,最大的可表示整数是224(16777216).请参阅
the Wikipedia page for IEEE floating point numbers的基准数字.
在Lua中验证这一点非常简单:
local maxdouble = 2^53
-- one less than the maximum can be represented precisely
print (string.format("%.0f",maxdouble-1)) --> 9007199254740991
-- the maximum itself can be represented precisely
print (string.format("%.0f",maxdouble)) --> 9007199254740992
-- one more than the maximum gets rounded down
print (string.format("%.0f",maxdouble+1)) --> 9007199254740992 again
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#define min(a, b) (a < b ? a : b)
#define bits(type) (sizeof(type) * 8)
#define testimax(test_t) { \
uintmax_t in = 1, out = 2; \
size_t pow = 0, limit = min(bits(test_t), bits(uintmax_t)); \
while (pow < limit && out == in + 1) { \
in = in << 1; \
out = (test_t) in + 1; \
++pow; \
} \
if (pow == limit) \
puts(#test_t " is as precise as longest integer type"); \
else printf(#test_t " conversion imprecise for 2^%d+1:\n" \
" in: %llu\n out: %llu\n\n", pow, in + 1, out); \
}
int main(void)
{
testimax(float);
testimax(double);
return 0;
}
The output of the above code:
float conversion imprecise for 2^24+1:
in: 16777217
out: 16777216
double conversion imprecise for 2^53+1:
in: 9007199254740993
out: 9007199254740992
当然,由于浮点精度的工作方式,64位双精度可以表示远大于264的数字,因为浮动指数增长为正. The Wikipedia page on double-precision floating-point描述:
Between 252=4,503,599,627,370,496 and 253=9,007,199,254,740,992 the representable numbers are exactly the integers. For the next range, from 253 to 254, everything is multiplied by 2, so the representable numbers are the even ones, etc. Conversely, for the previous range from 251 to 252, the spacing is 0.5, etc.
对于64位双精度,最大的可表示整数是253(9007199254740992),对于32位浮点数,最大的可表示整数是224(16777216).请参阅
the Wikipedia page for IEEE floating point numbers的基准数字.
在Lua中验证这一点非常简单:
local maxdouble = 2^53
-- one less than the maximum can be represented precisely
print (string.format("%.0f",maxdouble-1)) --> 9007199254740991
-- the maximum itself can be represented precisely
print (string.format("%.0f",maxdouble)) --> 9007199254740992
-- one more than the maximum gets rounded down
print (string.format("%.0f",maxdouble+1)) --> 9007199254740992 again
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#define min(a, b) (a < b ? a : b)
#define bits(type) (sizeof(type) * 8)
#define testimax(test_t) { \
uintmax_t in = 1, out = 2; \
size_t pow = 0, limit = min(bits(test_t), bits(uintmax_t)); \
while (pow < limit && out == in + 1) { \
in = in << 1; \
out = (test_t) in + 1; \
++pow; \
} \
if (pow == limit) \
puts(#test_t " is as precise as longest integer type"); \
else printf(#test_t " conversion imprecise for 2^%d+1:\n" \
" in: %llu\n out: %llu\n\n", pow, in + 1, out); \
}
int main(void)
{
testimax(float);
testimax(double);
return 0;
}
The output of the above code:
float conversion imprecise for 2^24+1:
in: 16777217
out: 16777216
double conversion imprecise for 2^53+1:
in: 9007199254740993
out: 9007199254740992
当然,由于浮点精度的工作方式,64位双精度可以表示远大于264的数字,因为浮动指数增长为正. The Wikipedia page on double-precision floating-point描述:
Between 252=4,503,599,627,370,496 and 253=9,007,199,254,740,992 the representable numbers are exactly the integers. For the next range, from 253 to 254, everything is multiplied by 2, so the representable numbers are the even ones, etc. Conversely, for the previous range from 251 to 252, the spacing is 0.5, etc.