理解Usual Arithmetic Convertions和Integral Promotions

Last Updated: 2024-01-04 10:48:15 Thursday

-- TOC --

从汇编中寻找答案

在使用C/C++语言编写代码的时候，始终要注意类型转换这个重要的细节！有的时候类型转换是编译器隐式进行的，比如下面这段有bug的代码：

#include <stdio.h>

int main(void) {
    unsigned int a=1, b=2;

    if ((a-b)>0)  // should be if(a>b)
        printf("WTF...\n");

    return 0;
}

a-b应该是-1，不会大于0，但是代码确打印了WTF....（对于unsigned类型，建议直接比较大小，不要先加减）

错误的原因：a和b都是unsigned，a-b的结果也就转换成了unsigned！不可能小于0。

对于类似上面这段代码所展示的转换，在标准C中，有一个术语来描述，叫做Usual Arithmetic Convertions。

一般规则：当operator两边的operand不同类型时，会自动向不损失信息的方向转换，比如float-->double，char-->int，int-->long。类型转换几乎无处不在，因为C语言处理的就是大大小小的内存块！

上面bug涉及的类型转换，是signed-->unsigned，只要运算涉及unsigned，signed就会被当做unsigned来处理，结果也是unsigned的。signed转换成unsigned后，可能是一个很大的数，因为负数在系统中用补码表示。

下面的代码，也有bug：

#include <stdio.h>

int main(void) {
    unsigned int a=1;
    int b=2;

    if ((a-b)>0)
        printf("WTF...\n");

    return 0;
}

运行同样会看到WTF....计算结果-1转成了unsigned，就大于0了！

再看一个bug：

#include <stdio.h>

int main(void) {
    unsigned int a=1;
    int b=-1;

    if (a < b)
        printf("WTF...\n");

    return 0;
}

依然会显示WTF....因为b被转换成了unsigned类型。

修正这些bug的方法，不要将signed和unsigned混在一起计算，除非你知道自己在干什么！编译时开启-Wall -Wextra，会出现相应的warning。

接下来，抄一段书，来自《C专家编程》，里面有一段关于Usual Arithmetic Conversions在C标准文档中的描述。这段文字还提到了另一个术语，Integral Promotions，这是另一个常见的bug，必须要了解。

Section 6.2.1.1 Characters and Integers (the integral promotions)

A char, a short int, or an int bit-field, or their signed or unsigned varieties, or an enumeration type, may be used in an expression wherever an int or unsigned int may be used. If an int can represent all the values of the origianl type, the value is converted to an int; otherwise it is converted to an unsigned int. These are called the integral promotions.

size小于int的整型数据，在计算的时候，会转成int，比如[unsigned] char!因为转成int不会有信息损失，有时不用一下子到unsigned int。

Section 6.2.1.5 Usual Arithmetic Conversions

Many binary operators that expect operands of arithmetic type cause conversions and yield result types in a similar way. The purpose is to yield a common type, which is also the type of the result. This pattern is called the usual arithmetic conversions.

First, if either operand has type long double, the other operand is converted to long double. Otherwise, if either operand has type double, the other operand is converted to double. Otherwise, if either operand has type float, the other operand is converted to float. Otherwise the integral promotions has performed on both operands. Then the following rules are applied.

If either operand has type unsigned long int, the other operand is converted to unsigned long int. Otherwise, if one operand has type long int and the other has type unsigned int, if a long int can represent all values of an unsigned int the operand of type unsigned int is converted to long int; if a long int cannot represent all the values of an unsigned int, both operands are converted to unsigned long int. Otherwise, if either operand has type long int, the other operand is converted to long int. Otherwise, if either operand has type unsigned int, the other operand is converted to unsigned int. Otherwise, both operands have type int.

The values of floating operands and of the results of floating expressions may be represented in greater precision and range than that required by the type; the types are not changed thereby.

对我来说，这里面关键的一个信息，就是在什么时候使用integral promotions。

下面说点人话：

Operands with different types get converted when you do arithmetic. Everything is converted to the type of the floatiest, longest operand, unsigned（原文这里是singed，我看来看去，应该是个typo） if possible without losing bits.

再来一个示例：

#include <stdio.h>


int main(void) {
    // 编译器在-O0的情况下，都会在汇编层面优化掉这些if，
    // 因此，这个测试不够完美！

    if (-1 < 1)
        printf("yes\n");

    if (-1 < (unsigned char)1) // integral promotion to int
        printf("yes\n");

    if (-1 < (unsigned short)1) // integral promotion to int
        printf("yes\n");

    if (-1 < (unsigned int)1)
        printf("yes\n");
    else
        printf("what...\n");

    //------------------------------------

    if ((char)-1 < (unsigned char)1)  // integral promotion to int
        printf("yes\n");
    else
        printf("what...\n");

    if ((short)-1 < (unsigned short)1)  // integral promotion to int
        printf("yes\n");
    else
        printf("what...\n");

    //-----------------------------------

    if (-1 < (long)1)
        printf("yes\n");
    else
        printf("what...\n");

    if (-1 < (unsigned long)1)
        printf("yes\n");
    else
        printf("what...\n");

    return 0;
}

运行时，会出现两个what...

$ gcc test.c
$ ./a.out
yes
yes
yes
what...
yes
yes
yes
what...

这两个what，都是因为将-1转成unsigned类型导致的。

似乎有这样一个概念，使用=就存在隐式地类型转换，如果刻意为之，也叫作implicit conversions。

从汇编中寻找答案

CPU几乎只在设置rFlags标志的时候，才存在一点点signed或unsigned的概念。比如sub指令：

This instruction evaluates the result for both signed and unsigned data types and sets the OF and CF flags to indicate a borrow in a signed or unsigned result, respectively. It sets the SF flag to indicate the sign of a signed result.

这么理解这段来自AMD64芯片手册中的话：CPU正常执行二进制减法，如果将两个数字理解为signed，OF表示出现borrow，如果将两个数字理解为unsigned，CF表示出现borrow。SF标志与MSB相同，如果将结果理解为signed，SF=1就表示负数。

C代码被编译后，所有的类型信息就都没有了，只有编译器有类型信息，因此编译器会根据不同的类型，来选择使用哪些rFlags标志，进行正确的逻辑判断。简单的说，如果是signed类型，编译器会选择判断OF标志，如果是unsigned类型，编译器就会选择判断CF标志，编译器就是根据不同类型，选择不同的jcc指令。

本文链接：https://cs.pynote.net/sf/c/202112113/

-- EOF --

-- MORE --