内联汇编(inline asm)

Last Updated: 2023-12-30 12:38:12 Saturday

-- TOC --

C/C++与ASM混合编程有两种方式:

  1. 用ASM单独生成object文件,和与C生成的object一起链接,各自定义的符号可以相互引用;
  2. 直接在C/C++源代码中嵌入ASM代码,这种方式叫做inline asm!本文主题。

汇编学习

基本内联汇编

基本格式:

__asm__ __volatile__("Instruction List");
__asm__("movl %esp,%eax");

# no double underscores is also OK:
asm volatile(
    "movl $1,%eax   \n\t"
    "xor  %ebx,%ebx \n\t"
    "int  $0x80"
);

# one instruction per line
asm("addl %edi, %esi");
asm("movl %esi, %eax");

使用__asm__与使用asm是完全一样的,建议使用前者,因为有可能asm在你的代码中,是个变量或函数名称。Instruction List可以是空,比如:__asm__ __volatile__("");__asm__ ("");都是完全合法的内联汇编表达式,只不过这两条语句没有什么意义。

If we have more than one instructions, we write one per line in double quotes, and also suffix a \n and \t to the instruction. This is because gcc sends each instruction as a string to as(GAS) and by using the newline&tab we send correctly formatted lines to the assembler.

__volatile__是GCC关键字volatile的宏定义:

#define __volatile__ volatile

__volatile__volatile是可选的,你可以用它也可以不用它。如果你用了它,则是向GCC声明不要动我所写的instructions,我需要原封不动的保留每一条指令,否则当你使用了优化选项(-O)进行编译时,GCC将会根据自己的判断决定是否将这个内联汇编表达式中的指令进行优化。

举个例子:

$ cat test2.c
int main(void) {
    __asm__ __volatile__("movl $77, %ebx \n\t"
                         "movl $1,  %eax \n\t"
                         "int  $0x80");
    return 0;
}
$ gcc test2.c -o test2
$ ./test2
$ echo $?
77

这个代码就是在main中,直接用asm代码调用exit系统调用,并将return code设置为77,最后的return没有执行。

基本内联汇编中的代码,在gcc编译到汇编这个环节,能够在.s文件中看到:

$ gcc -S test2.c
$ cat test2.s
    .file   "test2.c"
    .text
    .globl  main
    .type   main, @function
main:
.LFB0:
    .cfi_startproc
    endbr64
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
#APP
# 2 "test2.c" 1
    movl $77, %ebx 
    movl $1,  %eax 
    int  $0x80
# 0 "" 2
#NO_APP
    movl    $0, %eax
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size   main, .-main
    .ident  "GCC: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0"
    .section    .note.GNU-stack,"",@progbits
    .section    .note.gnu.property,"a"
    .align 8
    .long    1f - 0f
    .long    4f - 1f
    .long    5
0:
    .string  "GNU"
1:
    .align 8
    .long    0xc0000002
    .long    3f - 2f
2:
    .long    0x3
3:
    .align 8
4:

代码在#APP#NO_APP之间!

扩展内联汇编

In basic inline assembly, we had only instructions. 基本内联汇编,只有instructions。

In extended assembly, we can also specify the operands. It allows us to specify the input registers, output registers and a list of clobbered registers. It is not mandatory to specify the registers to use, we can leave that headache to GCC and that probably fit into GCC’s optimization scheme better. Anyway the basic format is:

asm ( assembler template 
        : output operands                  /* optional */
        : input operands                   /* optional */
        : list of clobbered registers      /* optional */
        );

"constraint" (C expression) is the general form for operands! For output operands an additional modifier will be there.

用冒号:分割的几个部分,可以为空。如果一个冒号都没有,就退化为基本内联格式。

#include <stdio.h>

int main(){
    int input = 9;
    int output = 0;
    __asm__ __volatile__ ("movl %1,%0"
                            :"=r"(output)
                            :"r"(input));
    printf("%d\n",output);
    return 0;
}

运行:

$ gcc test3.c -o test3
$ ./test3
9
char *str = "hello asm!\n";

void myprint(void) {
    asm volatile("movl $11, %%edx \n\t"  // string length
                 "movl %0,  %%ecx \n\t"  // address of str
                 "movl $1,  %%ebx \n\t"  // stdout
                 "movl $4,  %%eax \n\t"  // syscall 4 write
                 "int  $0x80"
                 ::"r"(str):"edx","ecx","ebx");
}

void myexit(void) {
    asm volatile("movl $79, %ebx \n\t"  // return code
                 "movl $1,  %eax \n\t"  // syscall 1 exit
                 "int  $0x80");
}

int main(void) {
    myprint();
    myexit();
    return 0;
}

执行效果:

$ gcc -m32 test5.c -o test5
$ ./test5
hello asm!

myprint函数中的寄存器都要使用%%双百分号;

"edx","ecx","ebx",表示告诉gcc编译器,代码动了这几个寄存器,让他看着办!(而eax,编译器能够自己发现它也有使用)

int a=10, b;
asm ("movl %1, %%eax; 
      movl %%eax, %0;"
     :"=r"(b)        /* output */
     :"r"(a)         /* input */
     :"%eax"         /* clobbered register */
     );

Assembler Template

The assembler template contains the set of assembly instructions that gets inserted inside the C program.

The format is like: either each instruction should be enclosed within double quotes, or the entire group of instructions should be within double quotes. Each instruction should also end with a delimiter. The valid delimiters are newline \n and semicolon ;. \n may be followed by a tab \t. We know the reason of newline/tab, right?. Operands corresponding to the C expressions are represented by %0, %1 ... etc.

Operands

If we use more than one operand, they are separated by comma.

In the assembler template, each operand is referenced by numbers. Numbering is done as follows. If there are a total of n operands (both input and output inclusive), then the first output operand is numbered 0, continuing in increasing order, and the last input operand is numbered n-1. The maximum number of operands is as we saw in the previous section.

Output operand expressions must be lvalues. The input operands are not restricted like this. They may be expressions. The extended asm feature is most often used for machine instructions the compiler itself does not know as existing ;-). If the output expression cannot be directly addressed (for example, it is a bit-field), our constraint must allow a register. In that case, GCC will use the register as the output of the asm, and then store that register contents into the output.

As stated above, ordinary output operands must be write-only; GCC will assume that the values in these operands before the instruction are dead and need not be generated. Extended asm also supports input-output or read-write operands.

Case One:

asm ("leal (%1,%1,4), %0"
             : "=r" (five_times_x)
             : "r" (x) 
             );

constraints部分没有指定寄存器,gcc自动选择。%0是five_times_x,%1是x。

Case Two:

asm ("leal (%0,%0,4), %0"
            : "=r" (five_times_x)
            : "0" (x) 
            );

five_times_x是%0,并指定x也使用%0,%0为gcc选择出来的寄存器。

Case Three:

asm ("leal (%%ecx,%%ecx,4), %%ecx"
             : "=c" (x)
             : "c" (x) 
             );

强制使用exc寄存器!

+---+--------------------+
| r |    Register(s)     |
+---+--------------------+
| a |   %eax, %ax, %al   |
| b |   %ebx, %bx, %bl   |
| c |   %ecx, %cx, %cl   |
| d |   %edx, %dx, %dl   |
| S |   %esi, %si        |
| D |   %edi, %di        |
+---+--------------------+

以上3个case,都没有指定clobbered register,原因如下:

In all the three examples above, we didn’t put any register to the clobber list. why? In the first two examples, GCC decides the registers and it knows what changes happen. In the last one, we don’t have to put ecx on the c lobberlist, gcc knows it goes into x. Therefore, since it can know the value of ecx, it isn’t considered clobbered.

Clobber List

Some instructions clobber some hardware registers. We have to list those registers in the clobber-list, ie the field after the third ’:’ in the asm function. This is to inform gcc that we will use and modify them ourselves. So gcc will not assume that the values it loads into these registers will be valid. We shoudn’t list the input and output registers in this list. Because, gcc knows that "asm" uses them (because they are specified explicitly as constraints). If the instructions use any other registers, implicitly or explicitly (and the registers are not present either in input or in the output constraint list), then those registers have to be specified in the clobbered list.

If our instruction can alter the condition code register, we have to add cc to the list of clobbered registers.

If our instruction modifies memory in an unpredictable fashion, add memory to the list of clobbered registers. This will cause GCC to not keep memory values cached in registers across the assembler instruction. We also have to add the volatile keyword if the memory affected is not listed in the inputs or outputs of the asm.

Memory Operand Constraint (m)

When the operands are in the memory, any operations performed on them will occur directly in the memory location, as opposed to register constraints, which first store the value in a register to be modified and then write it back to the memory location.

But register constraints are usually used only when they are absolutely necessary for an instruction or they significantly speed up the process. Memory constraints can be used most efficiently in cases where a C variable needs to be updated inside "asm" and you really don’t want to use a register to hold its value. For example, the value of idtr is stored in the memory location loc:

asm("sidt %0\n" : :"m"(loc)); 

Other Constraints

Following constraints are x86 specific:

Constraint Modifiers

Cases

__asm__ __volatile__("decl %0; sete %1"
                      : "=m"(my_var), "=q"(cond)
                      : "m"(my_var) 
                      : "memory"
                      );

__asm__ __volatile__("btsl %1,%0"
                      : "=m" (ADDR)
                      : "Ir" (pos)
                      : "cc"
                      );

static inline char * strcpy(char * dest,const char *src)
{
    int d0, d1, d2;
    __asm__ __volatile__(  "1:\tlodsb\n\t"
                           "stosb\n\t"
                           "testb %%al,%%al\n\t"
                           "jne 1b"
                           : "=&S" (d0), "=&D" (d1), "=&a" (d2)
                           : "0" (src),"1" (dest) 
                           : "memory");
    return dest;
}

Windows系统内联汇编

关键词为__asm_asm

# include <stdio.h>

int main(void) {
    int i = 10;
    printf("i = %d\n", i);

    // 下面汇编语句的作用就是改变内存中 i 的值
    __asm {
        //mov dword ptr [i], 20h
        mov dword ptr [i], ebp  // ebp --> i所在的地址
    }

    printf("ebp = %X\n", i);
    printf("&i = %p\n", &i);
    return 0;
}

用VC在x86模式下编译上面的代码,可以看到ebp和i的地址。

在VC中查看代码汇编的方法是,先让代码运行到断点,然后选择调试-->窗口-->反汇编

case study

$ cat casm2.c
#include <stdio.h>

int sum(int a, int b)
{
  asm("addl %edi, %esi");
  asm("movl %esi, %eax");  // return value in eax
}

int main()
{
  printf("%d\n", sum(5, 3));
  return 0;
}
$ gcc casm2.c -o casm2
$ ./casm2
8

获取CPU是否支持AVX功能

int main(){
    int r[4] = {};
    __asm__ __volatile__ (
        "cpuid"
        : "=a"(r[0]), "=b"(r[1]), "=c"(r[2]), "=d"(r[3])
        : "a"(1)
        :);
    printf("%X\n", r[0]);
    printf("%X\n", r[1]);
    printf("%X\n", r[2]);
    printf("%X\n", r[3]);
    printf("----\n");
    printf("AVX: %d\n", r[2]&(1<<28)!=0);
    return 0;
}

还是直接使用__builtin_cpu_supports("avx")最简单...AMD手册用一整个Appendix(Volume3 Appendix E)来详细说明cpuid指令的使用。

本文链接:https://cs.pynote.net/hd/asm/202302141/

-- EOF --

-- MORE --