post

This article is a ‘quick-n-dirty’ introduction to the AT&T assembly language syntax, as implemented in the GNU Assembler as(1). For the first timer the AT&T syntax may seem a bit confusing, but if you have any kind of assembly language programming background, it’s easy to catch up once you have a few rules in mind. I assume you have some familiarity to what is commonly referred to as the INTEL-syntax for assembly language instructions, as described in the x86 manuals. Due to its simplicity, I use the NASM (Netwide Assembler) variant of the INTEL-syntax to cite differences between the formats.

The GNU assembler is a part of the GNU Binary Utilities (binutils), and a back-end to the GNU Compiler Collection. Although as is not the preferred assembler for writing reasonably big assembler programs, its a vital part of contemporary Unix-like systems, especially for kernel-level hacking. Often criticised for its cryptic AT&T-style syntax, it is argued that as was written with an emphasis on being used as a back-end to GCC, with little concern for “developer-friendliness”. If you are an assembler programmer hailing from an INTEL-Syntax background, you’ll experience a degree of stifling with regard to code-readability and code-generation. Nevertheless, it must be stated that, many operating systems’ code-base depend on as as the assembler for generating low-level code.

The Basic Format

The structure of a program in AT&T-syntax is similar to any other assembler-syntax, consisting of a series of directives, labels, instructions – composed of a mnemonic followed by a maximum of three operands. The most prominent difference in the AT&T-syntax stems from the ordering of the operands.

For example, the general format of a basic data movement instruction in INTEL-syntax is,

mnemonic	destination, source

whereas, in the case of AT&T, the general format is

mnemonic	source, destination

To some (including myself), this format is more intuitive. The following sections describe the types of operands to AT&T assembler instructions for the x86 architecture.

Registers

All register names of the IA-32 architecture must be prefixed by a ‘%’ sign, eg. %al,%bx, %ds, %cr0 etc.

mov	%ax, %bx

The above example is the mov instruction that moves the value from the 16-bit register AX to 16-bit register BX.

Literal Values

All literal values must be prefixed by a ‘$’ sign. For example,

 		
mov	$100,	%bx
mov	$A,	%al

The first instruction moves the the value 100 into the register AX and the second one moves the numerical value of the ascii A into the AL register. To make things clearer, note that the below example is not a valid instruction,

mov	%bx,	$100

as it just tries to move the value in register bx to a literal value. It just doesn’t make any sense.

Memory Addressing

In the AT&T Syntax, memory is referenced in the following way,

segment-override:signed-offset(base,index,scale)

parts of which can be omitted depending on the address you want.

%es:100(%eax,%ebx,2)

Please note that the offsets and the scale should not be prefixed by ‘$’. A few more examples with their equivalent NASM-syntax, should make things clearer,

GAS memory operand			NASM memory operand
------------------			-------------------

100					[100]
%es:100					[es:100]
(%eax)					[eax]
(%eax,%ebx)				[eax+ebx]
(%ecx,%ebx,2)				[ecx+ebx*2]
(,%ebx,2)				[ebx*2]
-10(%eax)				[eax-10]
%ds:-10(%ebp)				[ds:ebp-10]

Example instructions,

mov	%ax,	100
mov	%eax,	-100(%eax)

The first instruction moves the value in register AX into offset 100 of the data segment register (by default), and the second one moves the value in eax register to [eax-100].

Operand Sizes

At times, especially when moving literal values to memory, it becomes neccessary to specify the size-of-transfer or the operand-size. For example the instruction,

mov	$10,	100

only specfies that the value 10 is to be moved to the memory offset 100, but not the transfer size. In NASM this is done by adding the casting keyword byte/word/dword etc. to any of the operands. In AT&T syntax, this is done by adding a suffix – b/w/l – to the instruction. For example,

movb	$10,	%es:(%eax)

moves a byte value 10 to the memory location [ea:eax], whereas,

movl	$10,	%es:(%eax)

moves a long value (dword) 10 to the same place.

A few more examples,

movl	$100, %ebx
pushl	%eax
popw	%ax

Control Transfer Instructions

The jmp, call, ret, etc., instructions transfer the control from one part of a program to another. They can be classified as control transfers to the same code segment (near) or to different code segments (far). The possible types of branch addressing are – relative offset (label), register, memory operand, and segment-offset pointers.

Relative offsets, are specified using labels, as shown below.

label1:
	.
	.
  jmp	label1

Branch addressing using registers or memory operands must be prefixed by a ‘*’. To specify a “far” control tranfers, a ‘l’ must be prefixed, as in ‘ljmp’, ‘lcall’, etc. For example,

GAS syntax			NASM syntax
==========			===========

jmp	*100			jmp  near [100]
call	*100			call near [100]
jmp	*%eax			jmp  near eax
jmp	*%ecx			call near ecx
jmp	*(%eax)			jmp  near [eax]
call	*(%ebx)			call near [ebx]
ljmp	*100			jmp  far  [100]
lcall	*100			call far  [100]
ljmp	*(%eax)			jmp  far  [eax]
lcall	*(%ebx)			call far  [ebx]
ret				retn
lret				retf
lret $0x100			retf 0x100

Segment-offset pointers are specified using the following format:

jmp	$segment, $offset

For example:

jmp	$0x10, $0x100000

If you keep these few things in mind, you’ll catch up real soon. As for more details on the GNU assembler, you could try the documentation.