Asm Code Formatter



Coding Standards

Assembly Language Style Guidelines

You can read and download this Assembly Language Style Guidelines from here for free at your own risk. All trademarks, registered trademarks, product names and company names or logos mentioned herein are the property of their respective owners.

Assembly Language Style Guidelines

Style Guidelines for Assembly Language Programmers
1.0 - Introduction
1.2 - Graphics Example
1.3 - S.COM Example
1.4 - Intended Audience
1.5 - Readability Metrics
1.6 - How to Achieve Readability
1.7 - How This Document is Organized
1.8 - Guidelines, Rules, Enforced Rules, and Exceptions
1.9 - Source Language Concerns
2.0 - Program Organization
2.1 - Library Functions
2.2 - Common Object Modules
2.3 - Local Modules
2.4 - Program Make Files
3.0 - Module Organization
3.1 - Module Attributes
3.1.1 - Module Cohesion
3.1.2 - Module Coupling
3.1.3 - Physical Organization of Modules
3.1.4 - Module Interface
4.0 - Program Unit Organization
4.1 - Routine Cohesion
4.1.1 - Routine Coupling
4.1.2 - Routine Size
4.2 - Placement of the Main Procedure and Data
5.0 - Statement Organization
6.0 - Comments
6.1 - What is a Bad Comment?
6.2 - What is a Good Comment?
6.3 - Endline vs. Standalone Comments
6.4 - Unfinished Code
6.5 - Cross References in Code to Other Documents
7.0 - Names, Instructions, Operators, and Operands
7.1 - Names
7.1.1 - Naming Conventions
7.1.2 - Alphabetic Case Considerations
7.1.3 - Abbreviations
7.1.4 - The Position of Components Within an Identifier
7.1.5 - Names to Avoid
7.2 - Instructions, Directives, and Pseudo-Opcodes
7.2.1 - Choosing the Best Instruction Sequence
7.2.2 - Control Structures
7.2.3 - Instruction Synonyms
8.0 - Data Types
8.1 - Defining New Data Types with TYPEDEF
8.2 - Creating Array Types
8.3 - Declaring Structures in Assembly Language
8.4 - Data Types and the UCR Standard Library

Style Guidelines for Assembly Language Programmers

1.0 Introduction

Most people consider assembly language programs difficult to read. While there are a multitude of reasons why people feel this way, the primary reason is that assembly language does not make it easy for programmers to write readable programs. This doesn't mean it's impossible to write readable programs, only that it takes an extra effort on the part of an assembly language programmer to produce readable code.

To demonstrate some common problems with assembly language programs, consider the following programs or program segments. These are actual programs written in assembly language taken from the internet. Each example demonstrates a separate problem. (By the way, the choice of these examples is not intended to embarass the original authors. These programs are typical of assembly language source code found on the Internet.)


%TITLE "Sums TWO hex values"

        MODEL   small
        STACK   256


exitCode        db      0
prompt1         db      'Enter value 1: ', 0
prompt2         db      'Enter value 2: ', 0
string          db      20 DUP (?)


        EXTRN   StrLength:proc
        EXTRN   StrWrite:proc, StrRead:proc, NewLine:proc
        EXTRN   AscToBin:proc, BinToAscHex:proc

        mov     ax,@data
        mov     ds,ax
        mov     es,ax
        mov     di, offset prompt1
        call    GetValue
        push    ax
        mov     di, offset prompt2
        call    GetValue
        pop     bx
        add     ax,bx
        mov     cx,4
        mov     di, offset string
        call    BinToAscHex
        call    StrWrite
        mov     ah,04Ch
        mov     al,[exitCode]
        int     21h

PROC    GetValue
        call    StrWrite
        mov     di, offset string
        mov     cl,4
        call    StrRead
        call    NewLine
        call    StrLength
        mov     bx,cx
        mov     [word bx + di], 'h'
        call    AscToBin
ENDP    GetValue

        END     Start

Well, the biggest problem with this program should be fairly obvious - it has absolutely no comments other than the title of the program. Another problem is the fact that strings that prompt the user appear in one part of the program and the calls that print those strings appear in another. While this is typical assembly language programming, it still makes the program harder to read. Another, relatively minor, problem is that it uses TASM's "less-than" IDEAL syntax[1].

This program also uses the MASM/TASM "simplified" segment directives. How typically Microsoft to name a feature that adds complexity to a product "simplified." It turns out that programs that use the standard segmentation directives will be easier to read[2].

Before moving one, it is worthwhile to point out two good features about this program (with respect to readability). First, the programmer chose a reasonable set of names for the procedures and variables this program uses (I'll assume the author of this code segment is also the author of the library routines it calls). Another positive aspect to this program is that the mnemonic and operand fields are nicely aligned.

Okay, after complaining about how hard this code is to read, how about a more readable version? The following program is, arguably, more readable than the version above. Arguably, because this version uses the UCR Standard Library v2.0 and it assumes that the reader is familiar with features of that particular library.

; AddHex-
; This simple program reads two integer values from
; the user, computes their sum, and prints the
; result to the display.
; This example uses the "UCR Standard Library for
; 80x86 Assembly Language Programmers v2.0"
; Randall Hyde
; 12/13/96

                title           AddHex
                include         ucrlib.a
                includelib      ucrlib.lib

cseg            segment para public 'code'
                assume  cs:cseg

; GetInt-
; This function reads an integer value from the keyboard and
; returns that value in the AX register.
; This routine traps illegal values (either too large or
; incorrect digits) and makes the user re-enter the value.

GetInt          textequ <call GetInt_p>
GetInt_p        proc
                push    dx              ;DX hold error code.

GetIntLoop:     mov     dx, false       ;Assume no error.
                try                     ;Trap any errors.

                FlushGetc               ;Force input from a new line.
                geti                    ;Read the integer.

                except  $Conversion     ;Trap if bad characters.
                print   "Illegal numeric conversion, please
re-enter", nl
                mov     dx, true

                except  $Overflow       ;Trap if # too large.
                print   "Value out of range, please re-enter.",nl
                mov     dx, true

                cmp     dx, true
                je      GetIntLoop
                pop     dx
GetInt_p        endp

Main            proc


                print   'Enter value 1: '
                mov     bx, ax

                print   'Enter value 2: '
                print   cr, lf, 'The sum of the two values is '
                add     ax, bx

Quit:           CleanUpEx
                ExitPgm                 ;DOS macro to quit program.
Main            endp

cseg            ends

sseg            segment para stack 'stack'
stk             db      256 dup (?)
sseg            ends

zzzzzzseg       segment para public 'zzzzzz'
LastBytes       db      16 dup (?)
zzzzzzseg       ends
                end     Main

It is well worth pointing out that this code does quite a bit more than the original AddHex program. In particular, it validates the user's input; something the original program did not do. If one were to exactly simulate the original program, the program could be simplified to the following:

                print   nl, 'Enter value 1: '
                mov     bx, ax

                print   nl, 'Enter value 2: '
                add     ax, bx

In this example, the two sample solutions improved the readability of the program by adding comments, formatting the program a little bit better, and by using the high-level features of the UCR Standard Library to simplify the coding and keep output string literals with the statements that print them.

1.2 Graphics Example

The following program segment comes from a much larger program named "MODEX.ASM" on the net. It deals with setting up the color graphics display.

;SET_POINT (Xpos%, Ypos%, ColorNum%)
; Plots a single Pixel on the active display page
; ENTRY: Xpos     = X position to plot pixel at
;        Ypos     = Y position to plot pixel at
;        ColorNum = Color to plot pixel with
; EXIT:  No meaningful values returned

                DW  ?,? ; BP, DI
                DD  ?   ; Caller
    SETP_Color  DB  ?,? ; Color of Point to Plot
    SETP_Ypos   DW  ?   ; Y pos of Point to Plot
    SETP_Xpos   DW  ?   ; X pos of Point to Plot



    PUSHx   BP, DI              ; Preserve Registers
    MOV     BP, SP              ; Set up Stack Frame

    LES     DI, d CURRENT_PAGE  ; Point to Active VGA Page

    MOV     AX, [BP].SETP_Ypos  ; Get Line # of Pixel
    MUL     SCREEN_WIDTH        ; Get Offset to Start of Line

    MOV     BX, [BP].SETP_Xpos  ; Get Xpos
    MOV     CX, BX              ; Copy to extract Plane # from
    SHR     BX, 2               ; X offset (Bytes) = Xpos/4
    ADD     BX, AX              ; Offset = Width*Ypos + Xpos/4

    MOV     AX, MAP_MASK_PLANE1 ; Map Mask & Plane Select Register
    AND     CL, PLANE_BITS      ; Get Plane Bits
    SHL     AH, CL              ; Get Plane Select Value
    OUT_16  SC_Index, AX        ; Select Plane

    MOV     AL,[BP].SETP_Color  ; Get Pixel Color
    MOV     ES:[DI+BX], AL      ; Draw Pixel

    POPx    DI, BP              ; Restore Saved Registers
    RET     6                   ; Exit and Clean up Stack


Unlike the previous example, this one has lots of comments. Indeed, the comments are not bad. However, this particular routine suffers from its own set of problems. First, most of the instructions, register names, and identifiers appear in upper case. Upper case characters are much harder to read than lower case letters. Considering the extra work involved in entering upper case letters into the computer, it's a real shame to see this type of mistake in a program[3]. Another big problem with this particular code segment is that the author didn't align the label field, the mnemonic field, and the operand field very well (it's not horrible, but it's bad enough to affect the readability of the program.

Here is an improved version of the program:

;SetPoint (Xpos%, Ypos%, ColorNum%)
; Plots a single Pixel on the active display page
; ENTRY: Xpos     = X position to plot pixel at
;        Ypos     = Y position to plot pixel at
;        ColorNum = Color to plot pixel with
;        ES:DI    = Screen base address (??? I added this without really
;                                        knowing what is going on here
; EXIT:  No meaningful values returned
dp              textequ <dword ptr>

Color           textequ <[bp+6]>
YPos            textequ <[bp+8]>
XPos            textequ <[bp+10]>

                public  SetPoint
SetPoint        proc    far
                push    bp
                mov     bp, sp
                push    di
                les     di, dp CurrentPage      ;Point at active VGA Page

                mov     ax, YPos                ;Get line # of Pixel
                mul     ScreenWidth             ;Get offset to start of

                mov     bx, XPos                ;Get offset into line
                mov     cx, bx                  ;Save for plane
                shr     bx, 2                   ;X offset (bytes)= XPos/4
                add     bx, ax                  ;Offset=Width*YPos + XPos/4

                mov     ax, MapMaskPlane1       ;Map mask & plane
select reg
                and     cl, PlaneBits           ;Get plane bits
                shl     ah, cl                  ;Get plane select value
                out_16  SCIndex, ax             ;Select plane

                mov     al, Color               ;Get pixel color
                mov     es:[di+bx], al          ;Draw pixel

                pop     di
                pop     bp
                ret     6
SetPoint        endp

Most of the changes here were purely mechanical: reducing the number of upper case letters in the program, spacing the program out better, adjusting some comments, etc. Nevertheless, these small, subtle, changes have a big impact on how easy the code is to read (at least, to an experienced assembly langage programmer).

1.3 S.COM Example

The following code sequence came from a program labelled "S.COM" that was also found in an archive on the internet.

;Get all file names matching filespec and set up tables
    mov dx, OFFSET DTA          ;Set up DTA
    mov ah, 1Ah
    int 21h
    mov dx, FILESPEC            ;Get first file name
    mov cl, 37h
    mov ah, 4Eh
    int 21h
    jnc FileFound               ;No files.  Try a different filespec.
    mov si, OFFSET NoFilesMsg
    call Error
    jmp NewFilespec
    mov di, OFFSET fileRecords  ;DI -> storage for file names
    mov bx, OFFSET files        ;BX -> array of files
    sub bx, 2
    add bx, 2                   ;For all files that will fit,
    cmp bx, (OFFSET files) + NFILES*2
    jb @@L1
    sub bx, 2
    mov [last], bx
    mov si, OFFSET tooManyMsg
    jmp DoError
    mov [bx], di                ;Store pointer to status/filename in
    mov al, [DTA_ATTRIB]        ;Store status byte
    and al, 3Fh                 ;Top bit is used to indicate file is marked
    mov si, OFFSET DTA_NAME     ;Copy file name from DTA to filename
    call CopyString
    inc di
    mov si, OFFSET DTA_TIME     ;Copy time, date and size
    mov cx, 4
    rep movsw
    mov ah, 4Fh                 ;Next filename
    int 21h
    jnc StoreFileName
    mov [last], bx              ;Save pointer to last file entry
    mov al, [keepSorted]        ;If returning from EXEC, need to resort
    or al, al
    jz DisplayFiles
    jmp Sort0

The primary problem with this program is the formatting. The label fields overlap the mnemonic fields (in almost every instance), the operand fields of the various instructions are not aligned, there are very few blank lines to organize the code, the programmer makes excessive use of "local" label names, and, although not prevalent, there are a few items that are all uppercase (remember, upper case characters are harder to read). This program also makes considerable use of "magic numbers," especially with respect to opcodes passed on to DOS.

Another subtle problem with this program is the way it organizes control flow. At a couple of points in the code it checks to see if an error condition exists (file not found and too many files processed). If an error exists, the code above branches around some error handling code that the author places in the middle of the routine. Unfortunately, this interrupts the flow of the program. Most readers will want to see a straight-line version of the program's typical operation without having to worry about details concerning error conditions. Unfortunately, the organization of this code is such that the user must skip over seldomly-executed code in order to follow what is happening with the common case[4].

Here is a slightly improved version of the above program:

;Get all file names matching filespec and set up tables

GetFileRecords  mov     dx, offset DTA          ;Set up DTA
                DOS     SetDTA

; Get the first file that matches the specified filename (that may
; contain wildcard characters).  If no such file exists, then
; we've got an error.

                mov     dx, FileSpec
                mov     cl, 37h
                DOS     FindFirstFile
                jc      FileNotFound

; As long as there are no more files matching our file spec (that contains
; wildcard characters), get the file information and place it in the
; "files" array.  Each time through the
"StoreFileName" loop we've got
; a new file name via a call to DOS' FindNextFile function (FindFirstFile
; for the first iteration).  Store the info concerning the file away and
; move on to the next file.

                mov     di, offset fileRecords  ;DI -> storage for file
                mov     bx, offset files        ;BX -> array of
                sub     bx, 2                   ;Special case for 1st
StoreFileName:  add     bx, 2
                cmp     bx, (offset files) + NFILES*2
                jae     TooManyFiles

; Store away the pointer to the status/filename in files[] array.
; Note that the H.O. bit of the status byte indicates that the file is
; is marked.

                mov     [bx], di                ;Store pointer in files[]
                mov     al, [DTAattrib]         ;Store status byte
                and     al, 3Fh                 ;Clear file is marked bit

; Copy the filename from the DTA storage area to the space we've set aside.

                mov     si, offset DTAname
                call    CopyString
                inc     di                      ;Skip zero byte (???).

                mov     si, offset DTAtime      ;Copy time, date and size
                mov     cx, 4
        rep     movsw

; Move on to the next file and try again.

                DOS     FindNextFile
                jnc     StoreFileName

; After processing the last file entry, do some clean up.
; (1) Save pointer to last file entry.
; (2) If returning from EXEC, we may need to resort and display the files.

                mov     [last], bx
                mov     al, [keepSorted]
                or      al, al
                jz      DisplayFiles
                jmp     Sort0

; Jump down here if there were no files to process.

FileNotFound:   mov     si, offset NoFilesMsg
                call    Error
                jmp     NewFilespec

; Jump down here if there were too many files to process.

TooManyFiles:   sub     bx, 2
                mov     [last], bx
                mov     si, offset tooManyMsg
                jmp     DoError

This improved version dispenses with the local labels, formats the code better by aligning all the statement fields and inserting blank lines into the code. It also eliminates much of the uppercase characters appearing in the previous version. Another improvment is that this code moves the error handling code out of the main stream of this code segment, allowing the reader to follow the typical execution in a more linear fashion.

1.4 Intended Audience

Of course, an assembly language program is going to be nearly unreadable to someone who doesn't know assembly language. This is true for almost any programming language. In the examples above, it's doubtful that the "improved" versions are really any more readable than the original version if you don't know 80x86 assembly language. Perhaps the improved versions are more aesthetic in a generic sense, but if you don't know 80x86 assembly language it's doubtful you'd make any more sense of the second version than the first. Other than burying a tutorial on 80x86 assembly language in a program's comments, there is no way to address this problem[5].

In view of the above, it makes sense to define an "intended audience" that we intend to have read our assembly language programs. Such a person should:

  • Be a reasonably competent 80x86 assembly language programmer.
  • Be reasonably familiar with the problem the assembly language program is attempting to solve.
  • Fluently read English[6].
  • Have a good grasp of high level language concepts.
  • Possess appropriate knowledge for someone working in the field of Computer Science (e.g., understands standard algorithms and data structures, understands basic machine architecture, and understands basic discrete mathmatics).

1.5 Readability Metrics

One has to ask "What is it that makes one program more readable than another?" In other words, how do we measure the "readability" of a program? The usual metric, "I know a well-written program when I see one" is inappropriate; for most people, this translates to "If your programs look like my better programs then they are readable, otherwise they are not." Obviously, such a metric is of little value since it changes with every person.

To develop a metric for measuring the readability of an assembly language program, the first thing we must ask is "Why is readability important?" This question has a simple (though somewhat flippant) answer: Readability is important because programs are read (furthermore, a line of code is typically read ten times more often than it is written). To expand on this, consider the fact that most programs are read and maintained by other programmers (Steve McConnell claims that up to ten generations of maintenance programmers work on a typically real world program before it is rewritten; furthermore, they spend up to 60% of their effort on that code simply figuring out how it works). The more readable your programs are, the less time these other people will have to spend figuring out what your program does. Instead, they can concentrate on adding features or correcting defects in the code.

For the purposes of this document, we will define a "readable" program as one that has the following trait:

  • A "readable" program is one that a competent programmer (one who is familiar with the problem the program is attempting to solve) can pick up, without ever having seen the program before, and fully comprehend the entire program in a minimal amount of time.

That's a tall order! This definition doesn't sound very difficult to achieve, but few non-trivial programs ever really achieve this status. This definition suggests that an appropriate programmer (i.e., one who is familiar with the problem the program is trying to solve) can pick up a program, read it at their normal reading pace (just once), and fully comprehend the program. Anything less is not a "readable" program.

Of course, in practice, this definition is unusable since very few programs reach this goal. Part of the problem is that programs tend to be quite long and few human beings are capable of managing a large number of details in their head at one time. Furthermore, no matter how well-written a program may be, "a competent programmer" does not suggest that the programmer's IQ is so high they can read a statement a fully comprehend its meaning without expending much thought. Therefore, we must define readabilty, not as a boolean entity, but as a scale. Although truly unreadable programs exist, there are many "readable" programs that are less readable than other programs. Therefore, perhaps the following definition is more realistic:

  • A readable program is one that consists of one or more modules. A competent program should be able to pick a given module in that program and achieve an 80% comprehension level by expending no more than an average of one minute for each statement in the program.

An 80% comprehension level means that the programmer can correct bugs in the program and add new features to the program without making mistakes due to a misunderstanding of the code at hand.

1.6 How to Achieve Readability

The "I'll know one when I see one" metric for readable programs provides a big hint concerning how one should write programs that are readable. As pointed out early, the "I'll know it when I see it" metric suggests that an individual will consider a program to be readable if it is very similar to (good) programs that this particular person has written. This suggests an important trait that readable programs must possess: consistency. If all programmers were to write programs using a consistent style, they'd find programs written by others to be similar to their own, and, therefore, easier to read. This single goal is the primary purpose of this paper - to suggest a consistent standard that everyone will follow.

Of course, consistency by itself is not good enough. Consistently bad programs are not particularly easy to read. Therefore, one must carefully consider the guidelines to use when defining an all-encompassing standard. The purpose of this paper is to create such a standard. However, don't get the impression that the material appearing in this document appears simply because it sounded good at the time or because of some personal preferences. The material in this paper comes from several software engineering texts on the subject (including Elements of Programming Style, Code Complete, and Writing Solid Code), nearly 20 years of personal assembly language programming experience, and a set of generic programming guidelines developed for Information Management Associates, Inc.

This document assumes consistent usage by its readers. Therefore, it concentrates on a lot of mechanical and psychological issues that affect the readability of a program. For example, uppercase letters are harder to read than lower case letters (this is a well-known result from psychology research). It takes longer for a human being to recognize uppercase characters, therefore, an average human being will take more time to read text written all in upper case. Hence, this document suggests that one should avoid the use of uppercase sequences in a program. Many of the other issues appearing in this document are in a similar vein; they suggest minor changes to the way you might write your programs that make it easier for someone to recognize some pattern in your code, thus aiding in comprehension.

1.7 How This Document is Organized

This document follows a top-down discussion of readability. It starts with the concept of a program. Then it discusses modules. From there it works its way down to procedures. Then it talks about individual statements. Beyond that, it talks about components that make up statements (e.g., instructions, names, and operators). Finally, this paper concludes by discussing some orthogonal issues.

Section Two discusses programs in general. It primarily discusses documentation that must accompany a program and the organization of source files. It also discusses, briefly, configuration management and source code control issues. Keep in mind that figuring out how to build a program (make, assemble, link, test, debug, etc.) is important. If your reader fully understands the "heapsort" algorithm you are using, but cannot build an executable module to run, they still do not fully understand your program.

Section Three discusses how to organize modules in your program in a logical fashion. This makes it easier for others to locate sections of code and organizes related sections of code together so someone can easily find important code and ignore unimportant or unrelated code while attempting to understand what your program does.

Section Four discusses the use of procedures within a program. This is a continuation of the theme in Section Three, although at a lower, more detailed, level.

Section Five discusses the program at the level of the statement. This (large) section provides the meat of this proposal. Most of the rules this paper presents appear in this section.

Section Six discusses those items that make up a statement (labels, names, instructions, operands, operators, etc.) This is another large section that presents a large number of rules one should follow when writing readable programs. This section discusses naming conventions, appropriateness of operators, and so on.

Section Seven discusses data types and other related topics.

Section Eight covers miscellaneous topics that the previous sections did not cover.

1.8 Guidelines, Rules, Enforced Rules, and Exceptions

Not all rules are equally important. For example, a rule that you check the spelling of all the words in your comments is probably less important than suggesting that the comments all be in English[7]. Therefore, this paper uses three designations to keep things straight: Guidelines, Rules, and Enforced Rules.

A Guideline is a suggestion. It is a rule you should follow unless you can verbally defend why you should break the rule. As long as there is a good, defensible, reason, you should feel no apprehension violated a guideline. Guidelines exist in order to encourage consistency in areas where there are no good reasons for choosing one methodology over another. You shouldn't violate a Guideline just because you don't like it -- doing so will make your programs inconsistent with respect to other programs that do follow the Guidline (and, therefore, harder to read -- however, you shouldn't lose any sleep because you violated a Guideline.

Rules are much stronger than Guidelines. You should never break a rule unless there is some external reason for doing so (e.g., making a call to a library routine forces you to use a bad naming convention). Whenever you feel you must violate a rule, you should verify that it is reasonable to do so in a peer review with at least two peers. Furthermore, you should explain in the program's comments why it was necessary to violate the rule. Rules are just that -- rules to be followed. However, there are certain situations where it may be necessary to violate the rule in order to satisfy external requirements or even make the program more readable.

Enforced Rules are the toughest of the lot. You should never violate an enforced rule. If there is ever a true need to do this, then you should consider demoting the Enforced Rule to a simple Rule rather than treating the violation as a reasonable alternative.

An Exception is exactly that, a known example where one would commonly violate a Guideline, Rule, or (very rarely) Enforced Rule. Although exceptions are rare, the old adage "Every rule has its exceptions..." certainly applies to this document. The Exceptions point out some of the common violations one might expect.

Of course, the categorization of Guidelines, Rules, Enforced Rules, and Exceptions herein is one man's opinion. At some organizations, this categorization may require reworking depending on the needs of that organization.

1.9 Source Language Concerns

This document will assume that the entire program is written in 80x86 assembly language. Although this organization is rare in commercial applications, this assumption will, in no way, invalidate these guidelines. Other guidelines exist for various high level languages (including a set written by this paper's author). You should adopt a reasonable set of guidelines for the other languages you use and apply these guidelines to the 80x86 assembly language modules in the program.

  Don't waste time on formatting assembly code by hand any more!  Try SourceFormatX 80x86 Asm Code Formatter today!