Zilog 80 assembly code optimization In general, programs can be made to run substantially faster only by first determining where they spend their time. This requieres determining which loops (other than delay routines) the processor is executing most often. Reducing the execution time of a frequently executed loop will have a major effect because of the multiplying factor. It is thus critical to determine how often instructions are being executed and to work on loops in the order of their frequency of execution. Once it is determined which loops the processor executes most frequently, reduce their execution time with the following techniques: Eliminate redundant operations. These may include a constant that is being added during each iteration or a special case that is being tested repatedly. Another example is a constant value or a memory address that is being fetched from memory each time rather than being stored in a register or register pair. Reorganize the loop to reduce the number of jump instructions. You can often eliminate branches by changing the initial conditions, inverting the order of operations, or combining operations. In particular, you may find it helpful to initialize everything one step back, thus making the first iteration the same as all the others. Inverting the order of operations can be helpful if numerical comparisons are involved, since the equality case may not have to be handled sparately. Reorganization may also combine condition checking inside the loop with the overall loop control. If you call a function only once, use inline code rather than a function. This will save a CALL and a RET. Also make very small function rather macros than normal functions. Try to take maximum advantage of specialized instructions as LD HL, (ADDR); LD (ADDR), HL; EX DE,HL; EX HL,(SP); DJNZ; and the block move/compare instructions by organizing the registers in the right way. Thus it is preferable to always use B or BC for a counter, HL for an indirect address, and DE for another indirect address if needed. Use the block move, block compare, and block I/O instructions to handle blocks of data. These instructions can replace an entire program sequence, since they combine counting and updating of pointers with the actual data manipulation or transfer operations and updating of pointers with the actual data manipulation or transfer operations. Note, in particular, that the block move and block I/O instructions transfer data to or form memory without using the accumzlator. Use the 16-bit instructions whenever possible to manipulate 16-bit data. These instructions are ADC, ADD, DEC, EX INC, LD, POP, PUSH, and SBC. Use instructions that operate directly on data in user registers or in memory to avoid having to save and restore the accumulator,HL, or an index register. These instructions inculde DEC, EX, INC, LD, POP, PUSH, and the bit manipulation and shift instructions. Minimize the use of the index registers, since they always require extra execution time and memory. The index registers are generally used only as backups to HL and in handlingdata structurs that involve many fixed offsets. Minimize the use of special Z80 instructions that require a 2-byte operation code. These alway reequire extra execution time and memory. Examples are BIT, RES, SET, SLA, SRA, and SRL, as well as some load instructions such as LD DE, (ADDR),LD(ADDR), BC and LD SP,(ADDR). Take advantage of specialized short instructions such as the accumulator shifts(RLA, RLCA, RRA, and RRCA) and DJNZ. Use absolute jumps(JP) rather than relative jumps(JR). The absolute jumps take less time if a branch actually occurs. Organize sequences of conditional jumps to minimize average execution time. Branches that are often taken should come before ones that are seldom, taken for example, checking for a result being negative (true 50% of the time if the value is random) before checking for it to be zero(true less than1% of the time if the value is random). Test for conditions under which a sequence has no effect and branch around it if the conditions hold. This will be profitable if the sequence is long, and it frequently does not change the result. A typical example is the propagation of carries through higher or bytes. If a carry seldom occurs, it will be faster on the average to test for it rather than simply propagate a0. A general way to reduce execution time is to replace long sequences of instructions with tables. A single table lookup can perform the same operation as a sequence of instructions if there are no special exits or program logic involved. The cost is extra memory, but that may be justified if the memory is available. If enough memory is avaiable, a lookup table may be reasonable approach even if many of its entries are repetitive- that is, even if many inputs procude the same output. In addition to its speed, table lookup is also general, easy to program, and easy to change. Now for the even more practical approach: The less bytes an instruction uses, the faster it generally executes. So always look for a better way to do things. Note however that this might go in hand with some disadvantages... Here are some examples: Instead of ... ... you write Disadvantages? ld a, 0 sub a or xor a flags are modified cp 0 and a or or a none cp 1 dec a A is modified cp 255 inc a A is modified srl a rrca not exactly the same effect ld hl, ... ld hl, ... Zeroflag not affected ld de, ... ld de, -... or a add hl, de sbc hl, de dec bc cpi increments HL ld a,b ret po or c ret z Try, if possible, to use the shadow registers in frequently used loops. You can reach this over the instructions EXX and EX AF',AF. Note however to do a DI to disable interrupts before the actual function and an EI afterwards. Avoid having interrupts switched off all the time. Pass arguments to function over registers, NOT by PUSHing/POPing or even variables in memory!! Keep often used variables, like position of the main character in registers (optimally in those registers you have to pass to the draw function as coordinates). Avoid excessive mode switching using the IY register. Each switching costs 4 bytes! Take advantage of ROM functions / built-in functions as often as possible. This won't only save you from coding them, they also save storage and are likely to be highly optimized. If possible, use self-modifying code (only possible when code is in RAM) for less code and faster execution. In certain circumstances (where you can disable all interrupts and don't have too make any calls or other use of the stack), an extremely quick way of retrieving data from memory is to set the stack pointer SP to the start of the memory and then POP bytes into a register pair. The POP takes 10 T-states but it gets 2 bytes at once, and it's twice as quick as a LD HL,(nnn) and works especially well in reading contiguous buffer data. It's 4 times faster than using LD HL,(...) and then you need to update the load address. (This of course also works with PUSHing to store data.) author: unknown TI-calculators coder