Tumblr Syntax Highlighting

The code examples in my previous post look horrible, even allowing for the fact it’s assembly language. I’ll investigate getting better layout and syntax highlighting on Tumblr for next time.

Arduino and AVR Assembly Language

Frustrated by vague timing problems using the standard libraries, I started playing around with AVR assembly language on the Nanode today. Because the Arduino toolchain is gcc-based, inline assembly turns out to be quite easy to use:

void unio_set_scio(bool value) {
  // 8 cycles for low or high including register load
  // 4 cycles before bit change, 2 cycles change, 2 cycles after
  __asm__ __volatile__ (
    "sbrs %[value],0"            "\n\t"
    "rjmp low_%="                "\n\t"
    "nop"                        "\n\t"
    "sbi %[port],7"              "\n\t"
    "rjmp done_%="               "\n\t"
    "low_%=: cbi %[port],7"      "\n\t"
    "cbi %[port], 7"             "\n\t" // second time because it's smaller than 2 nops
    "done_%=:"                   "\n\t" : :
    [value] "r" (value),
    [port] "I" (_SFR_IO_ADDR(PORTD))
  );
}

In search of symmetry

What I was after was a function with symmetrical timing: that is, it takes exactly the same time whether it’s setting an I/O line high or low. It always takes exactly 8 CPU cycles. The I/O line should change after 6 cycles, and two more cycles pass before it continues the code following this.

If you use the typical code:

if (value) {
  digitalWrite(7, HIGH);
} else {
  digitalWrite(7, LOW);
}

you get lots of wasted time while it looks up the I/O address for the port, etc. If you eliminate that by accessing the port directly:

if (value) {
  PORTD |= (1<<7);
} else {
  PORTD &= ~(1<<7);
}

you get much smaller code, but the timing is iffy. The compiler produces something like this (variable setup and comparison omitted here as it’s the same for both paths):

breq .+4    ; [2/1] if zero, jump to clear bit
sbi 0x0b, 7 ; [2]   set DIG7 high
rjmp .+2    ; [2]   jump to continue code
cbi 0x0b, 7 ; [2]   set DIG7 low
...(code continues)

When the line is to be set low, it takes 2+2 cycles (branch true, cbi). When the line is to be set high, it takes 1+2+2 cycles (branch false, sbi, jump). This imbalance is annoying when you’re trying to time signals for bit-banging as the error becomes cumulative and depends on the data. The duration of each path in the code above could be made to match by inserting one extra cycle with a {{nop}}:

breq .+4
sbi 0x0b, 7
rjmp .+4
nop
cbi 0x0b, 7
...(code continues)

Now, when the line is to be set high: no branch[1]+sbi[2]+jump[2] = 5 cycles. When it is to be set low, branch[2]+nop[1]+cbi[2] = 5 cycles. However, there’s still a problem! While the code now takes the same time for both high and low, the signal timing is still flawed. The I/O line changes on completion of the cbi or sbi instruction. When setting high, this is after 3 cycles (no branch[1]+sbi[2]), but when setting low it’s after 5 (branch[2]+nop[1]+cbi[2]). We could swap the nop and cbi instructions to move it one cycle closer but it’s still off by a cycle. Because we need a minimum of 2 cycles for the longest path before the I/O line is set, we need to do the same for the shortest path so we add another nop before the I/O line is set. We must also add one to the other path to keep them equal:

breq .+6
nop
sbi 0x0b,7
rjmp .+6
cbi 0x0b, 7
nop
nop

Now, the high path is: no branch[1]+nop[1]+sbi[2]+jump[2]=6 cycles, with the I/O line changing after 4. The low path is: branch[2]+cbi[2]+nop[1]+nop[1]=6 cycles, with the I/O line changing after 4. The pair of nop instructions in the low path could be replaced with another cbi instruction - the same cbi twice is of course a nop but it saves two bytes of precious program memory! :)

If you’re interested in this, the Inline Assembler Cookbook and the AVR Instruction Set Manual are both very useful.

Nanode MAC addresses

I spent a while today playing with code for my new Nanode to read the MAC address from the tiny EEPROM on the underside of the board.

The EEPROM is a Microchip 11AA02E48 and fits 256 bytes into a SOT23 package. That’s tiny, as you can see below.

The EEPROM uses the UNI/O bus which I hadn’t used before. I couldn’t find any sample code for the Arduino so I wrote something simple myself. This code basically works but I’m not happy with the timing. Next on the list: delve into AVR assembly language. :)

Tumblr.

I’ve not been posting anything anywhere lately. I intend to change that… :)