The commonly published approach to using the bit banding feature of the Cortex Mx family of processors is to use macros – see Bit Banding in the STM32. This post describes an alternate implementation that uses a dedicated RAM section for bit banding.
typedef unsigned int u32; typedef volatile unsigned int vu32;
When working on the diagonal solver, it became apparent that to make the solver run quickly, and to save RAM, on the Cortex M3 I needed to use bit banding with boolean variables.
The way bit banding works is that there are two address ranges to access the same memory location. In one address range, you access it “32 bits at a time”. And through the second address range, the bit banding “alias”, you access one the same memory location but one bit at a time and with a 32 bit stride. When I say 32 bit stride I mean that the address for sequential bits are separated by 32 bits.
This picture from the Cortex Mx manual graphically shows what is going on.
Typical C compilers and linkers only know how to access memory from the “32 bits at a time” address range. So, what we have to do is tell the C compiler about the second way to access the memory i.e. through the bit band alias.
A C compiler calls various portions of a program – code, uninitialized data, initialized data – sections. The linker is invoked with a specification file which it uses to place these sections at the specified locations. To get more details on this, do a search on GCC, linker and section.
I use “CrossWorks for ARM” and they do the section work using two files. The first file is processor specific and called “
So, to create the bit band alias, in the MemoryMap file, we reduce the RAM 32 bit access range by the amount we are going to use for the bit band alias.
The original RAM statement was:
<memorysegment size="0x10000" name="RAM" start="0x20000000"> </memorysegment>
After adding the bit band alias, it looks like this:
<memorysegment size="0x00100" name="BBRAM" start="0x20000000" access="Read/Write"> </memorysegment> <memorysegment size="0x02000" name="BBALIAS" start="0x22000000" access="Read/Write"> </memorysegment> <memorysegment size="0x0ff00" name="RAM" start="0x20000100"> </memorysegment>
This creates a 256 byte segment at the start of physical RAM for the bit band alias. The 256 byte segment results in 256 * 32 = 8,192 size segment in the BBALIAS segment because each bit in the BBRAM segment maps to 32 bits in the BBALIAS segment.
NOTE: One thing to keep in mind with putting the bit band alias at the start of RAM is that if you relocate the vector table to RAM, the vector table address has very specific alignment requirements that must be met – don’t ask how I know.
Next, we add the following statements to the placement file:
<memorysegment name="BBRAM"> <programsection name="bbram"></programsection> </memorysegment> <memorysegment name="BBALIAS"> <programsection name="bbalias"></programsection> </memorysegment>
To place a variable in the bit band alias section, declare it as follows:
volatile bool NewBBFlagBit1 __attribute__ ((section ("bbalias")));
By using the attribute command, we tell the C compiler to place the variable NewBBFlagBit1 in the section called bbalias. The linker then resolves the bbalias to the BBALIAS address range.
NOTE: This scheme does have one limitation. The GCC compiler system requires that any variable that is placed in a section using the attribute schema be a global variable. If you need to use a variable within a routine, you can make it a static variable. The example below shows how to do this.
NOTE: Since this is a bit band variable, you can access it either through the BBALIAS address range or through the BBRAM address range.
Now, let’s look at a simple example where we have a global bit flag, a local bit flag and look at the resultant code.
Here is the macro approach sample code:
typedef unsigned int u32; typedef volatile unsigned int vu32; #define RAM_BASE 0x20000000 #define RAM_BB_BASE 0x22000000 #define Var_ResetBit_BB(VarAddr, BitNumber) (*(vu32 *) (RAM_BB_BASE | ((VarAddr - RAM_BASE) < < 5) | ((BitNumber) << 2)) = 0) #define Var_SetBit_BB(VarAddr, BitNumber) (*(vu32 *) (RAM_BB_BASE | ((VarAddr - RAM_BASE) << 5) | ((BitNumber) << 2)) = 1) #define Var_GetBit_BB(VarAddr, BitNumber) (*(vu32 *) (RAM_BB_BASE | ((VarAddr - RAM_BASE) << 5) | ((BitNumber) << 2))) #define varSetBit(var,bit) (Var_SetBit_BB((u32)&var,bit)) #define varGetBit(var,bit) (Var_GetBit_BB((u32)&var,bit)) #define varResetBit(var,bit) (Var_ResetBit_BB((u32)&var,bit)) vu32 OldBBFlags; void vOldBitBand(void) { static vu32 OldBBFlagsLocal; varResetBit(OldBBFlags,0); OldBBFlags = 0x01; if (varGetBit(OldBBFlags,0)) varSetBit(OldBBFlagsLocal,0); else varResetBit(OldBBFlagsLocal,0); OldBBFlags = 0x02; return; }
Here is the section approach sample code:
typedef enum {FALSE = 0, TRUE = !FALSE} bool; typedef volatile unsigned int vu32; volatile bool NewBBFlagBit1 __attribute__ ((section ("bbalias"))); vu32 NewBBFlags __attribute__ ((section ("bbram"))); void vNewBitBand(void) { static volatile bool NewBBFlagBit2 __attribute__ ((section ("bbalias"))); NewBBFlagBit1 = 0; NewBBFlags = 0x01; NewBBFlagBit2 = NewBBFlagBit1; NewBBFlags = 0x02; return; }
When the macro code is compiled using GCC with all optimization level 3, we get:
void vOldBitBand(void) { static vu32 OldBBFlagsLocal; 4B0D ldr r3, 0x080003302000 movs r0, #0 0159 lsls r1, r3, #5 F0415208 orr r2, r1, #0x22000000 2101 movs r1, #1 varResetBit(OldBBFlags,0); 6010 str r0, [r2] OldBBFlags = 0x01; 6019 str r1, [r3] if (varGetBit(OldBBFlags,0)) 6812 ldr r2, [r2] B93A cbnz r2, 0x0800031C varSetBit(OldBBFlagsLocal,0); 4909 ldr r1, 0x08000334 0148 lsls r0, r1, #5 F0405108 orr r1, r0, #0x22000000 600A str r2, [r1] OldBBFlags = 0x02; 2202 movs r2, #2 601A str r2, [r3] return; } 4770 bx lr 0x0800031C: else varResetBit(OldBBFlagsLocal,0); 4805 ldr r0, 0x08000334 0142 lsls r2, r0, #5 F0425C08 orr r12, r2, #0x22000000 2202 movs r2, #2 F8CC1000 str.w r1, [r12, #0] OldBBFlags = 0x02; 601A str r2, [r3] return; } 4770 bx lr BF00 nop 0x08000330: 0104 lsls r4, r0, #4 2000 movs r0, #0 0x08000334: 0100 lsls r0, r0, #4 0000 movs r0, r0
When the section approach code is compiled using the same settings as the macro code above, we get:
--- BitBandNew.c -- 9 -------------------------------------- void vNewBitBand(void) { static volatile bool NewBBFlagBit2 __attribute__ ((section ("bbalias"))); F2400000 movw r0, #0 F2400200 movw r2, #0 NewBBFlagBit1 = 0; F2C22000 movt r0, #0x2200 NewBBFlags = 0x01; F2C20200 movt r2, #0x2000 2100 movs r1, #0 2301 movs r3, #1 NewBBFlagBit1 = 0; 6001 str r1, [r0] NewBBFlags = 0x01; 6013 str r3, [r2] NewBBFlagBit2 = NewBBFlagBit1; 6801 ldr r1, [r0] 2302 movs r3, #2 NewBBFlagBit2 = NewBBFlagBit1; 6041 str r1, [r0, #4] NewBBFlags = 0x02; 6013 str r3, [r2] return; } 4770 bx lr BF00 nop
The section approach code is 34 bytes VS 64 bytes for the macro based code. It also should run faster…