FAQ-0645 How do I embed ARM assembler in a THUMB executable?


FAQ-0645 How do I embed ARM assembler in a THUMB executable?

» Symbian OS » Developer Knowledgebase » FAQ-0645

Classification:	C++	Category:	Development
Created:	10/11/99	Modified:	09/19/2001
Number:	FAQ-0645
Platform:	Symbian OS v6.0

Question:
I have some code which contains ARM assembler and I need it to compile in the THUMB build. I don't have the time or
expertise to convert the algorithm to use THUMB assembler, so is there some way to switch into ARM mode for the
duration of the function?

Answer:
The GCC assembler can operate in both thumb mode and arm mode, so the main trick is to accomplish the mode change using the BX instruction. BEWARE: the ARM documentation says that you can't use "BX PC" to achieve this!

Consider this example function:

__declspec(naked) int sysThreadCheckStack(long /* red_zone*/)
{
__asm("and r0,sp,#0xff000"); // extract bits 12-19 (0x00ff000)
#ifdef __MARM_ARMI__
__asm("bx lr"); // and return, allowing for interworking
#else
__asm("mov pc,lr"); // and return
#endif
}
This particular example is so simple it would be better to convert it to THUMB directly, but ignoring that, the changes needed are:

__declspec(naked) int sysThreadCheckStack(long /* red_zone*/)
{
#ifdef __MARM_THUMB__
__asm(" push {r0}"); // 1
__asm(" add r0,pc,#4"); // 2
__asm(" bx r0"); // 3
__asm(" nop"); // 4
__asm(" .align 2"); // 5
__asm(" .code 32"); // 6
__asm(" ldr r0,[sp],#4); // 7
#endif

__asm("and r0,sp,#0xff000"); // extract bits 12-19 (0x00ff000)
#if defined(__MARM_ARMI__)||defined(__MARM_THUMB__)
__asm("bx lr"); // and return, allowing for interworking
#else
__asm("mov pc, lr");
#endif

#ifdef __MARM_THUMB__
__asm(" .code 16"); // 8
#endif
}
The __MARM_THUMB__ define is used by MAKMAKE to indicate a THUMB build, so the new code is protected by a suitable #ifdef section, and the conditional use of BX for the return is extended to include THUMB as well as ARMI. The additional lines do the following things, not all of which are particularly obvious:

1. Save r0 on the stack, so that we can use it as a scratch register for the BX
2. Compute the address of the first ARM instruction: the pc is currently pointing at the following "nop" instruction.
3. Transfer to the ARM code, switching into ARM mode as we go. The low 2 bits of R0 will be ignored.
4. Padding so that the rounding of R0 has the right effect
5. Directive to align the following instruction to a multiple of 4 bytes: this either does nothing or puts out 2 zero bytes
6. Directive to the assembler that it should now process ARM instructions rather than THUMB instructions
7. Pop R0 from the stack - this is the first instruction executed in ARM mode.
8. Directive to the assembler that it should process THUMB instructions, so that subsequent functions are compiled correctly.
The really tricky bit is lines 3-5. There are two cases to consider, distinguished by whether or not line 1 falls on a multiple of 4 bytes:

Case 1: instruction 1 is at address 4N+0

4N+0 push {r0}
4N+2 add r0,pc,#4 // pc=4N+6, so r0 = 4N+10
4N+4 bx r0 // jumps to ARM mode at 4N+8 because the low 2 bits are ignored
4N+6 nop
4N+8 .align 2 // already aligned, so does nothing
4N+8 ldr r0,[sp],#4
4N+C
Case 2: instruction 1 is at address 4N+2

4N+2 push {r0}
4N+4 add r0,pc,#4 // pc=4N+8, so r0 = 4N+10
4N+6 bx r0 // jumps to ARM mode at 4N+C
4N+8 nop
4N+A .align 2 // not aligned, so 2 bytes of zeros
4N+C ldr r0,[sp],#4
Thus it works in both cases., and would even be suitable for use in mixed C++ and assembler functions.

Note that the THUMB instruction "add rN, pc, #immediate" also masks part of the PC, but this effect isn't significant here because the masking implied in the BX instruction does the whole job anyway.

In circumstances where it's appropriate to include \epoc32\include\kernel\u32std.h, there are macros __SWITCH_TO_ARM and __END_ARM which provide the above code, including the appropriate #ifdef protection.