Slide 1

Don’t be scared of hardware only bugs

Everyone has horror stories
 full of embellished gory details.
They are like veterans talking about war wounds!
Lack of documentation about tools and techniques.
Further mystifies the black art.
There will be some difficult hardware only bugs.
But majority are quite easy to progress using systematic methods.

So

lets start with what you know already
The emulator and Metrowerks CodeWarrior

Using the emulator

What happens when a thread panics?
Breakpoint is hit. Causing the emulator to stop at the line that caused the failure.
A Source level call stack is shown.
objects and variables in all functions of the call stack can be examined
You are spoilt! Always try to reproduce problems on the emulator first - it is a good debugging environment.

What does a panic look like?

What does a panic look like?

Find the line of code which calls user::Panic

What does a panic look like?

Or an access violation
below - trying to call Cancel() on a NULL pointer

Tips

Make sure Just In Time debugging is enabled
Set the following registry value:

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\AeDebug] "UserDebuggerHotKey"=dword:00000000 "Debugger"="\"C:\\apps\\Metrowerks\\bin\\IDE.exe\" -p %ld -e %ld" "Auto"="0"
also ensure that the following macro is removed from \epoc32\data\epoc.ini:

JustInTime 0
Debug messages also appear in %Temp%epocwind.out

Tips

Enable Logging of System messages…
From the "Target Settings" panel, go to the "Debugger | Debugger Settings" options and tick the box labelled "Log System Messages“

What is a panic?

A panic is a Symbian term used to denote an unexpected exit of a thread
A thread is the unit of execution on Symbian OS
Processes must have at least one thread to begin executing code
A panic denotes a serious coding error.
Either by the caller of a function which has violated an API contract (e.g calling a function with invalid paramaters)
or that a object or memory structure has moved into a bad internal state causing an Invariant
Panics are helpful
 They aim to inform you about the exact nature of the problem during development

What does a panic look like?

TReal PercentageToDecimal(Tint aPercentage)
{
ASSERT__ALWAYS(aPercentage>=0 && aPercentage <=100), Panic( EInvalidInput)
TReal result = aPercentage/100;
ASSERT(result>=0.0 && result<=1.0);
return result;
}

Call Stack

What is a call stack?

Call Stack

The cascade of function calling functions which resulted in the panic.
Shows some history of the current operation.
This often gives a pretty good idea of the chain of events leading to a panic.
Essential for tracking down problems and knowing where to put breakpoints.
Also a good way of identifying duplicate defects.

Debugging Memory Leaks

Using Hook Logger

Provides logging for:
memory allocations
process and thread creation
leaves
more in the future?
main use is for most pin-pointing the source of leaked memory
To use this tool you need to:
Install it on your machine: download from Symbian DevNet
Attach the hooks to EUSER.DLL
Run HookLogger.EXE
Run the code to be hooked

1. Attach the Hooks

Run “HookEUser.cmd” from the “emulator” drive.
x:\> HookEUser WINSCW
‘x’ is the drive containing the epoc32 folder
Replaces EUSER with a hook "parasite" DLL
Undo by using the “-r” (remove) option

2. Start the UI

Run “HookLogger.exe”
Connection status shown in title bar
Set the options for monitoring heaps threads

3. Reproduce the leak

Start the emulator and reproduce the memory leak
the emulator will panic
Break into Codewarrior
Walk back up the stack to User::__DbgMarkEnd
take a note of leaked memory location (badCell) and thread id.

3. Reproduce the leak

4. Find the bad thread

Go to the Threads tab in the hook logger
find the thread that leaked memory

5. Show heap allocations

Right-click and select "Show allocations”
may take 10 to 20 seconds to respond

6. Find the bad allocation

Order list by “Ptr”
Find address indicated by “badCell” in part 3
Double click to get a nice callstack

Panics on Hardware

Hardware situation

What happens when a thread panics?
Either a panic dialog appears
or device reboots
No context is stored
Oh dear - No wonder it’s scary.
But you need to use tools to get the same information which emulator gives so easily.

Why are there two kinds?

Marking a thread or process as “system critical” means that it is an integral and essential part of the system
e.g. the file server
The thread or process is being declared necessary for correct functioning of the device
If a system critical thread exits or panics the device will reboot
This is why panics in some threads cause the device to reset

Here’s where it happens

\src\cedar\generic\base\e32\kernel\sthread.cpp
void DThread::Exit()
{
if (iExitType!=EExitKill && (iFlags & (KThreadFlagSystemPermanent|KThreadFlagSystemCritical)))
K::Fault(K::ESystemThreadPanic);
<snip>
}

Need some more information!

The most important information to get hold of
Which thread panicked/caused an access violation
What was the panic reason and number?
What was the callstack of the thread when it paniced?

Hardware Panics

There are two kinds
Application panic
Where a Panic dialog appears
not critical - device carries on working
System thread panic
critical - the device halts and resets.
Or possibly device may enter a special debug mode (called crash debugger or debug monitor)

Application panic

Dialog will tell you
Thread which panicked
Panic reason
What else do we need?
The call stack.
A tool called D_EXC can provide the call stack.

System panic

Must enable a tool called the debug monitor (or crash debugger) to get more info
Crash debugger tells you
which thread paniced
the category and number of the panic
where the stack for the paniced thread is located in memory
The crashdebugger can be coaxed to dump the callstack

Tackling a hardware panic

Use the OS Library to look up Panic codes

E.g if the dialog says “KERN-EXEC 3”.
Type in KERN-EXEC panic into the search
this will help you understand what to look for in code

Useful call stacks from Hardware

To get a useful call stack two things are always needed.
A hex dump of the memory used by the stack of the thread which paniced
A ROM symbol file for the software flashed onto the device.
With this information a Symbian perl script can decode a human readable call stack  ( similar to the call stack seen in the emulator).

How do I get a call stack?
Application panic

run d_exc tool on the device first
reproduce the panic.
d_exc dialog  pops up
telling you some information about the panic. Press OK to save the stack to disk.
d_exc will have dumped 2 files to disk
a binary .stk file containing the thread’s stack
a .txt file detailing the panic code and category
get those files onto a PC and have your symbol file at hand.

How do I get a call stack?
Application panic

Stack.txt

Open the output in notepad.
Do a find for “>>>>”.
This takes you to the top of the decoded stack!
 >>>> current stack pointer >>>>
r00=80007204 00000000 80000368 80000003
r04=00801bb0 00000001 00000000 00802bc4
r08=00000002 50340f15 00802bc4 00000000
r12=8041b36c 00801bb0 50160ff8 5000b34c
PC = 5000b34c L..P  __ArmVectorSwi(void) + 0x124
LR = 50160ff8 ...P  SvSendReceive(int, void *) + 0x1c
 >>>> current stack pointer >>>>

What next

Scroll down the text. Sometimes you may see this familiar finger print for a panic:
 >>>> current stack pointer >>>>
r00=80007204 00000000 80000368 80000003
r04=00801bb0 00000001 00000000 00802bc4
r08=00000002 50340f15 00802bc4 00000000
r12=8041b36c 00801bb0 50160ff8 5000b34c
PC = 5000b34c L..P  __ArmVectorSwi(void) + 0x124
LR = 50160ff8 ...P  SvSendReceive(int, void *) + 0x1c
 >>>> current stack pointer >>>>
1bb0  80000001 ....
1bb4  00000082 ....
1bb8  50161018 ...P  SvSendReceiveCheck(int, void *) + 0x8
1bbc  5016594c LY.P  RThread::Panic(TDesC16 const &, int) + 0x24
1bc0  ffff8001 ....
1bc4  00000082 ....
1bc8  00801bdc ....  Stack + 0x1bdc
1bcc  0000003c <...
1bd0  50162024 $ .P  User::Panic(TDesC16 const &, int) + 0x24
1bd4  ffff8001 ....
1bd8  5016ce20  ..P  Panic(TCdtPanic) + 0x24
1bdc  10000004 ....
1be0  50178000 ...P  TUnicode::CjkWidthFoldTable + 0x5408

And then?

Look at all the functions that follow
In my case, after cutting out the lines that looked garbled. I got:
1e18  50650031 1.eP  CBaLockChangeNotifier::DoRunL(void) + 0x5d
1e24  5064fdef ..dP  RBaBackupSession::GetBacukupOperationEvent(..
1e5c  5064ff65 e.dP  CBaLockChangeNotifier::RunL(void) + 0x19
That was enough to tell me to look at the code for DoRunL(), and to put some logging in there to see what is going on.
That’s the basics for d_exc

But what about system panics?

Same idea - we want to get panic reason and call stack:
Firstly get the base porting people to show you how to enable crash debugger build.
Reproduce the problem - If the device enters crash debugger - then you can get more information.
You use a terminal program on the pc to “talk” the the crashed device.

What do I ask it?

Same as always
Which thread caused a panic or access violation
What is the panic reason and number
What is the callstack of the thread

Connect the crash debugger

Launch terminal emulator (e.g. hyperterm) on your PC
Connect serial port to serial port which provides debug tracing
The terminal window should show a “password” prompt
Type in “replacement” and you have entered the debug monitor prompt
the kernel is frozen allowing you to interrogate it’s current state

Find the fault

Type ‘f’ into the crash debugger to get the Fault information
If the category is KERN 4 then you are in business.
KERN 4 simply says that a panic happened in a system thread
The actual panic, such as KERN-EXEC3, is hidden
Type ‘i’ into the crash debugger to get information about the real panic reason
Sometimes even this doesn’t work - a non-system critical thread which crashes can cause the process to exit if it is process critical e.g. the main thread
If another thread in the process is marked as system critical this will take down the platform.
A fool proof method of finding the real panic is to look at the output from the KPanic debug tracing

Find the fault

Type ‘r’ to get the values of all the registers
the ones to look at depends on the processor mode!
Type ‘c0’ to get the details of all the threads
‘C0’ will pause between each screen full

Some background - APCS

The ARM Procedure Calling Standard (APCS)
Imposes conventions on the use of registers
So we always know the important registers to look at

Finding the panicked thread

If you have KPanic debug tracing enabled use that to identify the panic
RLibrary::Load - aFileName: BMPANSRV.DLL, -aPath:  threadName: Wserv
RLibrary::Load - OK
RLibrary::Load ......1
RLibrary::Load ......2
RLibrary::Load ......3
RLibrary::Load ......4
RLibrary::Load ......5
RLibrary::Load Init() - OK
Exc 1 Cpsr=20000010 FAR=0060fa6d FSR=00000801
 R0=00614a40  R1=806a21d7  R2=006029c4  R3=006029c4
 R4=0060f448  R5=0060f4c8  R6=006126b8  R7=0060f484
 R8=00000012  R9=00000040 R10=c8087d78 R11=00000000
R12=8009fced R13=004060e0 R14=8108b0f8 R15=8144710c
R13Svc=c924c000 R14Svc=80020108 SpsrSvc=00000010
Thread 37, KernCSLocked=0
FAULT: KERN 00000004
Password: replacement

What do all those numbers mean?

The type of exception or why the processor was unhappy
Exc 1 Cpsr=20000010 FAR=0060fa6d FSR=00000801

What do all those numbers mean?

The processor mode or which registers are valid
Exc 1 Cpsr=20000010 FAR=0060fa6d FSR=00000801

What do all those numbers mean?

The Fault Address Register (FAR) indicates the dodgy address that was accessed
Exc 1 Cpsr=20000010 FAR=0060fa6d FSR=00000801
Least significant 4 bits of the Fault Status Register (FSR) indicates the MMU fault
Exc 1 Cpsr=20000010 FAR=0060fa6d FSR=00000801

Finding the panicked thread

The “i” command gives you a lot more information, but all you are interested in is finding a fingerprint.
<snip>
THREAD at c8084ef0 VPTR=00000000 AccessCount=6 Owner=c80848a8
Full name apprun.exe::Calcsoft
Thread MState READY
Default priority 16 WaitLink Priority 16
ExitInfo 2,3,KERN-EXEC
Flags 00000002, Handles c8084a70
Supervisor stack base c9208000 size 4000
User stack base 00402000 size 5000
Id=29, Alctr=00600000, Created alctr=00600000, Frame=00406e1c
<snip>
R13_USR 8005e414 R14_USR 000002a8 SPSR_SVC c8084ef0
 R4 c8085198  R5 00000000  R6 00000000  R7 00000001
 R8 00000000  R9 8005e834 R10 000002a8 R11 c8085198
 PC 8005e81c
TheCurrentProcess=c80848a8
PROCESS at c80848a8 VPTR=00000000 AccessCount=7 Owner=00000000
Full name apprun.exe
ExitInfo 3,0,
<snip>

What do all those numbers mean?

The Exit Type
ExitInfo 2,3,KERN-EXEC

What next? - The Program counter

From the same information as previous page
 look at R15 and copy that number.
 This is the PC - address of the last instruction to execute in the thread which panicked
Be careful to get the right version depending on the “mode” of the processor
Exc 1 Cpsr=48000030 FAR=01000003 FSR=00000001
 R0=00600080  R1=00405ee0  R2=08cc014c  R3=00000000
 R4=00619ae8  R5=00000000  R6=00ffffff  R7=ffffffff
 R8=00000012  R9=00000040 R10=c808a2e0 R11=00000000
R12=800c3a35 R13=00405ee0 R14=810b1b59 R15=80728562
R13Svc=c924c000 R14Svc=80020234 SpsrSvc=08000010
Thread 37, KernCSLocked=0
Thread eiksrvs.exe::!EikAppUiServer Die: 2,3,KERN-EXEC
Thread eiksrvs.exe::!EikAppUiServer SetDefaultPriority 16
Thread eiksrvs.exe::!EikAppUiServer SetRequiredPriority def 16 cleanup -1 nest 0
Thread eiksrvs.exe::!EikAppUiServer MState 2 SetActualPriority 16
Exec::ThreadId
Exec::SemaphoreWait
Thread eiksrvs.exe::!EikAppUiServer Panic KERN-EXEC 3
FAULT: KERN 00000004

Program counter

Lookup the PC to find the “top” of the callstack
Either look at the symbol file directly or use printsym to decode the address
You were probably half way through a function
so you may have to look for the closest match, e.g. where R15=80728562
80728498    0000    CAknViewAppUi::~CAknViewAppUi__sub_object()  avkon.in(.text)
80728584    0010    CAknViewAppUi::~CAknViewAppUi__deallocating()  avkon.in(.text)
R14 (the link register – lr) may also give you a clue
e.g. where R14=810b1b59
810b1b56    001c    CCoeEnv::CreateResourceReaderLC(TResourceReader&, int) const  CONE.in(.text)
810b1b72    0074    CCoeEnv::ReadResourceAsDes16(TDes16&, int) const  CONE.in(.text)

What next? - The call stack

From the same information as previous page
 look at R13 and copy that number.
 This is the address of the stack.
Exc 1 Cpsr=48000030 FAR=01000003 FSR=00000001
 R0=00600080  R1=00405ee0  R2=08cc014c  R3=00000000
 R4=00619ae8  R5=00000000  R6=00ffffff  R7=ffffffff
 R8=00000012  R9=00000040 R10=c808a2e0 R11=00000000
R12=800c3a35 R13=00405ee0 R14=810b1b59 R15=80728562
R13Svc=c924c000 R14Svc=80020234 SpsrSvc=08000010
Thread 37, KernCSLocked=0
Thread eiksrvs.exe::!EikAppUiServer Die: 2,3,KERN-EXEC
Thread eiksrvs.exe::!EikAppUiServer SetDefaultPriority 16
Thread eiksrvs.exe::!EikAppUiServer SetRequiredPriority def 16 cleanup -1 nest 0
Thread eiksrvs.exe::!EikAppUiServer MState 2 SetActualPriority 16
Exec::ThreadId
Exec::SemaphoreWait
Thread eiksrvs.exe::!EikAppUiServer Panic KERN-EXEC 3
FAULT: KERN 00000004

Yuk! Hex

All you need to do now is
Type command M. into the crash debugger with the address of the stack from R13
and take dump about 200 bytes of stack - that should be plenty.
You can dump the stacks of all threads by using the ‘S’ command
Command is…
m 00405ee0+200

More hex!

That will dump some HEX and text to your terminal:
00405ee0: 00 00 00 00 00 00 00 00 44 04 77 80 c6 56 00 10 ........D.w..V..
00405ef0: 4c 01 cc 08 e8 9a 61 00 40 04 77 80 50 2a 60 00 [email protected]*`.
00405f00: ff ff ff ff e3 78 72 80 00 00 60 00 18 00 00 00 .....xr...`.....

Decoding the data using printsym

Type the following into a windows command prompt :

Warning: stack overflow == KE3

Always be aware to check whether you’re suffering from a stack overflow
A stack overflow will cause unexplainable   KERN-EXEC 3 errors
If you can’t get hold of the stack (you see a line like that shown below), it may indicate a stack overflow
.m 00414ff8 00415fff
Exception: Type 1 Code 80073280 Data 00414ff8 Extra 00000007

Checking for stack overflow

You need the value of R13 and the thread id
Exc 1 Cpsr=68000030 FAR=00414fe8 FSR=00000807
 R0=00415190  R1=00415190  R2=800e19bc  R3=00000038
 R4=7fffffff  R5=00000000  R6=00415028  R7=00000100
 R8=00000000  R9=00000040 R10=c808a2e8 R11=00000000
R12=800c3699 R13=00414ff8 R14=800c8d43 R15=800c8a1c
R13Svc=c931c000 R14Svc=8002014c SpsrSvc=08000010
Thread 58, KernCSLocked=0
Look up the details of the thread from the output of the ‘i’ or ‘c0’ commands to get the stack base
THREAD at c809ae88 VPTR=00000000 AccessCount=3 Owner=c8089e30
Full name eiksrvs.exe::KeySoundServerThread
<snip>
Supervisor stack base c9318000 size 4000
User stack base 00415000 size 1000
Id=58, Alctr=00600000, Created alctr=00600000, Frame=00415bd4
<snip>

Checking for stack overflow

The stack has overflowed if R13 < stack base
Plus: the exception id will indicate a data abort
Exc 1 Cpsr=68000030 FAR=00414fe8 FSR=00000807
R12=800c3699 R13=00414ff8 R14=800c8d43 R15=800c8a1c
User stack base 00415000 size 1000

Decoding the stack dump output

The output in will be similar to the decoded d_exc stack except
The top of printout represents the function which called Panic() (with d_exc you have to find the top)
So your output may start with something like this:
00405f00: ff ff ff ff e3 78 72 80 00 00 60 00 18 00 00 00 .....xr...`.....
= ffffffff ....
= 807278e3 .xr.  CAknNoteAttributes::ConstructFromResourceL(TResourceReader&)  avkon.in(.text) + 0x1a1
= 00600000 ..`.
= 00000018 ....
00405f10: 00 00 60 00 a1 40 0d 80 a8 60 40 00 ce 7b 61 00 ..`..@...`@..{a.
= 00600000 ..`.
= 800d40a1 .@..  RHeap::Alloc(int)                         euser.in(.text) + 0x8b
= 004060a8 .`@.
= 00617bce .{a.

What if the program is not in ROM?

D_EXC knows how to decode RAM based symbols also
D_EXC .txt file, lists any DLLs which were loaded into RAM (hence not present in the ROM symbol file
You have to place the .MAP file of every RAM DLL you are interested in into the same directory as the d_exc trace
 Run printstk.pl as usual, and it should pick up the addresses correctly

Anything else?

Now that you know the thread, panic code and have the stack for both application panics and system panics:
It gives you a good idea of what functions to put logging in
It quickly allows you to see if a defect is a duplicate (if the callstack has already been posted on a previous defect)

Debugging on hardware is hard?

But with practice it’s a systematic method - not a black art
Application and System thread panics cover 90% of application side hardware crashes
So learn how to diagnose these first, then worry about more advanced debugging facilities
Print out the documentation as you work, it will help you with other kinds of problems and is a good reference
Read the other debugging guides to help you understand what is going on under the hood

Tips

Try these techniques out on a panic you put in the code yourself
so that you are confident about the result
make sure the call stack matches what you know to be the problem
When looking into a defect. It is often enough to find the component which paniced
This is what triage may do
The component owner may then be able to take over apply some knowledge and logging etc
Sometimes it can be helpful to make every thread a system thread
so all panics go to the debug monitor

ROM Symbol file format

5055ced4    004c    CMmPhoneTsy::UpdatePhoneIndicator(RMobileCall::TMobileCallStatus
5055cf20    0054    CMmPhoneTsy::UpdatePhoneIndicator(RMobilePhone::TMobilePhoneRegistr
5055cf74    001c    CMmPhoneTsy::GetSubscriberIdL(TBuf<15> &)
5055cf90    0038    CMmPhoneTsy::CompleteReadNamData(int, TPtrC8)
5055cfc8    0058    CMmPhoneTsy::CompleteProductInfoNumId(TBuf8<50> &)