
Trap Dump Reference                                        2012-05-02
--------------------                                 Steven H. Levine
                                                steve53@earthlink.net


0.  Introduction.
    -------------

This note explains how to set up and record trap dumps and how to do basic
dump analysis with PMDF.

1.  Preparing the Dump Volume/Partition
    -----------------------------------

The trap dump facility writes the dump files to a dedicated FAT partition.

This must be a FAT formatted partition with the volume name SADUMP. The partition
must be at least as large as the installed memory.

WARNING - os2dump effectively formats the volume and overwrites any existing 
volume content.  Don't store anything you don't want to lose on the volume.

Use LVM or dfsee to create a volume for the file system.  Verify that the
partition size is no more than 2047MiB. LVM and dfsee round requests up to 
the next cylinder boundary and os2dump can not handle partitions larger than 
2047MiB.

On some systems with less than 2GiB of RAM, it appears that a maximum size
partition is required.

Given the typical logical geometry of 255 sectors/track and 63
heads/cylinder, the number of cylinders in a 2GiB partition is

  2*1024**3 / (255 * 63 * 512) = 261.083348

or rounding down, 261 cylinders.

Your drives probably use this geometry, but you should check with the
dfsee GEO command or menus or some other method.

If you are using dfsee to create the volume, specify the size as 261,c,
where c indicates cylinders.  If you must specify the size in MiB, 2046.5MiB
should be result in a 261 cylinder partition.

The dfsee confirmation dialog will be similar to

  CREATE logical FAT 2047,  @10

  Freespace ID 10 :   8738.5 MiB disk 3
  FAT32-Ext  = 0c :   2047.3 MiB Logical

Format the volume with

  format X: /fs:FAT /v:SADUMP

where X: is the drive letter you assigned to the volume.

Test the volume with the command

  dir X:\

The volume name should be SADUMP.  Since the volume is empty, the shell may 
display a sys0002 error message.


2.  Enabling the Trap Dump Facility.
    --------------------------------

There are several ways to enable and configure the trap dump facility.  Pick
the one that best suits you needs.

To permanently enable trap dumps, add the following line to config.sys

  TRAPDUMP=R0,x:

where x is the volume where the dumps will be stored.

WARNING - os2dump effectively formats the volume and overwrites any existing 
volume content.  Don't store anything you don't want to lose on the volume.

On LVM aware systems, volume letters may or may not match BIOS assigned drive
letters.  The kernel maps the LVM volume letter to the SADUMP partition when
invoking the dump routines.  This works correctly in current kernels.

Some older kernels had problems with this mapping.  If you are working with
an older Warp4 kernel and the trap dump facility can not find the SADUMP
volume, try to position the SADUMP volume so that LVM volume letter
matches the drive letter assigned by the BIOS scan.  There are lots of
different volume types and setups, so it may take some experimentation to
determine the correct volume letter for some setups.

WARNING: Do not use the os2dump binary delivered with mcp2/acp2.  It had a 
defect and can possibly OVERWRITE your volume(s).

After editing CONFIG.SYS, reboot to active the feature.

For more information on this command, type

  view cmdref trapdump

from the command line.  This information is accurate but incomplete.  Recent
FixPak's have added many new features.  See

  \os2\install\readme.dbg

for the details.

One new command is PDUMPSYS, which can control the level of detail included
in the system dump.  These settings do not apply to kernel trap dumps, but
can be used to control the detail of ring 0 process dumps.  Also added is the
capability to set up the dump configuration from the command line using the
TRAPDUMP command. For more information, see

  \os2\system\ras\procdump.doc

If a ring 0 system dump seems to be missing information needed to analyze
your problem try

  pdumpsys paddr(all)

If needed, this command and others can be invoked from config.sys.  For
example

  run=z:\os2\system\ras\pdumpsys.exe paddr(all)

where z: is your boot volume.


3.  Recording Trap Dumps.
    ---------------------

With the above configuration, this is usually automatic.  The trap dump file
will be created as the traps occur and will be written to the volume you
chose.  Any existing data on volume will be erased including any prior dump
file.  Most often the trap will occur at the same cs:eip repeatedly.  If not
you may need to save the trap dump files to a temp directory to allow you to
analyze the common factors.

Those using non-US keyboards might have trouble responding to the dump prompt
with the Y key.  If so, use F1 instead.

If you are trying to capture a trap dump file for a hang condition, try
Ctrl-Alt-NumLock-NumLock from the keyboard.  On some keyboards, you may need
to use Ctrl-Alt-F10-F10.

To turn off the Trap Dump Facility, REM out the TRAPDUMP statement in
config.sys and reboot.


4.  Preparing to Use the PM Dump Facility - Overview.
    -------------------------------------------------

The PM Dump Facility (PMDF) is a generic tool which needs to be configured
to understand the dump files generated by a specific kernel version.

PMDF is configured with a set of files known as Dump Symbols which are a
combination of programs and data files.

The program files (DF_RET.EXE and DF_DEB.EXE) are invoked by PMDF to retrieve
data from the dump file. These programs understand the layout of the kernel
data structures and are typically specific to a range of kernel revisions.

The data files include System Definition (.sdf) files and Symbol (.sym)
files.  These are called a Symbol File Set.

The System Definition files configure df_ret and df_deb and are typically
specific to a single kernel revision.

The Symbol files contain data used to translate binary addresses within an
executable to symbolic labels. These files are typically specific to a
specific version of the executable.

This means that a set of Dump Symbols is typically kernel version specific,
FixPak specific and application version specific.

Assuming a standard install of the Dump Facility, the Dump Symbols sets are
stored in subdirectories of \os2\pdpsi\pmdf on your boot volume and
identified by the index file \os2\pdpsi\pmdf\pmdfvers.lst.

The pmdfvers.lst index file cross references kernel version to subdirectory
names.

Note that the Dump Facility and the Kernel Debugger use the same set of
symbol files, but the requirements differ as to where the symbol files must
be located.  The Dump Facility requires that the symbol files be placed in a
directory that can be found via pmdfvers.lst.

The Kernel Debugger requires that the symbols be stored in the same
directory as the associated executable.  For example, os2krnl.sym must be
located in the root of the boot volume along with os2krnl.

The reason for this is that the Kernel Debugger runs on the Machine Under
Test (MUT) and may need access to the symbol files before the file system
drivers are loaded.  This places some limits on where the symbol files can be
located and the easiest solution was to require the symbol files to be in
the same directory as the associated executable.

PMDF is often run on a system other than the MUT which generated the dump.
Using pmdfvers.lst to locate the correct symbols for the dump allows PMDF to
be used to analyze dumps capture on systems other than the system running
PMDF.  Pmdfvers.lst will have a line defining where to find the symbols for
each specific kernel revision to be analyzed.


5.  Preparing to Use the PM Dump Facility - pmdfvers.lst.
    -----------------------------------------------------

Each pmdfvers.lst entry is a single line of the form

  directory;version;comment

or

  directory:version:comment

Directory is the data directory that contains the df_ref.exe and symbol
files files for the indicated kernel version.  This directory must be a
subdirectory of \os2\pdpsi\pmdf.

A typical entry is

  warp45_s;14.105_SMP;eComStation 2.1 standard install

which states that the warp45_s is the data directory which contains the
df_ret.exe and symbol data files for the 14.105 SMP kernel.  The version
string is case sensitive and must match the kernel revision exactly.
Anything after the second delimiter is a comment to help you remember what
the other two values mean.

If you are not sure of what kernel revision you are running, use your
favorite hex editor and search for the string "Internal revision" without
the quotes.  The string that follows is the value PMDF uses to match the
version string in pmdfvers.lst. You can find similar information with

  bldlevel \os2krnl

but the revision number reported in the build level string is often not in
exactly the format that PMDF is looking for.

Check if \os2\pdpsi\pmdf\pmdfvers.lst exists.  If it does not, create it
using the following as a template

  warp45_s;14.105_SMP;WSeB/ACP SMP
  warp45_u;14.105_UNI;WSeB/ACP UNI
  warp45;14.105_W4;Warp 4/MCP

Replace the version string with one that matches the installed kernel.

Check that the data directory for your kernel revision exists.  If not create
it as a subdirectory of \os2\pdpsi\pmdf.

Check the data directory for an existing Symbol File set.  For a retail
kernel, at a minimum, the set will include

  df_ret.exe
  os2krnlr.sym
  os2krnl.sdf
  doscall1.sym

Depending on the issue, you may need additional symbol files.


6.  Preparing to Use the PM Dump Facility - eComStation Details.
    ------------------------------------------------------------

If you are running eComStation 1.2 or newer, the dump symbols may already be
installed.  Check the data directory named in the pmdfvers.lst entry for your
kernel version.

If not, they are available on your installation CD in the
\os2image\debug and the \os2image\fi\sysmgmt directories.  They are also
available from your account at the eComStation website (www.ecomstation.com).

If you need to copy files to the PMDF data directory, copy the files to the data
directory and not a subdirectory of the data directory.  This is
important.  PMDF only searches the data directory.

Continue with step 7.


7.  Preparing to Use the PM Dump Facility - Warp4/WSeB details.
    -----------------------------------------------------------

Depending on what components you installed when you installed Warp or WSeB on
the box where the trap occurred, you might already have the files needed to
examine the dump installed.  If not, the following examples describe how to
get the files you need and how to install the files so that PMDF can use
them.

The examples that follow assume

  FixPak 15
  Kernel version 14.062
  Boot volume c:

Be sure to replace the values shown in the examples with values that match
your specific system.  This applies to kernel revision numbers, boot volume
letters and other values that are specific to your system.  Add pathname
prefixes as needed to match where you have stored the files.

If you are not sure of what kernel revision you are running, use your
favorite hex editor and search for the string "Internal revision" without the
quotes.  The string that follows is the value PMDF uses to match the kernel
to the symbols.  You can find the same information with

  bldlevel c:\os2krnl

but the revision number reported in the build level string is usually not in
exactly the format that PMDF is looking for.

Go to

  ftp.software.ibm.com/ps/products/os2/fixes/debug

and download m015dmp.zip. This is the symbols zip file for FP15 and kernel
version 14.062.

If you are using a testcase kernel, the dump symbols will be at

  testcase.boulder.ibm.com/ps/fromibm/os2

and will be named

  dfyyyymmdd.zip

where mmdd matches the testcase kernel date.  When you download a testcase
kernel, be sure to download the symbols file at the same time.  Otherwise,
it might be gone when you need it.

There are a few sites that archive copies of the test test case kernels.  One
such site is

  http://www.os2site.com/sw/upgrades/kernel/

Create the subdirectory

  c:\os2\pdpsi\pmdf\warp4.15

Unzip the contents of m015dmp.zip into this subdirectory with

  unzip -j m015pmd.zip -d f:\os2\pdpsi\pmdf\warp4.15

This will put all the files in m015pmd.zip into the data directory.  This is
important.  If you do not use the -j option or it's equivalent, PMDF will
not be able to find the symbols.

The zip files contain subdirectories because they are structured for use with
either the Kernel Debugger or PMDF.  As explained previously, the Kernel
Debugger requires that the symbol files be in the same directory as the
associated executable.  The zip files contain subdirectory information so
that the files will be placed in the correct directory when unzipped to the
root of the boot volume.  For PMDF, the symbol files must be in a
data directory named in pmdfvers.lst.

8.  Preparing to Use the PM Dump Facility - Other Applications.
    -----------------------------------------------------------

If your application came with map files and no symbol files, the mapsym
utility may be able to create symbol files from the map files.  Mapsym is
available with most compilers.  If you need usage help, just type

  mapsym

from the command line.  Copy the symbol files to the PMDF data directory.

If you don't have either map or symbol files for your application, it may be
a bit more difficult to analyze the dump file, but PMDF will still work.

9.  Analyzing Trap Dumps.
    ---------------------

This is a bare bones overview.

Start PMDF.

For eComStation, the object should be in the Utilities folder.  For Warp4,
you should have an object in the Problem Determination Tools folder. If you
do not have an object, you can start PMDF from the command line.

If you did not install PMDF, you will need to run Selective Install and
install the Problem Determination Tools and reapply the last FixPak you
installed.

Open the dump file from the PMDF File menu.  PMDF should find the matching
Symbol File Set using the data in pmdfvers.lst.

If for some reason PMDF cannot match up the dump file with a Symbol File
Set, it will prompt you to select a Symbol File Set from the sets defined in
pmdfvers.lst.  Since you have configured PMDF for you specific kernel
version, this should not occur.  If it does, you should request help.

PMDF will allow you to select one of available Symbol File Sets, but the
results will be unpredictable.  PMDF is almost sure to misinterpret the dump
file content.  It may even trap.

If PMDF appears to trap before selecting a symbol set, check pmdfvers.lst
and make sure it contains no blank lines.

If you are trying of open a system dump for a SMP kernel and df_ret aborts, 
try the 14.100d df_ret.exe

If you continue to have trouble open your dump file with PMDF, request help.

To verify the dump file and view some of the available data, try the
following:

  Select Synopsis -> Trap Screen Info from the Analyze menu.

  Select Thread -> Call Gate from the Analyze menu.

  Select Thread -> Ring 0 Stack Trace from the Analyze menu.

  Select Thread -> Ring 2 Stack Trace from the Analyze menu.

  Select Thread -> Ring 3 Stack Trace from the Analyze menu.

  Select Process -> Open Files from the Analyze menu.

  Select Synopsis -> Process Synopsis from the Analyze menu.

  Select Synopsis -> System Synopsis from the Analyze menu.

  Select Process -> Module Table from the Analyze menu.

Enter the commands

    r
    ln
    u eip-20 eip
    k
    dw bp

in the command line at the bottom of the PMDF window.  Press the Enter key
after each command.

If the r command does not report the same cs:eip as shown on the Trap
Screen, repeat the above commands substituting the numeric cs:eip value from
the Trap Screen for eip and the numeric ss:ebp value from the Trap Screen
for ebp.

Select Save Output from the File menu and save the window contents to a
file.

If you don't understand what you are seeing, you will have to find someone to
help you interpret the content of the dump file.  Ask questions and let your
helper guide you.  Be prepared to spend some time working with your helper
to understand the cause of the trap.

The bare bones information you generated is just a starting point.  It may
or may not be sufficient to identify the source of the trap.  Unless you are
lucky, your helper will request additional output and may ask you generate
another dump file using different settings.

Often, your helper will to want you to send a copy of the dump file and the
debug symbols.

It is a good idea to keep notes describing how each dump file was generated
and what you were doing when the dump file was generated.  This is especially
important for intermittent failures where one is looking for a pattern.

Save the dump file, the debug symbols and your notes until analysis is
complete.


10.  Interpreting Trap Dumps.
    ------------------------

This too is just the bare bones overview.

The first goal is to decide what the code is trying to do when it traps.

Start with the Ring 0 Trap Screen Dump.

If CSLIM is all F's, the trap is in the kernel.  Experienced developers can
sometimes derive a module name from the cs:eip value, but this is beyond the
scope of this note.

If the CSLIM is not all F's, the trap is either in a device driver or 16-bit
code within the kernel.  Scan the device driver list.  Look for a matching CS
value in the strategy entry point column or a matching DS value in the Device
Header column.

If SS is E8 or 15E8, the trap is within an interrupt handler.


10. E-mailing Trap Dumps.
    ---------------------

In general, don't e-mail a dump file to someone without asking them if they
want you to send it.

If you do need to e-mail the file, zip it up first.  This will save transmit
time and protect the dump file from corruption.  It's always a good idea to
give the zip file a useful name.  Something like DavesTrapE_20111221.zip
will help everyone remember what the zip file contains.  It's also a good
idea to include a note in the zip file describing how and why the trap
occurred along with the bldlevel output.  The zip file may get separated from
the e-mail message.  You should include the output of

 procdump query

in your note.  This will describe the type of data recorded in the dump file.

If the zip file is over 5MB or so, check with your helper before sending the
e-mail.  You may need to use a file splitter and send each chunk in a
separate e-mail giving your helper a chance to delete each e-mail from the
server before you send the next chunk.  Most ISPs limit e-mail inboxes to
10MB and unless you send the zip file in chunks, it will never get to your
helper.

If you can arrange to FTP the zip file to your helper,  this is often a
better solution.

Be careful to send the e-mail containing the dump file only to the
intended addressee.  Sending a large, unexpected e-mail to all the members of
a mailing list, some of whom may still be one dial up, is sure of upset
someone.

11. Known Restrictions.
    -------------------

The dump volume must be visible to the BIOS.  This usually means it must
be on the first or second drive and must be below the 1024 cylinder boundary
unless the BIOS support the Int13 extensions.

In a mixed IDE/SCSI system, the driver for the boot volume must load first.
This applies even when booting from IDE and when the SCSI drive contains no
bootable devices.  Otherwise, OS2DUMP will hang.

The standard trap dump facility is limited by the 2GiB FAT volume limit. If 
you have more than 2GiB of RAM, you will need to install and configure the 
DUMPFS IFS.

There are issues with AHCI capable controllers.  Depending on the BIOS and 
AHCI controller, the trap dump facility may not be compatible with the 
os2ahci.add driver.  The BIOS must be able to reset the controller to a 
state where real-mode int13 disk IO can work.  If this is not possible, the 
workaround is to use the danis506 driver, if the system supports it.

Good luck.

Steven
