
Process Dump Reference                                    26 Jan 2012
----------------------                               Steven H. Levine
                                                steve53@earthlink.net


0.  Introduction.
    -------------

This note explains how to set up and record process dumps and how to do
basic dump analysis.


1.  Setup.
    ------

There are several ways to enable and configure the process dump facility.
Pick the one that best suits you needs.

To enable the process dump facility permanently, add the following line to
CONFIG.SYS

  DUMPPROCESS=x

where x is the drive where you want to store the dump files.  After you
reboot, the kernel will generate a dump file for every program that traps.
The file will be stored in the root of the drive.

Depending on how the program uses memory, dump files can be very large so
pick a drive with sufficient free space.  For best results pick the JFS or
HPFS drive with the most free space.

For more information on the DUMPPROCESS command, type

  view cmdref dumpprocess

from the command line.

To activate the process dump facility from the command line issue the
command

  procdump on /l:pathname

where pathname is the drive or directory where you want to store the dump
files.

After you enter this command, the kernel will generate a dump file for every
program that traps.  This setting will stay in effect until you turn it off
or until you reboot.

To create a onetime dump of a running program, issue the commands

  procdump on /l:pathname
  procdump force /proc:processname
  procdump off

where pathname is the drive or directory where you want to store the dump
file and processname is the name of the process to be dumped.  After you
enter these commands, the kernel will generate a single dump file for the
named process.

The default settings generate a relatively small dump file, but might not
include sufficient detail for analysis.  If you need all the process memory
included in the dump file, enter the command

  pdumpusr private,update

before generating the dump file.

If you need all the shared memory included in the dump, enter the command

  pdumpusr private,shared,update

before generating the dump file.

There are cases where the dump facility will hang or trap when attempting to 
dump shared memory.  If this occurs, try the alternate command

  pdumpusr paddr(all),update

This will generate a dump file slightly larger than the quantity of physical 
memory, but it is usually sufficient to generate a usable dump file.

For more information on the pdumpusr options, see

  \os2\system\ras\procdump.doc


2.  Recording Process Dumps.
    ------------------------

Once you have enabled the dump facility, the kernel will a generate dump
file for each trap that occurs or whenever you use the FORCE option.  If you
are getting frequent traps, turn off the dump facility as soon as possible
to avoid filling up your disk.

To turn off the dump facility from the command line enter the command

  procdump off

If you enabled the dump facility from CONFIG.SYS, don't forget to REM out the
line in CONFIG.SYS when you no longer need to collect the dump files.


3.  Preparing to Use the PM Dump Facility - Overview.
    -------------------------------------------------

The PM Dump Facility (PMDF) is a generic tool which needs to be configured
to understand the dump files generated by a specific kernel version.

PMDF is configured with a set of files known as Dump Symbols which are a
combination of programs and data files.

The program files (DF_RET.EXE and DF_DEB.EXE) are invoked by PMDF to retrieve
data from the dump file. These programs understand the layout of the kernel
data structures and are typically specific to a range of kernel revisions.

The data files include System Definition (.sdf) files and Symbol (.sym)
files.  These are called a Symbol File Set.

The System Definition files configure df_ret and df_deb and are typically
specific to a single kernel revision.

The Symbol files contain data used to translate binary addresses within an
executable to symbolic labels. These files are typically specific to a
specific version of the executable.

This means that a set of Dump Symbols is typically kernel version specific,
FixPak specific and application version specific.

Assuming a standard install of the Dump Facility, the Dump Symbols sets are
stored in subdirectories of \os2\pdpsi\pmdf on your boot volume and
identified by the index file \os2\pdpsi\pmdf\pmdfvers.lst.

The pmdfvers.lst index file cross references kernel version to subdirectory
names.

Note that the Dump Facility and the Kernel Debugger use the same set of
symbol files, but the requirements differ as to where the symbol files must
be located.  The Dump Facility requires that the symbol files be placed in a
directory that can be found via pmdfvers.lst.

The Kernel Debugger requires that the symbols be stored in the same
directory as the associated executable.  For example, os2krnl.sym must be
located in the root of the boot volume along with os2krnl.

The reason for this is that the Kernel Debugger runs on the Machine Under
Test (MUT) and may need access to the symbol files before the file system
drivers are loaded.  This places some limits on where the symbol files can be
located and the easiest solution was to require the symbol files to be in
the same directory as the associated executable.

PMDF is often run on a system other than the MUT which generated the dump.
Using pmdfvers.lst to locate the correct symbols for the dump allows PMDF to
be used to analyze dumps capture on systems other than the system running
PMDF.  Pmdfvers.lst will have a line defining where to find the symbols for
each specific kernel revision to be analyzed.


4.  Preparing to Use the PM Dump Facility - pmdfvers.lst.
    -----------------------------------------------------

Each pmdfvers.lst entry is a single line of the form

  directory;version;comment

or

  directory:version:comment

Directory is the data directory that contains the df_ref.exe and symbol
files files for the indicated kernel version.  This directory must be a
subdirectory of \os2\pdpsi\pmdf.

A typical entry is

  warp45_s;14.105_SMP;eComStation 2.1 standard install

which states that the warp45_s is the data directory which contains the
df_ret.exe and symbol data files for the 14.105 SMP kernel.  The version
string is case sensitive and must match the kernel revision exactly.
Anything after the second delimiter is a comment to help you remember what
the other two values mean.

If you are not sure of what kernel revision you are running, use your
favorite hex editor and search for the string "Internal revision" without
the quotes.  The string that follows is the value PMDF uses to match the
version string in pmdfvers.lst. You can find similar information with

  bldlevel \os2krnl

but the revision number reported in the build level string is often not in
exactly the format that PMDF is looking for.

Check if \os2\pdpsi\pmdf\pmdfvers.lst exists.  If it does not, create it
using the following as a template

  warp45_s;14.105_SMP;WSeB/ACP SMP
  warp45_u;14.105_UNI;WSeB/ACP UNI
  warp45;14.105_W4;Warp 4/MCP

Replace the version string with one that matches the installed kernel.

Check that the data directory for your kernel revision exists.  If not create
it as a subdirectory of \os2\pdpsi\pmdf.

Check the data directory for an existing Symbol File set.  For a retail
kernel, at a minimum, the set will include

  df_ret.exe
  os2krnlr.sym
  os2krnl.sdf
  doscall1.sym

Depending on the issue, you may need additional symbol files.


5.  Preparing to Use the PM Dump Facility - eComStation Details.
    ------------------------------------------------------------

If you are running eComStation 1.2 or newer, the dump symbols may already be
installed.  Check the data directory named in the pmdfvers.lst entry for your
kernel version.

If not, they are available on your installation CD in the
\os2image\debug and the \os2image\fi\sysmgmt directories.  They are also
available from your account at the eComStation website (www.ecomstation.com).

If you need to copy files to the PMDF data directory, copy the files to the data
directory and not a subdirectory of the data directory.  This is
important.  PMDF only searches the data directory.

Continue with step 7.


6.  Preparing to Use the PM Dump Facility - Warp4/WSeB details.
    -----------------------------------------------------------

Depending on what components you installed when you installed Warp or WSeB on
the box where the trap occurred, you might already have the files needed to
examine the dump installed.  If not, the following examples describe how to
get the files you need and how to install the files so that PMDF can use
them.

The examples that follow assume

  FixPak 15
  Kernel version 14.062
  Boot volume c:

Be sure to replace the values shown in the examples with values that match
your specific system.  This applies to kernel revision numbers, boot volume
letters and other values that are specific to your system.  Add pathname
prefixes as needed to match where you have stored the files.

If you are not sure of what kernel revision you are running, use your
favorite hex editor and search for the string "Internal revision" without the
quotes.  The string that follows is the value PMDF uses to match the kernel
to the symbols.  You can find the same information with

  bldlevel c:\os2krnl

but the revision number reported in the build level string is usually not in
exactly the format that PMDF is looking for.

Go to

  ftp.software.ibm.com/ps/products/os2/fixes/debug

and download m015dmp.zip. This is the symbols zip file for FP15 and kernel
version 14.062.

If you are using a testcase kernel, the dump symbols will be at

  testcase.boulder.ibm.com/ps/fromibm/os2

and will be named

  dfyyyymmdd.zip

where mmdd matches the testcase kernel date.  When you download a testcase
kernel, be sure to download the symbols file at the same time.  Otherwise,
it might be gone when you need it.

There are a few sites that archive copies of the test test case kernels.  One
such site is

  http://www.os2site.com/sw/upgrades/kernel/

Create the subdirectory

  c:\os2\pdpsi\pmdf\warp4.15

Unzip the contents of m015dmp.zip into this subdirectory with

  unzip -j m015pmd.zip -d f:\os2\pdpsi\pmdf\warp4.15

This will put all the files in m015pmd.zip into the data directory.  This is
important.  If you do not use the -j option or it's equivalent, PMDF will
not be able to find the symbols.

The zip files contain subdirectories because they are structured for use with
either the Kernel Debugger or PMDF.  As explained previously, the Kernel
Debugger requires that the symbol files be in the same directory as the
associated executable.  The zip files contain subdirectory information so
that the files will be placed in the correct directory when unzipped to the
root of the boot volume.  For PMDF, the symbol files must be in a
data directory named in pmdfvers.lst.

7.  Preparing to Use the PM Dump Facility - Other Applications.
    -----------------------------------------------------------

If your application came with map files and no symbol files, the mapsym
utility may be able to create symbol files from the map files.  Mapsym is
available with most compilers.  If you need usage help, just type

  mapsym

from the command line.  Copy the symbol files to the PMDF data directory.

If you don't have either map or symbol files for your application, it may be
a bit more difficult to analyze the dump file, but PMDF will still work.


8.  Analyzing Process Dumps.
    ------------------------

This is the bare bones.  Start PMDF.  For Warp4, you should have an object in
the Problem Determination Tools folder.  For eCS, the object should be in
the Utilities folder. If you don't have an object, you can start PMDF from
the command line.  If you didn't install PMDF, you will need to run Selective
Install and install the Problem Determination Tools and reapply the last
FixPak you installed (i.e. FP15 or whatever).

Open the dump file from the PMDF File menu.  PMDF should find the matching 
.sym files, using the data in pmdfvers.lst.

If for some reason PMDF cannot match up the dump file and the .sym files,
it will prompt you to select a .sym file set from the sets defined in
pmdfvers.lst.  Since you configured PMDF above, this should not occur.  You
can try selecting one of the available .sym file sets, but the results will
be unpredictable.  PMDF may misinterpret the dump file content or it may even
trap.

If PMDF appears to trap before selecting a .sym file set, check pmdfvers.lst 
and make sure it contains no blank lines.

Select Synopsis -> Trap Screen from the Analyze menu.

Select Thread -> Call Gate from the Analyze menu.

If the thread is in a Call Gate, select Thread -> Ring 0 Stack Trace from 
the Analyze menu.

Select Thread -> Ring 3 Stack Trace from the Analyze menu.

Enter the commands

  r
  ln
  u eip-20 eip
  k
  dd ebp
  dd esp
  db esp

in the PMDF command line at the bottom of the window.  Press the Enter key 
after each command.

If the r command does not report the same cs:eip as shown on the Trap 
Screen, repeat the above commands substituting the numeric cs:eip value from 
the Trap Screen for eip, the numeric ss:ebp value from the Trap Screen 
for ebp and the numeric ss:esp value from the Trap Screen for esp.

Select Process -> Open files the from Analyze menus

Select Process -> Module Table from the Analyze menu.

Select Save Output from the File menu and save the window contents to a 
file.

If you don't understand what you are seeing, you will have to find someone to
help you interpret the content of the dump file.  Ask questions and let your
helper guide you.  Be prepared to spend some time working with
your helper to understand the cause of the trap.

The bare bones information you generated is just a starting point.  It may
or may not be sufficient to identify the source of the trap.  Unless you are
lucky, your helper will request additional output and may ask you generate
another dump file using different settings.

Often, your helper will to want you to send a copy of the dump file and the 
debug symbols.

It is a good idea to keep notes describing how each dump file was generated
and what you were doing when the dump file was generated.  This is especially
important for intermittent failures where one is looking for a pattern.

Save the dump file, the debug symbols and your notes until analysis is
complete.


9.  Interpreting Process Dumps.
    ---------------------------

This too is just the bare bones.

The goal is to understand what the code is trying to do when the exception 
occurred, the data it was operating and what went wrong.

Start with the Call Gate information.  If the code is in a Call Gate, the 
kernel was processing some request when the exception occurred.  This may or 
may not represent a kernel defect.  The kernel attempts to validate any data 
is it passed, but in practice this is impossible.  There are just too many 
variables.

If the code is not in a Call Gate, the problem is within the application.  
The most common causes of application exceptions are buffer overruns or 
indexing errors.

If the exception is in the kernel, look first at the Ring 0 Stack Trace for 
useful clues.  The function names are often a good hint as to what the code 
is trying to do.

If the exception is in the application, look first at the Ring3 Stack Trace 
for useful clues.

Sometimes seeing the stack content in other formats can be useful.  The dd 
command displays memory content as double words  The db command displays 
memory content as bytes and characters.  The ds command displays memory 
content as 0 terminated strings.  Look for recognizable strings.

At times you might want to display larger amounts of data.  For example

  dd esp l400

displays 400 hex (i.e. 1024) double words starting at esp.

10.  E-mailing Process Dumps.
     ------------------------

In general, don't e-mail a dump file to someone without asking them if they
want you to send it.

If you do need to e-mail the file, zip it up first.  This will save transmit
time and protect the dump file from corruption.  It's always a good idea to
give the zip file a useful name.  Something like DavesTrapE_20120126.zip
will help everyone remember what the zip file contains.  It's also a good
idea to include a note in the zip file describing how and why the trap
occurred along with the bldlevel output.  The zip file may get separated from 
the e-mail message.  You should include the output of

  procdump query

in your note.  This will describe the type of data recorded in the dump file.

If the zip file is over 5MB or so, check with your helper before sending the
e-mail.  You may need to use a file splitter and send each chunk in a
separate e-mail giving your helper a chance to delete each e-mail from the
server before you send the next chunk.  Most ISPs limit e-mail inboxes to
10MB and unless you send the zip file in chunks, it will never get to your
helper.

If you can arrange to FTP the zip file to your helper,  this is often a
better solution.

Be careful to send the e-mail containing the dump file only to the
intended addressee.  Sending a large, unexpected e-mail to all the members of
a mailing list, some of whom may still be one dial up, is sure of upset
someone.


Good luck.

Steven
