How to DEBUG your code?----大气所东亚区域气候-环境重点实验室

实验室简介更多》

中国科学院东亚区域气候－环境重点实验室是经中国科学院批准，在原中国科学院大气物理研究所全球变化东亚区域研究中心基础上成立的开放实验室。研究领域包括东亚区域环境、气候变化等全球变化研究的诸多方面，多学科交叉研究是本实验室的基本特色。同时，实验室还承...

模式发展更多》

RIEMS简介

RIEMS-Chem模拟的气溶胶气候效应

AVIM简介

实验观测更多》

吉林通榆观测站

通榆站的主要观测内容

通榆观测站感热通量年变化

How to DEBUG your code?

II.  SUGGESTIONS FOR DEBUGGING FOR SERIAL EXECUTION OF SCIENTIFIC PROGRAMS.

     The following section is designed to provide suggestions for debugging programs written for scientific

applications.  However, many of the suggestions will apply to debugging other types of applications. This

section deals only with debugging programs that are running serially and not in parallel.

     Programs can appear to have no bugs with some sets of data because all paths through the program may

not be executed.  To debug your program, it is important to try and test your program with a variety of

different data sets so that (hopefully) all paths in your program can be tested for errors.
   
     Now assume that your program compiles and produces an executable, but the program execution either

doesn't complete or it does complete but produces wrong answers. Carefully going through the following

steps will help you find many of the commonly occurring bugs.  All compiler options mentioned in this

section are valid for Fortran/77, Fortran/90, C and C++ compilers unless indicated otherwise.


     STEP 1. Using lint 

     If your program is written in C, you should use the lint utility that will help identify problems with your

code at the compile step. If your C program is in the file prog.c, then invoke lint with:

          lint prog.c

     The output from the above invocation will be directed to your screen.

     There is a public domain version of lint for Fortran77 called ftncheck that can be obtained from

the /pub directory at the anonymous ftp site ftp.dsm.fordham.edu at Fordham University. 


     STEP 2.  Check for out-of-bounds array accesses

     A common programming error is the use of array indices outside their declared limits.  For help

finding these errors in your Fortran 90 program, compile as follows (currently there is no such support

for the C/C++ compilers):

          f90 -g -DEBUG:subscript_check:trap_uninitialized:conform_check prog.f90

     Then run the generated executable by itself or under dbx or cvd. The  "subscript_check" option

enables bounds checking.  The "trap_uninitialized" option caused the program to detect when the value

of a  variable is used before it has been set.  The "conform_check" option enables conformance checking

of array operands in array expressions.

 And click on the RUN button. 
     (See the DEBUG_group man page for more information on this option.)

          (a)  If you are running a C or C++ program, your program will stop at the first occurrence of an

array index going out of bounds. You can now examine the value of the index which caused the problem

using any of the methods described in section C above. Compiling with the -g option causes the compiler

to generate symbolic debugging information so your program will execute  under cvd.  It also disables

optimization.  Sometimes disabling optimization will cause the bug to disappear.  If this happens, you

should still carefully go through each of these steps as best as you can.

          (b)  If you are using the Fortran/90 compiler, after compiling with the above options and after

running the generated executable under cvd, enter the following in the cvd pane

                    cvd>  stop in __f90_bounds_check

               Now click on the RUN button.  Next click on VIEWS and select Call Stack and double click

on the function/subroutine immediately below __f90_bounds_check. This will cause the source code

for this function/subroutine to be displayed in the Main View Window and the line where cvd has stopped

will be highlighted.  You can now find the value of the index which caused the out-of-bounds problem.  

          (c)  If you are using the Fortran/77 compiler, after compiling with the above options and after running

the generated executable under cvd, enter the following in the cvd pane

                    cvd>  stop in s_rnge

               Now click on the RUN button.  Next click on VIEWS and select Call Stack and double click on

the function/subroutine immediately below s_rnge. This will cause the source code for this function/subroutine

to be displayed in the Main View Window and the line where cvd has stopped will be highlighted.  You can

now find the value of the index which caused the out-of-bounds problem.  

               Note:
               For Fortran programs, bounds checking cannot be done in subprograms if arrays passed to

a subprogram are declared with extents of "1" or "*" instead of passing in their sizes and using this

information in their declarations.  An example of how the declarations should be written to allow

for bounds checking is:

                    SUBROUTINE SUB(A,LDA,N, ...)
                    INTEGER LDA,N
                    REAL A(LDA,N)


   
     STEP 3.  Check for uninitialized variables being used in calculations

     To find uninitialized REAL variables being used in floating point calculations, compile your program with

          -g -DEBUG:trap_uninitialized=ON

     This will force all uninitialized stack, automatic and dynamically allocated variables to be initialized

with 0xFFFA5A5A.  When this value is used as a floating point variable involving a floating point calculation,

it is treated as a floating point NaN and it will cause a floating point trap.  When it is used as a pointer or as

an address a segmentation violation may occur. For example, if x and y are real variables and the program

is compiled as above,

          x = y

     will not be detected when y is uninitialized since no floating point calculations are being done.

However, the following will be detected:

          x = y + 1.0

     After compiling your program with the above options, enter

          cvd <executable>

     and then click the RUN button.  To find out where your program has stopped, click on VIEWS

and select the Call Stack where you will see that many system routines have been called.  Double click

on the highest routine in the call stack that is clearly not a system routine.  This will bring up the source

code for this routine and the line where the first uninitialized variable (subject to the above-mentioned

conditions) was used. You can now examine the values of the indices which caused the problem using

any of the methods described in section I part C.

     At the present time, it is not possible to use cvd to detect the use of uninitialized INTEGER variables. 


     STEP 4.  Finding Divisions by Zero and Overflows

     A. To find floating point divisions by zero and overflows, first 
        enter

             setenv TRAP_FPE ON

        if you are using the csh or tcsh shell.  For other shells, see their man pages.

        Next compile your program with -g and link with -lfpe:

             -g -lfpe
  
        and then enter

             cvd <executable>

        In the cvd command/message pane enter

             cvd> stop in __catch

        Click on the RUN button; select Call Stack from VIEWS and then double click on the highest routine

that is not a system routine.  The line where execution stopped will now be highlighted in the Source code

display area of the cvd Main View window.  You may now use any of the methods in section C above to

find variable values to discover why the divide by zero or overflow occurred.  For more information on

handling floating point exceptions, see the man pages for handle_sigfpes.

     B. To find integer divisions by zero, compile your program as

             -g -DEBUG:div_check=1

        and enter

             cvd <executable>

        Click the RUN button and the program will automatically stop at the first line where an integer divide

by zero occurred.  You may now use any of the methods of section C to find variable values to discover

why the divide by zero occurred.  


     STEP 5.  A core file is produced

     Sometimes during program execution a core file is produced and the program does not complete execution.

This file is placed in your working directory with the file name of 'core'.  You can find the place in your program

where the execution stopped and the core file was produced by entering

          cvd <executable> core

     where <executable> is the executable that you were running. The cvd Main View window will come up

and the source line where execution stopped may be highlighted in green.  If it is not highlighted in green,

then select Call Stack under VIEWS and double-click on the highest routine that is not a system routine.

This will bring up the source code for this routine and the last line executed will be highlighted in green.

If the executable was formed by compiling with the -g option, then you can view values of program variables

when program execution stopped.  You can find the assembly instruction where execution stopped by clicking

on VIEWS and selecting Disassembly View. Remember that this is the last statement executed before the core

file was produced and hence it does not necessarily mean that the bug in your program is in this line of code.

For example, a program variable may have been initialized incorrectly, but the core was not produced until

the variable was used later in the program.  

     Some machines are configured to not produce a core file.

    To find out if this is the case on the machine you are using enter

          limit

     If the limit on coredumpsize is zero, no corefile will be produced.
     If the limit on coredumpsize is not large enough to hold the program's memory image,

the core file produced will not be usable.  To change the configuration to allow useful core

files to be produced enter

          unlimit coredumpsize


     STEP 6. Incorrect answers are being produced

     Assume that the above steps have been taken and that all problems that can be detected by the above

have been corrected. This means that your program completes execution, but obtains incorrect answers.

What you do at this point will likely depend on special circumstances.  The following is a list of some

commonly used debugging procedures that may or may not apply to your situation.

     1.  Try running your program on a very small problem size where you can easily obtain intermediate results.

Run your program under cvd on this small problem and compare with the known correct results.

     2.  If you know that a certain answer being calculated is not correct, set breakpoints in your program

so you can monitor the value of the answer at various points in your program.

     3.  You may want to set breakpoints on each call to a selected function/subroutine where you suspect

there may be problems, see section I part C.

     4.  Debugging COMMON blocks and EQUIVALENCE statements in Fortran.
         Variables used in these statements must have exactly the same type and dimension everywhere they appear

and they must occur in exactly the same order.  Normally ftncheck, for Fortran/77 programs will find these errors.

However, for Fortran/77 programs it is best to use an include statement for each COMMON block.  For Fortran/90

programs, it is best to use a module for each COMMON block.  It is best not to use EQUIVALENCE statements.

     5.  Local data not saved. In Fortran, values of local variables are not guaranteed to be saved from one execution

of the subprogram to the next unless they are either initialized in their declarations or they are declared to have

the SAVE attribute.  Some compilers/machines automatically give all local variables the SAVE attribute, so

moving a working program from this compiler/machine to a compiler/machine that does not do this may introduce

this kind of bug.  You should give local variables the SAVE attribute if you would like their values saved.

来源：http://andrew.ait.iastate.edu/HPC

相关附件

相关文档