加入收藏   |   联系我们   |   论坛  |   中国科学院
实验室简介 更多》
中国科学院东亚区域气候-环境重点实验室是经中国科学院批准,在原中国科学院大气物理研究所全球变化东亚区域研究中心基础上成立的开放实验室。研究领域包括东亚区域环境、气候变化等全球变化研究的诸多方面,多学科交叉研究是本实验室的基本特色。同时,实验室还承...
How to DEBUG your code?
II.  SUGGESTIONS FOR DEBUGGING FOR SERIAL EXECUTION OF SCIENTIFIC PROGRAMS.

     The following section is designed to provide suggestions for debugging programs written for scientific 
applications.  However, many of the suggestions will apply to debugging other types of applications. This 
section deals only with debugging programs that are running serially and not in parallel.

     Programs can appear to have no bugs with some sets of data because all paths through the program may 
not be executed.  To debug your program, it is important to try and test your program with a variety of 
different data sets so that (hopefully) all paths in your program can be tested for errors.
   
     Now assume that your program compiles and produces an executable, but the program execution either 
doesn't complete or it does complete but produces wrong answers. Carefully going through the following 
steps will help you find many of the commonly occurring bugs.  All compiler options mentioned in this 
section are valid for Fortran/77, Fortran/90, C and C++ compilers unless indicated otherwise.


     STEP 1. Using lint 

     If your program is written in C, you should use the lint utility that will help identify problems with your 
code at the compile step. If your C program is in the file prog.c, then invoke lint with:

          lint prog.c

     The output from the above invocation will be directed to your screen.

     There is a public domain version of lint for Fortran77 called ftncheck that can be obtained from 
the /pub directory at the anonymous ftp site ftp.dsm.fordham.edu at Fordham University. 


     STEP 2.  Check for out-of-bounds array accesses

     A common programming error is the use of array indices outside their declared limits.  For help 
finding these errors in your Fortran 90 program, compile as follows (currently there is no such support 
for the C/C++ compilers):

          f90 -g -DEBUG:subscript_check:trap_uninitialized:conform_check prog.f90

     Then run the generated executable by itself or under dbx or cvd. The  "subscript_check" option 
enables bounds checking.  The "trap_uninitialized" option caused the program to detect when the value 
of a  variable is used before it has been set.  The "conform_check" option enables conformance checking 
of array operands in array expressions.

 And click on the RUN button. 
     (See the DEBUG_group man page for more information on this option.)

          (a)  If you are running a C or C++ program, your program will stop at the first occurrence of an 
array index going out of bounds. You can now examine the value of the index which caused the problem 
using any of the methods described in section C above. Compiling with the -g option causes the compiler 
to generate symbolic debugging information so your program will execute  under cvd.  It also disables 
optimization.  Sometimes disabling optimization will cause the bug to disappear.  If this happens, you 
should still carefully go through each of these steps as best as you can.

          (b)  If you are using the Fortran/90 compiler, after compiling with the above options and after 
running the generated executable under cvd, enter the following in the cvd pane

                    cvd>  stop in __f90_bounds_check

               Now click on the RUN button.  Next click on VIEWS and select Call Stack and double click 
on the function/subroutine immediately below __f90_bounds_check. This will cause the source code 
for this function/subroutine to be displayed in the Main View Window and the line where cvd has stopped 
will be highlighted.  You can now find the value of the index which caused the out-of-bounds problem.  

          (c)  If you are using the Fortran/77 compiler, after compiling with the above options and after running 
the generated executable under cvd, enter the following in the cvd pane

                    cvd>  stop in s_rnge

               Now click on the RUN button.  Next click on VIEWS and select Call Stack and double click on 
the function/subroutine immediately below s_rnge. This will cause the source code for this function/subroutine 
to be displayed in the Main View Window and the line where cvd has stopped will be highlighted.  You can 
now find the value of the index which caused the out-of-bounds problem.  

               Note:
               For Fortran programs, bounds checking cannot be done in subprograms if arrays passed to 
a subprogram are declared with extents of "1" or "*" instead of passing in their sizes and using this 
information in their declarations.  An example of how the declarations should be written to allow 
for bounds checking is:

                    SUBROUTINE SUB(A,LDA,N, ...)
                    INTEGER LDA,N
                    REAL A(LDA,N)


   
     STEP 3.  Check for uninitialized variables being used in calculations

     To find uninitialized REAL variables being used in floating point calculations, compile your program with

          -g -DEBUG:trap_uninitialized=ON

     This will force all uninitialized stack, automatic and dynamically allocated variables to be initialized 
with 0xFFFA5A5A.  When this value is used as a floating point variable involving a floating point calculation, 
it is treated as a floating point NaN and it will cause a floating point trap.  When it is used as a pointer or as 
an address a segmentation violation may occur. For example, if x and y are real variables and the program 
is compiled as above,

          x = y

     will not be detected when y is uninitialized since no floating point calculations are being done.  
However, the following will be detected:

          x = y + 1.0

     After compiling your program with the above options, enter

          cvd <executable>

     and then click the RUN button.  To find out where your program has stopped, click on VIEWS 
and select the Call Stack where you will see that many system routines have been called.  Double click 
on the highest routine in the call stack that is clearly not a system routine.  This will bring up the source 
code for this routine and the line where the first uninitialized variable (subject to the above-mentioned 
conditions) was used. You can now examine the values of the indices which caused the problem using 
any of the methods described in section I part C.

     At the present time, it is not possible to use cvd to detect the use of uninitialized INTEGER variables. 


     STEP 4.  Finding Divisions by Zero and Overflows

     A. To find floating point divisions by zero and overflows, first 
        enter

             setenv TRAP_FPE ON

        if you are using the csh or tcsh shell.  For other shells, see their man pages.  
        Next compile your program with -g and link with -lfpe:

             -g -lfpe
  
        and then enter

             cvd <executable>

        In the cvd command/message pane enter

             cvd> stop in __catch

        Click on the RUN button; select Call Stack from VIEWS and then double click on the highest routine 
that is not a system routine.  The line where execution stopped will now be highlighted in the Source code 
display area of the cvd Main View window.  You may now use any of the methods in section C above to 
find variable values to discover why the divide by zero or overflow occurred.  For more information on 
handling floating point exceptions, see the man pages for handle_sigfpes.

     B. To find integer divisions by zero, compile your program as

             -g -DEBUG:div_check=1

        and enter

             cvd <executable>

        Click the RUN button and the program will automatically stop at the first line where an integer divide 
by zero occurred.  You may now use any of the methods of section C to find variable values to discover 
why the divide by zero occurred.  


     STEP 5.  A core file is produced

     Sometimes during program execution a core file is produced and the program does not complete execution.  
This file is placed in your working directory with the file name of 'core'.  You can find the place in your program 
where the execution stopped and the core file was produced by entering

          cvd <executable> core

     where <executable> is the executable that you were running. The cvd Main View window will come up 
and the source line where execution stopped may be highlighted in green.  If it is not highlighted in green, 
then select Call Stack under VIEWS and double-click on the highest routine that is not a system routine.  
This will bring up the source code for this routine and the last line executed will be highlighted in green.  
If the executable was formed by compiling with the -g option, then you can view values of program variables 
when program execution stopped.  You can find the assembly instruction where execution stopped by clicking 
on VIEWS and selecting Disassembly View. Remember that this is the last statement executed before the core 
file was produced and hence it does not necessarily mean that the bug in your program is in this line of code.  
For example, a program variable may have been initialized incorrectly, but the core was not produced until 
the variable was used later in the program.  

     Some machines are configured to not produce a core file.  
    To find out if this is the case on the machine you are using enter

          limit

     If the limit on coredumpsize is zero, no corefile will be produced.
     If the limit on coredumpsize is not large enough to hold the program's memory image, 
the core file produced will not be usable.  To change the configuration to allow useful core 
files to be produced enter

          unlimit coredumpsize


     STEP 6. Incorrect answers are being produced

     Assume that the above steps have been taken and that all problems that can be detected by the above 
have been corrected. This means that your program completes execution, but obtains incorrect answers.  
What you do at this point will likely depend on special circumstances.  The following is a list of some 
commonly used debugging procedures that may or may not apply to your situation.

     1.  Try running your program on a very small problem size where you can easily obtain intermediate results.  
Run your program under cvd on this small problem and compare with the known correct results.

     2.  If you know that a certain answer being calculated is not correct, set breakpoints in your program 
so you can monitor the value of the answer at various points in your program.

     3.  You may want to set breakpoints on each call to a selected function/subroutine where you suspect 
there may be problems, see section I part C.

     4.  Debugging COMMON blocks and EQUIVALENCE statements in Fortran.
         Variables used in these statements must have exactly the same type and dimension everywhere they appear 
and they must occur in exactly the same order.  Normally ftncheck, for Fortran/77 programs will find these errors.  
However, for Fortran/77 programs it is best to use an include statement for each COMMON block.  For Fortran/90 
programs, it is best to use a module for each COMMON block.  It is best not to use EQUIVALENCE statements.

     5.  Local data not saved. In Fortran, values of local variables are not guaranteed to be saved from one execution 
of the subprogram to the next unless they are either initialized in their declarations or they are declared to have 
the SAVE attribute.  Some compilers/machines automatically give all local variables the SAVE attribute, so 
moving a working program from this compiler/machine to a compiler/machine that does not do this may introduce 
this kind of bug.  You should give local variables the SAVE attribute if you would like their values saved.   

来源:http://andrew.ait.iastate.edu/HPC

相关附件
相关文档
版权所有 © 中国科学院东亚区域气候-环境重点实验室,中国科学院大气物理研究所,北京,100029
京ICP备14024088号  电话:010-82995160  E-mail:webmaster@tea.ac.cn