From 56adf5d91a3e106096a612a6340ba5f527be0d16 Mon Sep 17 00:00:00 2001 From: Douglas Guptill <douglas.guptill@dal.ca> Date: Mon, 8 Jun 2009 20:56:29 +0000 Subject: [PATCH] first report --- doc/report-01 | 256 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 256 insertions(+) create mode 100644 doc/report-01 diff --git a/doc/report-01 b/doc/report-01 new file mode 100644 index 00000000..14161293 --- /dev/null +++ b/doc/report-01 @@ -0,0 +1,256 @@ +Title: Porting douar +Author: Douglas Guptill +Date: 2009-06-04 + + +Porting to p690 +--------------- + - frustrating, time-costly + - douar behaviour varies with + + compiler (xlf 8.1, xlf 10.1) + + compiler options + + changes in noctreemax + - I wonder if the execution time and/or results are being affected + by the type mismatches at link time; see below for more. + - on the p690, using 16 processors, douar is running at 1/300 of the + speed of grace. (David Whipp ran the input.txt file on grace, and + sent me the stdout file.) + - no run yet has passed the point where douar calls wsmp. + + +======== (start) output from grace =================================== + start of non-linear iteratio 1 168.0293 + ----------------------------------- + build system 168.0294 0.0001 0.0001 + nelem per proc (min/max) 27059 42683 + viscosity range 1.00000E-04 3.59904E+02 + viscosity capped in 212992 + wsmp solve 215.1212 47.0919 47.0918 +======== (end) output from grace ===================================== + + +======== (start) output from p690 (np=16) ============================ + start of non-linear iteratio 1 57001.9888 + ----------------------------------- + build system 57001.9889 0.0001 0.0001 + nelem per proc (min/max) 13522 28627 + viscosity range 1.00000E-04 3.59904E+02 + viscosity capped in 212992 + wsmp solve 61747.2775 4745.2886 4745.2886 +======== (end) output from p690 (np=16) ============================== + + + + +Porting to mahone.ace-net.ca +---------------------------- + +For more about mahone, its hardware and software, see here: + https://wiki.ace-net.ca/index.php/Main_Page + + - PGI compilers + - quick and easy + - runs about 10% faster (using 4 processes on the head node) than grace. + - no run yet has passed the point where it calls wsmp. + - no successful run yet; MPI problems, as yet un-diagnosed: + [clhead:03784] *** An error occurred in MPI_Send + [clhead:03784] *** on communicator MPI_COMM_WORLD + [clhead:03784] *** MPI_ERR_COUNT: invalid count argument + [clhead:03784] *** MPI_ERRORS_ARE_FATAL (goodbye) + + +======== (start) output from mahone (np=4) =========================== + start of non-linear iteratio 1 137.5392 + ----------------------------------- + build system 137.5393 0.0000 0.0000 + nelem per proc (min/max) 54619 68563 + viscosity range 1.00000E-04 3.59904E+02 + viscosity capped in 196608 + wsmp solve 179.2639 41.7246 41.7246 +======== (end) output from mahone (np=4) ============================= + + + +Type mismatches +--------------- + +The xlf Fortran compiler on the p690 will, if asked, check for +mismatches between calling sequences and subroutine definitions. + +What do I mean by a type mismatch? An example: parameter #3 in the +calling list is a scalar, parameter #3 in the dummy argument list of +the subroutine code is an array. + +The xlf compiler found type mismatches in douar. Some I could correct +easily; others will require a much closer examination of the code to +determine a fix that doesn't break douar. There are type mismatches +in the calls to these routines: + nn2d_setup + nn2d + octree_interpolate_many + octree_interpolate_many_derivative + show + delaun + indexx + fluvial_erosion + diffusion_erosion + update_time_step + +There are also mismatches for mpi_ routines; this is common on the +p690; I believe they can be ignored. See below for the complete list. + + +The C code in NN +---------------- +(this section is for detail fanatics) + +The files stack.c, stackpair.c and volume.c had copies with a .cc +extension: + stack.c and stack.cc + stackpair.c and stackpair.cc + volume.c and volume.cc + +The actual code in each pair of files (stack.c and stack.cc) was +identical, except that in one copy there was an underscore at the end +of some routine names. + +The reason for these duplicates appears to be the variation in how +Fortran compilers link to non-Fortran routines; some add an underscore +to the name of non-Fortran routine, some don't. This results in two +possibilities at link time: a reference to, for example, + "stackinit" +or to + "stackinit_" + +I believe that the duplicated code is a maintenance problem and a +potential source of nasty bugs. So I removed the duplicate files, and +modified the remaining copy by adding a stub for each routine which +calls the other. For example, the code for "stackinit" now looks like +the snippet below. + +The C routines which were un-typed caused link failures on the p690. +The cure for this was to add a type (void) for those entry points. + +================= (start) stackinit ====================== +/* prototypes */ + void stackinit_(); + void stackinit(); + +/* provide an entry point with "_" at the end */ + void stackinit_() {stackinit();} + +/* the code */ + void stackinit() + { + head = (struct node *) malloc(sizeof *head); + z = (struct node *) malloc(sizeof *z); + head->next = z; head->key=0; + z->next = z; + z->key = 0; + } +================= (end) stackinit ====================== + + +============ (start) type mismatches from the p690 ==================== +(ld): mismatch +ld: 0711-189 ERROR: Type mismatches were detected. + The following symbols are in error: + Symbol Hash Inpndx TY CL Source-File(Object-File) OR Import-File{Shared-object} + ------------------------- ---------------------- ------- -- -- ------------------------------------------------------ + .mpi_reduce ** No Hash ** [IMPORT] -- PR {/usr/lpp/ppe.poe/lib/libmpi_r.a[mpifort64_r.o]} + ** References ** [249] ER PR (/home/beaumnt1/software/wsmp/lib/Power4/libpwsmp64.a[pwssmp.o]) + ** References Without Matching Definitions ** + Fort 1C031446 20202020 [499] ER PR (do_leaf_measurements.o) + [497] ER PR (compute_divergence.o) + Fort 883D2E6F 883D2E6F [520] ER PR (build_system_wsmp.o) + + .mpi_allreduce ** No Hash ** [IMPORT] -- PR {/usr/lpp/ppe.poe/lib/libmpi_r.a[mpifort64_r.o]} + ** References ** [60] ER PR (/home/beaumnt1/software/wsmp/lib/Power4/libpwsmp64.a[pwgcomm.o]) + [26] ER PR (/home/beaumnt1/software/wsmp/lib/Power4/libpwsmp64.a[pwgcomm.o]) + [496] ER PR (/home/beaumnt1/software/wsmp/lib/Power4/libpwsmp64.a[pwgsmp.o]) + [167] ER PR (/home/beaumnt1/software/wsmp/lib/Power4/libpwsmp64.a[pwgsmp.o]) + [1437] ER PR (/home/beaumnt1/software/wsmp/lib/Power4/libpwsmp64.a[parsymb.o]) + ** References Without Matching Definitions ** + Fort A0C85910 20202020 [577] ER PR (update_cloud_fields.o) + [393] ER PR (move_cloud.o) + [490] ER PR (move_surface.o) + [390] ER PR (interpolate_velocity_on_surface.o) + [477] ER PR (interpolate_ov_on_osolve.o) + [538] ER PR (improve_osolve.o) + [540] ER PR (erosion.o) + [476] ER PR (compute_pressure.o) + [528] ER PR (build_system_wsmp.o) + Fort 7300E28E 20202020 [413] ER PR (refine_surface.o) + [402] ER PR (check_delaunay.o) + + .mpi_bcast ** No Hash ** [IMPORT] -- PR {/usr/lpp/ppe.poe/lib/libmpi_r.a[mpifort64_r.o]} + ** References ** [376] ER PR (/home/beaumnt1/software/wsmp/lib/Power4/libpwsmp64.a[pwgsmp.o]) + [295] ER PR (/home/beaumnt1/software/wsmp/lib/Power4/libpwsmp64.a[pwssmp.o]) + [24] ER PR (/home/beaumnt1/software/wsmp/lib/Power4/libpwsmp64.a[porder.o]) + [244] ER PR (/home/beaumnt1/software/wsmp/lib/Power4/libpwsmp64.a[plda.o]) + ** References Without Matching Definitions ** + Fort D8DDC68E 20202020 [519] ER PR (solve_with_pwgsmp.o) + [528] ER PR (solve_with_pwssmp.o) + [524] ER PR (erosion.o) + Fort 4955A997 4955A997 [354] ER PR (read_input_file.o) + [272] ER PR (read_controlling_parameters.o) + [450] ER PR (create_surfaces.o) + + .nn2d_setup Fort 415F1726 DC32B44C [47] LD PR nn.f(NN/libnn_f-q64.a[nn.o]) + ** References Without Matching Definitions ** + Fort 5064C7F3 20202020 [536] ER PR (erosion.o) + [466] ER PR (create_surfaces.o) + + .nn2d Fort DAF314E0 477683E6 [182] LD PR nn.f(NN/libnn_f-q64.a[nn.o]) + ** References Without Matching Definitions ** + Fort 14D0B906 20202020 [538] ER PR (erosion.o) + + .octree_interpolate_many Fort 56666C28 C1BA756A [1853] LD PR OctreeBitPlus.f90(OCTREE/libOctree-q64.a[OctreeBitPlus.o]) + ** References Without Matching Definitions ** + Fort 5C3279C7 20202020 [591] ER PR (update_cloud_fields.o) + [391] ER PR (move_cloud.o) + [480] ER PR (move_surface.o) + [388] ER PR (interpolate_velocity_on_surface.o) + Fort 8D36C986 20202020 [473] ER PR (interpolate_ov_on_osolve.o) + + .octree_interpolate_many_derivative Fort 8F7B9C4E 70645C45 [2013] LD PR OctreeBitPlus.f90(OCTREE/libOctree-q64.a[OctreeBitPlus.o]) + ** References Without Matching Definitions ** + Fort 7989C8EA 20202020 [482] ER PR (move_surface.o) + + .mpi_wtime ** No Hash ** [IMPORT] -- PR {/usr/lpp/ppe.poe/lib/libmpi_r.a[mpifort64_r.o]} + ** References Without Matching Definitions ** + Fort D298D1B5 D298D1B5 [272] ER PR (toolbox.o) + Fort D8CCB3E4 D8CCB3E4 [501] ER PR (solve_with_pwgsmp.o) + [508] ER PR (solve_with_pwssmp.o) + + .show Fort 4955A997 0C4AC181 [738] LD PR OctreeBitPlus.f90(OCTREE/libOctree-q64.a[OctreeBitPlus.o]) + ** References Without Matching Definitions ** + Fort F55DBCFA 20202020 [566] ER PR (CASCADE/libcascade-q64.a[cascade.o]) + + .delaun Fort FCC34CBE F539DC8A [34] LD PR delaun.f(NN/libnn_f-q64.a[delaun.o]) + ** References Without Matching Definitions ** + Fort FCC34CBE 20202020 [516] ER PR (CASCADE/libcascade-q64.a[nn_remove.o]) + [121] ER PR (NN/libnn_f-q64.a[nn.o]) + Fort 61BF4539 20202020 [210] ER PR (CASCADE/libcascade-q64.a[check_mesh.o]) + Fort F6CF495C 20202020 [126] ER PR (CASCADE/libcascade-q64.a[find_neighbours.o]) + + .indexx Fort B1C0D0F4 BDD1E49E [2404] LD PR nn.f(NN/libnn_f-q64.a[nn.o]) + ** References Without Matching Definitions ** + Fort 410FD164 20202020 [122] ER PR (CASCADE/libcascade-q64.a[find_neighbours.o]) + Fort B1C0D0F4 20202020 [1847] ER PR (NN/libnn_f-q64.a[nn.o]) + [1637] ER PR (NN/libnn_f-q64.a[nn.o]) + [1451] ER PR (NN/libnn_f-q64.a[nn.o]) + + .fluvial_erosion Fort 05493BBF 38B5F94F [15] LD PR fluvial_erosion.f(CASCADE/libcascade-q64.a[fluvial_erosion.o]) + ** References Without Matching Definitions ** + Fort 6E2E3C6A 20202020 [546] ER PR (CASCADE/libcascade-q64.a[cascade.o]) + + .diffusion_erosion Fort 9BE4E4A5 B1655465 [12] LD PR diffusion_erosion.f(CASCADE/libcascade-q64.a[diffusion_erosion.o]) + ** References Without Matching Definitions ** + Fort A199E5A4 20202020 [548] ER PR (CASCADE/libcascade-q64.a[cascade.o]) + + .update_time_step Fort B1E8249C FE6F92E6 [14] LD PR update_time_step.f(CASCADE/libcascade-q64.a[update_time_step.o]) + ** References Without Matching Definitions ** + Fort 949623C9 20202020 [532] ER PR (CASCADE/libcascade-q64.a[cascade.o]) +MISMATCH: The return code is 8. +============ (end) type mismatches from the p690 ==================== -- GitLab