Title: Porting douar Author: Douglas Guptill Date: 2009-06-04 Porting to p690 --------------- - frustrating, time-costly - douar behaviour varies with + compiler (xlf 8.1, xlf 10.1) + compiler options + changes in noctreemax - I wonder if the execution time and/or results are being affected by the type mismatches at link time; see below for more. - on the p690, using 16 processors, douar is running at 1/300 of the speed of grace. (David Whipp ran the input.txt file on grace, and sent me the stdout file.) - no run yet has passed the point where douar calls wsmp. ======== (start) output from grace =================================== start of non-linear iteratio 1 168.0293 ----------------------------------- build system 168.0294 0.0001 0.0001 nelem per proc (min/max) 27059 42683 viscosity range 1.00000E-04 3.59904E+02 viscosity capped in 212992 wsmp solve 215.1212 47.0919 47.0918 ======== (end) output from grace ===================================== ======== (start) output from p690 (np=16) ============================ start of non-linear iteratio 1 57001.9888 ----------------------------------- build system 57001.9889 0.0001 0.0001 nelem per proc (min/max) 13522 28627 viscosity range 1.00000E-04 3.59904E+02 viscosity capped in 212992 wsmp solve 61747.2775 4745.2886 4745.2886 ======== (end) output from p690 (np=16) ============================== Porting to mahone.ace-net.ca ---------------------------- For more about mahone, its hardware and software, see here: https://wiki.ace-net.ca/index.php/Main_Page - PGI compilers - quick and easy - runs about 10% faster (using 4 processes on the head node) than grace. - no run yet has passed the point where it calls wsmp. - no successful run yet; MPI problems, as yet un-diagnosed: [clhead:03784] *** An error occurred in MPI_Send [clhead:03784] *** on communicator MPI_COMM_WORLD [clhead:03784] *** MPI_ERR_COUNT: invalid count argument [clhead:03784] *** MPI_ERRORS_ARE_FATAL (goodbye) ======== (start) output from mahone (np=4) =========================== start of non-linear iteratio 1 137.5392 ----------------------------------- build system 137.5393 0.0000 0.0000 nelem per proc (min/max) 54619 68563 viscosity range 1.00000E-04 3.59904E+02 viscosity capped in 196608 wsmp solve 179.2639 41.7246 41.7246 ======== (end) output from mahone (np=4) ============================= Type mismatches --------------- The xlf Fortran compiler on the p690 will, if asked, check for mismatches between calling sequences and subroutine definitions. What do I mean by a type mismatch? An example: parameter #3 in the calling list is a scalar, parameter #3 in the dummy argument list of the subroutine code is an array. The xlf compiler found type mismatches in douar. Some I could correct easily; others will require a much closer examination of the code to determine a fix that doesn't break douar. There are type mismatches in the calls to these routines: nn2d_setup nn2d octree_interpolate_many octree_interpolate_many_derivative show delaun indexx fluvial_erosion diffusion_erosion update_time_step There are also mismatches for mpi_ routines; this is common on the p690; I believe they can be ignored. See below for the complete list. The C code in NN ---------------- (this section is for detail fanatics) The files stack.c, stackpair.c and volume.c had copies with a .cc extension: stack.c and stack.cc stackpair.c and stackpair.cc volume.c and volume.cc The actual code in each pair of files (stack.c and stack.cc) was identical, except that in one copy there was an underscore at the end of some routine names. The reason for these duplicates appears to be the variation in how Fortran compilers link to non-Fortran routines; some add an underscore to the name of non-Fortran routine, some don't. This results in two possibilities at link time: a reference to, for example, "stackinit" or to "stackinit_" I believe that the duplicated code is a maintenance problem and a potential source of nasty bugs. So I removed the duplicate files, and modified the remaining copy by adding a stub for each routine which calls the other. For example, the code for "stackinit" now looks like the snippet below. The C routines which were un-typed caused link failures on the p690. The cure for this was to add a type (void) for those entry points. ================= (start) stackinit ====================== /* prototypes */ void stackinit_(); void stackinit(); /* provide an entry point with "_" at the end */ void stackinit_() {stackinit();} /* the code */ void stackinit() { head = (struct node *) malloc(sizeof *head); z = (struct node *) malloc(sizeof *z); head->next = z; head->key=0; z->next = z; z->key = 0; } ================= (end) stackinit ====================== ============ (start) type mismatches from the p690 ==================== (ld): mismatch ld: 0711-189 ERROR: Type mismatches were detected. The following symbols are in error: Symbol Hash Inpndx TY CL Source-File(Object-File) OR Import-File{Shared-object} ------------------------- ---------------------- ------- -- -- ------------------------------------------------------ .mpi_reduce ** No Hash ** [IMPORT] -- PR {/usr/lpp/ppe.poe/lib/libmpi_r.a[mpifort64_r.o]} ** References ** [249] ER PR (/home/beaumnt1/software/wsmp/lib/Power4/libpwsmp64.a[pwssmp.o]) ** References Without Matching Definitions ** Fort 1C031446 20202020 [499] ER PR (do_leaf_measurements.o) [497] ER PR (compute_divergence.o) Fort 883D2E6F 883D2E6F [520] ER PR (build_system_wsmp.o) .mpi_allreduce ** No Hash ** [IMPORT] -- PR {/usr/lpp/ppe.poe/lib/libmpi_r.a[mpifort64_r.o]} ** References ** [60] ER PR (/home/beaumnt1/software/wsmp/lib/Power4/libpwsmp64.a[pwgcomm.o]) [26] ER PR (/home/beaumnt1/software/wsmp/lib/Power4/libpwsmp64.a[pwgcomm.o]) [496] ER PR (/home/beaumnt1/software/wsmp/lib/Power4/libpwsmp64.a[pwgsmp.o]) [167] ER PR (/home/beaumnt1/software/wsmp/lib/Power4/libpwsmp64.a[pwgsmp.o]) [1437] ER PR (/home/beaumnt1/software/wsmp/lib/Power4/libpwsmp64.a[parsymb.o]) ** References Without Matching Definitions ** Fort A0C85910 20202020 [577] ER PR (update_cloud_fields.o) [393] ER PR (move_cloud.o) [490] ER PR (move_surface.o) [390] ER PR (interpolate_velocity_on_surface.o) [477] ER PR (interpolate_ov_on_osolve.o) [538] ER PR (improve_osolve.o) [540] ER PR (erosion.o) [476] ER PR (compute_pressure.o) [528] ER PR (build_system_wsmp.o) Fort 7300E28E 20202020 [413] ER PR (refine_surface.o) [402] ER PR (check_delaunay.o) .mpi_bcast ** No Hash ** [IMPORT] -- PR {/usr/lpp/ppe.poe/lib/libmpi_r.a[mpifort64_r.o]} ** References ** [376] ER PR (/home/beaumnt1/software/wsmp/lib/Power4/libpwsmp64.a[pwgsmp.o]) [295] ER PR (/home/beaumnt1/software/wsmp/lib/Power4/libpwsmp64.a[pwssmp.o]) [24] ER PR (/home/beaumnt1/software/wsmp/lib/Power4/libpwsmp64.a[porder.o]) [244] ER PR (/home/beaumnt1/software/wsmp/lib/Power4/libpwsmp64.a[plda.o]) ** References Without Matching Definitions ** Fort D8DDC68E 20202020 [519] ER PR (solve_with_pwgsmp.o) [528] ER PR (solve_with_pwssmp.o) [524] ER PR (erosion.o) Fort 4955A997 4955A997 [354] ER PR (read_input_file.o) [272] ER PR (read_controlling_parameters.o) [450] ER PR (create_surfaces.o) .nn2d_setup Fort 415F1726 DC32B44C [47] LD PR nn.f(NN/libnn_f-q64.a[nn.o]) ** References Without Matching Definitions ** Fort 5064C7F3 20202020 [536] ER PR (erosion.o) [466] ER PR (create_surfaces.o) .nn2d Fort DAF314E0 477683E6 [182] LD PR nn.f(NN/libnn_f-q64.a[nn.o]) ** References Without Matching Definitions ** Fort 14D0B906 20202020 [538] ER PR (erosion.o) .octree_interpolate_many Fort 56666C28 C1BA756A [1853] LD PR OctreeBitPlus.f90(OCTREE/libOctree-q64.a[OctreeBitPlus.o]) ** References Without Matching Definitions ** Fort 5C3279C7 20202020 [591] ER PR (update_cloud_fields.o) [391] ER PR (move_cloud.o) [480] ER PR (move_surface.o) [388] ER PR (interpolate_velocity_on_surface.o) Fort 8D36C986 20202020 [473] ER PR (interpolate_ov_on_osolve.o) .octree_interpolate_many_derivative Fort 8F7B9C4E 70645C45 [2013] LD PR OctreeBitPlus.f90(OCTREE/libOctree-q64.a[OctreeBitPlus.o]) ** References Without Matching Definitions ** Fort 7989C8EA 20202020 [482] ER PR (move_surface.o) .mpi_wtime ** No Hash ** [IMPORT] -- PR {/usr/lpp/ppe.poe/lib/libmpi_r.a[mpifort64_r.o]} ** References Without Matching Definitions ** Fort D298D1B5 D298D1B5 [272] ER PR (toolbox.o) Fort D8CCB3E4 D8CCB3E4 [501] ER PR (solve_with_pwgsmp.o) [508] ER PR (solve_with_pwssmp.o) .show Fort 4955A997 0C4AC181 [738] LD PR OctreeBitPlus.f90(OCTREE/libOctree-q64.a[OctreeBitPlus.o]) ** References Without Matching Definitions ** Fort F55DBCFA 20202020 [566] ER PR (CASCADE/libcascade-q64.a[cascade.o]) .delaun Fort FCC34CBE F539DC8A [34] LD PR delaun.f(NN/libnn_f-q64.a[delaun.o]) ** References Without Matching Definitions ** Fort FCC34CBE 20202020 [516] ER PR (CASCADE/libcascade-q64.a[nn_remove.o]) [121] ER PR (NN/libnn_f-q64.a[nn.o]) Fort 61BF4539 20202020 [210] ER PR (CASCADE/libcascade-q64.a[check_mesh.o]) Fort F6CF495C 20202020 [126] ER PR (CASCADE/libcascade-q64.a[find_neighbours.o]) .indexx Fort B1C0D0F4 BDD1E49E [2404] LD PR nn.f(NN/libnn_f-q64.a[nn.o]) ** References Without Matching Definitions ** Fort 410FD164 20202020 [122] ER PR (CASCADE/libcascade-q64.a[find_neighbours.o]) Fort B1C0D0F4 20202020 [1847] ER PR (NN/libnn_f-q64.a[nn.o]) [1637] ER PR (NN/libnn_f-q64.a[nn.o]) [1451] ER PR (NN/libnn_f-q64.a[nn.o]) .fluvial_erosion Fort 05493BBF 38B5F94F [15] LD PR fluvial_erosion.f(CASCADE/libcascade-q64.a[fluvial_erosion.o]) ** References Without Matching Definitions ** Fort 6E2E3C6A 20202020 [546] ER PR (CASCADE/libcascade-q64.a[cascade.o]) .diffusion_erosion Fort 9BE4E4A5 B1655465 [12] LD PR diffusion_erosion.f(CASCADE/libcascade-q64.a[diffusion_erosion.o]) ** References Without Matching Definitions ** Fort A199E5A4 20202020 [548] ER PR (CASCADE/libcascade-q64.a[cascade.o]) .update_time_step Fort B1E8249C FE6F92E6 [14] LD PR update_time_step.f(CASCADE/libcascade-q64.a[update_time_step.o]) ** References Without Matching Definitions ** Fort 949623C9 20202020 [532] ER PR (CASCADE/libcascade-q64.a[cascade.o]) MISMATCH: The return code is 8. ============ (end) type mismatches from the p690 ====================