From 56adf5d91a3e106096a612a6340ba5f527be0d16 Mon Sep 17 00:00:00 2001
From: Douglas Guptill <douglas.guptill@dal.ca>
Date: Mon, 8 Jun 2009 20:56:29 +0000
Subject: [PATCH] first report

---
 doc/report-01 | 256 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 256 insertions(+)
 create mode 100644 doc/report-01

diff --git a/doc/report-01 b/doc/report-01
new file mode 100644
index 00000000..14161293
--- /dev/null
+++ b/doc/report-01
@@ -0,0 +1,256 @@
+Title: Porting douar
+Author: Douglas Guptill
+Date: 2009-06-04
+
+
+Porting to p690
+---------------
+  - frustrating, time-costly
+  - douar behaviour varies with 
+     + compiler (xlf 8.1, xlf 10.1) 
+     + compiler options
+     + changes in noctreemax
+  - I wonder if the execution time and/or results are being affected
+    by the type mismatches at link time; see below for more.
+  - on the p690, using 16 processors, douar is running at 1/300 of the
+    speed of grace. (David Whipp ran the input.txt file on grace, and
+    sent me the stdout file.)
+  - no run yet has passed the point where douar calls wsmp.
+
+
+======== (start) output from grace ===================================
+ start of non-linear iteratio      1    168.0293
+ -----------------------------------
+                        build system    168.0294      0.0001      0.0001
+                                                                        nelem per proc (min/max)       27059       42683
+                                                                        viscosity range   1.00000E-04  3.59904E+02
+                                                                        viscosity capped in    212992
+                          wsmp solve    215.1212     47.0919     47.0918
+======== (end) output from grace =====================================
+
+
+======== (start) output from p690 (np=16) ============================
+ start of non-linear iteratio      1  57001.9888
+ -----------------------------------
+                        build system  57001.9889      0.0001      0.0001
+                                                                        nelem per proc (min/max)       13522       28627
+                                                                        viscosity range   1.00000E-04  3.59904E+02
+                                                                        viscosity capped in    212992
+                          wsmp solve  61747.2775   4745.2886   4745.2886
+======== (end) output from p690 (np=16) ==============================
+
+
+
+
+Porting to mahone.ace-net.ca
+----------------------------
+
+For more about mahone, its hardware and software, see here:
+  https://wiki.ace-net.ca/index.php/Main_Page
+
+  - PGI compilers
+  - quick and easy
+  - runs about 10% faster (using 4 processes on the head node) than grace.
+  - no run yet has passed the point where it calls wsmp.
+  - no successful run yet; MPI problems, as yet un-diagnosed:
+    [clhead:03784] *** An error occurred in MPI_Send
+    [clhead:03784] *** on communicator MPI_COMM_WORLD
+    [clhead:03784] *** MPI_ERR_COUNT: invalid count argument
+    [clhead:03784] *** MPI_ERRORS_ARE_FATAL (goodbye)
+
+
+======== (start) output from mahone (np=4) ===========================
+ start of non-linear iteratio      1    137.5392
+ -----------------------------------
+                        build system    137.5393      0.0000      0.0000
+                                                                        nelem per proc (min/max)       54619       68563
+                                                                        viscosity range   1.00000E-04  3.59904E+02
+                                                                        viscosity capped in    196608
+                          wsmp solve    179.2639     41.7246     41.7246
+======== (end) output from mahone (np=4) =============================
+
+
+
+Type mismatches
+---------------
+
+The xlf Fortran compiler on the p690 will, if asked, check for
+mismatches between calling sequences and subroutine definitions.  
+
+What do I mean by a type mismatch?  An example: parameter #3 in the
+calling list is a scalar, parameter #3 in the dummy argument list of
+the subroutine code is an array.  
+
+The xlf compiler found type mismatches in douar.  Some I could correct
+easily; others will require a much closer examination of the code to
+determine a fix that doesn't break douar.  There are type mismatches
+in the calls to these routines:
+  nn2d_setup
+  nn2d
+  octree_interpolate_many
+  octree_interpolate_many_derivative
+  show
+  delaun
+  indexx
+  fluvial_erosion
+  diffusion_erosion
+  update_time_step
+
+There are also mismatches for mpi_ routines; this is common on the
+p690; I believe they can be ignored.  See below for the complete list.
+
+
+The C code in NN
+----------------
+(this section is for detail fanatics)
+
+The files stack.c, stackpair.c and volume.c had copies with a .cc
+extension:
+  stack.c      and stack.cc
+  stackpair.c  and stackpair.cc
+  volume.c     and volume.cc
+
+The actual code in each pair of files (stack.c and stack.cc) was
+identical, except that in one copy there was an underscore at the end
+of some routine names.  
+
+The reason for these duplicates appears to be the variation in how
+Fortran compilers link to non-Fortran routines; some add an underscore
+to the name of non-Fortran routine, some don't.  This results in two
+possibilities at link time: a reference to, for example, 
+  "stackinit" 
+or to 
+  "stackinit_"
+
+I believe that the duplicated code is a maintenance problem and a
+potential source of nasty bugs.  So I removed the duplicate files, and
+modified the remaining copy by adding a stub for each routine which
+calls the other.  For example, the code for "stackinit" now looks like
+the snippet below.
+
+The C routines which were un-typed caused link failures on the p690.
+The cure for this was to add a type (void) for those entry points.
+
+================= (start) stackinit ======================
+/* prototypes */
+        void stackinit_();
+        void stackinit();
+
+/* provide an entry point with "_" at the end */
+        void stackinit_() {stackinit();}
+
+/* the code */
+        void stackinit() 
+	   {
+	     head = (struct node *) malloc(sizeof *head);
+	     z = (struct node *) malloc(sizeof *z);
+	     head->next = z; head->key=0;
+	     z->next = z;
+	     z->key = 0;
+	   }
+================= (end) stackinit ======================
+
+
+============ (start) type mismatches from the p690 ====================
+(ld): mismatch
+ld: 0711-189 ERROR: Type mismatches were detected.
+	The following symbols are in error:
+ Symbol                    Hash                   Inpndx  TY CL Source-File(Object-File) OR Import-File{Shared-object}
+ ------------------------- ---------------------- ------- -- -- ------------------------------------------------------
+ .mpi_reduce               ** No Hash **          [IMPORT] -- PR {/usr/lpp/ppe.poe/lib/libmpi_r.a[mpifort64_r.o]}
+       ** References **                           [249]   ER PR (/home/beaumnt1/software/wsmp/lib/Power4/libpwsmp64.a[pwssmp.o])
+       ** References Without Matching Definitions **
+                           Fort 1C031446 20202020 [499]   ER PR (do_leaf_measurements.o)
+                                                  [497]   ER PR (compute_divergence.o)
+                           Fort 883D2E6F 883D2E6F [520]   ER PR (build_system_wsmp.o)
+
+ .mpi_allreduce            ** No Hash **          [IMPORT] -- PR {/usr/lpp/ppe.poe/lib/libmpi_r.a[mpifort64_r.o]}
+       ** References **                           [60]    ER PR (/home/beaumnt1/software/wsmp/lib/Power4/libpwsmp64.a[pwgcomm.o])
+                                                  [26]    ER PR (/home/beaumnt1/software/wsmp/lib/Power4/libpwsmp64.a[pwgcomm.o])
+                                                  [496]   ER PR (/home/beaumnt1/software/wsmp/lib/Power4/libpwsmp64.a[pwgsmp.o])
+                                                  [167]   ER PR (/home/beaumnt1/software/wsmp/lib/Power4/libpwsmp64.a[pwgsmp.o])
+                                                  [1437]  ER PR (/home/beaumnt1/software/wsmp/lib/Power4/libpwsmp64.a[parsymb.o])
+       ** References Without Matching Definitions **
+                           Fort A0C85910 20202020 [577]   ER PR (update_cloud_fields.o)
+                                                  [393]   ER PR (move_cloud.o)
+                                                  [490]   ER PR (move_surface.o)
+                                                  [390]   ER PR (interpolate_velocity_on_surface.o)
+                                                  [477]   ER PR (interpolate_ov_on_osolve.o)
+                                                  [538]   ER PR (improve_osolve.o)
+                                                  [540]   ER PR (erosion.o)
+                                                  [476]   ER PR (compute_pressure.o)
+                                                  [528]   ER PR (build_system_wsmp.o)
+                           Fort 7300E28E 20202020 [413]   ER PR (refine_surface.o)
+                                                  [402]   ER PR (check_delaunay.o)
+
+ .mpi_bcast                ** No Hash **          [IMPORT] -- PR {/usr/lpp/ppe.poe/lib/libmpi_r.a[mpifort64_r.o]}
+       ** References **                           [376]   ER PR (/home/beaumnt1/software/wsmp/lib/Power4/libpwsmp64.a[pwgsmp.o])
+                                                  [295]   ER PR (/home/beaumnt1/software/wsmp/lib/Power4/libpwsmp64.a[pwssmp.o])
+                                                  [24]    ER PR (/home/beaumnt1/software/wsmp/lib/Power4/libpwsmp64.a[porder.o])
+                                                  [244]   ER PR (/home/beaumnt1/software/wsmp/lib/Power4/libpwsmp64.a[plda.o])
+       ** References Without Matching Definitions **
+                           Fort D8DDC68E 20202020 [519]   ER PR (solve_with_pwgsmp.o)
+                                                  [528]   ER PR (solve_with_pwssmp.o)
+                                                  [524]   ER PR (erosion.o)
+                           Fort 4955A997 4955A997 [354]   ER PR (read_input_file.o)
+                                                  [272]   ER PR (read_controlling_parameters.o)
+                                                  [450]   ER PR (create_surfaces.o)
+
+ .nn2d_setup               Fort 415F1726 DC32B44C [47]    LD PR nn.f(NN/libnn_f-q64.a[nn.o])
+       ** References Without Matching Definitions **
+                           Fort 5064C7F3 20202020 [536]   ER PR (erosion.o)
+                                                  [466]   ER PR (create_surfaces.o)
+
+ .nn2d                     Fort DAF314E0 477683E6 [182]   LD PR nn.f(NN/libnn_f-q64.a[nn.o])
+       ** References Without Matching Definitions **
+                           Fort 14D0B906 20202020 [538]   ER PR (erosion.o)
+
+ .octree_interpolate_many  Fort 56666C28 C1BA756A [1853]  LD PR OctreeBitPlus.f90(OCTREE/libOctree-q64.a[OctreeBitPlus.o])
+       ** References Without Matching Definitions **
+                           Fort 5C3279C7 20202020 [591]   ER PR (update_cloud_fields.o)
+                                                  [391]   ER PR (move_cloud.o)
+                                                  [480]   ER PR (move_surface.o)
+                                                  [388]   ER PR (interpolate_velocity_on_surface.o)
+                           Fort 8D36C986 20202020 [473]   ER PR (interpolate_ov_on_osolve.o)
+
+ .octree_interpolate_many_derivative Fort 8F7B9C4E 70645C45 [2013]  LD PR OctreeBitPlus.f90(OCTREE/libOctree-q64.a[OctreeBitPlus.o])
+       ** References Without Matching Definitions **
+                           Fort 7989C8EA 20202020 [482]   ER PR (move_surface.o)
+
+ .mpi_wtime                ** No Hash **          [IMPORT] -- PR {/usr/lpp/ppe.poe/lib/libmpi_r.a[mpifort64_r.o]}
+       ** References Without Matching Definitions **
+                           Fort D298D1B5 D298D1B5 [272]   ER PR (toolbox.o)
+                           Fort D8CCB3E4 D8CCB3E4 [501]   ER PR (solve_with_pwgsmp.o)
+                                                  [508]   ER PR (solve_with_pwssmp.o)
+
+ .show                     Fort 4955A997 0C4AC181 [738]   LD PR OctreeBitPlus.f90(OCTREE/libOctree-q64.a[OctreeBitPlus.o])
+       ** References Without Matching Definitions **
+                           Fort F55DBCFA 20202020 [566]   ER PR (CASCADE/libcascade-q64.a[cascade.o])
+
+ .delaun                   Fort FCC34CBE F539DC8A [34]    LD PR delaun.f(NN/libnn_f-q64.a[delaun.o])
+       ** References Without Matching Definitions **
+                           Fort FCC34CBE 20202020 [516]   ER PR (CASCADE/libcascade-q64.a[nn_remove.o])
+                                                  [121]   ER PR (NN/libnn_f-q64.a[nn.o])
+                           Fort 61BF4539 20202020 [210]   ER PR (CASCADE/libcascade-q64.a[check_mesh.o])
+                           Fort F6CF495C 20202020 [126]   ER PR (CASCADE/libcascade-q64.a[find_neighbours.o])
+
+ .indexx                   Fort B1C0D0F4 BDD1E49E [2404]  LD PR nn.f(NN/libnn_f-q64.a[nn.o])
+       ** References Without Matching Definitions **
+                           Fort 410FD164 20202020 [122]   ER PR (CASCADE/libcascade-q64.a[find_neighbours.o])
+                           Fort B1C0D0F4 20202020 [1847]  ER PR (NN/libnn_f-q64.a[nn.o])
+                                                  [1637]  ER PR (NN/libnn_f-q64.a[nn.o])
+                                                  [1451]  ER PR (NN/libnn_f-q64.a[nn.o])
+
+ .fluvial_erosion          Fort 05493BBF 38B5F94F [15]    LD PR fluvial_erosion.f(CASCADE/libcascade-q64.a[fluvial_erosion.o])
+       ** References Without Matching Definitions **
+                           Fort 6E2E3C6A 20202020 [546]   ER PR (CASCADE/libcascade-q64.a[cascade.o])
+
+ .diffusion_erosion        Fort 9BE4E4A5 B1655465 [12]    LD PR diffusion_erosion.f(CASCADE/libcascade-q64.a[diffusion_erosion.o])
+       ** References Without Matching Definitions **
+                           Fort A199E5A4 20202020 [548]   ER PR (CASCADE/libcascade-q64.a[cascade.o])
+
+ .update_time_step         Fort B1E8249C FE6F92E6 [14]    LD PR update_time_step.f(CASCADE/libcascade-q64.a[update_time_step.o])
+       ** References Without Matching Definitions **
+                           Fort 949623C9 20202020 [532]   ER PR (CASCADE/libcascade-q64.a[cascade.o])
+MISMATCH: The return code is 8.
+============ (end) type mismatches from the p690 ====================
-- 
GitLab