/usr/share/doc/mrmpi-doc/Interface_python.html is in mrmpi-doc 1.0~20140404-1.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 | <HTML>
<CENTER><A HREF = "http://mapreduce.sandia.gov">MapReduce-MPI WWW Site</A> - <A HREF = "Manual.html">MapReduce-MPI Documentation</A>
</CENTER>
<HR>
<H3>Python interface to the MapReduce-MPI Library
</H3>
<P>A Python wrapper for the MR-MPI library is included in the
distribution. The advantage of using Python is how concise the
language is, enabling rapid development and debugging of MapReduce
programs. The disadvantage is speed, since Python is slower than a
compiled language. Using the MR-MPI library from Python incurs two
additional overheads, discussed in the <A HREF = "Technical.html">Technical
Details</A> section.
</P>
<P>Before using MR-MPI from a Python script, you need to do two things.
You need to build MR-MPI as a dynamic shared library, so it can be
loaded by Python. And you need to tell Python how to find the library
and the Python wrapper file python/mrmpi.py. Both these steps are
discussed below. If you wish to run MR-MPI in parallel from Python,
you also need to extend your Python with MPI. This is also discussed
below.
</P>
<P>The Python wrapper for MR-MPI uses the amazing and magical (to me)
"ctypes" package in Python, which auto-generates the interface code
needed between Python and a set of C interface routines for a library.
Ctypes is part of standard Python for versions 2.5 and later. You can
check which version of Python you have installed, by simply typing
"python" at a shell prompt.
</P>
<P>The following sub-sections cover the rest of the Python discussion:
</P>
<UL><LI><A HREF = "#py_1">Building MR-MPI as a shared library</A>
<LI><A HREF = "#py_2">Installing the Python wrapper into Python</A>
<LI><A HREF = "#py_3">Extending Python with MPI to run in parallel</A>
<LI><A HREF = "#py_4">Testing the Python/MR-MPI interface</A>
<LI><A HREF = "#py_5">Using the MR-MPI library from Python</A>
</UL>
<HR>
<HR>
<A NAME = "py_1"></A><B>Building MR-MPI as a shared library</B>
<P>Instructions on how to build MR-MPI as a shared library are given in
the <A HREF = "Start.html">Start section</A>. A shared library is one that is
dynamically loadable, which is what Python requires. On Linux this is
a library file that ends in ".so", not ".a".
</P>
<P>From the src directory, type
</P>
<PRE>make -f Makefile.shlib foo
</PRE>
<P>where foo is the machine target name, such as linux or g++ or serial.
This should create the file libmrmpir_foo.so in the src directory, as
well as a soft link libmrmpi.so, which is what the Python wrapper will
load by default. Note that if you are building multiple machine
versions of the shared library, the soft link is always set to the
most recently built version.
</P>
<P>If this fails, see the <A HREF = "Start.html">Start section</A> for more details.
</P>
<HR>
<A NAME = "py_2"></A><B>Installing the Python wrapper into Python</B>
<P>For Python to invoke MR-MPI, there are 2 files it needs to know about:
</P>
<UL><LI>python/mrmpi.py
<LI>src/libmrmpi.so
</UL>
<P>Mrmpi.py is the Python wrapper on the MR-MPI library interface.
Libmrmpi.so is the shared MR-MPI library that Python loads, as
described above.
</P>
<P>You can insure Python can find these files in one of two ways:
</P>
<UL><LI>set two environment variables
<LI>run the python/install.py script
</UL>
<P>If you set the paths to these files as environment variables, you only
have to do it once. For the csh or tcsh shells, add something like
this to your ~/.cshrc file, one line for each of the two files:
</P>
<PRE>setenv PYTHONPATH $<I>PYTHONPATH</I>:/home/sjplimp/mrmpi/python
setenv LD_LIBRARY_PATH $<I>LD_LIBRARY_PATH</I>:/home/sjplimp/mrmpi/src
</PRE>
<P>If you use the python/install.py script, you need to invoke it every
time you rebuild MR-MPI (as a shared library) or make changes to the
python/mrmpi.py file.
</P>
<P>You can invoke install.py from the python directory as
</P>
<PRE>% python install.py [libdir] [pydir]
</PRE>
<P>The optional libdir is where to copy the MR-MPI shared library to; the
default is /usr/local/lib. The optional pydir is where to copy the
mrmpi.py file to; the default is the site-packages directory of the
version of Python that is running the install script.
</P>
<P>Note that libdir must be a location that is in your default
LD_LIBRARY_PATH, like /usr/local/lib or /usr/lib. And pydir must be a
location that Python looks in by default for imported modules, like
its site-packages dir. If you want to copy these files to
non-standard locations, such as within your own user space, you will
need to set your PYTHONPATH and LD_LIBRARY_PATH environment variables
accordingly, as above.
</P>
<P>If the install.py script does not allow you to copy files into system
directories, prefix the python command with "sudo". If you do this,
make sure that the Python that root runs is the same as the Python you
run. E.g. you may need to do something like
</P>
<PRE>% sudo /usr/local/bin/python install.py [libdir] [pydir]
</PRE>
<P>You can also invoke install.py from the make command in the src
directory as
</P>
<PRE>% make install-python
</PRE>
<P>In this mode you cannot append optional arguments. Again, you may
need to prefix this with "sudo". In this mode you cannot control
which Python is invoked by root.
</P>
<P>Note that if you want Python to be able to load different versions of
the MR-MPI shared library (see <A HREF = "#py_5">this section</A> below), you will
need to manually copy files like lmpmrmpi_g++.so into the appropriate
system directory. This is not needed if you set the LD_LIBRARY_PATH
environment variable as described above.
</P>
<HR>
<A NAME = "py_3"></A><H4>11.3 Extending Python with MPI to run in parallel
</H4>
<P>If you wish to run MR-MPI in parallel from Python, you need to extend
your Python with an interface to MPI. This also allows you to make
MPI calls directly from Python in your script, if you desire.
</P>
<P>There are several Python packages available that purport to wrap MPI
as a library and allow MPI functions to be called from Python.
</P>
<P>These include
</P>
<UL><LI><A HREF = "http://pympi.sourceforge.net/">pyMPI</A>
<LI><A HREF = "http://code.google.com/p/maroonmpi/">maroonmpi</A>
<LI><A HREF = "http://code.google.com/p/mpi4py/">mpi4py</A>
<LI><A HREF = "http://nbcr.sdsc.edu/forum/viewtopic.php?t=89&sid=c997fefc3933bd66204875b436940f16">myMPI</A>
<LI><A HREF = "http://code.google.com/p/pypar">Pypar</A>
</UL>
<P>All of these except pyMPI work by wrapping the MPI library and
exposing (some portion of) its interface to your Python script. This
means Python cannot be used interactively in parallel, since they do
not address the issue of interactive input to multiple instances of
Python running on different processors. The one exception is pyMPI,
which alters the Python interpreter to address this issue, and (I
believe) creates a new alternate executable (in place of "python"
itself) as a result.
</P>
<P>In principle any of these Python/MPI packages should work to invoke
MR-MPI in parallel and MPI calls themselves from a Python script which
is itself running in parallel. However, when I downloaded and looked
at a few of them, their documentation was incomplete and I had trouble
with their installation. It's not clear if some of the packages are
still being actively developed and supported.
</P>
<P>The one I recommend, since I have successfully used it with MR-MPI, is
Pypar. Pypar requires the ubiquitous <A HREF = "http://numpy.scipy.org">Numpy
package</A> be installed in your Python. After
launching python, type
</P>
<PRE>import numpy
</PRE>
<P>to see if it is installed. If not, here is how to install it (version
1.3.0b1 as of April 2009). Unpack the numpy tarball and from its
top-level directory, type
</P>
<PRE>python setup.py build
sudo python setup.py install
</PRE>
<P>The "sudo" is only needed if required to copy Numpy files into your
Python distribution's site-packages directory.
</P>
<P>To install Pypar (version pypar-2.1.4_94 as of Aug 2012), unpack it
and from its "source" directory, type
</P>
<PRE>python setup.py build
sudo python setup.py install
</PRE>
<P>Again, the "sudo" is only needed if required to copy Pypar files into
your Python distribution's site-packages directory.
</P>
<P>If you have successully installed Pypar, you should be able to run
Python and type
</P>
<PRE>import pypar
</PRE>
<P>without error. You should also be able to run python in parallel
on a simple test script
</P>
<PRE>% mpirun -np 4 python test.py
</PRE>
<P>where test.py contains the lines
</P>
<PRE>import pypar
print "Proc %d out of %d procs" % (pypar.rank(),pypar.size())
</PRE>
<P>and see one line of output for each processor you run on.
</P>
<P>IMPORTANT NOTE: To use Pypar and MR-MPI in parallel from Python, you
must insure both are using the same version of MPI. If you only have
one MPI installed on your system, this is not an issue, but it can be
if you have multiple MPIs. Your MR-MPI build is explicit about which
MPI it is using, since you specify the details in your lo-level
src/MAKE/Makefile.foo file. Pypar uses the "mpicc" command to find
information about the MPI it uses to build against. And it tries to
load "libmpi.so" from the LD_LIBRARY_PATH. This may or may not find
the MPI library that MR-MPI is using. If you have problems running
both Pypar and MR-MPI together, this is an issue you may need to
address, e.g. by moving other MPI installations so that Pypar finds
the right one.
</P>
<HR>
<A NAME = "py_4"></A><H4>11.4 Testing the Python-MR-MPI interface
</H4>
<P>To test if MR-MPI is callable from Python in serial, launch Python
interactively and type:
</P>
<PRE>>>> from mrmpi import mrmpi
>>> mr = mrmpi()
</PRE>
<P>If you get no errors, you're ready to use MR-MPI from Python. If the
2nd command fails, the most common error to see is
</P>
<PRE>OSError: Could not load MR-MPI dynamic library
</PRE>
<P>which means Python was unable to load the MR-MPI shared library. This
typically occurs if the system can't find the MR-MPI shared library,
or if something about the library is incompatible with your Python.
The error message should give you an indication of what went wrong.
</P>
<P>You can also test the load directly in Python as follows, without
first importing from the mrmpi.py file:
</P>
<PRE>>>> from ctypes import CDLL
>>> CDLL("libmrmpi.so")
</PRE>
<P>If an error occurs, carefully go thru the steps in <A HREF = "Start.html">Start</A>
and above about building a shared library and about insuring Python
can find the necessary two files it needs.
</P>
<H5><B>Test MR-MPI and Python in parallel:</B>
</H5>
<P>To run MR-MPI in parallel, assuming you have installed the
<A HREF = "http://datamining.anu.edu.au/~ole/pypar">Pypar</A> package as discussed
above, create a test.py file containing these lines:
</P>
<PRE>import pypar
from mrmpi import mrmpi
mr = mrmpi()
print "Proc %d out of %d procs has" % (pypar.rank(),pypar.size()),mr
pypar.finalize()
</PRE>
<P>You can then run it in parallel as:
</P>
<PRE>% mpirun -np 4 python test.py
</PRE>
<P>Note that if you leave out the 3 lines from test.py that specify Pypar
commands you will instantiate and run MR-MPI independently on each of
the P processors specified in the mpirun command. In this case you
should get 4 sets of output, each showing that a MR-MPI was
initialized on a single processor, instead of one set of output
showing MR-MPI was initialized on 4 processors. If the 1-processor
outputs occur, it means that Pypar is not working correctly.
</P>
<P>Also note that once you import the PyPar module, Pypar initializes MPI
for you, and you can use MPI calls directly in your Python script, as
described in the Pypar documentation. The last line of your Python
script should be pypar.finalize(), to insure MPI is shut down
correctly.
</P>
<H5><B>Running Python scripts:</B>
</H5>
<P>Note that any Python script (not just for MR-MPI) can be invoked in
one of several ways:
</P>
<PRE>% python foo.script
% python -i foo.script
% foo.script
</PRE>
<P>The last command requires that the first line of the script be
something like this:
</P>
<PRE>#!/usr/local/bin/python
#!/usr/local/bin/python -i
</PRE>
<P>where the path points to where you have Python installed, and that you
have made the script file executable:
</P>
<PRE>% chmod +x foo.script
</PRE>
<P>Without the "-i" flag, Python will exit when the script finishes.
With the "-i" flag, you will be left in the Python interpreter when
the script finishes, so you can type subsequent commands. As
mentioned above, you can only run Python interactively when running
Python on a single processor, not in parallel.
</P>
<HR>
<HR>
<A NAME = "py_5"></A><B>Using the MR-MPI library from Python</B>
<P>The Python interface to MR-MPI consists of a Python "mrmpi" module,
the source code for which is in python/mrmpi.py, which creates a
"mrmpi" object, with a set of methods that can be invoked on that
object. The sample Python code below assumes you have first imported
the "mrmpi" module in your Python script, as follows:
</P>
<PRE>from mrmpi import mrmpi
</PRE>
<P>These are the methods defined by the mrmpi module. Some of them take
callback functions as arguments, e.g. <A HREF = "map.html">map()</A> and
<A HREF = "reduce.html">reduce()</A>. These are Python functions you define
elsewhere in your script. When you register "keys" and "values" with
the library, they can be simple quantities like strings or ints or
floats. Or they can be Python data structures like lists or tuples.
</P>
<P>These are the class methods defined by the mrmpi module. Their
functionality and arguments are described in the <A HREF = "Interface_c++.html">C++ interface
section</A>.
</P>
<PRE>mr = mrmpi() # create a MR-MPI object using the default libmrmpi.so library
mr = mrmpi(mpi_comm) # ditto, but with a specified MPI communicator
mr = mrmpi(0.0) # ditto, and the library will finalize MPI
mr = mrmpi(None,"g++") # create a MR-MPI object using the libmrmpi_g++.so library
mr = mrmpi(mpi_comm,"g++") # ditto, but with a specified MPI communicator
mr = mrmpi(0.0,"g++") # ditto, and the library will finalize MPI
</PRE>
<PRE>mr2 = mr.copy() # copy mr to create mr2
</PRE>
<PRE>mr.destroy() # destroy an mrmpi object, freeing its memory
# this will also occur if Python garbage collects
</PRE>
<PRE>mr.add(mr2)
mr.aggregate()
mr.aggregate(myhash) # if specified, myhash is a hash function
# called back from the library as myhash(key)
# myhash() should return an integer (a proc ID)
mr.broadcast(root)
mr.clone()
mr.close()
mr.collapse(key)
mr.collate()
mr.collate(myhash) # if specified, myhash is the same function
# as for aggregate()
</PRE>
<PRE>mr.compress(mycompress) # mycompress is a function called back from the
# library as mycompress(key,mvalue,mr,ptr)
# where mvalue is a list of values associated
# with the key, mr is the MapReduce object,
# and you (optionally) provide ptr (see below)
# your mycompress function should typically
# make calls like mr->add(key,value)
mr.compress(mycompress,ptr) # if specified, ptr is any Python datum
# and is passed back to your mycompress()
# if not specified, ptr = None
</PRE>
<PRE>mr.convert()
mr.gather(nprocs)
</PRE>
<PRE>mr.map(nmap,mymap) # mymap is a function called back from the
# library as mymap(itask,mr,ptr)
# where mr is the MapReduce object,
# and you (optionally) provide ptr (see below)
# your mymap function should typically
# make calls like mr->add(key,value)
mr.map(nmap,mymap,ptr) # if specified, ptr is any Python datum
# and is passed back to your mymap()
# if not specified, ptr = None
mr.map(nmap,mymap,ptr,addflag) # if addflag is specfied as a non-zero int,
# new key/value pairs will be added to the
# existing key/value pairs
</PRE>
<PRE>mr.map_file(files,self,recurse,readfile,mymap)
# files is a list of filenames and dirnames
# mymap is a function called back from the
# library as mymap(itask,filename,mr,ptr)
# as above, ptr and addflag are optional args
mr.map_file_char(nmap,files,recurse,readfile,sepchar,delta,mymap)
# files is a list of filenames and dirnames
# mymap is a function called back from the
# library as mymap(itask,str,mr,ptr)
# as above, ptr and addflag are optional args
mr.map_file_str(nmap,files,recurse,readfile,sepstr,delta,mymap)
# files is a list of filenames and dirnames
# mymap is a function called back from the
# library as mymap(itask,str,mr,ptr)
# as above, ptr and addflag are optional args
mr.map_mr(mr2,mymap) # pass key/values in mr2 to mymap
# mymap is a function called back from the
# library as mymap(itask,key,value,mr,ptr)
# as above, ptr and addflag are optional args
</PRE>
<PRE>mr.open()
mr.open(addflag)
mr.print_screen(proc,nstride,kflag,vflag)
mr.print_file(file,fflag,proc,nstride,kflag,vflag)
</PRE>
<PRE>mr.reduce(myreduce) # myreduce is a function called back from the
# library as myreduce(key,mvalue,mr,ptr)
# where mvalue is a list of values associated
# with the key, mr is the MapReduce object,
# and you (optionally) provide ptr (see below)
# your myreduce function should typically
# make calls like mr->add(key,value)
mr.reduce(myreduce,ptr) # if specified, ptr is any Python datum
# and is passed back to your myreduce()
# if not specified, ptr = None
</PRE>
<PRE>mr.scan_kv(myscan) # myscan is a function called back from the
# library as myscan(key,value,ptr)
# for each key/value pair
# and you (optionally) provide ptr (see below)
mr.scan_kv(myscan,ptr) # if specified, ptr is any Python datum
# and is passed back to your myreduce()
# if not specified, ptr = None
</PRE>
<PRE>mr.scan_kmv(myscan) # myscan is a function called back from the
# library as myreduce(key,mvalue,ptr)
# where mvalue is a list of values associated
# with the key,
# and you (optionally) provide ptr (see below)
mr.scan_kmv(myscan,ptr) # if specified, ptr is any Python datum
# and is passed back to your myreduce()
# if not specified, ptr = None
</PRE>
<PRE>mr.scrunch(nprocs,key)
mr.sort_keys(mycompare)
mr.sort_values(mycompare)
mr.sort_multivalues(mycompare) # compare is a function called back from the
# library as mycompare(a,b) where
# a and b are two keys or two values
# your mycompare() should compare them
# and return a -1, 0, or 1
# if a < b, or a == b, or a > b
mr.sort_keys_flag(flag)
mr.sort_values_flag(flag)
mr.sort_multivalues_flag(flag)
</PRE>
<PRE>mr.kv_stats(level)
mr.kmv_stats(level)
</PRE>
<PRE>mr.mapstyle(value) # set mapstyle to value
mr.all2all(value) # set all2all to value
mr.verbosity(value) # set verbosity to value
mr.timer(value) # set timer to value
mr.memsize(value) # set memsize to value
mr.minpage(value) # set minpage to value
mr.maxpage(value) # set maxpage to value
</PRE>
<PRE>mr.add(key,value) # add single key and value
mr.add_multi_static(keys,values) # add list of keys and values
# all keys are assumed to be same length
# all values are assumed to be same length
mr.add_multi_dynamic(keys,values) # add list of keys and values
# each key may be different length
# each value may be different length
</PRE>
<HR>
<P>Note that you can create multiple MR-MPI objects in your Python
script, and coordinate the data stored in each and moved between them,
just as can from a C or C++ program.
</P>
<P>The class methods above correspond one-to-one with the C++ methods
described <A HREF = "Interface_c++.html">here</A>, except that for C++ methods with
multiple interfaces (e.g. <A HREF = "map.html">map()</A>), there are multiple Python
methods with slightly different names, similar to the <A HREF = "Interface_c.html">C
interface</A>.
</P>
<P>There is no set function the the <I>keyalign</I> and <I>valuealign</I>
<A HREF = "settings.html">settings</A>. These are hard-wired to 1 for
the Python interface, since no other values make sense, due to
the pickling/unpickling that is performed in key and value data.
</P>
<P>See the Python scripts in the examples directory for
<A HREF = "Examples.html">examples</A> of how these calls are made from a Python
program. They are conceptually identical to the C++ and C programs in
the same directory.
</P>
</HTML>
|