Module Parmap
Module Parmap
: efficient parallel map, fold and mapfold on lists and arrays on multicores.
All the primitives allow to control the granularity of the parallelism via an optional parameter chunksize
: if chunksize
is omitted, the input sequence is split evenly among the available cores; if chunksize
is specified, the input data is split in chunks of size chunksize
and dispatched to the available cores using an on demand strategy that ensures automatic load balancing.
A specific primitive array_float_parmap
is provided for fast operations on float arrays.
Configuring available cores
Setting and getting the default value for ncores
Getting ncores being used during parallel execution
Enabling/disabling processes core pinning
Setting and getting an explicity mapping from processes to cores
Parallel map and folds
Generic operations
Parallel mapfold
val parmapfold : ?init:(int -> unit) -> ?finalize:(unit -> unit) -> ?ncores:int -> ?chunksize:int -> ('a -> 'b) -> 'a sequence -> ('b -> 'c -> 'c) -> 'c -> ('c -> 'c -> 'c) -> 'c
parmapfold ~ncores:n f (L l) op b concat
computesList.fold_right op (List.map f l) b
by forkingn
processes on a multicore machine. You need to provide the extraconcat
operator to combine the partial results of the fold computed on each core. If 'b = 'c, thenconcat
may be simplyop
. The order of computation in parallel changes w.r.t. sequential execution, so this function is only correct ifop
andconcat
are associative and commutative. If the optionalchunksize
parameter is specified, the processes compute the result in an on-demand fashion on blocks of sizechunksize
.parmapfold ~ncores:n f (A a) op b concat
computesArray.fold_right op (Array.map f a) b
Parallel fold
val parfold : ?init:(int -> unit) -> ?finalize:(unit -> unit) -> ?ncores:int -> ?chunksize:int -> ('a -> 'b -> 'b) -> 'a sequence -> 'b -> ('b -> 'b -> 'b) -> 'b
parfold ~ncores:n op (L l) b concat
computesList.fold_right op l b
by forkingn
processes on a multicore machine. You need to provide the extraconcat
operator to combine the partial results of the fold computed on each core. If 'b = 'c, thenconcat
may be simplyop
. The order of computation in parallel changes w.r.t. sequential execution, so this function is only correct ifop
andconcat
are associative and commutative. If the optionalchunksize
parameter is specified, the processes compute the result in an on-demand fashion on blocks of sizechunksize
.parfold ~ncores:n op (A a) b concat
similarly computesArray.fold_right op a b
.
Parallel map
val parmap : ?init:(int -> unit) -> ?finalize:(unit -> unit) -> ?ncores:int -> ?chunksize:int -> ?keeporder:bool -> ('a -> 'b) -> 'a sequence -> 'b list
parmap ~ncores:n f (L l)
computesList.map f l
by forkingn
processes on a multicore machine.parmap ~ncores:n f (A a)
computesArray.map f a
by forkingn
processes on a multicore machine. If the optionalchunksize
parameter is specified, the processes compute the result in an on-demand fashion on blocks of sizechunksize
; this provides automatic load balancing for unbalanced computations, preserving the order of the results ifkeeporder
is set to true.
Parallel iteration
val pariter : ?init:(int -> unit) -> ?finalize:(unit -> unit) -> ?ncores:int -> ?chunksize:int -> ('a -> unit) -> 'a sequence -> unit
pariter ~ncores:n f (L l)
computesList.iter f l
by forkingn
processes on a multicore machine.parmap ~ncores:n f (A a)
computesArray.iter f a
by forkingn
processes on a multicore machine. If the optionalchunksize
parameter is specified, the processes perform the computation in an on-demand fashion on blocks of sizechunksize
; this provides automatic load balancing for unbalanced computations.
Parallel mapfold, indexed
val parmapifold : ?init:(int -> unit) -> ?finalize:(unit -> unit) -> ?ncores:int -> ?chunksize:int -> (int -> 'a -> 'b) -> 'a sequence -> ('b -> 'c -> 'c) -> 'c -> ('c -> 'c -> 'c) -> 'c
Like parmapfold, but the map function gets as an extra argument the index of the mapped element
Parallel map, indexed
val parmapi : ?init:(int -> unit) -> ?finalize:(unit -> unit) -> ?ncores:int -> ?chunksize:int -> ?keeporder:bool -> (int -> 'a -> 'b) -> 'a sequence -> 'b list
Like parmap, but the map function gets as an extra argument the index of the mapped element
Parallel iteration, indexed
val pariteri : ?init:(int -> unit) -> ?finalize:(unit -> unit) -> ?ncores:int -> ?chunksize:int -> (int -> 'a -> unit) -> 'a sequence -> unit
Like pariter, but the iterated function gets as an extra argument the index of the sequence element
Parallel map on arrays
val array_parmap : ?init:(int -> unit) -> ?finalize:(unit -> unit) -> ?ncores:int -> ?chunksize:int -> ?keeporder:bool -> ('a -> 'b) -> 'a array -> 'b array
array_parmap ~ncores:n f a
computesArray.map f a
by forkingn
processes on a multicore machine. If the optionalchunksize
parameter is specified, the processes compute the result in an on-demand fashion on blochs of sizechunksize
; this provides automatic load balancing for unbalanced computations, preserving the order of the results ifkeeporder
is set to true.
Parallel map on arrays, indexed
Float array operations
init_shared_buffer a
creates a new memory mapped shared buffer big enough to hold a float array of the size ofa
. This buffer can be reused in a series of calls toarray_float_parmap
, avoiding the cost of reallocating it each time.
Parallel map on float arrays
val array_float_parmap : ?init:(int -> unit) -> ?finalize:(unit -> unit) -> ?ncores:int -> ?chunksize:int -> ?result:float array -> ?sharedbuffer:buf -> ('a -> float) -> 'a array -> float array
array_float_parmap ~ncores:n f a
computesArray.map f a
by forkingn
processes on a multicore machine, and preallocating the resulting array as shared memory, which allows significantly more efficient computation than calling the generic array_parmap function. If the optionalchunksize
parameter is specified, the processes compute the result in an on-demand fashion on blochs of sizechunksize
; this provides automatic load balancing for unbalanced computations, *and* the order of the result is guaranteed to be preserved.In case you already have at hand an array where to store the result, you can squeeze out some more cpu cycles by passing it as optional parameter
result
: this will avoid the creation of a result array, which can be costly for very large data sets. RaisesWrongArraySize
ifresult
is too small to hold the data.It is possible to share the same preallocated shared memory space across calls, by initialising the space calling
init_shared_buffer a
and passing the result as the optionalsharedbuffer
parameter to each subsequent call toarray_float_parmap
. Raises WrongArraySize ifsharedbuffer
is too small to hold the input data.
Parallel map on float arrays, indexed
val array_float_parmapi : ?init:(int -> unit) -> ?finalize:(unit -> unit) -> ?ncores:int -> ?chunksize:int -> ?result:float array -> ?sharedbuffer:buf -> (int -> 'a -> float) -> 'a array -> float array
Like array_float_parmap, but the map function gets as an extra argument the index of the mapped element
Debugging and Helpers
val redirect : ?path:string -> id:int -> unit
Helper function that redirects stdout and stderr to files located in the directory
path
, carrying names of the shape stdout.NNN and stderr.NNN where NNN is theid
of the used core. Useful when writing initialisation functions to be passed asinit
argument to the parallel combinators. The default value forpath
is /tmp/.parmap.PPPP with PPPP the process id of the main program.