Parallel.js

Easy multi-core processing with javascript

Parallel.js is a tiny library for multi-core processing in Javascript. It was created to take full advantage of the ever-maturing web-workers API. Javascript is fast, no doubt, but lacks the parallel computing capabilites of its peer languages due to its single-threaded computing model. In a world where the numbers of cores on a CPU are increasing faster than the speed of the cores themselves, isn't it a shame that we can't take advantage of this raw parallelism?

Parallel.js solves that problem by giving you high level access to multi-core processing using web workers. It runs on node and in your browser.

Unminified: parallel.js

Minified (1490 Bytes gzipped): parallel.min.js

Source: github

Include parallel.js in your web projects like so:

            		<script src="parallel.js"></script>
        		

This will give you access to the global variable, Parallel.

Parallel can also be included in node:

            		$ npm install paralleljs
        		
            		var Parallel = require('paralleljs');
        		

Parallel(data, opts)

This is the constructor. Use it to new up any parallel jobs. The constructor takes an array of data you want to operate on. This data will be held in memory until you finish your job, and can be accessed via the .data attribute of your job.

The object returned by the Parallel constructor is meant to be chained, so you can produce a chain of operations on the provided data.

Arguments

data
This is the data you wish to operate on. Will often be an array, but the only restrictions are that your values are serializable as JSON.
options (optional): Some options for your job
  • evalPath (optional): This is the path to the file eval.js. This is required when running in node, and required when requiring files in browser environments (to work around cross-domain restrictions for web workers in IE 10). Defaults to the same location as parallel.js in node environments, and null in the browser
  • maxWorkers (optional): The maximum number of permitted worker threads. This will default to 4, or the number of cpus on your computer if you're running node
  • synchronous (optional): If webworkers are not available, whether or not to fall back to synchronous processing using setTimeout. Defaults to true.

Examples

Let's construct an new Parallel.js job:

                		var p = new Parallel([1, 2, 3, 4, 5]);
                		console.log(p.data); // prints [1, 2, 3, 4, 5]
            		

spawn

This function will spawn a new process on a worker thread. Pass it the function you want to call. Your function will receive one argument, which is the current data. The value returned from your spawned function will update the current data.

Arguments

fn
A function to execute on a worker thread. Receives the wrapped data as an argument. The value returned will be assigned to the wrapped data.

Examples

Let's start with something simple, reversing the letters of a string:

                            var p = new Parallel('forwards');
                            // Spawn a remote job (we'll see more on how to use then later)
                            p.spawn(function (data) {
                                data = data.split('').reverse().join('');
                                return data;
                            }).then(function (data) {
                                console.log(data) // logs sdrawrof
                            });
                        

This example reverses the letters in the string forwards. First, we construct a new Parallel job, passing in the argument, `'forwards'`. We then spawn a job, passing in an anonymous function. This function receives whatever the currently stored data is, and returns what we want it to be. Finally, we call `then` to log out the result when we're finished.

What might it look like if we spawn a longer running job?

                            var slowSquare = function (n) { 
                                var i = 0; 
                                while (++i < n * n) {}
                                return i;
                            };
                            // Create a job
                            var p = new Parallel(100000);
                            // Spawn our slow function
                            p.spawn(slowSquare).then(yourCallback);
                        

map

map will apply the supplied function to every element in the wrapped data. Parallel will spawn one worker for each array element in the data, or the supplied maxWorkers argument. The values returned will be stored for further processing.

map takes one required argument.

Arguments

fn
A function to apply. Receives the wrapped data as an argument. The value returned will be assigned to the wrapped data.

Examples

Let's start by computing numbers in the Fibonacci sequence:

                                var p = new Parallel([0, 1, 2, 3, 4, 5, 6]),
                                log = function () { console.log(arguments); };
                                // One gotcha: anonymous functions cannot be serialzed
                                // If you want to do recursion, make sure the function
                                // is named appropriately
                                function fib(n) {
                                    return n < 2 ? 1 : fib(n - 1) + fib(n - 2);
                                };
                                p.map(fib).then(log)
                                // Logs the first 7 Fibonnaci numbers, woot!
                            

We start by creating a new Parallel job, this time passing in a sequence of numbers. We then define the Fibonacci function. Make sure that your function is named, so that it can be serialized properly. This is only an issue if you reference that function, which we do, since it's recursive. Alternatively, we can share this function with the workers using require.

We then call map, which automagically spawns one worker for item in our list, unless we've specified a max number of workers. When our job is complete, we'll log out the first 7 Fibonacci numbers.

Now let's try a longer running job:

                                var p = new Parallel([40, 41, 42]),
                                log = function () { console.log(arguments); };
                                // One gotcha: anonymous functions cannot be serialzed
                                // If you want to do recursion, make sure the function
                                // is named appropriately
                                function fib(n) {
                                    return n < 2 ? 1 : fib(n - 1) + fib(n - 2);
                                };
                                p.map(fib).then(log);
                            

reduce

reduce applies an operation to every member of the wrapped data, and returns a scalar value produced by the operation. Use it for combining the results of a map operation, by summing numbers for example. This takes a reducing function, which gets an argument, data, an array of the stored value, and the current element.

reduce takes one required argument.

Arguments

fn
A function to apply. Receives the stored value and current element as argument. The value returned will be stored as the current value for the next iteration. Finally, the current value will be assigned to current data.

Examples

Let's compute e^10 from its series definition:

                                // Use underscore's range function to generate the series 0..49
                                var p = new Parallel(_.range(50));
                                function add(d) { return d[0] + d[1]; }
                                function factorial(n) { return n < 2 ? 1 : n * factorial(n - 1); }
                                function log() { console.log(arguments); }
                                p.require(factorial)
                                // Approximate e^10
                                p.map(function (n) { return Math.pow(10, n) / factorial(n); }).reduce(add).then(log);
                            

We start by creating a new Parallel job, again, passing in a sequence of numbers. We then define the add function, which will be used to reduce our values. Then we define the factorial function, and use the require method to share it with all workers.

Finally, we construct a job pipeline, consisting of a map operation, where we compute the series value for each index. Finally, the data is passed to our reduce operation, where we sum the values in the list. This sum of this series will tend to approximate e^10 as its length approaches infinity.

require

require is used to share state between your workers. Require can be used to import libraries and functions into your worker threads.

require takes any number of arguments, either functions or strings. If the argument is a function it will be converted into a string and included in your worker.

Important: If you pass functions into require they must be named functions. Anonymous functions will not work. If you wish to pass anonymous functions, you may do so by declaring them with an object literal of the form, { fn: myAnonFn, name: 'myAnonFn' }.

Example:

                            var wontWork = function (n) { return n * n; };
                            function worksGreat(n) { return n * n };
                            var r = new Parallel(3).require(wontWork).spawn(function (a) { return 2 * wontWork(a); }, 3);  // throws an error
                            var r = new Parallel(3).require(worksGreat).spawn(function (a) { return 2 * worksGreat(a); }, 3); // returns 18
                            var r = new Parallel(3).require({ fn: wontWork, name: 'wontWork' }).spawn(function (a) { return 2 * wontWork(a); }, 3); // returns 18
                        

Passing files as arguments to require

`require` also accepts files as requirements. These should be passed as strings. The string may either be a url of the file you want to include or an absolute path.

Important: When requring files in the browser or node, you will need to be sure to include eval.js, and provide a path via the evalPath option in the constructor.

Examples

Include path to eval.js
p = new Parallel([2, 3, 3], { evalPath: 'js/eval.js' });
Absolute url:
p.require('http://mydomain.com/js/script.js')
Absolute path (assuming my document lives in http://mydomain.com/index.html)
p.require('js/script.js')
Does not work (yet)
p.require('../js/script.js')

Important: browser security restrictions prevent loading files over the file protocol, so you will need to run an http server in order to load local files.

Personally, I like the npm package, http-server. This can be installed and run pretty easily:

                                $ npm install http-server -g
                                $ cd myproject
                                $ http-server .
                            

Passing environment to functions

You can pass data to threads that will be global to that worker. This data will be global in each called function. The data will be available under the global.env namespace. The namespace can be configured by passing the envNamespace option to the Parallel constructor. The data you wish to pass should be provided as the env optionn to the parallel constructor.

Important: Globals can not be mutated between threads.

Examples

                                var p = new Parallel([1, 2, 3], {
                                    env: {
                                        a: 10
                                    }
                                });
                                // returns 10, 20, 30
                                p.map(function (d) {
                                    return d * global.env.a;
                                });
                                // Configure the namespace
                                p = new Parallel([1, 2, 3], {
                                    env: {
                                        a: 10
                                    },
                                    envNamespace: 'parallel'
                                });
                                p.map(function (d) {
                                    return d * global.parallel.a;
                                });