Buy the latest issue online!
Never miss an update! Get PC Plus in your RSS reader and follow us on Twitter

Tutorial: Build The Ultimate PC

it's easy to use any spare machines you may have to create a single homogeneous computing mega-matrix. Yes.

You can never have enough processing power, especially if you enjoy working with 3D graphics or compiling your own software. Luckily, it's easy to use any spare machines you may have to create a single homogeneous computing mega-matrix and calculation engine just by wiring them all together and running the right software. Even modest hardware can make a significant contribution to your net computing power, and in this tutorial from PC Plus 278, we show you how...

And don't forget, if you're a Linux fan check out our tutorials about how to build your own Linux distro and how to boot into Linux over a network.

The shopping list

In theory, you can use any old PC. The minimum requirement is that it must be able to run Linux; so that narrows the choice down to almost any PC from the last 10 years. But in reality, the cluster works best if the machines that you’re linking together are relatively close in specification, especially when you start to take running costs into consideration. A 1GHz Athlon machine, for example, could cost you over £50 a year in electricity costs. You’d be much better off spending this money on a processor upgrade for a more efficient machine. A similar platform for each computer also makes configuration considerably easier.

For our cluster, we used four identical powerful machines. You only need powerful machines if you’re making a living from something computer-based – 3D animation, for instance – where you can weigh the extra cost against increased performance. We’re also going to assume that you have a main machine you can use as the master. This will be the eyes and ears of the cluster, and it’s from here that you’ll be able to set up jobs and control the other machines.

Hardware compatibility

Linux has come a long way in terms of hardware compatibility, but you don’t want to be troubleshooting three different network adaptors if you can help it. And it’s the network adaptors that are likely to be the weakest link. Cluster computing is dependent on each machine having access to the same data, and that means that data needs to be shuffled between each of the machines on the network cluster continually.

Due to the heavy data requirement, wireless is too slow for almost any task, so you’re going to need a physical connection between each machine. The faster your networking hardware, the better. To start with, you should be fine with your standard Ethernet ports as long as they’re capable of megabit speeds (100Mbps). If you need greater capacity, gigabit Ethernet cards are also cheap. Just stick them into a free PCI or PCI-X slot, and then disable the slower port if you can.

You’ll also need a way of connecting the machines together. A gigabit switch can be bought for around £20 to £30, and this will let you connect each machine together at the maximum speed capacity. You may be able to use the Ethernet ports on an Internet or wireless router, but most will only be able to handle megabit speeds. It might be a good idea to start out using your standard Ethernet ports and your Internet or wireless router and upgrade to a dedicated switch and ports later on.

As for general PC requirements, that depends on how you plan to use your cluster. Memory is more important than processing speed because it’s likely that you’ll be using your combined computing power for large, memory-intensive jobs. That means lots of images and textures in the case of 3D rendering, or lots of libraries, objects and source files in the case of distributed compilation.

Each node in a cluster will normally operate independently of the rest. If one machine has a lower specification, it shouldn’t be too much of a bottleneck, but this depends on how you use your cluster. If you’re rendering a 300-frame animation and the slow machine happens to render only a handful of frames, then it has still made some contribution without holding any of the other machines back. But if the same machine is used to help render a single complex frame, it’s likely that you’ll need to wait for the slower machine to complete its work after the faster machines have finished.

Re-using hardware

If you’re building these machines from scratch, you can cut corners. Other than for installing the OS, you won’t need a keyboard, mouse, optical drive or screen. You can even avoid using an optical drive for installation by using a tool called UNetbootin. This generates a Ubuntu USB stick installation from a standard CD image. However, its success depends on the capabilities of the system BIOS within each machine. BIOS settings are normally accessed by pressing a certain key when you boot your machine. This key is most commonly [F2], but it could also be [F1], [Delete] or [Escape]. Look for any on-screen messages at boot for a clue.

From the BIOS, you should be able to discern whether your system can boot off external USB devices. You should also disable ‘halt on all errors’ if the option exists. This means that your machine will boot even when there are errors or problems detected – which is what you need if the machine complains that no keyboard, mouse or screen has been detected.

Internal storage needs are minimal. You need around 10GB for the operating system, and it’s going to be difficult to find a hard drive that small these days if you haven’t got one or two handy. For storing your project data, we’d recommend using either a large drive (or array) on the master machine, or use an external NAS device connected to the same switch as the other machines in the cluster. Linux can mount remote drives onto the local file system so that apps treat the remote storage as if it were local.

Each machine needs to be connected to the switch that we mentioned earlier using a standard Ethernet cable. You also need to connect the switch to your Internet or wireless router. A router is needed to assign a network address to the machines in the cluster; this should happen automatically thanks to the DHCP server running on the router. If you don’t want to use a router (or connect your cluster to the Internet), use your master machine as a DHCP server. This will create IP addresses for each machine on the switch.

Installing the software

With the hard stuff out the way, it’s time to install the software. We recommend Ubuntu as your Linux distribution of choice because there is such a wide variety of packages available, and these can be installed through the default package manager without any further configuration. This makes installing apps like Blender across your cluster as easy as typing a single command on each machine, and it also means that you can install support utilities such as the Dr Queue rendering queue manager just as easily.

We opted for the Desktop rather than the Server version. This was because we still need the desktops on each node for setting up our various tests, and while we could still do the same with the Server edition (which by default has no desktop), configuration would be easier with a desktop.

Installing Ubuntu is easy but a little tedious, as you need to go through the following routine for each machine. After creating a burned CD from the ISO image of the distro, boot each machine in turn with this disc in the optical drive. If it doesn’t boot, check the boot order in your BIOS. Next, choose the ‘Install Ubuntu’ option from the Boot menu and answer the resulting questions.

On the Partition Options page, use the entire disk for the installation (unless you’re sharing the machine with a Windows OS), and create the same user account on each machine. This makes setting up the shared storage device easier. You should also give each computer within the cluster its own name (on the Who Are You page). Installation can take up to 45 minutes, depending on the speed of each node. When finished, your machine will restart and you’ll need to log in to your new desktop.

It’s likely that you will also have to install a few security updates; a small yellow balloon message will open if these are necessary. The next step is to install the control software.

Doing more with software

New applications and software can be installed through the Synaptic package manager. This can be found in the ‘System | Administration’ menu. First, you need to search for a package called ‘openssh-server’. In Synaptic, click on the package to enable it, followed by ‘Apply’ to download and install it. You need to do the same for a package called ‘tightvncserver’. Both of these tools are used for remote administration, so you won’t need them if you plan to keep a keyboard, mouse and screen attached to your various nodes.

To connect to each machine from a master device running Windows on the network or one of the Linux machines, you need to use an SSH client. The most popular application for Windows is the freely available Putty, while Linux users can simply type ‘ssh -l username IP Address’ into the command-line. To find the IP address of each node, either use your router’s web interface for connected devices (if it has one) or right-click on the Network icon on the Ubuntu desktop and select ‘Connection Information’. If you’re using the command line, type ‘ifconfig’.

From an SSH connection, you can launch a remote desktop session by typing ‘vncserver :1’ followed by a password. When the remote session is active, you can connect to a desktop session on each machine using a VNC client such as TightVNC Viewer for Windows or Vinagre for Ubuntu. When specifying the address to connect to, make sure that you follow the IP numbers with ‘:1’, as this is needed in order to specify which screen session to connect to.

Testing the cluster

Now that everything is running as it should, let’s test the combined power of the cluster. The easiest method that we’ve found (and the one that provides immediate results) is to use Blender, a professional-quality open-source 3D animation system that’s particularly mathematically intensive. It’s designed to work with more than one instance running on different machines.

You can easily run a job on your server before farming it out to the entire cluster. All you need to do is open an instance of Blender on each machine, get them to read the same ‘.blend’ animation file, switch to the Output page and make sure that every machine is writing its output to the same directory. You then need to enable both the ‘Touch’ and ‘No Overwrite’ options. When rendering is started, each Blender instance will grab the next available frame that hasn’t been created by another. It’s an effective and pretty quick way of spreading the load.

There are two downsides to this option. Firstly, you’ll need to have a screen connected to each node because Blender won’t run over VNC. Secondly, Blender will only work with animations. Single-frame images can’t be distributed. Alternative solutions are more complicated, but happily, both of these problems can be overcome by installing a render farm queue manager.

The best tool that we found was Distriblend. This suite of free Java tools automatically distributes Blender jobs between the various nodes of your cluster. You use a client to add Blender jobs to a distributor that then sends the jobs on to each node. We used Distriblend for our distributed rendering benchmarks, and we were able to almost quadruple our processing speed across our four machines. The render time for a 120-frame, 640x480 resolution character animation example from Blender Nation was cut from two minutes to less than 30 seconds. On a more complex fluid dynamics job (‘Vector blurred fluid’ by Matt Ebb), we saved ourselves over two hours by distributing the 50 frames of animation across our cluster.

Enjoyed this article? Subscribe to PC Plus and get your monthly digest of news, features and all the stories that matter delivered straight to your doorstep - worldwide. Click here for our latest offers.

Very clear instructions.

This is very helpful.

Great Stuff!

Thank you, Paul

Dr. Paul's picture

Thanks for your comments and encouragement!

Cheers

M

Martin Cooper's picture

Great Post Learned a lot, keep up the good work Thank you

ubu-fan's picture

I don't see how this is a cluster?All you are doing is distributed processing. I don't see any cluster software (wolfpack?) A cluster is seen as one machine to the outside world (internally it can be many in many places). So how does putting Ubuntu on a few machines and getting it to be used by blender make it a cluster? Are you not just doing what SETI etc do? Thats not really a cluster!

AlanA's picture

How on earth is this a cluster? I kept looking for a page 2.

No doubt, the ultimate PC. LOL

Anonymous's picture

Considering this is an article for complete beginners, how about other examples of apps that can have its processing distributed? Blender is fine, but any other examples/applications? Good article, anyway (apart from calling it a cluster).

Anu de Deus's picture

This is a good example on a compute cluster. It has the feature of split and distribute a single job to nodes and merge the result when complete.

There are versions of SPICE that can take advantages of parallelism in calculating electronic designs...and finally there are libraries for programming parallel tasks/processes.

Tore

Tore's picture

This one was a complete, but welcome, surprise. Although the title is a trifle misleading, the content is becoming quite relevant these days. We hope to install a 64node cluster at work this year. I hope to see you write a bit more on clustering, as your writing style is quite a pleasure to read, unlike some off the more jargonistic writeups around.

friend's picture

I use pecon to parallelize Matlab programs. It's very easy to use and besides a few minor requirements, it just needs each computer to have matlab, ssh, and access to the same project files. http://www.cs.wlu.edu/~levy/software/pecon/

Anonymous's picture

There is a mistake, in linux you DO NOT use ipconfig, that is for only windows machines, but instead use the "ifconfig" command. Simple mistake, but should be noted.

Anonymous's picture

hello all if i assembled a cluster...lets say mini itx w/core 2 quads and pci/ex16 and ran something like Wine...could i play windows fps on the net any faster?

rob m's picture

I think when people think of clusters they immediately think of a SSI (Single System Image). A virtual system where processes operate transparently over multiple nodes. Lookup Kerrighed, based on Linux, it's a system that does just that. www.kerrighed.org

Daniel Waterworth's picture

Post new comment

The content of this field is kept private and will not be shown publicly.
If you have a Gravatar account, used to display your avatar.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <p>

More information about formatting options

CAPTCHA
We apologise for making you prove your humanity...
Buy the latest issue online!