Setting up an isolated cluster, help please.

Message boards : Number crunching : Setting up an isolated cluster, help please.
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile Anthony Q. Bachler

Send message
Joined: 3 Jul 07
Posts: 29
Credit: 608,463
RAC: 0
United States
Message 623349 - Posted: 20 Aug 2007, 16:00:28 UTC
Last modified: 20 Aug 2007, 16:03:31 UTC

Im setting up a cluster but the cluster will not have a direct internet connection. The main workstation is connected to the internet and to the cluster lan, but the connections won't be bridged. What I would like to do is have a shared directory on the workstation that contains all the WU's and have the nodes fo the cluster run those WU's. Is this possible or is there some other way i have to set up an off-line cluster? Id like the workstation to automatically retrieve new WU's for the entire cluster and submit finished WU's from the entire cluster without the nodes having to make actual contact.
ID: 623349 · Report as offensive
Dotsch
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 2422
Credit: 919,393
RAC: 0
Germany
Message 623374 - Posted: 20 Aug 2007, 16:14:19 UTC

I think, it would better and easier to install a proxy server at the workstation. So the nodes from the internal cluster lan can do the up- and downloads throught the proxy server at the workstation.
ID: 623374 · Report as offensive
Profile Anthony Q. Bachler

Send message
Joined: 3 Jul 07
Posts: 29
Credit: 608,463
RAC: 0
United States
Message 623393 - Posted: 20 Aug 2007, 16:30:02 UTC - in response to Message 623374.  
Last modified: 20 Aug 2007, 16:40:02 UTC

I think, it would better and easier to install a proxy server at the workstation. So the nodes from the internal cluster lan can do the up- and downloads throught the proxy server at the workstation.

There are logistics reasons why I would prefer to have the WU pool centralized. Some of the crunching we use the cluster for can saturate one or several nodes for weeks or months at a time. I dont want WU's going stale sitting in the queue of a busy node. Most of the jobs we submit are small, and just need to be finished quickly, hence the cluster, so most of the time, most of the cluster is idle.
ID: 623393 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 623408 - Posted: 20 Aug 2007, 16:37:02 UTC

I know of noone who's successfully configured a cluster to work. The boinc client needs to be able to call home on it's own. Each wu is assigned by Seti to a "specific" host. This is recorded in their database. It also has to be returned by the host it was sent to or it won't be accepted.

There are ways to transfer wus from one host to another, but they're time consuming and hardly worth the bother IMO.
ID: 623408 · Report as offensive
Profile Jim-R.
Volunteer tester
Avatar

Send message
Joined: 7 Feb 06
Posts: 1494
Credit: 194,148
RAC: 0
United States
Message 623412 - Posted: 20 Aug 2007, 16:41:37 UTC - in response to Message 623393.  

I think, it would better and easier to install a proxy server at the workstation. So the nodes from the internal cluster lan can do the up- and downloads throught the proxy server at the workstation.

There are logistics reasons why I would prefer to have the WU pool centralized. Some of the crunching we use the cluster for can saturate one or several nodes for weeks or months at a time. I dont want WU's going stale sitting in the queue of a busy node.

Unfortunately the current BOINC client does not support pooling of work to be distributed to other computers. In the past I understand there was such a program available (for Classic SETI) that would aloow caching of work units, but it is not available for BOINC.

The problem is the BOINC client connects to the project server and asks for work and the work is assigned to that client only. It would take a complete rewrite of the client code to enable what you are asking. I'm no programming expert, but I have investigated the use of clusters myself and gave up on it since I don't know enough about programming to rewrite the code to enable clustering. I will still set up something similar, but each one will connect through a router to the project and get work and report individually.
Jim

Some people plan their life out and look back at the wealth they've had.
Others live life day by day and look back at the wealth of experiences and enjoyment they've had.
ID: 623412 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 623431 - Posted: 20 Aug 2007, 16:55:42 UTC - in response to Message 623393.  

I think, it would better and easier to install a proxy server at the workstation. So the nodes from the internal cluster lan can do the up- and downloads throught the proxy server at the workstation.

There are logistics reasons why I would prefer to have the WU pool centralized. Some of the crunching we use the cluster for can saturate one or several nodes for weeks or months at a time. I dont want WU's going stale sitting in the queue of a busy node. Most of the jobs we submit are small, and just need to be finished quickly, hence the cluster, so most of the time, most of the cluster is idle.

I know there can be logistic reasons that you'd want to do this, but BOINC isn't designed to work this way.

What you might be able to do is give each machine a separate directory, and then (carefully) shuffle the contents of the directory from the connected machine(s) to the non-connected machines.

At the same time, I wouldn't worry that much about work going stale. It's part of the cost of doing business.
ID: 623431 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 623439 - Posted: 20 Aug 2007, 17:02:48 UTC - in response to Message 623431.  
Last modified: 20 Aug 2007, 17:26:41 UTC

I would imagine that a customised boinc client as a master, injecting workunits into a miniature boinc server setup running its own special mini project, plus boinc clients on each node 'attaching' to that project could be a way of doing it effectively. Of course that would require a massive amount of effort, and the daily quota restrictions would quickly become a problem. After all distributed computing is what boinc is designed for ... just thoughts.

"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 623439 · Report as offensive
Profile Anthony Q. Bachler

Send message
Joined: 3 Jul 07
Posts: 29
Credit: 608,463
RAC: 0
United States
Message 623450 - Posted: 20 Aug 2007, 17:17:43 UTC - in response to Message 623408.  
Last modified: 20 Aug 2007, 17:20:54 UTC

I know of noone who's successfully configured a cluster to work. The boinc client needs to be able to call home on it's own. Each wu is assigned by Seti to a "specific" host. This is recorded in their database. It also has to be returned by the host it was sent to or it won't be accepted.

There are ways to transfer wus from one host to another, but they're time consuming and hardly worth the bother IMO.


Hmmm, thats a shame then. Guess Ill just have to configure the cluster to go into hibernation mode when its idle then. I was really looking forward to crunching with 'my' 64 dual core2 quads.
ID: 623450 · Report as offensive
zombie67 [MM]
Volunteer tester
Avatar

Send message
Joined: 22 Apr 04
Posts: 758
Credit: 27,771,894
RAC: 0
United States
Message 623451 - Posted: 20 Aug 2007, 17:21:20 UTC - in response to Message 623450.  

I know of noone who's successfully configured a cluster to work. The boinc client needs to be able to call home on it's own. Each wu is assigned by Seti to a "specific" host. This is recorded in their database. It also has to be returned by the host it was sent to or it won't be accepted.

There are ways to transfer wus from one host to another, but they're time consuming and hardly worth the bother IMO.


Hmmm, thats a shame then. Guess Ill just have to configure the cluster to go into hibernation mode when its idle then. I was really looking forward to crunching with my 64 dual core2 quads.

Maybe other non-BOINC DC projects would work? There are a lot out there, like Folding@home or distributed.net.
Dublin, California
Team: SETI.USA
ID: 623451 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 623457 - Posted: 20 Aug 2007, 17:30:34 UTC - in response to Message 623450.  
Last modified: 20 Aug 2007, 17:31:21 UTC


Hmmm, thats a shame then. Guess Ill just have to configure the cluster to go into hibernation mode when its idle then. I was really looking forward to crunching with 'my' 64 dual core2 quads.


Well, unless the fact that BOINC was running on a node was causing a problem for it's 'paying' job, then I would just set it up so you weren't carrying a lot of cached work and then not worry about it. IOW's 'No Reply' is an occupational hazard for all participants and part of the game, so I don't see why you should withold your resources from the project just because of that.

If you were concerned about it going idle when SAH has problems, you could always run a different project as a backup for it.

Alinator
ID: 623457 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 623460 - Posted: 20 Aug 2007, 17:33:37 UTC - in response to Message 623450.  

I know of noone who's successfully configured a cluster to work. The boinc client needs to be able to call home on it's own. Each wu is assigned by Seti to a "specific" host. This is recorded in their database. It also has to be returned by the host it was sent to or it won't be accepted.

There are ways to transfer wus from one host to another, but they're time consuming and hardly worth the bother IMO.


Hmmm, thats a shame then. Guess Ill just have to configure the cluster to go into hibernation mode when its idle then. I was really looking forward to crunching with my 64 dual core2 quads.

How committed are you to SETI@home as your distributed computing project? One other possibility would be something like CPDN, which doesn't actually need to contact the internet more than once every six weeks or so (and even then, that's mainly just to reassure the server that they're still alive - the WUs themselves take about 4 months on my 1.86GHz Xeon quads, running 24/7).
They have fairly hefty disk storage requirements, but the new ones don't thrash the disk so hard once they're running.

If you could transfer each set of models to the gateway node say once a month, and let them send the trickle uploads and intermediate result files - you might even be able to run them briefly on the gateway node from a network share on their usual home node, enable networking for long enough to 'phone home', and then diable networking, shut down processing on the gateway node, and restart processing on the 'host' node. If the "run from network share" concept works (and I haven't tried it myself), you could even automate the process with a carefully choreographed set of scheduled scripts.
ID: 623460 · Report as offensive
Profile Anthony Q. Bachler

Send message
Joined: 3 Jul 07
Posts: 29
Credit: 608,463
RAC: 0
United States
Message 623461 - Posted: 20 Aug 2007, 17:33:52 UTC - in response to Message 623457.  
Last modified: 20 Aug 2007, 17:45:08 UTC

No the issue is the nodes will not have a direct internet connection, either through a bridge or through a proxy server. They either have to run from a networked directory or not at all. They simply cannot have direct access to the internet. Its a security issue you see. I know this makes it a bit of a pain to implement, but we wont risk our cluster.

As for my commitment to Seti being my idle project, its either that or PrimeGrid.

ID: 623461 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 623463 - Posted: 20 Aug 2007, 17:36:32 UTC

Yep, I saw that 'little' detail in your other post. That's a horse of a different color, and a tough nut to crack given the way BOINC is currently designed to work. ;-)

Alinator

ID: 623463 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 623468 - Posted: 20 Aug 2007, 17:44:07 UTC

I've seen several requests for this kind of thing. perhaps if you figure out a way to set up a cluster, then others(including the project) might benefit from it.

I hate giving bad news out. (generally)
ID: 623468 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 623472 - Posted: 20 Aug 2007, 17:50:07 UTC - in response to Message 623468.  
Last modified: 20 Aug 2007, 18:03:38 UTC

out of interest in the topic, If they were Linux machines, some of these options look promising at
A survey of open source cluster management systems Even Boinc is mentioned for long distance remote clustering.

[Later: From reading that I would imagine some of those "Single System Image" clustering tools would present to Boinc on the master as a single large computer with many CPUs, and share the load around as load dictates ... from what I'm reading this would not require modified Boinc program or applications on at least a few of those setups ]
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 623472 · Report as offensive
Profile Anthony Q. Bachler

Send message
Joined: 3 Jul 07
Posts: 29
Credit: 608,463
RAC: 0
United States
Message 623483 - Posted: 20 Aug 2007, 18:16:02 UTC - in response to Message 623472.  
Last modified: 20 Aug 2007, 18:22:51 UTC

out of interest in the topic, If they were Linux machines, some of these options look promising at
[url=http://www.linux.com/articles/57073]


That would be a nice route to explore for dedicated BOINC clusters. Im sort of locked into using Windows though since we have to run the 'money' jobs using an in-house DC API. What we need is a 'passive' client, that only runs WU's from a particular directory, and a 'Cluster managemer' that has a central WU cache that it uses to fill one or several subdirectories with WU and to submit finished WU found in those directories.

One question that coems to mind is, how do multiple copies of Seti sync with each other when they are running? Do they use MUTEX's to access the WU cache, or do they use file locking?
ID: 623483 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 623487 - Posted: 20 Aug 2007, 18:20:53 UTC - in response to Message 623483.  
Last modified: 20 Aug 2007, 18:25:52 UTC

Looked into Windows Cluster 2003? (I haven't, i.e 'Windows Compute Cluster Server 2003'/ 'Windows Server 2003 R2...' etc)
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 623487 · Report as offensive
Profile Anthony Q. Bachler

Send message
Joined: 3 Jul 07
Posts: 29
Credit: 608,463
RAC: 0
United States
Message 623495 - Posted: 20 Aug 2007, 18:31:16 UTC - in response to Message 623487.  

Looked into Windows Cluster 2003? (I haven't, i.e 'Windows Compute Cluster Server 2003'/ 'Windows Server 2003 R2...' etc)


Seems like a workable solution, but not at $470 per node.
ID: 623495 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 623496 - Posted: 20 Aug 2007, 18:32:01 UTC - in response to Message 623483.  
Last modified: 20 Aug 2007, 18:33:18 UTC

I have seen some people using a hazardous work unit transplant scheme to move workunits from a boinc installation to another machine, then move them back again for upload... If there could be multiple boinc installations on the master, this would still present as multiple machines, but could probably be scripted.... splicing out workunits from an installed Boinc setup as a single Boinc Client would be tricky but theoretically possible I suppose [ ther'd be many issues to find and work around]. It'd be easier to find a secure means of tunnelling through the network...

"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 623496 · Report as offensive
Profile Anthony Q. Bachler

Send message
Joined: 3 Jul 07
Posts: 29
Credit: 608,463
RAC: 0
United States
Message 623499 - Posted: 20 Aug 2007, 18:48:54 UTC - in response to Message 623496.  

I have seen some people using a hazardous work unit transplant scheme to move workunits from a boinc installation to another machine, then move them back again for upload... If there could be multiple boinc installations on the master, this would still present as multiple machines, but could probably be scripted.... splicing out workunits from an installed Boinc setup as a single Boinc Client would be tricky but theoretically possible I suppose [ ther'd be many issues to find and work around]. It'd be easier to find a secure means of tunnelling through the network...


yes, I suppose that would be the only way, since otherwise the daily WU quota will only keep about 6 machines busy. Even if the project lets gives back a WU poitn fro each one submitted, it woudl still only let me feed 100 cpu off a single account, which woudl be a mere 12 out of 64 systems.
ID: 623499 · Report as offensive
1 · 2 · 3 · Next

Message boards : Number crunching : Setting up an isolated cluster, help please.


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.