Setting up an isolated cluster, help please.

Message boards : Number crunching : Setting up an isolated cluster, help please.
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 625119 - Posted: 23 Aug 2007, 21:49:06 UTC - in response to Message 625108.  
Last modified: 23 Aug 2007, 21:51:32 UTC


What about a "truly part time proxy" with a very high security and highly limited access control? I have configured a cluster like that with squid running part-time (once a day for ~1/2 hour is enough) on the master workstation and the Boinc clients on rackmounted cluster nodes configured for this proxy. Squid is very easy to configure so that it only connects to setiathome.berkeley.edu sites and no other sites, and it services only the cluster nodes.


I think you missed his point. It's not that he isn't aware of secure solutions to the problem. The problem is he's not the final arbiter of the restrictive policy, even if that policy doesn't make a whole lot of sense once they said he could run something not inhouse in the first place. IOW's no matter what you do, something is acting as a 'proxy' to the outside, by defintion.

Alinator
ID: 625119 · Report as offensive
Profile Anthony Q. Bachler

Send message
Joined: 3 Jul 07
Posts: 29
Credit: 608,463
RAC: 0
United States
Message 625172 - Posted: 23 Aug 2007, 23:04:08 UTC

OK, I think I might have a solution. Im going to try crunching some WU from my workstation on a computer I have at home, and then see if the workstation will register it and upload it. The actual exercise will be Ill copy the WU's from my workstation over to a stick, then copy the files from the stick to my home system, then when teh WU's at home are finished ill copy them back intot he workstations directory.
ID: 625172 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 625178 - Posted: 23 Aug 2007, 23:11:55 UTC - in response to Message 625172.  
Last modified: 23 Aug 2007, 23:13:58 UTC

OK, I think I might have a solution. Im going to try crunching some WU from my workstation on a computer I have at home, and then see if the workstation will register it and upload it. The actual exercise will be Ill copy the WU's from my workstation over to a stick, then copy the files from the stick to my home system, then when teh WU's at home are finished ill copy them back intot he workstations directory.


I think you're seeing my point though that even if you collect the work at home and then transplant to the cluster from inside it's firewall, it's still the net same difference. You have acted as a human proxy for the cluster and are still trusting the work it's going to run (both application and data) is 'kosher'.

At that point it really doesn't make a lot of difference how it got there if it turned out to be malicious. IOW's Sneakernet or Internet is pretty much the same here.

Alinator
ID: 625178 · Report as offensive
Profile doublechaz

Send message
Joined: 17 Nov 00
Posts: 89
Credit: 76,455,865
RAC: 735
United States
Message 625208 - Posted: 24 Aug 2007, 0:06:02 UTC
Last modified: 24 Aug 2007, 0:06:48 UTC

Yes, but seemlingly his policy boss doesn't understand that packets and the payload of those packets are the same thing WRT data delivery to a network of computers. The policy is 'no active direct or proxy connection to the net. But any *data* delivered to the cluster is fine as long as it doesn't slow production jobs, or compromise security.' It's a subtle wording difference that we don't see, but that boss does. Doesn't matter if it is a real difference or a semantic difference. The boss said.

I think you can create a share on the head for each node with an instance of boinc in it. The head can run each in turn to get a machine id and cache (lying about the host name so that they are all separate machines) and then stop. Then each node can run in that share to process. Every 8 or 12 hours the nodes all shut down the boinc process. The head runs each in turn to return/fetch WUs. Then the nodes pick up again. Should be a fairly simple scripting job apart from the getting unique machine IDs for each instance.
ID: 625208 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 625234 - Posted: 24 Aug 2007, 0:45:08 UTC - in response to Message 623349.  

Im setting up a cluster but the cluster will not have a direct internet connection. The main workstation is connected to the internet and to the cluster lan, but the connections won't be bridged. What I would like to do is have a shared directory on the workstation that contains all the WU's and have the nodes fo the cluster run those WU's. Is this possible or is there some other way i have to set up an off-line cluster? I'd like the workstation to automatically retrieve new WU's for the entire cluster and submit finished WU's from the entire cluster without the nodes having to make actual contact.


Sounds like what you really want is a cluster of N nodes with one controlling workstation. N+1 computers. It's a very interesting challenge and one I expect others have run into before. Try doing a Google search on "BOINC cluster".

I see no reason why it couldn't be done, however, doing it elegantly may require a BOINC Cluster Client to be written in C++. You could trick it by manually moving files to each computer and editing the client_state.xml files on each computer...but that's way too tedious and easy to mess up.

I studied the architecture on the BOINC wiki http://www.boinc-wiki.info/BOINC_System_Architecture and decided how I would architect this in order to make a cluster work with a minimal amount of coding.

You would need the following configuration to start with:
N nodes each with:
a copy of BOINC

1 controlling node with:
a copy of BOINC & BOINC Manager
two network cards (one for each subnet)
internet connection
Apache web server with PHP module installed
lots of disk space to hold all work units & finished output files

Essentially, the idea is the controlling node (a.k.a. "controller" from now on) would have a web server that would act as your own personal BOINC Project Server. All the nodes would access the controller's web server to download files and get new work. When done, they would upload the finished file to that same node.

The tricky stuff is this:
- Write PHP pages that "fool" the nodes to think it's a project server.
- Join this fake project with BOINC client on each node (or just copy the same BOINC folder to each node once you've done this).
- Run BOINC with Suspend always.

Once the nodes complete their work unit and return the result, you'll need to "Run Always" BOINC on the controller. Since the finished file will already be there, it will say completed 100% immediately (if all works right) and upload the file to the real project's server. After all the work units have been uploaded and new work is downloaded, set the controller to Suspend. This cycle of getting a bunch of work units and uploading all at once can repeat as long as you switch them at the right time. The trick is all the work units must have finished files on the controller, else the controller will try to crunch the WU itself.

I can't swear that this will work, but it's the best architecture I can come up with. Writing a few PHP pages is nothing compared to writing a custom BOINC client in C++.
ID: 625234 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 625245 - Posted: 24 Aug 2007, 1:00:49 UTC - in response to Message 625208.  

I think you can create a share on the head for each node with an instance of boinc in it. The head can run each in turn to get a machine id and cache (lying about the host name so that they are all separate machines) and then stop. Then each node can run in that share to process. Every 8 or 12 hours the nodes all shut down the boinc process. The head runs each in turn to return/fetch WUs. Then the nodes pick up again. Should be a fairly simple scripting job apart from the getting unique machine IDs for each instance.


Could you elaborate on: "Then each node can run in that share to process."

How will each node which work unit to crunch? Won't you have to write each node's client_state.xml file yourself from the controller node?
ID: 625245 · Report as offensive
Profile michael37
Avatar

Send message
Joined: 23 Jul 99
Posts: 311
Credit: 6,955,447
RAC: 0
United States
Message 625287 - Posted: 24 Aug 2007, 2:52:51 UTC - in response to Message 625234.  

I can't swear that this will work, but it's the best architecture I can come up with. Writing a few PHP pages is nothing compared to writing a custom BOINC client in C++.


Well, there is a "blessed" architecture described here: http://boinc.berkeley.edu/proxy_server.php

ID: 625287 · Report as offensive
Profile michael37
Avatar

Send message
Joined: 23 Jul 99
Posts: 311
Credit: 6,955,447
RAC: 0
United States
Message 625295 - Posted: 24 Aug 2007, 3:04:44 UTC - in response to Message 625172.  

OK, I think I might have a solution. Im going to try crunching some WU from my workstation on a computer I have at home, and then see if the workstation will register it and upload it. The actual exercise will be Ill copy the WU's from my workstation over to a stick, then copy the files from the stick to my home system, then when teh WU's at home are finished ill copy them back intot he workstations directory.


I have an interesting question. Can the master workstation launch jobs on the cluster nodes via cluster lan?

If yes, here is my 'alternative' architecture for your cluster.

1. Share boinc directory from master workstation to all cluster nodes.
2. Create a custom "application" which launches job on a randomly selected computer. This is not that difficult since modern boinc versions create a slot/<NUMBER> directory and creates "Unix-like soft link" to the job.
For example, the workunit file is called work_unit.sah with contents:
<soft_link>../../projects/setiathome.berkeley.edu/02mr07ah.32435.11524.12.5.191</soft_link>

3. Create app_info.xml in projects/setiathome.berkeley.edu for a your custom application, which not a really an application, but more like a remote job launcher.
4. Make a copy of modified boinc which thinks you have 128 CPUs.

That should be it!


ID: 625295 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 625311 - Posted: 24 Aug 2007, 3:47:12 UTC - in response to Message 625295.  

Can the master workstation launch jobs on the cluster nodes via cluster lan?


No, like you said, if it did, the cluster solution would be trivial.

Unfortunately, science applications are designed to be spawned by the BOINC daemon. A sneaky programmer would have to write a simple loader that interfaces with the science apps in the matter described by the BOINC API.
ID: 625311 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 625312 - Posted: 24 Aug 2007, 3:51:32 UTC - in response to Message 625287.  

Well, there is a "blessed" architecture described here: http://boinc.berkeley.edu/proxy_server.php


That would essentially be a more complicated version of what I proposed. I still think my solution would be easier to implement.
ID: 625312 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 625326 - Posted: 24 Aug 2007, 5:08:43 UTC
Last modified: 24 Aug 2007, 5:16:05 UTC

You guys are splitting hairs here. If the policy is that nothing 'untrusted' is to be run on the protected cluster, then by definition running a BOINC project is illegal.

The fact Anthony is permitted to make that call for an outside the house application doesn't change the fact it's still an exception to rule, no matter how it arrives at the cluster.

That's why I was saying that methods exist to prevent even man in the middle attacks in the BOINC context in this scenario, therefore the restrictive policy is kind of silly once you make the exception in the first place.

Alinator
ID: 625326 · Report as offensive
Profile michael37
Avatar

Send message
Joined: 23 Jul 99
Posts: 311
Credit: 6,955,447
RAC: 0
United States
Message 625328 - Posted: 24 Aug 2007, 5:41:38 UTC - in response to Message 625326.  

You guys are splitting hairs here. If the policy is that nothing 'untrusted' is to be run on the protected cluster, then by definition running a BOINC project is illegal.

The fact Anthony is permitted to make that call for an outside the house application doesn't change the fact it's still an exception to rule, no matter how it arrives at the cluster.

That's why I was saying that methods exist to prevent even man in the middle attacks in the BOINC context in this scenario, therefore the restrictive policy is kind of silly once you make the exception in the first place.

Alinator


Let me quote Anthony from his initial post:
Id like the workstation to automatically retrieve new WU's for the entire cluster and submit finished WU's from the entire cluster without the nodes having to make actual contact.

We are simply devising a strategy for nodes to run the applications without making any outside calls (only to the master workstation). The master workstation will handle external communications with the project.

ID: 625328 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 625330 - Posted: 24 Aug 2007, 5:50:11 UTC - in response to Message 625328.  
Last modified: 24 Aug 2007, 6:09:52 UTC



Let me quote Anthony from his initial post:
Id like the workstation to automatically retrieve new WU's for the entire cluster and submit finished WU's from the entire cluster without the nodes having to make actual contact.

We are simply devising a strategy for nodes to run the applications without making any outside calls (only to the master workstation). The master workstation will handle external communications with the project.


Yes, I understand that fully. However, if you stop to think about what the security policy means as described, no matter what solution we devise to meet the 'letter of the law' in terms of access, it defeats the fundamental purpose of the restriction in the first place.

Like I said, an exception is an exception, regardless of the elegance of the workaround. Therefore all bets are off when it comes to security, unless you take personal responsibility for it.

Alinator

<edit> In any event, I love these kind of debates, and you guys have given me at least six months of new things to think about regarding this topic for situations where I am the final arbiter of what is and isn't allowed on a given LAN. ;-)

Alinator
ID: 625330 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 625782 - Posted: 24 Aug 2007, 17:41:28 UTC - in response to Message 625326.  

You guys are splitting hairs here. If the policy is that nothing 'untrusted' is to be run on the protected cluster, then by definition running a BOINC project is illegal.

I don't think that is the policy, the policy is "no internet connectivity."

We both understand that the policy should be "nothing untrusted" but his pointy-haired-boss didn't say that.

ID: 625782 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 625785 - Posted: 24 Aug 2007, 17:43:43 UTC - in response to Message 625328.  

You guys are splitting hairs here. If the policy is that nothing 'untrusted' is to be run on the protected cluster, then by definition running a BOINC project is illegal.

The fact Anthony is permitted to make that call for an outside the house application doesn't change the fact it's still an exception to rule, no matter how it arrives at the cluster.

That's why I was saying that methods exist to prevent even man in the middle attacks in the BOINC context in this scenario, therefore the restrictive policy is kind of silly once you make the exception in the first place.

Alinator


Let me quote Anthony from his initial post:
Id like the workstation to automatically retrieve new WU's for the entire cluster and submit finished WU's from the entire cluster without the nodes having to make actual contact.

We are simply devising a strategy for nodes to run the applications without making any outside calls (only to the master workstation). The master workstation will handle external communications with the project.

... and you could do that by "shuffling around" unmodified BOINC directories between an internet-connected machine and the nodes in the cluster.
ID: 625785 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 625790 - Posted: 24 Aug 2007, 17:53:06 UTC
Last modified: 24 Aug 2007, 17:54:06 UTC

I'm not going to quibble about the semantics of it. In reading through the whole thread I think it's pretty obvious from what's been said:

Internet Access = Untrustworthy + Outside Apps + Security Risk

LAN Access = Trusted + In House applications + Low Risk

My main point is once you've made the exception to one of the factors in the LAN equation, then you have dropped your pants (so to speak) for that particular case. Therefore, how it gets to cluster physically is somewhat moot at that point, don't you think?

Alinator
ID: 625790 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 625889 - Posted: 24 Aug 2007, 20:15:47 UTC - in response to Message 625790.  
Last modified: 24 Aug 2007, 20:16:07 UTC

I'm not going to quibble about the semantics of it. In reading through the whole thread I think it's pretty obvious from what's been said:

Internet Access = Untrustworthy + Outside Apps + Security Risk

LAN Access = Trusted + In House applications + Low Risk

My main point is once you've made the exception to one of the factors in the LAN equation, then you have dropped your pants (so to speak) for that particular case. Therefore, how it gets to cluster physically is somewhat moot at that point, don't you think?

Alinator

Oh, I completely agree, but I'm not the one who will get fired if the cluster gets hosed.

There are several machines around here that COULD run BOINC, but don't -- and I can't possibly be fired for mis-using company resources.

... and it appears that he's been given permission in this post so it is ultimately his call.
ID: 625889 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20140
Credit: 7,508,002
RAC: 20
United Kingdom
Message 628762 - Posted: 29 Aug 2007, 10:37:20 UTC

One thing to try might be to use one of the various Virtual Machine systems that allow you to aggregate multiple machines as though you had one multi-cpu NUMA machine. Some will transparently migrate entire processes around a network of machines for load balancing.

Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 628762 · Report as offensive
Previous · 1 · 2 · 3

Message boards : Number crunching : Setting up an isolated cluster, help please.


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.