Message boards :
Number crunching :
Setting up an isolated cluster, help please.
Message board moderation
Previous · 1 · 2 · 3
Author | Message |
---|---|
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
I think you missed his point. It's not that he isn't aware of secure solutions to the problem. The problem is he's not the final arbiter of the restrictive policy, even if that policy doesn't make a whole lot of sense once they said he could run something not inhouse in the first place. IOW's no matter what you do, something is acting as a 'proxy' to the outside, by defintion. Alinator |
Anthony Q. Bachler Send message Joined: 3 Jul 07 Posts: 29 Credit: 608,463 RAC: 0 |
OK, I think I might have a solution. Im going to try crunching some WU from my workstation on a computer I have at home, and then see if the workstation will register it and upload it. The actual exercise will be Ill copy the WU's from my workstation over to a stick, then copy the files from the stick to my home system, then when teh WU's at home are finished ill copy them back intot he workstations directory. |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
OK, I think I might have a solution. Im going to try crunching some WU from my workstation on a computer I have at home, and then see if the workstation will register it and upload it. The actual exercise will be Ill copy the WU's from my workstation over to a stick, then copy the files from the stick to my home system, then when teh WU's at home are finished ill copy them back intot he workstations directory. I think you're seeing my point though that even if you collect the work at home and then transplant to the cluster from inside it's firewall, it's still the net same difference. You have acted as a human proxy for the cluster and are still trusting the work it's going to run (both application and data) is 'kosher'. At that point it really doesn't make a lot of difference how it got there if it turned out to be malicious. IOW's Sneakernet or Internet is pretty much the same here. Alinator |
doublechaz Send message Joined: 17 Nov 00 Posts: 89 Credit: 76,455,865 RAC: 735 |
Yes, but seemlingly his policy boss doesn't understand that packets and the payload of those packets are the same thing WRT data delivery to a network of computers. The policy is 'no active direct or proxy connection to the net. But any *data* delivered to the cluster is fine as long as it doesn't slow production jobs, or compromise security.' It's a subtle wording difference that we don't see, but that boss does. Doesn't matter if it is a real difference or a semantic difference. The boss said. I think you can create a share on the head for each node with an instance of boinc in it. The head can run each in turn to get a machine id and cache (lying about the host name so that they are all separate machines) and then stop. Then each node can run in that share to process. Every 8 or 12 hours the nodes all shut down the boinc process. The head runs each in turn to return/fetch WUs. Then the nodes pick up again. Should be a fairly simple scripting job apart from the getting unique machine IDs for each instance. |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
Im setting up a cluster but the cluster will not have a direct internet connection. The main workstation is connected to the internet and to the cluster lan, but the connections won't be bridged. What I would like to do is have a shared directory on the workstation that contains all the WU's and have the nodes fo the cluster run those WU's. Is this possible or is there some other way i have to set up an off-line cluster? I'd like the workstation to automatically retrieve new WU's for the entire cluster and submit finished WU's from the entire cluster without the nodes having to make actual contact. Sounds like what you really want is a cluster of N nodes with one controlling workstation. N+1 computers. It's a very interesting challenge and one I expect others have run into before. Try doing a Google search on "BOINC cluster". I see no reason why it couldn't be done, however, doing it elegantly may require a BOINC Cluster Client to be written in C++. You could trick it by manually moving files to each computer and editing the client_state.xml files on each computer...but that's way too tedious and easy to mess up. I studied the architecture on the BOINC wiki http://www.boinc-wiki.info/BOINC_System_Architecture and decided how I would architect this in order to make a cluster work with a minimal amount of coding. You would need the following configuration to start with: N nodes each with: a copy of BOINC 1 controlling node with: a copy of BOINC & BOINC Manager two network cards (one for each subnet) internet connection Apache web server with PHP module installed lots of disk space to hold all work units & finished output files Essentially, the idea is the controlling node (a.k.a. "controller" from now on) would have a web server that would act as your own personal BOINC Project Server. All the nodes would access the controller's web server to download files and get new work. When done, they would upload the finished file to that same node. The tricky stuff is this: - Write PHP pages that "fool" the nodes to think it's a project server. - Join this fake project with BOINC client on each node (or just copy the same BOINC folder to each node once you've done this). - Run BOINC with Suspend always. Once the nodes complete their work unit and return the result, you'll need to "Run Always" BOINC on the controller. Since the finished file will already be there, it will say completed 100% immediately (if all works right) and upload the file to the real project's server. After all the work units have been uploaded and new work is downloaded, set the controller to Suspend. This cycle of getting a bunch of work units and uploading all at once can repeat as long as you switch them at the right time. The trick is all the work units must have finished files on the controller, else the controller will try to crunch the WU itself. I can't swear that this will work, but it's the best architecture I can come up with. Writing a few PHP pages is nothing compared to writing a custom BOINC client in C++. |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
I think you can create a share on the head for each node with an instance of boinc in it. The head can run each in turn to get a machine id and cache (lying about the host name so that they are all separate machines) and then stop. Then each node can run in that share to process. Every 8 or 12 hours the nodes all shut down the boinc process. The head runs each in turn to return/fetch WUs. Then the nodes pick up again. Should be a fairly simple scripting job apart from the getting unique machine IDs for each instance. Could you elaborate on: "Then each node can run in that share to process." How will each node which work unit to crunch? Won't you have to write each node's client_state.xml file yourself from the controller node? |
michael37 Send message Joined: 23 Jul 99 Posts: 311 Credit: 6,955,447 RAC: 0 |
I can't swear that this will work, but it's the best architecture I can come up with. Writing a few PHP pages is nothing compared to writing a custom BOINC client in C++. Well, there is a "blessed" architecture described here: http://boinc.berkeley.edu/proxy_server.php |
michael37 Send message Joined: 23 Jul 99 Posts: 311 Credit: 6,955,447 RAC: 0 |
OK, I think I might have a solution. Im going to try crunching some WU from my workstation on a computer I have at home, and then see if the workstation will register it and upload it. The actual exercise will be Ill copy the WU's from my workstation over to a stick, then copy the files from the stick to my home system, then when teh WU's at home are finished ill copy them back intot he workstations directory. I have an interesting question. Can the master workstation launch jobs on the cluster nodes via cluster lan? If yes, here is my 'alternative' architecture for your cluster. 1. Share boinc directory from master workstation to all cluster nodes. 2. Create a custom "application" which launches job on a randomly selected computer. This is not that difficult since modern boinc versions create a slot/<NUMBER> directory and creates "Unix-like soft link" to the job. For example, the workunit file is called work_unit.sah with contents: <soft_link>../../projects/setiathome.berkeley.edu/02mr07ah.32435.11524.12.5.191</soft_link> 3. Create app_info.xml in projects/setiathome.berkeley.edu for a your custom application, which not a really an application, but more like a remote job launcher. 4. Make a copy of modified boinc which thinks you have 128 CPUs. That should be it! |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
Can the master workstation launch jobs on the cluster nodes via cluster lan? No, like you said, if it did, the cluster solution would be trivial. Unfortunately, science applications are designed to be spawned by the BOINC daemon. A sneaky programmer would have to write a simple loader that interfaces with the science apps in the matter described by the BOINC API. |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
Well, there is a "blessed" architecture described here: http://boinc.berkeley.edu/proxy_server.php That would essentially be a more complicated version of what I proposed. I still think my solution would be easier to implement. |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
You guys are splitting hairs here. If the policy is that nothing 'untrusted' is to be run on the protected cluster, then by definition running a BOINC project is illegal. The fact Anthony is permitted to make that call for an outside the house application doesn't change the fact it's still an exception to rule, no matter how it arrives at the cluster. That's why I was saying that methods exist to prevent even man in the middle attacks in the BOINC context in this scenario, therefore the restrictive policy is kind of silly once you make the exception in the first place. Alinator |
michael37 Send message Joined: 23 Jul 99 Posts: 311 Credit: 6,955,447 RAC: 0 |
You guys are splitting hairs here. If the policy is that nothing 'untrusted' is to be run on the protected cluster, then by definition running a BOINC project is illegal. Let me quote Anthony from his initial post: Id like the workstation to automatically retrieve new WU's for the entire cluster and submit finished WU's from the entire cluster without the nodes having to make actual contact. We are simply devising a strategy for nodes to run the applications without making any outside calls (only to the master workstation). The master workstation will handle external communications with the project. |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
Yes, I understand that fully. However, if you stop to think about what the security policy means as described, no matter what solution we devise to meet the 'letter of the law' in terms of access, it defeats the fundamental purpose of the restriction in the first place. Like I said, an exception is an exception, regardless of the elegance of the workaround. Therefore all bets are off when it comes to security, unless you take personal responsibility for it. Alinator <edit> In any event, I love these kind of debates, and you guys have given me at least six months of new things to think about regarding this topic for situations where I am the final arbiter of what is and isn't allowed on a given LAN. ;-) Alinator |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
You guys are splitting hairs here. If the policy is that nothing 'untrusted' is to be run on the protected cluster, then by definition running a BOINC project is illegal. I don't think that is the policy, the policy is "no internet connectivity." We both understand that the policy should be "nothing untrusted" but his pointy-haired-boss didn't say that. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
You guys are splitting hairs here. If the policy is that nothing 'untrusted' is to be run on the protected cluster, then by definition running a BOINC project is illegal. ... and you could do that by "shuffling around" unmodified BOINC directories between an internet-connected machine and the nodes in the cluster. |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
I'm not going to quibble about the semantics of it. In reading through the whole thread I think it's pretty obvious from what's been said: Internet Access = Untrustworthy + Outside Apps + Security Risk LAN Access = Trusted + In House applications + Low Risk My main point is once you've made the exception to one of the factors in the LAN equation, then you have dropped your pants (so to speak) for that particular case. Therefore, how it gets to cluster physically is somewhat moot at that point, don't you think? Alinator |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
I'm not going to quibble about the semantics of it. In reading through the whole thread I think it's pretty obvious from what's been said: Oh, I completely agree, but I'm not the one who will get fired if the cluster gets hosed. There are several machines around here that COULD run BOINC, but don't -- and I can't possibly be fired for mis-using company resources. ... and it appears that he's been given permission in this post so it is ultimately his call. |
ML1 Send message Joined: 25 Nov 01 Posts: 20147 Credit: 7,508,002 RAC: 20 |
One thing to try might be to use one of the various Virtual Machine systems that allow you to aggregate multiple machines as though you had one multi-cpu NUMA machine. Some will transparently migrate entire processes around a network of machines for load balancing. Happy crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.