Setting up an isolated cluster, help please.

Message boards : Number crunching : Setting up an isolated cluster, help please.
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 623501 - Posted: 20 Aug 2007, 18:51:54 UTC - in response to Message 623499.  
Last modified: 20 Aug 2007, 18:55:50 UTC

On a side issue I wonder if anyone's tried suspending a given result, then running it on a different machine with the science app directly...[sans Boinc]... don't know if that's feasible, but if so it could allow batch runs on remote machines....
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 623501 · Report as offensive
Dotsch
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 2422
Credit: 919,393
RAC: 0
Germany
Message 623507 - Posted: 20 Aug 2007, 19:05:48 UTC - in response to Message 623496.  
Last modified: 20 Aug 2007, 19:06:24 UTC

It'd be easier to find a secure means of tunnelling through the network...

I think so, too.
I see absoult no benefit if a cluster handle the BOINC instances. There too much problems as the other posters has written.
If you configure the a firewall at the client and/or the workstation to restrict the traffic to the outside, and configure ACLs within the proxy and restrict the connections from the clients to the SETI servers IPs for http requests only, it would be safe.

ID: 623507 · Report as offensive
Profile Anthony Q. Bachler

Send message
Joined: 3 Jul 07
Posts: 29
Credit: 608,463
RAC: 0
United States
Message 623511 - Posted: 20 Aug 2007, 19:14:00 UTC - in response to Message 623499.  

I looked into the Windows Cluster 2003, and basically its a non-starter. Basicalyl it costs 3 times as much and cant run non-HPC programs. I just dont see where it fills any needs that cant be filled by plain WindowsXP 64-bit and at 1/3 the cost. I think MS completely fails to comprehend the whole point of clusters, and thats maximizing the price/performance ratio. Adding an extra $300 per node simply makes their solution not cost effective. Especially when you consider thats nearly 50% of the cost per node in most installations. I just couldnt see where Cluster 2003 adds any value to my deployment.
ID: 623511 · Report as offensive
Profile Anthony Q. Bachler

Send message
Joined: 3 Jul 07
Posts: 29
Credit: 608,463
RAC: 0
United States
Message 623516 - Posted: 20 Aug 2007, 19:18:03 UTC - in response to Message 623507.  

It'd be easier to find a secure means of tunnelling through the network...

I think so, too.
I see absoult no benefit if a cluster handle the BOINC instances. There too much problems as the other posters has written.
If you configure the a firewall at the client and/or the workstation to restrict the traffic to the outside, and configure ACLs within the proxy and restrict the connections from the clients to the SETI servers IPs for http requests only, it would be safe.


Adding ANY kind of bypass to the seperation of the cluster from the internet is simply not going to happen. Its too much of a security risk, Id rather let the nodes go idle.
ID: 623516 · Report as offensive
Dotsch
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 2422
Credit: 919,393
RAC: 0
Germany
Message 623519 - Posted: 20 Aug 2007, 19:19:11 UTC - in response to Message 623511.  

I looked into the Windows Cluster 2003, and basically its a non-starter. Basicalyl it costs 3 times as much and cant run non-HPC programs. I just dont see where it fills any needs that cant be filled by plain WindowsXP 64-bit and at 1/3 the cost. I think MS completely fails to comprehend the whole point of clusters, and thats maximizing the price/performance ratio. Adding an extra $300 per node simply makes their solution not cost effective. Especially when you consider thats nearly 50% of the cost per node in most installations. I just couldnt see where Cluster 2003 adds any value to my deployment.

You must differ between a HA (high availability cluster) and a HPC cluster.
HPC is what you do, shifting out task to different systems to get more CPU power and faster processing. HA Clustering is to keep an mission critical application running if hardware or applications fail, for example in enterprise enviroments for Oracle, SAP,...
So far I know is the W2K3 cluster a HA clustering software.

ID: 623519 · Report as offensive
Dotsch
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 2422
Credit: 919,393
RAC: 0
Germany
Message 623521 - Posted: 20 Aug 2007, 19:22:56 UTC - in response to Message 623516.  

It'd be easier to find a secure means of tunnelling through the network...

I think so, too.
I see absoult no benefit if a cluster handle the BOINC instances. There too much problems as the other posters has written.
If you configure the a firewall at the client and/or the workstation to restrict the traffic to the outside, and configure ACLs within the proxy and restrict the connections from the clients to the SETI servers IPs for http requests only, it would be safe.


Adding ANY kind of bypass to the seperation of the cluster from the internet is simply not going to happen. Its too much of a security risk, Id rather let the nodes go idle.

Hm, but I think you should also attend, that if you such a high security requirements, if it is OK to install a application which is not certified from your company at the cluster nodes.
ID: 623521 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 623526 - Posted: 20 Aug 2007, 19:30:39 UTC - in response to Message 623521.  
Last modified: 20 Aug 2007, 19:35:17 UTC

Well, for all intents and purposes, the nodes are isolated. Assuming that you can (are allowed to) run Boinc on them then I can think of one way to avoid the danger of messing up boinc installations.

Basically to have a flash disk per machine, with it's Boinc install on it, Created on the master machine. with a large day cache crunching away on each drive, with it's network activity suspended, unplug and run it on the master [ with network enabled] every now and then for each machine. That's a lot of running around for 64 machines :D


"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 623526 · Report as offensive
Profile Anthony Q. Bachler

Send message
Joined: 3 Jul 07
Posts: 29
Credit: 608,463
RAC: 0
United States
Message 623528 - Posted: 20 Aug 2007, 19:34:11 UTC - in response to Message 623526.  

Well, for all intents and purposes, the nodes are isolated. Assuming that you can (are allowed to) run Boinc on them then I can think of one way to avoid the danger of messing up boinc installations.

Basically to have a flash disk per machine, with it's Boinc install on it, Created on the master machine. with a large day cache crunching away on each drive, with it's network activity suspended, unplug and run it on the master [ with network enabled] every now and then for each machine. That's a lot of running around for 64 machines :D


Not to mention the cost of 64 flash drives.
ID: 623528 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 623530 - Posted: 20 Aug 2007, 19:35:36 UTC - in response to Message 623528.  
Last modified: 20 Aug 2007, 19:38:16 UTC


Not to mention the cost of 64 flash drives.

What of 64 separate boinc installs running in a shared area [on the master]? if the nodes have no connectivity [to the outside world] then all you'd need to do is run each boinc on the master every now and then with network enabled... ensuring that the remote node wasn't running it....[coordinated with] scheduled tasks ? batch files?

"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 623530 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 623534 - Posted: 20 Aug 2007, 19:39:58 UTC

Thinking about this some, out of curiosity do the computing nodes do anything else beside just running the in house applications? IOW's are we talking about a cluster of GP workstations here or dedicated crunching iron.

Alinator
ID: 623534 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 623563 - Posted: 20 Aug 2007, 20:52:22 UTC - in response to Message 623499.  

I have seen some people using a hazardous work unit transplant scheme to move workunits from a boinc installation to another machine, then move them back again for upload... If there could be multiple boinc installations on the master, this would still present as multiple machines, but could probably be scripted.... splicing out workunits from an installed Boinc setup as a single Boinc Client would be tricky but theoretically possible I suppose [ ther'd be many issues to find and work around]. It'd be easier to find a secure means of tunnelling through the network...


yes, I suppose that would be the only way, since otherwise the daily WU quota will only keep about 6 machines busy. Even if the project lets gives back a WU poitn fro each one submitted, it woudl still only let me feed 100 cpu off a single account, which woudl be a mere 12 out of 64 systems.

... only if SETI thinks it is one computer, not 64.

So, you install BOINC on your gateway machine, load it up with work, and shuffle that directory out to one of the "cluster" machines. This machine has an ID.

Zap the BOINC directory, and reinstall. You now have a new machine ID. Shuffle the same directory out to a different machine.

Do that 62 more times.

Set up an automated task that periodically stops crunching on one cluster machine, pulls the directory back to the original location, and restarts BOINC on the "connected" machine, so it can have an hour or so of "network connectivity."

This will work because the "computer" is the directory, it isn't the physical hardware.
ID: 623563 · Report as offensive
Profile Anthony Q. Bachler

Send message
Joined: 3 Jul 07
Posts: 29
Credit: 608,463
RAC: 0
United States
Message 623571 - Posted: 20 Aug 2007, 21:10:34 UTC - in response to Message 623534.  
Last modified: 20 Aug 2007, 21:12:15 UTC

Thinking about this some, out of curiosity do the computing nodes do anything else beside just running the in house applications? IOW's are we talking about a cluster of GP workstations here or dedicated crunching iron.

Alinator


Dedicated rackmount crunchers
ID: 623571 · Report as offensive
Christoph
Volunteer tester

Send message
Joined: 21 Apr 03
Posts: 76
Credit: 355,173
RAC: 0
Germany
Message 623732 - Posted: 21 Aug 2007, 1:14:56 UTC

Back on school the IT Teacher made all the new PC's dual boot machines. Windows had not been allowed to connect to the Internet, but Linux was. I guess, the server in the room was running Linux. This an option for you?

Happy crunching, Christoph
Christoph
ID: 623732 · Report as offensive
Profile Anthony Q. Bachler

Send message
Joined: 3 Jul 07
Posts: 29
Credit: 608,463
RAC: 0
United States
Message 624375 - Posted: 22 Aug 2007, 17:02:59 UTC

Since we use this cluster to run a proprietary API, I cant really change the OS. Im stuck with 64 bit XP.
ID: 624375 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 624387 - Posted: 22 Aug 2007, 17:19:14 UTC - in response to Message 624375.  
Last modified: 22 Aug 2007, 17:31:40 UTC

Since we use this cluster to run a proprietary API, I cant really change the OS. Im stuck with 64 bit XP.


Well then, since ultimately no matter which way you went you end up having to trust and/or verify that any outside app is safe to run on the cluster, the real issue here is you have to prove that to whomever is the ultimate authority for making the no outside access rule. If you can argue a convincing case for that, then the trusted proxy gateway would be the best way to go to ensure you had a chance to look over the comm stream to the project before it actually went to the cluster nodes. I don't see where that would present much, if any additional risk over the 'agenting' strategy you've been considering.

That's why I was asking about the nodes themselves. If they were GP workstations then I can see the problem with allowing any outside access on port 80, but since it's dedicated 'iron' behind it's own firewall you can get really picky about where it can go beyond the LAN (I'm assuming you're not 'batching' jobs to it and that users on the LAN have access). If you use the trusted proxy gateway, then even a man in the middle would not have much of a clue about what they really are all about, and still has to get through two 'checkpoint/roadblocks' to gain any access.

Alinator
ID: 624387 · Report as offensive
Profile Anthony Q. Bachler

Send message
Joined: 3 Jul 07
Posts: 29
Credit: 608,463
RAC: 0
United States
Message 624486 - Posted: 22 Aug 2007, 21:16:34 UTC - in response to Message 624387.  

Well, ultimately the issue boils down to the fact that I cant do a proxy. I just dont have any flexability on that point. The policy is that the IT guy ( thats me) can run whatever low priority tasks they choose, as long as it doesnt compromise security, access outside networks, or interfere with the execution of primary workloads.

I may actually have to muck around in the boinc manager source files and write a custom version to cycle through multiple accounts and spit them into multiple folders. Alternateively, I may have to write a sutom SS that autolinks to boinc SS on the workstation/head node and directly allocates work units form that SS. That of course means writing 2 new versions of the boinc SS, one for the workstation and one for the nodes.
ID: 624486 · Report as offensive
Christoph
Volunteer tester

Send message
Joined: 21 Apr 03
Posts: 76
Credit: 355,173
RAC: 0
Germany
Message 624489 - Posted: 22 Aug 2007, 21:21:27 UTC - in response to Message 624375.  

Since we use this cluster to run a proprietary API, I cant really change the OS. Im stuck with 64 bit XP.

I understand that, but why not an additional Linux installation? You need it for your stuff, boot windows. It's done, boot Linux and crunch for SETI@home. Ah, do you mean, the server must also bee Win? Ok, I don't know if it's possible to tell Windows not to let Windows pass but to let Linux. But if so, and if you make sure, that Linux is not able to access the XP partition, than why not dual-boot? No risk to your installation, due to no access.

Happy crunching, Christoph
Christoph
ID: 624489 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 624495 - Posted: 22 Aug 2007, 21:31:02 UTC - in response to Message 624486.  

Well, ultimately the issue boils down to the fact that I cant do a proxy. I just dont have any flexability on that point. The policy is that the IT guy ( thats me) can run whatever low priority tasks they choose, as long as it doesnt compromise security, access outside networks, or interfere with the execution of primary workloads.

I may actually have to muck around in the boinc manager source files and write a custom version to cycle through multiple accounts and spit them into multiple folders. Alternateively, I may have to write a sutom SS that autolinks to boinc SS on the workstation/head node and directly allocates work units form that SS. That of course means writing 2 new versions of the boinc SS, one for the workstation and one for the nodes.


I hear ya! Don't you just hate showstoppers like that, and it looks like you've got an interesting challenge ahead you. ;-)

In any event, what you need to do to accomodate the security policy pretty much takes care of the proxying 'agent' for clusters (which has another thread currently going on now), so if you published your solution on your own website or other vehicle later on, I'm sure there'd be a fair amount of interest in it.

Alinator


ID: 624495 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 624526 - Posted: 22 Aug 2007, 22:55:28 UTC - in response to Message 624486.  

Well, ultimately the issue boils down to the fact that I cant do a proxy. I just dont have any flexability on that point. The policy is that the IT guy ( thats me) can run whatever low priority tasks they choose, as long as it doesnt compromise security, access outside networks, or interfere with the execution of primary workloads.

I may actually have to muck around in the boinc manager source files and write a custom version to cycle through multiple accounts and spit them into multiple folders. Alternateively, I may have to write a sutom SS that autolinks to boinc SS on the workstation/head node and directly allocates work units form that SS. That of course means writing 2 new versions of the boinc SS, one for the workstation and one for the nodes.

I think it can be done with batch files.
ID: 624526 · Report as offensive
Profile michael37
Avatar

Send message
Joined: 23 Jul 99
Posts: 311
Credit: 6,955,447
RAC: 0
United States
Message 625108 - Posted: 23 Aug 2007, 21:36:12 UTC - in response to Message 624486.  

Well, ultimately the issue boils down to the fact that I cant do a proxy. I just dont have any flexability on that point. The policy is that the IT guy ( thats me) can run whatever low priority tasks they choose, as long as it doesnt compromise security, access outside networks, or interfere with the execution of primary workloads.

I may actually have to muck around in the boinc manager source files and write a custom version to cycle through multiple accounts and spit them into multiple folders. Alternateively, I may have to write a sutom SS that autolinks to boinc SS on the workstation/head node and directly allocates work units form that SS. That of course means writing 2 new versions of the boinc SS, one for the workstation and one for the nodes.


What about a "truly part time proxy" with a very high security and highly limited access control? I have configured a cluster like that with squid running part-time (once a day for ~1/2 hour is enough) on the master workstation and the Boinc clients on rackmounted cluster nodes configured for this proxy. Squid is very easy to configure so that it only connects to setiathome.berkeley.edu sites and no other sites, and it services only the cluster nodes.

ID: 625108 · Report as offensive
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : Setting up an isolated cluster, help please.


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.