Setting up an isolated cluster, help please.

Author	Message
jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 623501 - Posted: 20 Aug 2007, 18:51:54 UTC - in response to Message 623499. Last modified: 20 Aug 2007, 18:55:50 UTC On a side issue I wonder if anyone's tried suspending a given result, then running it on a different machine with the science app directly...[sans Boinc]... don't know if that's feasible, but if so it could allow batch runs on remote machines.... "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 623501 ·

Dotsch Volunteer tester Send message Joined: 9 Jun 99 Posts: 2422 Credit: 919,393 RAC: 0	Message 623507 - Posted: 20 Aug 2007, 19:05:48 UTC - in response to Message 623496. Last modified: 20 Aug 2007, 19:06:24 UTC It'd be easier to find a secure means of tunnelling through the network... I think so, too. I see absoult no benefit if a cluster handle the BOINC instances. There too much problems as the other posters has written. If you configure the a firewall at the client and/or the workstation to restrict the traffic to the outside, and configure ACLs within the proxy and restrict the connections from the clients to the SETI servers IPs for http requests only, it would be safe. ID: 623507 ·

Anthony Q. Bachler Send message Joined: 3 Jul 07 Posts: 29 Credit: 608,463 RAC: 0	Message 623511 - Posted: 20 Aug 2007, 19:14:00 UTC - in response to Message 623499. I looked into the Windows Cluster 2003, and basically its a non-starter. Basicalyl it costs 3 times as much and cant run non-HPC programs. I just dont see where it fills any needs that cant be filled by plain WindowsXP 64-bit and at 1/3 the cost. I think MS completely fails to comprehend the whole point of clusters, and thats maximizing the price/performance ratio. Adding an extra $300 per node simply makes their solution not cost effective. Especially when you consider thats nearly 50% of the cost per node in most installations. I just couldnt see where Cluster 2003 adds any value to my deployment. ID: 623511 ·

Anthony Q. Bachler Send message Joined: 3 Jul 07 Posts: 29 Credit: 608,463 RAC: 0	Message 623516 - Posted: 20 Aug 2007, 19:18:03 UTC - in response to Message 623507. It'd be easier to find a secure means of tunnelling through the network... I think so, too. I see absoult no benefit if a cluster handle the BOINC instances. There too much problems as the other posters has written. If you configure the a firewall at the client and/or the workstation to restrict the traffic to the outside, and configure ACLs within the proxy and restrict the connections from the clients to the SETI servers IPs for http requests only, it would be safe. Adding ANY kind of bypass to the seperation of the cluster from the internet is simply not going to happen. Its too much of a security risk, Id rather let the nodes go idle. ID: 623516 ·

Dotsch Volunteer tester Send message Joined: 9 Jun 99 Posts: 2422 Credit: 919,393 RAC: 0	Message 623519 - Posted: 20 Aug 2007, 19:19:11 UTC - in response to Message 623511. I looked into the Windows Cluster 2003, and basically its a non-starter. Basicalyl it costs 3 times as much and cant run non-HPC programs. I just dont see where it fills any needs that cant be filled by plain WindowsXP 64-bit and at 1/3 the cost. I think MS completely fails to comprehend the whole point of clusters, and thats maximizing the price/performance ratio. Adding an extra $300 per node simply makes their solution not cost effective. Especially when you consider thats nearly 50% of the cost per node in most installations. I just couldnt see where Cluster 2003 adds any value to my deployment. You must differ between a HA (high availability cluster) and a HPC cluster. HPC is what you do, shifting out task to different systems to get more CPU power and faster processing. HA Clustering is to keep an mission critical application running if hardware or applications fail, for example in enterprise enviroments for Oracle, SAP,... So far I know is the W2K3 cluster a HA clustering software. ID: 623519 ·

Dotsch Volunteer tester Send message Joined: 9 Jun 99 Posts: 2422 Credit: 919,393 RAC: 0	Message 623521 - Posted: 20 Aug 2007, 19:22:56 UTC - in response to Message 623516. It'd be easier to find a secure means of tunnelling through the network... I think so, too. I see absoult no benefit if a cluster handle the BOINC instances. There too much problems as the other posters has written. If you configure the a firewall at the client and/or the workstation to restrict the traffic to the outside, and configure ACLs within the proxy and restrict the connections from the clients to the SETI servers IPs for http requests only, it would be safe. Adding ANY kind of bypass to the seperation of the cluster from the internet is simply not going to happen. Its too much of a security risk, Id rather let the nodes go idle. Hm, but I think you should also attend, that if you such a high security requirements, if it is OK to install a application which is not certified from your company at the cluster nodes. ID: 623521 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 623526 - Posted: 20 Aug 2007, 19:30:39 UTC - in response to Message 623521. Last modified: 20 Aug 2007, 19:35:17 UTC Well, for all intents and purposes, the nodes are isolated. Assuming that you can (are allowed to) run Boinc on them then I can think of one way to avoid the danger of messing up boinc installations. Basically to have a flash disk per machine, with it's Boinc install on it, Created on the master machine. with a large day cache crunching away on each drive, with it's network activity suspended, unplug and run it on the master [ with network enabled] every now and then for each machine. That's a lot of running around for 64 machines :D "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 623526 ·

Anthony Q. Bachler Send message Joined: 3 Jul 07 Posts: 29 Credit: 608,463 RAC: 0	Message 623528 - Posted: 20 Aug 2007, 19:34:11 UTC - in response to Message 623526. Well, for all intents and purposes, the nodes are isolated. Assuming that you can (are allowed to) run Boinc on them then I can think of one way to avoid the danger of messing up boinc installations. Basically to have a flash disk per machine, with it's Boinc install on it, Created on the master machine. with a large day cache crunching away on each drive, with it's network activity suspended, unplug and run it on the master [ with network enabled] every now and then for each machine. That's a lot of running around for 64 machines :D Not to mention the cost of 64 flash drives. ID: 623528 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 623530 - Posted: 20 Aug 2007, 19:35:36 UTC - in response to Message 623528. Last modified: 20 Aug 2007, 19:38:16 UTC Not to mention the cost of 64 flash drives. What of 64 separate boinc installs running in a shared area [on the master]? if the nodes have no connectivity [to the outside world] then all you'd need to do is run each boinc on the master every now and then with network enabled... ensuring that the remote node wasn't running it....[coordinated with] scheduled tasks ? batch files? "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 623530 ·

Alinator Volunteer tester Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0	Message 623534 - Posted: 20 Aug 2007, 19:39:58 UTC Thinking about this some, out of curiosity do the computing nodes do anything else beside just running the in house applications? IOW's are we talking about a cluster of GP workstations here or dedicated crunching iron. Alinator ID: 623534 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 623563 - Posted: 20 Aug 2007, 20:52:22 UTC - in response to Message 623499. I have seen some people using a hazardous work unit transplant scheme to move workunits from a boinc installation to another machine, then move them back again for upload... If there could be multiple boinc installations on the master, this would still present as multiple machines, but could probably be scripted.... splicing out workunits from an installed Boinc setup as a single Boinc Client would be tricky but theoretically possible I suppose [ ther'd be many issues to find and work around]. It'd be easier to find a secure means of tunnelling through the network... yes, I suppose that would be the only way, since otherwise the daily WU quota will only keep about 6 machines busy. Even if the project lets gives back a WU poitn fro each one submitted, it woudl still only let me feed 100 cpu off a single account, which woudl be a mere 12 out of 64 systems. ... only if SETI thinks it is one computer, not 64. So, you install BOINC on your gateway machine, load it up with work, and shuffle that directory out to one of the "cluster" machines. This machine has an ID. Zap the BOINC directory, and reinstall. You now have a new machine ID. Shuffle the same directory out to a different machine. Do that 62 more times. Set up an automated task that periodically stops crunching on one cluster machine, pulls the directory back to the original location, and restarts BOINC on the "connected" machine, so it can have an hour or so of "network connectivity." This will work because the "computer" is the directory, it isn't the physical hardware. ID: 623563 ·

Anthony Q. Bachler Send message Joined: 3 Jul 07 Posts: 29 Credit: 608,463 RAC: 0	Message 623571 - Posted: 20 Aug 2007, 21:10:34 UTC - in response to Message 623534. Last modified: 20 Aug 2007, 21:12:15 UTC Thinking about this some, out of curiosity do the computing nodes do anything else beside just running the in house applications? IOW's are we talking about a cluster of GP workstations here or dedicated crunching iron. Alinator Dedicated rackmount crunchers ID: 623571 ·

Christoph Volunteer tester Send message Joined: 21 Apr 03 Posts: 76 Credit: 355,173 RAC: 0	Message 623732 - Posted: 21 Aug 2007, 1:14:56 UTC Back on school the IT Teacher made all the new PC's dual boot machines. Windows had not been allowed to connect to the Internet, but Linux was. I guess, the server in the room was running Linux. This an option for you? Happy crunching, Christoph Christoph ID: 623732 ·

Anthony Q. Bachler Send message Joined: 3 Jul 07 Posts: 29 Credit: 608,463 RAC: 0	Message 624375 - Posted: 22 Aug 2007, 17:02:59 UTC Since we use this cluster to run a proprietary API, I cant really change the OS. Im stuck with 64 bit XP. ID: 624375 ·

Alinator Volunteer tester Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0	Message 624387 - Posted: 22 Aug 2007, 17:19:14 UTC - in response to Message 624375. Last modified: 22 Aug 2007, 17:31:40 UTC Since we use this cluster to run a proprietary API, I cant really change the OS. Im stuck with 64 bit XP. Well then, since ultimately no matter which way you went you end up having to trust and/or verify that any outside app is safe to run on the cluster, the real issue here is you have to prove that to whomever is the ultimate authority for making the no outside access rule. If you can argue a convincing case for that, then the trusted proxy gateway would be the best way to go to ensure you had a chance to look over the comm stream to the project before it actually went to the cluster nodes. I don't see where that would present much, if any additional risk over the 'agenting' strategy you've been considering. That's why I was asking about the nodes themselves. If they were GP workstations then I can see the problem with allowing any outside access on port 80, but since it's dedicated 'iron' behind it's own firewall you can get really picky about where it can go beyond the LAN (I'm assuming you're not 'batching' jobs to it and that users on the LAN have access). If you use the trusted proxy gateway, then even a man in the middle would not have much of a clue about what they really are all about, and still has to get through two 'checkpoint/roadblocks' to gain any access. Alinator ID: 624387 ·

Anthony Q. Bachler Send message Joined: 3 Jul 07 Posts: 29 Credit: 608,463 RAC: 0	Message 624486 - Posted: 22 Aug 2007, 21:16:34 UTC - in response to Message 624387. Well, ultimately the issue boils down to the fact that I cant do a proxy. I just dont have any flexability on that point. The policy is that the IT guy ( thats me) can run whatever low priority tasks they choose, as long as it doesnt compromise security, access outside networks, or interfere with the execution of primary workloads. I may actually have to muck around in the boinc manager source files and write a custom version to cycle through multiple accounts and spit them into multiple folders. Alternateively, I may have to write a sutom SS that autolinks to boinc SS on the workstation/head node and directly allocates work units form that SS. That of course means writing 2 new versions of the boinc SS, one for the workstation and one for the nodes. ID: 624486 ·

Christoph Volunteer tester Send message Joined: 21 Apr 03 Posts: 76 Credit: 355,173 RAC: 0	Message 624489 - Posted: 22 Aug 2007, 21:21:27 UTC - in response to Message 624375. Since we use this cluster to run a proprietary API, I cant really change the OS. Im stuck with 64 bit XP. I understand that, but why not an additional Linux installation? You need it for your stuff, boot windows. It's done, boot Linux and crunch for SETI@home. Ah, do you mean, the server must also bee Win? Ok, I don't know if it's possible to tell Windows not to let Windows pass but to let Linux. But if so, and if you make sure, that Linux is not able to access the XP partition, than why not dual-boot? No risk to your installation, due to no access. Happy crunching, Christoph Christoph ID: 624489 ·

Alinator Volunteer tester Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0	Message 624495 - Posted: 22 Aug 2007, 21:31:02 UTC - in response to Message 624486. Well, ultimately the issue boils down to the fact that I cant do a proxy. I just dont have any flexability on that point. The policy is that the IT guy ( thats me) can run whatever low priority tasks they choose, as long as it doesnt compromise security, access outside networks, or interfere with the execution of primary workloads. I may actually have to muck around in the boinc manager source files and write a custom version to cycle through multiple accounts and spit them into multiple folders. Alternateively, I may have to write a sutom SS that autolinks to boinc SS on the workstation/head node and directly allocates work units form that SS. That of course means writing 2 new versions of the boinc SS, one for the workstation and one for the nodes. I hear ya! Don't you just hate showstoppers like that, and it looks like you've got an interesting challenge ahead you. ;-) In any event, what you need to do to accomodate the security policy pretty much takes care of the proxying 'agent' for clusters (which has another thread currently going on now), so if you published your solution on your own website or other vehicle later on, I'm sure there'd be a fair amount of interest in it. Alinator ID: 624495 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 624526 - Posted: 22 Aug 2007, 22:55:28 UTC - in response to Message 624486. Well, ultimately the issue boils down to the fact that I cant do a proxy. I just dont have any flexability on that point. The policy is that the IT guy ( thats me) can run whatever low priority tasks they choose, as long as it doesnt compromise security, access outside networks, or interfere with the execution of primary workloads. I may actually have to muck around in the boinc manager source files and write a custom version to cycle through multiple accounts and spit them into multiple folders. Alternateively, I may have to write a sutom SS that autolinks to boinc SS on the workstation/head node and directly allocates work units form that SS. That of course means writing 2 new versions of the boinc SS, one for the workstation and one for the nodes. I think it can be done with batch files. ID: 624526 ·

michael37 Send message Joined: 23 Jul 99 Posts: 311 Credit: 6,955,447 RAC: 0	Message 625108 - Posted: 23 Aug 2007, 21:36:12 UTC - in response to Message 624486. Well, ultimately the issue boils down to the fact that I cant do a proxy. I just dont have any flexability on that point. The policy is that the IT guy ( thats me) can run whatever low priority tasks they choose, as long as it doesnt compromise security, access outside networks, or interfere with the execution of primary workloads. I may actually have to muck around in the boinc manager source files and write a custom version to cycle through multiple accounts and spit them into multiple folders. Alternateively, I may have to write a sutom SS that autolinks to boinc SS on the workstation/head node and directly allocates work units form that SS. That of course means writing 2 new versions of the boinc SS, one for the workstation and one for the nodes. What about a "truly part time proxy" with a very high security and highly limited access control? I have configured a cluster like that with squid running part-time (once a day for ~1/2 hour is enough) on the master workstation and the Boinc clients on rackmounted cluster nodes configured for this proxy. Squid is very easy to configure so that it only connects to setiathome.berkeley.edu sites and no other sites, and it services only the cluster nodes. ID: 625108 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.