Panic Mode On (20) Server problems

Author	Message
Sutaru Tsureku Volunteer tester Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5	Message 918167 - Posted: 15 Jul 2009, 18:43:41 UTC AFAIK, SETI@home Enhanced (MB) have min. 7 day deadline. So normally enough time for UL and REPORT. @ Richard Well evening in the pub and Cheers! ;-) ID: 918167 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 918168 - Posted: 15 Jul 2009, 18:45:58 UTC - in response to Message 918164. So far, I've waited since Sunday. Don't know why that should be the case. From what i can recall on Sat, Sun & most of Monday there were no problems with uploads. Late Monday (for reasons unknown) the upload server went off line & it only came back online a couple of hours ago (if that). Yes, I understand about traffic - just want to know what the average number of days one should wait for a job to upload. Even when things are congested, most uploads will go through in a few hours. If it's really bad it might take 12-24hrs. Usually thay go through on the first attempt. Grant Darwin NT ID: 918168 ·

arkayn Volunteer tester Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0	Message 918174 - Posted: 15 Jul 2009, 18:58:12 UTC - in response to Message 918156. I wish I could turn off networking, but I have another project that needs networking active. Hopefully soon they add in per-project networking options. ID: 918174 ·

Bill Walker Send message Joined: 4 Sep 99 Posts: 3868 Credit: 2,697,267 RAC: 0	Message 918178 - Posted: 15 Jul 2009, 19:11:55 UTC - in response to Message 918174. Last modified: 15 Jul 2009, 19:13:15 UTC I wish I could turn off networking, but I have another project that needs networking active. Hopefully soon they add in per-project networking options. I'll second that. I didn't want to interupt my other projects, especially when it looked like I might run out of SETI. When I had the time over the last few days, I would turn off network activity until one of my non-SETI jobs had some uploading or downloading to do, and then turn it back on long enough for the other project to get new work. It would be nicer if BOINC could handle this without my attention. Note to those who run BOINC - this may look like a SETI specific request today, but the other projects could get into the same situation as SETI someday, if they are lucky ;). ID: 918178 ·

BarryAZ Send message Joined: 1 Apr 01 Posts: 2580 Credit: 16,982,517 RAC: 0	Message 918197 - Posted: 15 Jul 2009, 20:12:14 UTC - in response to Message 918174. Agreed -- SETI isn't the only project that could use a project specific network off switch. My own read with the upload server (opinion only) is that with the large backlog of upload requests from the past 48 to 72 hours, even if it is running well at the server level, it is simply getting swamped big time. Sort of like the typical Tuesday outage congestion (with a typical server outage of 4 to 6 hours, I expect a congestion problem of 8 to 12 hours), only since the upload outage was so long, recovery will be proportional to that. Perhaps it will clear by the weekend, perhaps by Monday in time for the next outage. I wish I could turn off networking, but I have another project that needs networking active. Hopefully soon they add in per-project networking options. ID: 918197 ·

rebest Volunteer tester Send message Joined: 16 Apr 00 Posts: 1296 Credit: 45,357,093 RAC: 0	Message 918202 - Posted: 15 Jul 2009, 20:15:37 UTC - in response to Message 918156. I'm going to turn networking off and go out to the pub. Now that's the smartest idea I've seen all day! :) Join the PACK! ID: 918202 ·

zoom3+1=4 Volunteer tester Send message Joined: 30 Nov 03 Posts: 65746 Credit: 55,293,173 RAC: 49	Message 918203 - Posted: 15 Jul 2009, 20:18:48 UTC - in response to Message 918197. Last modified: 15 Jul 2009, 20:19:45 UTC Agreed -- SETI isn't the only project that could use a project specific network off switch. My own read with the upload server (opinion only) is that with the large backlog of upload requests from the past 48 to 72 hours, even if it is running well at the server level, it is simply getting swamped big time. Sort of like the typical Tuesday outage congestion (with a typical server outage of 4 to 6 hours, I expect a congestion problem of 8 to 12 hours), only since the upload outage was so long, recovery will be proportional to that. Perhaps it will clear by the weekend, perhaps by Monday in time for the next outage. I wish I could turn off networking, but I have another project that needs networking active. Hopefully soon they add in per-project networking options. That would be a nice feature for Boinc, Then I could tell S@H to stop wasting My other projects bandwidth instead of suspending the project(a waste of time that could be better spent crunching) which amounts to using a sledgehammer to swat a fly with. Doing NNT doesn't really cut It. The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's ID: 918203 ·

BarryAZ Send message Joined: 1 Apr 01 Posts: 2580 Credit: 16,982,517 RAC: 0	Message 918207 - Posted: 15 Jul 2009, 20:31:08 UTC - in response to Message 918178. Actually, other projects have run into problems for which a project specific network activity off switch would have reduced connect error messages at the user level and reduced network traffic to the specific project. I've seen outages over the years at Climate, Einstein, Rosetta, Spinhenge and others. These are outages where completed work (or in Climate's case trickles), is stuck in transfer mode and suspending the project won't stop the client from trying to connect. Further there are cases where you want to continue processing for a project (you still have a queue) but don't want to generate traffic to a specific project. Note to those who run BOINC - this may look like a SETI specific request today, but the other projects could get into the same situation as SETI someday, if they are lucky ;). ID: 918207 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 918212 - Posted: 15 Jul 2009, 20:39:20 UTC - in response to Message 918178. Note to those who run BOINC - this may look like a SETI specific request today, but the other projects could get into the same situation as SETI someday, if they are lucky ;). One of the incredibly cool things about BOINC is that it costs almost nothing to do distributed computing. So it is entirely possible for an individual to run a fair sized project, funded out of their own pocket, on a home internet connection. If you do that, on small servers with a cheap DSL line, it is very possible to run into exactly the same corner, just on a dramatically smaller scale. ID: 918212 ·

[AF>Libristes] erik Volunteer tester Send message Joined: 30 Jul 07 Posts: 19 Credit: 4,016,114 RAC: 0	Message 918213 - Posted: 15 Jul 2009, 20:42:55 UTC - in response to Message 918202. I think I understood SETI uploads problems. We must let time for time. Basically how can I interrupt SETI network communications without interrupt all BOINC network communications. I make that manually but now I want to go to bed. just a poet ID: 918213 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 918226 - Posted: 15 Jul 2009, 21:19:34 UTC - in response to Message 918207. Actually, other projects have run into problems for which a project specific network activity off switch would have reduced connect error messages at the user level and reduced network traffic to the specific project. I've seen outages over the years at Climate, Einstein, Rosetta, Spinhenge and others. These are outages where completed work (or in Climate's case trickles), is stuck in transfer mode and suspending the project won't stop the client from trying to connect. Further there are cases where you want to continue processing for a project (you still have a queue) but don't want to generate traffic to a specific project. Note to those who run BOINC - this may look like a SETI specific request today, but the other projects could get into the same situation as SETI someday, if they are lucky ;). Hic! Being a little dis-inhibited, I'm going to reveal that when I turned off networking before going to the pub, I turned off networking for this project only: the code exists (no, I didn't write it) and I'm testing it. I found another (small, cosmetic-only) bug this week, which I haven't reported to the author yet, but apart from that I believe it's nearly ready to submit to BOINC as a ready-made patch. When that time comes, I hope you'll all lobby for trac [trac]#139[/trac] to be actioned. It really helps. ID: 918226 ·

Bill Walker Send message Joined: 4 Sep 99 Posts: 3868 Credit: 2,697,267 RAC: 0	Message 918229 - Posted: 15 Jul 2009, 21:26:28 UTC - in response to Message 918226. Last modified: 15 Jul 2009, 21:26:58 UTC I would like to hereby officially lobby for Trac 139 to be actioned. ID: 918229 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 918234 - Posted: 15 Jul 2009, 21:36:09 UTC - in response to Message 918104. Samdani 15 Jul 2009 16:52:34 UTC It certainly makes sense but how about relaxing the linking rule under special circumstances like this. Would that be a feasible option? Samdani 15 Jul 2009 17:14:34 UTC Ok. I give up. There seems no room for any creative ideas :) Don't give up so quickly or easily. The responses you got all assumed by "relaxing" you meant "rescinding". There are certainly reasons to think about that 2X CPUs uploading threshold which was hard coded into BOINC 5.8.6 and hasn't changed since. Now BOINC supports CUDA and multithreaded processing, 5.8.6 didn't. Even without that, 2X is a compromise measure which probably doesn't fit any project particularly well. For S@H, upload bandwidth needed is about 5% of download bandwidth (but work requests also use the pipe into SSL). Since I've never seen more than 30 MBits/sec used going to SSL, any problems on that side have to be ascribed to transaction rates rather than bandwidth. Making the 2X adjustable per project might be useful, though it could also allow a user willing to edit client_state.xml a way around any restriction. Using a more complex formula which counts GPUs as well as CPUs, perhaps even checks the result sizes against the WU sizes in the project folder, might be another way to improve. Any way it is modified it will still impact hosts which normally are most productive before the older, slower hosts. Joe ID: 918234 ·

zoom3+1=4 Volunteer tester Send message Joined: 30 Nov 03 Posts: 65746 Credit: 55,293,173 RAC: 49	Message 918236 - Posted: 15 Jul 2009, 21:46:14 UTC - in response to Message 918226. Actually, other projects have run into problems for which a project specific network activity off switch would have reduced connect error messages at the user level and reduced network traffic to the specific project. I've seen outages over the years at Climate, Einstein, Rosetta, Spinhenge and others. These are outages where completed work (or in Climate's case trickles), is stuck in transfer mode and suspending the project won't stop the client from trying to connect. Further there are cases where you want to continue processing for a project (you still have a queue) but don't want to generate traffic to a specific project. Note to those who run BOINC - this may look like a SETI specific request today, but the other projects could get into the same situation as SETI someday, if they are lucky ;). Hic! Being a little dis-inhibited, I'm going to reveal that when I turned off networking before going to the pub, I turned off networking for this project only: the code exists (no, I didn't write it) and I'm testing it. I found another (small, cosmetic-only) bug this week, which I haven't reported to the author yet, but apart from that I believe it's nearly ready to submit to BOINC as a ready-made patch. When that time comes, I hope you'll all lobby for trac [trac]#139[/trac] to be actioned. It really helps. Thank You, Thank You, Thank You, Thank You. The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's ID: 918236 ·

zoom3+1=4 Volunteer tester Send message Joined: 30 Nov 03 Posts: 65746 Credit: 55,293,173 RAC: 49	Message 918239 - Posted: 15 Jul 2009, 21:49:28 UTC - in response to Message 918229. I would like to hereby officially lobby for Trac 139 to be actioned. I second the motion. The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's ID: 918239 ·

Pappa Volunteer tester Send message Joined: 9 Jan 00 Posts: 2562 Credit: 12,301,681 RAC: 0	Message 918240 - Posted: 15 Jul 2009, 21:49:39 UTC - in response to Message 918213. erik Just let it run (if you normally do overnight). It will recover with time. I think I understood SETI uploads problems. We must let time for time. Basically how can I interrupt SETI network communications without interrupt all BOINC network communications. I make that manually but now I want to go to bed. Regards Please consider a Donation to the Seti Project. ID: 918240 ·

# Bob Ahlers # Send message Joined: 30 Mar 01 Posts: 18 Credit: 10,209,954 RAC: 0	Message 918245 - Posted: 15 Jul 2009, 22:02:03 UTC - in response to Message 918239. Last modified: 15 Jul 2009, 22:05:21 UTC I would suggest that Seti contacts Pacific Internet Exchange: Pacific Internet Exchange 200 Paul Ave. Ste. M-200 San Francisco, CA 94124 It's a 35 mins drive from Seti. Arrange hosting and data at that location (could even be free) and put their name on the Seti site. Setup an VPN or VLAN between Seti and PIE and host the big data eating servers their. This way allot of the probs disappear. They could even drive over to that data enter and bring project on a external HD for 150$ or so. Off course i don't know the details about the seti system but this can not continue as it is now, for Seti and the Crunchers. Just an idea. ID: 918245 ·

Blurf Volunteer tester Send message Joined: 2 Sep 06 Posts: 8962 Credit: 12,678,685 RAC: 0	Message 918247 - Posted: 15 Jul 2009, 22:06:05 UTC - in response to Message 918245. I would suggest that Seti contacts Pacific Internet Exchange: Pacific Internet Exchange 200 Paul Ave. Ste. M-200 San Francisco, CA 94124 It's a 35 mins drive from Seti. Arrange hosting and data at that location (could even be free) and put their name on the Seti site. Setup an VPN or VLAN between Seti and PIE and host the big data eating servers their. This way allot of the probs disapear. They could even drive over to that datacenter and bring project on a external HD for 150$ orso. Offcourse i don't knwo the details about the seti system but this can not continue as it is now, for Seti and the Crunchers. Just an idea. Thank you for the idea. I've brought this to the attention of Matt and asked him to comment directly when he's able. ID: 918247 ·

Westsail and Pyxey Volunteer tester Send message Joined: 26 Jul 99 Posts: 338 Credit: 20,544,999 RAC: 0	Message 918251 - Posted: 15 Jul 2009, 22:17:00 UTC Last modified: 15 Jul 2009, 22:35:03 UTC Did my part to ease upload traffic late last night... ;) NOTE: Hopefully someone can learn from my mistake.. I've heard it said a smart man can learn from his mistakes while a genius learns from others.. Anyways, so was trying to get some video stuff working right on my Tesla workstation for someone else; was late and in a hurry.. I shutdown the boinc core client to install different driver version. Nvidia tells me drivers are older than currently installed do I want to reboot..yes yes whatever, hurry up..etc etc So machine reboots, (I can hear folks groaning knowing whats coming) drivers install all is fine. About this time I load up BM only to all at once realize I needed to disable the service from restarting. When the puter rebooted with no vid drivers I lost ~600 completed WU'S instantly. This machine shares a single ethernet cable with another machine that is always plugged in, for now. So I have to manually plug it in and up/report/download 1-2 times a day. With recent difficulties It had been accumulating WU's for a number of days. Was my whole cache that were pending upload with network suspended. Currently the CPU is still working on it's ~100 MB's. lol Opps!?!....Kida forgot as I wasn't really thinking about boinc, I was trying to make a rendering program work correctly. Oh well, operator error, we live to crunch another day. Speedy and pleasant recovery to everyone. Keep crunching! edit to add: Just catching up on the thread...Here Here Richard good show; best idea I have heard yet! Chalk us up! "The most exciting phrase to hear in science, the one that heralds new discoveries, is not Eureka! (I found it!) but rather, 'hmm... that's funny...'" -- Isaac Asimov ID: 918251 ·

B-Man Volunteer tester Send message Joined: 11 Feb 01 Posts: 253 Credit: 147,366 RAC: 0	Message 918256 - Posted: 15 Jul 2009, 22:46:20 UTC - in response to Message 918236. Actually, other projects have run into problems for which a project specific network activity off switch would have reduced connect error messages at the user level and reduced network traffic to the specific project. I've seen outages over the years at Climate, Einstein, Rosetta, Spinhenge and others. These are outages where completed work (or in Climate's case trickles), is stuck in transfer mode and suspending the project won't stop the client from trying to connect. Further there are cases where you want to continue processing for a project (you still have a queue) but don't want to generate traffic to a specific project. Note to those who run BOINC - this may look like a SETI specific request today, but the other projects could get into the same situation as SETI someday, if they are lucky ;). Hic! Being a little dis-inhibited, I'm going to reveal that when I turned off networking before going to the pub, I turned off networking for this project only: the code exists (no, I didn't write it) and I'm testing it. I found another (small, cosmetic-only) bug this week, which I haven't reported to the author yet, but apart from that I believe it's nearly ready to submit to BOINC as a ready-made patch. When that time comes, I hope you'll all lobby for trac [trac]#139[/trac] to be actioned. It really helps. Thank You, Thank You, Thank You, Thank You. I agree that is a great thing. 2nd topic. Just found out that I will be going away for the weekend tomorrow. I had not planed to be away and now I will have stacked up downloads from this week that I can only hope to get in before next tuesdays outage. I have some shorties that time out on the 21st. oh well. I will be shutting down at 7AM EDT I have about 12h to upload them. ID: 918256 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.