Panic Mode On (20) Server problems

Message boards : Number crunching : Panic Mode On (20) Server problems
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 15 · Next

AuthorMessage
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 918167 - Posted: 15 Jul 2009, 18:43:41 UTC


AFAIK, SETI@home Enhanced (MB) have min. 7 day deadline.
So normally enough time for UL and REPORT.


@ Richard

Well evening in the pub and Cheers! ;-)

ID: 918167 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 918168 - Posted: 15 Jul 2009, 18:45:58 UTC - in response to Message 918164.  

So far, I've waited since Sunday.

Don't know why that should be the case.
From what i can recall on Sat, Sun & most of Monday there were no problems with uploads. Late Monday (for reasons unknown) the upload server went off line & it only came back online a couple of hours ago (if that).


Yes, I understand about traffic - just want to know what the average number of days one should wait for a job to upload.

Even when things are congested, most uploads will go through in a few hours. If it's really bad it might take 12-24hrs. Usually thay go through on the first attempt.
Grant
Darwin NT
ID: 918168 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 918174 - Posted: 15 Jul 2009, 18:58:12 UTC - in response to Message 918156.  

I wish I could turn off networking, but I have another project that needs networking active.

Hopefully soon they add in per-project networking options.

ID: 918174 · Report as offensive
Profile Bill Walker
Avatar

Send message
Joined: 4 Sep 99
Posts: 3868
Credit: 2,697,267
RAC: 0
Canada
Message 918178 - Posted: 15 Jul 2009, 19:11:55 UTC - in response to Message 918174.  
Last modified: 15 Jul 2009, 19:13:15 UTC

I wish I could turn off networking, but I have another project that needs networking active.

Hopefully soon they add in per-project networking options.


I'll second that. I didn't want to interupt my other projects, especially when it looked like I might run out of SETI.

When I had the time over the last few days, I would turn off network activity until one of my non-SETI jobs had some uploading or downloading to do, and then turn it back on long enough for the other project to get new work. It would be nicer if BOINC could handle this without my attention.

Note to those who run BOINC - this may look like a SETI specific request today, but the other projects could get into the same situation as SETI someday, if they are lucky ;).

ID: 918178 · Report as offensive
BarryAZ

Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 16,982,517
RAC: 0
United States
Message 918197 - Posted: 15 Jul 2009, 20:12:14 UTC - in response to Message 918174.  

Agreed -- SETI isn't the only project that could use a project specific network off switch.

My own read with the upload server (opinion only) is that with the large backlog of upload requests from the past 48 to 72 hours, even if it is running well at the server level, it is simply getting swamped big time. Sort of like the typical Tuesday outage congestion (with a typical server outage of 4 to 6 hours, I expect a congestion problem of 8 to 12 hours), only since the upload outage was so long, recovery will be proportional to that. Perhaps it will clear by the weekend, perhaps by Monday in time for the next outage.


I wish I could turn off networking, but I have another project that needs networking active.

Hopefully soon they add in per-project networking options.


ID: 918197 · Report as offensive
Profile rebest Project Donor
Volunteer tester
Avatar

Send message
Joined: 16 Apr 00
Posts: 1296
Credit: 45,357,093
RAC: 0
United States
Message 918202 - Posted: 15 Jul 2009, 20:15:37 UTC - in response to Message 918156.  


I'm going to turn networking off and go out to the pub.


Now that's the smartest idea I've seen all day! :)


Join the PACK!
ID: 918202 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65746
Credit: 55,293,173
RAC: 49
United States
Message 918203 - Posted: 15 Jul 2009, 20:18:48 UTC - in response to Message 918197.  
Last modified: 15 Jul 2009, 20:19:45 UTC

Agreed -- SETI isn't the only project that could use a project specific network off switch.

My own read with the upload server (opinion only) is that with the large backlog of upload requests from the past 48 to 72 hours, even if it is running well at the server level, it is simply getting swamped big time. Sort of like the typical Tuesday outage congestion (with a typical server outage of 4 to 6 hours, I expect a congestion problem of 8 to 12 hours), only since the upload outage was so long, recovery will be proportional to that. Perhaps it will clear by the weekend, perhaps by Monday in time for the next outage.


I wish I could turn off networking, but I have another project that needs networking active.

Hopefully soon they add in per-project networking options.


That would be a nice feature for Boinc, Then I could tell S@H to stop wasting My other projects bandwidth instead of suspending the project(a waste of time that could be better spent crunching) which amounts to using a sledgehammer to swat a fly with. Doing NNT doesn't really cut It.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 918203 · Report as offensive
BarryAZ

Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 16,982,517
RAC: 0
United States
Message 918207 - Posted: 15 Jul 2009, 20:31:08 UTC - in response to Message 918178.  

Actually, other projects have run into problems for which a project specific network activity off switch would have reduced connect error messages at the user level and reduced network traffic to the specific project. I've seen outages over the years at Climate, Einstein, Rosetta, Spinhenge and others. These are outages where completed work (or in Climate's case trickles), is stuck in transfer mode and suspending the project won't stop the client from trying to connect. Further there are cases where you want to continue processing for a project (you still have a queue) but don't want to generate traffic to a specific project.

Note to those who run BOINC - this may look like a SETI specific request today, but the other projects could get into the same situation as SETI someday, if they are lucky ;).


ID: 918207 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 918212 - Posted: 15 Jul 2009, 20:39:20 UTC - in response to Message 918178.  

Note to those who run BOINC - this may look like a SETI specific request today, but the other projects could get into the same situation as SETI someday, if they are lucky ;).

One of the incredibly cool things about BOINC is that it costs almost nothing to do distributed computing.

So it is entirely possible for an individual to run a fair sized project, funded out of their own pocket, on a home internet connection.

If you do that, on small servers with a cheap DSL line, it is very possible to run into exactly the same corner, just on a dramatically smaller scale.
ID: 918212 · Report as offensive
Profile [AF>Libristes] erik
Volunteer tester
Avatar

Send message
Joined: 30 Jul 07
Posts: 19
Credit: 4,016,114
RAC: 0
France
Message 918213 - Posted: 15 Jul 2009, 20:42:55 UTC - in response to Message 918202.  

I think I understood SETI uploads problems. We must let time for time.
Basically how can I interrupt SETI network communications without interrupt all BOINC network communications.
I make that manually but now I want to go to bed.
just a poet
ID: 918213 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 918226 - Posted: 15 Jul 2009, 21:19:34 UTC - in response to Message 918207.  

Actually, other projects have run into problems for which a project specific network activity off switch would have reduced connect error messages at the user level and reduced network traffic to the specific project. I've seen outages over the years at Climate, Einstein, Rosetta, Spinhenge and others. These are outages where completed work (or in Climate's case trickles), is stuck in transfer mode and suspending the project won't stop the client from trying to connect. Further there are cases where you want to continue processing for a project (you still have a queue) but don't want to generate traffic to a specific project.

Note to those who run BOINC - this may look like a SETI specific request today, but the other projects could get into the same situation as SETI someday, if they are lucky ;).


Hic!

Being a little dis-inhibited, I'm going to reveal that when I turned off networking before going to the pub, I turned off networking for this project only: the code exists (no, I didn't write it) and I'm testing it. I found another (small, cosmetic-only) bug this week, which I haven't reported to the author yet, but apart from that I believe it's nearly ready to submit to BOINC as a ready-made patch. When that time comes, I hope you'll all lobby for trac [trac]#139[/trac] to be actioned. It really helps.
ID: 918226 · Report as offensive
Profile Bill Walker
Avatar

Send message
Joined: 4 Sep 99
Posts: 3868
Credit: 2,697,267
RAC: 0
Canada
Message 918229 - Posted: 15 Jul 2009, 21:26:28 UTC - in response to Message 918226.  
Last modified: 15 Jul 2009, 21:26:58 UTC

I would like to hereby officially lobby for Trac 139 to be actioned.

ID: 918229 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 918234 - Posted: 15 Jul 2009, 21:36:09 UTC - in response to Message 918104.  

Samdani 15 Jul 2009 16:52:34 UTC
It certainly makes sense but how about relaxing the linking rule under special circumstances like this. Would that be a feasible option?

Samdani 15 Jul 2009 17:14:34 UTC
Ok. I give up. There seems no room for any creative ideas :)

Don't give up so quickly or easily. The responses you got all assumed by "relaxing" you meant "rescinding".

There are certainly reasons to think about that 2X CPUs uploading threshold which was hard coded into BOINC 5.8.6 and hasn't changed since. Now BOINC supports CUDA and multithreaded processing, 5.8.6 didn't.

Even without that, 2X is a compromise measure which probably doesn't fit any project particularly well. For S@H, upload bandwidth needed is about 5% of download bandwidth (but work requests also use the pipe into SSL). Since I've never seen more than 30 MBits/sec used going to SSL, any problems on that side have to be ascribed to transaction rates rather than bandwidth.

Making the 2X adjustable per project might be useful, though it could also allow a user willing to edit client_state.xml a way around any restriction. Using a more complex formula which counts GPUs as well as CPUs, perhaps even checks the result sizes against the WU sizes in the project folder, might be another way to improve.

Any way it is modified it will still impact hosts which normally are most productive before the older, slower hosts.
                                                                 Joe
ID: 918234 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65746
Credit: 55,293,173
RAC: 49
United States
Message 918236 - Posted: 15 Jul 2009, 21:46:14 UTC - in response to Message 918226.  

Actually, other projects have run into problems for which a project specific network activity off switch would have reduced connect error messages at the user level and reduced network traffic to the specific project. I've seen outages over the years at Climate, Einstein, Rosetta, Spinhenge and others. These are outages where completed work (or in Climate's case trickles), is stuck in transfer mode and suspending the project won't stop the client from trying to connect. Further there are cases where you want to continue processing for a project (you still have a queue) but don't want to generate traffic to a specific project.

Note to those who run BOINC - this may look like a SETI specific request today, but the other projects could get into the same situation as SETI someday, if they are lucky ;).


Hic!

Being a little dis-inhibited, I'm going to reveal that when I turned off networking before going to the pub, I turned off networking for this project only: the code exists (no, I didn't write it) and I'm testing it. I found another (small, cosmetic-only) bug this week, which I haven't reported to the author yet, but apart from that I believe it's nearly ready to submit to BOINC as a ready-made patch. When that time comes, I hope you'll all lobby for trac [trac]#139[/trac] to be actioned. It really helps.

Thank You, Thank You, Thank You, Thank You.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 918236 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65746
Credit: 55,293,173
RAC: 49
United States
Message 918239 - Posted: 15 Jul 2009, 21:49:28 UTC - in response to Message 918229.  

I would like to hereby officially lobby for Trac 139 to be actioned.

I second the motion.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 918239 · Report as offensive
Profile Pappa
Volunteer tester
Avatar

Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 918240 - Posted: 15 Jul 2009, 21:49:39 UTC - in response to Message 918213.  

erik

Just let it run (if you normally do overnight). It will recover with time.


I think I understood SETI uploads problems. We must let time for time.
Basically how can I interrupt SETI network communications without interrupt all BOINC network communications.
I make that manually but now I want to go to bed.


Regards

Please consider a Donation to the Seti Project.

ID: 918240 · Report as offensive
Profile # Bob Ahlers #

Send message
Joined: 30 Mar 01
Posts: 18
Credit: 10,209,954
RAC: 0
Netherlands
Message 918245 - Posted: 15 Jul 2009, 22:02:03 UTC - in response to Message 918239.  
Last modified: 15 Jul 2009, 22:05:21 UTC

I would suggest that Seti contacts Pacific Internet Exchange:

Pacific Internet Exchange
200 Paul Ave. Ste. M-200
San Francisco, CA 94124

It's a 35 mins drive from Seti.
Arrange hosting and data at that location (could even be free) and put their name on the Seti site. Setup an VPN or VLAN between Seti and PIE and host the big data eating servers their. This way allot of the probs disappear.

They could even drive over to that data enter and bring project on a external HD for 150$ or so.

Off course i don't know the details about the seti system but this can not continue as it is now, for Seti and the Crunchers.

Just an idea.
ID: 918245 · Report as offensive
Profile Blurf
Volunteer tester

Send message
Joined: 2 Sep 06
Posts: 8962
Credit: 12,678,685
RAC: 0
United States
Message 918247 - Posted: 15 Jul 2009, 22:06:05 UTC - in response to Message 918245.  

I would suggest that Seti contacts Pacific Internet Exchange:

Pacific Internet Exchange
200 Paul Ave. Ste. M-200
San Francisco, CA 94124

It's a 35 mins drive from Seti.
Arrange hosting and data at that location (could even be free) and put their name on the Seti site. Setup an VPN or VLAN between Seti and PIE and host the big data eating servers their. This way allot of the probs disapear.

They could even drive over to that datacenter and bring project on a external HD for 150$ orso.

Offcourse i don't knwo the details about the seti system but this can not continue as it is now, for Seti and the Crunchers.

Just an idea.


Thank you for the idea. I've brought this to the attention of Matt and asked him to comment directly when he's able.



ID: 918247 · Report as offensive
Profile Westsail and *Pyxey*
Volunteer tester
Avatar

Send message
Joined: 26 Jul 99
Posts: 338
Credit: 20,544,999
RAC: 0
United States
Message 918251 - Posted: 15 Jul 2009, 22:17:00 UTC
Last modified: 15 Jul 2009, 22:35:03 UTC

Did my part to ease upload traffic late last night... ;)

NOTE: Hopefully someone can learn from my mistake..

I've heard it said a smart man can learn from his mistakes while a genius learns from others..

Anyways, so was trying to get some video stuff working right on my Tesla workstation for someone else; was late and in a hurry..
I shutdown the boinc core client to install different driver version.
Nvidia tells me drivers are older than currently installed do I want to reboot..yes yes whatever, hurry up..etc etc

So machine reboots, (I can hear folks groaning knowing whats coming) drivers install all is fine. About this time I load up BM only to all at once realize I needed to disable the service from restarting. When the puter rebooted with no vid drivers I lost ~600 completed WU'S instantly. This machine shares a single ethernet cable with another machine that is always plugged in, for now. So I have to manually plug it in and up/report/download 1-2 times a day. With recent difficulties It had been accumulating WU's for a number of days. Was my whole cache that were pending upload with network suspended. Currently the CPU is still working on it's ~100 MB's. lol

Opps!?!....Kida forgot as I wasn't really thinking about boinc, I was trying to make a rendering program work correctly.

Oh well, operator error, we live to crunch another day. Speedy and pleasant recovery to everyone. Keep crunching!

edit to add:
Just catching up on the thread...Here Here Richard good show; best idea I have heard yet! Chalk us up!
"The most exciting phrase to hear in science, the one that heralds new discoveries, is not Eureka! (I found it!) but rather, 'hmm... that's funny...'" -- Isaac Asimov
ID: 918251 · Report as offensive
B-Man
Volunteer tester

Send message
Joined: 11 Feb 01
Posts: 253
Credit: 147,366
RAC: 0
United States
Message 918256 - Posted: 15 Jul 2009, 22:46:20 UTC - in response to Message 918236.  

Actually, other projects have run into problems for which a project specific network activity off switch would have reduced connect error messages at the user level and reduced network traffic to the specific project. I've seen outages over the years at Climate, Einstein, Rosetta, Spinhenge and others. These are outages where completed work (or in Climate's case trickles), is stuck in transfer mode and suspending the project won't stop the client from trying to connect. Further there are cases where you want to continue processing for a project (you still have a queue) but don't want to generate traffic to a specific project.

Note to those who run BOINC - this may look like a SETI specific request today, but the other projects could get into the same situation as SETI someday, if they are lucky ;).


Hic!

Being a little dis-inhibited, I'm going to reveal that when I turned off networking before going to the pub, I turned off networking for this project only: the code exists (no, I didn't write it) and I'm testing it. I found another (small, cosmetic-only) bug this week, which I haven't reported to the author yet, but apart from that I believe it's nearly ready to submit to BOINC as a ready-made patch. When that time comes, I hope you'll all lobby for trac [trac]#139[/trac] to be actioned. It really helps.

Thank You, Thank You, Thank You, Thank You.

I agree that is a great thing.
2nd topic. Just found out that I will be going away for the weekend tomorrow. I had not planed to be away and now I will have stacked up downloads from this week that I can only hope to get in before next tuesdays outage. I have some shorties that time out on the 21st. oh well. I will be shutting down at 7AM EDT I have about 12h to upload them.
ID: 918256 · Report as offensive
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 15 · Next

Message boards : Number crunching : Panic Mode On (20) Server problems


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.