Panic Mode On (110) Server Problems?

Message boards : Number crunching : Panic Mode On (110) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 23 · Next

AuthorMessage
Profile Stargate Project Donor
Volunteer tester
Avatar

Send message
Joined: 4 Mar 10
Posts: 478
Credit: 140,621
RAC: 1,258
Australia
Message 1914752 - Posted: 24 Jan 2018, 8:35:38 UTC

I got 1 gpu and cpu now out of work
ID: 1914752 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 9359
Credit: 120,216,103
RAC: 46,928
Australia
Message 1914753 - Posted: 24 Jan 2018, 8:36:36 UTC - in response to Message 1914751.  
Last modified: 24 Jan 2018, 8:49:51 UTC

I have about 50 downloads stalled and going into 4 and 5 hour backoffs

I've tried both 208.68.240.127 and 208.68.240.119 with no luck.
Anyone else?

EDIT- tried 208.68.240.119 again, and managed to download stalled work.
Grant
Darwin NT
ID: 1914753 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 9359
Credit: 120,216,103
RAC: 46,928
Australia
Message 1914763 - Posted: 24 Jan 2018, 8:59:40 UTC
Last modified: 24 Jan 2018, 9:00:32 UTC

The 900 AP ready-to-send backlog has cleared.
Back to "Project has no tasks available" as whatever the Ready-to-send buffer was, it's all gone now, and it generally takes a couple of hours for the splitters to get going after the outages.

One interesting point to note- Received-in-last hour hit 1.01 million. Previous best 666k, usually only around 400k. Something Eric's done has resulted in more than a doubling (at least for the processing of received WUs) in performance. Hopefully this will carry over in to general Validation, Assimilation & Deletion, which will then allow the splitters to maintain their best speed.
*fingers crossed*
Grant
Darwin NT
ID: 1914763 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 3580
Credit: 213,264,901
RAC: 294,706
United States
Message 1914764 - Posted: 24 Jan 2018, 9:03:07 UTC

I saw that in the graphs. System high returns reached. Don't know whether that was because of anything that was done to the servers or just the fact that all hosts were empty after a day and had lots to report.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1914764 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 9359
Credit: 120,216,103
RAC: 46,928
Australia
Message 1914768 - Posted: 24 Jan 2018, 9:12:12 UTC - in response to Message 1914764.  

Don't know whether that was because of anything that was done to the servers or just the fact that all hosts were empty after a day and had lots to report.

All the work has already been returned as the upload servers run throughout the outage, I figure that there will be more hosts than usual reporting more work than usual. But in the past with similar outages the reported received-last-hour numbers have never been that high, it's just stayed at it's 400k or so peak for a few hours.
This time the system was able to handle it all in a much shorter period of time (even with the Scheduler issues earlier on), which bodes well for likely performance once the splitters get going again & the backlog of returned work is validated, assimilated & deleted (at least that's my hope looking at the new record peak).
Grant
Darwin NT
ID: 1914768 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 3580
Credit: 213,264,901
RAC: 294,706
United States
Message 1914769 - Posted: 24 Jan 2018, 9:17:25 UTC - in response to Message 1914768.  

Don't know. Could be. Crossing fingers that in fact the servers were updated or configured to handle a larger load of reported tasks. The site was very sluggish after it came back with lots of scheduler timeouts and connections refused. Plus once you got any work assigned to you, you couldn't download it.

Seems to be straightening itself out though and quickly. We shall see once we've reached equilibrium again.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1914769 · Report as offensive     Reply Quote
Tutankhamon
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 6935
Credit: 43,255,705
RAC: 1,662
Sweden
Message 1914771 - Posted: 24 Jan 2018, 9:32:33 UTC
Last modified: 24 Jan 2018, 9:33:03 UTC

Heh, one Arecibo file is being split/splat/splut :-)
It's a fresh one 15ja18aa
ID: 1914771 · Report as offensive     Reply Quote
Cruncher-American Special Project $75 donor

Send message
Joined: 25 Mar 02
Posts: 1428
Credit: 246,916,565
RAC: 133,479
United States
Message 1914774 - Posted: 24 Jan 2018, 9:37:28 UTC
Last modified: 24 Jan 2018, 9:39:16 UTC

And here's my reward for waiting:

6342366749 2801396702 23 Jan 2018, 12:52:07 UTC 24 Jan 2018, 9:08:50 UTC Completed and validated 3,178.04 3,170.55 27.21 SETI@home v8
Anonymous platform (CPU)

Makes a fella proud to be a soldier, don't it (with thanks to Tom Lehrer).
ID: 1914774 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 11738
Credit: 111,707,009
RAC: 48,593
United Kingdom
Message 1914775 - Posted: 24 Jan 2018, 9:48:30 UTC - in response to Message 1914771.  

Heh, one Arecibo file is being split/splat/splut :-)
It's a fresh one 15ja18aa
That'll be the one that fell down the back of the sofa.
ID: 1914775 · Report as offensive     Reply Quote
Profile KWSN Ekky Ekky Ekky Project Donor
Avatar

Send message
Joined: 25 May 99
Posts: 943
Credit: 27,747,836
RAC: 31,900
United Kingdom
Message 1914779 - Posted: 24 Jan 2018, 10:29:16 UTC
Last modified: 24 Jan 2018, 10:29:36 UTC

Having raised every possible criterion to the maximum, I still only ever get enough work to last for less than a single day on my main machine. The other two can keep going OK, being rather slower. The consequence is that these days, during what has become a "normal" weekly outage, I run out of work. Is there any way of persuading the gnomes of Berkeley that I can easily handle much more than they are giving me?

ID: 1914779 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 9359
Credit: 120,216,103
RAC: 46,928
Australia
Message 1914781 - Posted: 24 Jan 2018, 10:34:04 UTC - in response to Message 1914779.  

Is there any way of persuading the gnomes of Berkeley that I can easily handle much more than they are giving me?

No, because the 100 WU limits are to stop the servers from falling over completely. There have been suggestions made to make it possible to increase the limits to some degree- but it would never be enough to see even reasonably fast systems through the weekly outage without running out of work, let alone the high end ones.
And due to the current server performance issues (may have been fixed during this outage- we'll have to wait and see) the after outage recoveries have been more protracted than usual.
Grant
Darwin NT
ID: 1914781 · Report as offensive     Reply Quote
Stephen "Heretic" Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 2995
Credit: 57,798,720
RAC: 83,355
Australia
Message 1914787 - Posted: 24 Jan 2018, 10:50:56 UTC - in response to Message 1914745.  

Longer the party, the worse the hangover ...

Yeah, but even given the length of this last outage, this is a rockier than usual recovery.
If the failed drives were part of a RAID array (highly likely), then that's going to take ages to rebuild, and really hit on performance while it occurs.


EDIT- AP ready-to-send up to 900, but in-progress is still falling.
Looks like they're being split, but nothing is being sent out yet...
I haven't picked up anything since that one off WU- usually that doesn't happen till the ready-to-send buffer is empty & the splitters still haven't fired up.

EDIT- C2D has picked up 2 downloads, but they just sit there for 20sec or so & then go in to project backoff.


. . SETI is not well ....

Stephen

:(
ID: 1914787 · Report as offensive     Reply Quote
Stephen "Heretic" Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 2995
Credit: 57,798,720
RAC: 83,355
Australia
Message 1914788 - Posted: 24 Jan 2018, 10:52:59 UTC - in response to Message 1914752.  

I got 1 gpu and cpu now out of work


. . I have had a handful of tasks for GPU which are far less than I am returning, but nothing yet for the CPU :(

Stephen

:(
ID: 1914788 · Report as offensive     Reply Quote
Stephen "Heretic" Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 2995
Credit: 57,798,720
RAC: 83,355
Australia
Message 1914790 - Posted: 24 Jan 2018, 10:57:00 UTC - in response to Message 1914771.  

Heh, one Arecibo file is being split/splat/splut :-)
It's a fresh one 15ja18aa


. . I don't think I have ever seen a tape that fresh before.

Stephen

:)
ID: 1914790 · Report as offensive     Reply Quote
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1062
Credit: 98,391,837
RAC: 87,321
United States
Message 1914809 - Posted: 24 Jan 2018, 12:04:27 UTC

Looks like something threw a wobblie. Saw this about this time yesterday as well.
From the SSP:

Database/file status
Warning: number_format() expects parameter 1 to be double, string given in /disks/carolyn/b/home/boincadm/projects/sah/html/seti_boinc_html/sah_status.php on line 602 Warning: number_format() expects parameter 1 to be double, string given in /disks/carolyn/b/home/boincadm/projects/sah/html/seti_boinc_html/sah_status.php on line 604 Warning: number_format() expects parameter 1 to be double, string given in /disks/carolyn/b/home/boincadm/projects/sah/html/seti_boinc_html/sah_status.php on line 606
ID: 1914809 · Report as offensive     Reply Quote
Profile KWSN Ekky Ekky Ekky Project Donor
Avatar

Send message
Joined: 25 May 99
Posts: 943
Credit: 27,747,836
RAC: 31,900
United Kingdom
Message 1914828 - Posted: 24 Jan 2018, 14:11:22 UTC - in response to Message 1914781.  

Ho hum, many thanks anyway :(
Is there any way of persuading the gnomes of Berkeley that I can easily handle much more than they are giving me?

No, because the 100 WU limits are to stop the servers from falling over completely. There have been suggestions made to make it possible to increase the limits to some degree- but it would never be enough to see even reasonably fast systems through the weekly outage without running out of work, let alone the high end ones.
And due to the current server performance issues (may have been fixed during this outage- we'll have to wait and see) the after outage recoveries have been more protracted than usual.
ID: 1914828 · Report as offensive     Reply Quote
Profile JaundicedEye Project Donor
Avatar

Send message
Joined: 14 Mar 12
Posts: 3657
Credit: 28,692,045
RAC: 15,995
United States
Message 1914847 - Posted: 24 Jan 2018, 15:24:24 UTC - in response to Message 1914735.  

And a 6 pack

Speaking of which, Today, January 24th, is the 83rd birthday of......The Beer Can.

http://greenmon.com/first_beer_cans.htm



Drink a toast with your favorite brew.....

"Sour Grapes make a bitter Whine." <(0)>
ID: 1914847 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 3580
Credit: 213,264,901
RAC: 294,706
United States
Message 1914861 - Posted: 24 Jan 2018, 17:45:14 UTC - in response to Message 1914809.  

Looks like something threw a wobblie. Saw this about this time yesterday as well.
From the SSP:

Database/file status
Warning: number_format() expects parameter 1 to be double, string given in /disks/carolyn/b/home/boincadm/projects/sah/html/seti_boinc_html/sah_status.php on line 602 Warning: number_format() expects parameter 1 to be double, string given in /disks/carolyn/b/home/boincadm/projects/sah/html/seti_boinc_html/sah_status.php on line 604 Warning: number_format() expects parameter 1 to be double, string given in /disks/carolyn/b/home/boincadm/projects/sah/html/seti_boinc_html/sah_status.php on line 606

I was hoping that that /carolyn disk was one them they said they replaced. I am still seeing the /carolyn disk error on my Donations page from the special project fund cycle. From your post . . . . I guess not.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1914861 · Report as offensive     Reply Quote
Stephen "Heretic" Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 2995
Credit: 57,798,720
RAC: 83,355
Australia
Message 1914931 - Posted: 24 Jan 2018, 21:30:33 UTC - in response to Message 1914847.  

And a 6 pack

Speaking of which, Today, January 24th, is the 83rd birthday of......The Beer Can.

http://greenmon.com/first_beer_cans.htm



Drink a toast with your favorite brew.....


. . Cheers! :)

Stephen

:)
ID: 1914931 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 3580
Credit: 213,264,901
RAC: 294,706
United States
Message 1914973 - Posted: 25 Jan 2018, 0:02:26 UTC

I am having an issue with my account now. Once I go to https://setiathome.berkeley.edu/home.php

I can no longer access any of the elements on the page. None of the menu choices work when clicked on. My friends list has disappeared. The disk error message that I only saw when I selected my Donation history that has been present since November is now present on my Home account page. The error message has changed since this morning after I commented on it in another post.

OK, they must be playing. The previous error message has reverted back to what it was. I have access again to the forums and my friends have reappeared.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1914973 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 23 · Next

Message boards : Number crunching : Panic Mode On (110) Server Problems?


 
©2018 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.