Panic Mode On (107) Server Problems?

Message boards : Number crunching : Panic Mode On (107) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 25 · 26 · 27 · 28 · 29 · Next

AuthorMessage
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1897580 - Posted: 26 Oct 2017, 16:24:38 UTC
Last modified: 26 Oct 2017, 16:36:39 UTC

Great, getting "internal server error failure" when requesting tasks after waking the server up with a kick.
[Edit] Got the same message on another machine after kicking the servers. The last machine got ONE task.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1897580 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1897612 - Posted: 26 Oct 2017, 18:37:05 UTC - in response to Message 1897580.  
Last modified: 26 Oct 2017, 18:39:55 UTC

Dang! I guess I'm going to have to keep an eye on my boxes today. I just discovered that my #1 cruncher has been out of GPU work for about an hour and a half. Moved everything left on the CPUs, except for Arecibo VLARs, over to the GPUs (66 tasks). That'll tide the GPUs over for a little while, at least.

EDIT: My #2 and #3 boxes are down just a bit, but nothing out of the ordinary if there's an Arecibo VLAR storm. Only the #1 box has taken the big hit.
ID: 1897612 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1897614 - Posted: 26 Oct 2017, 18:46:26 UTC

I am getting "Project has no tasks available" on every work request on both of my crunchers, and they are each now down to about half of their max number of tasks. But the server page shows about 700K available, and a reasonable creation rate*.
Is anybody at the datacenter to kick these machines?

*Whoops, rate is ~1/sec, WAY too low.
ID: 1897614 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22526
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1897620 - Posted: 26 Oct 2017, 20:08:49 UTC

ready to send should be sitting around 600k, so if that's hit 700k then its OK not to be producing anymore tasks.
What is more concerning is that the delivery rate appears to be pathetically low, it's taking quite a few requests to get anything.
Call in the tyre kicker to do his thing.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1897620 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1897644 - Posted: 26 Oct 2017, 21:18:37 UTC - in response to Message 1897620.  

Another way of looking at that: there is work that needs doing, but the people helping the project aren't asking for that kind of work. Until people come forward to do the work that needs doing (my guess is Arecibo VLAR), it blocks the pipelines and gets in the way.
ID: 1897644 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1897647 - Posted: 26 Oct 2017, 21:27:55 UTC - in response to Message 1897644.  

Another way of looking at that: there is work that needs doing, but the people helping the project aren't asking for that kind of work. Until people come forward to do the work that needs doing (my guess is Arecibo VLAR), it blocks the pipelines and gets in the way.
Don't think it's Arecibo VLARs this time. I ended up moving everything (except running tasks, of course) from the CPU queue to the GPU queue on my #1 box, Arecibo VLARs included. That left 96 places available for anything the scheduler wanted to send to the CPU, so any Arecibo VLARs would have been welcome. Sadly, nothing arrived, and once the GPUs ran out for the final time, I just shut it down (a couple hours earlier than would normally happen on a weekday afternoon). Something at the server end definitely seems to be constipated today.
ID: 1897647 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1897651 - Posted: 26 Oct 2017, 21:52:06 UTC - in response to Message 1897647.  

[quote] Arecibo VLARs included. That left 96 places available for anything the scheduler wanted to send to the CPU, so any Arecibo VLARs would have been welcome. Sadly, nothing arrived, and once the GPUs ran out for the final time, I just shut it down (a couple hours earlier than would normally happen on a weekday afternoon). Something at the server end definitely seems to be constipated today.


. . Yep, I awoke to find empty machines twiddling their thumbs ...

Stephen

:(
ID: 1897651 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1897652 - Posted: 26 Oct 2017, 21:57:28 UTC - in response to Message 1897647.  

Don't think it's Arecibo VLARs this time.
Really? I've just kicked a CPU-only cruncher, and got 53 new tasks at the first time of asking. Now I have to re-balance the cache with other projects...

Sorry, I've had a particularly bad intrusion of real like this week (up to and including police interaction), and I'm feeling in a particularly forensic state of mind this evening.
ID: 1897652 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1897660 - Posted: 26 Oct 2017, 22:20:20 UTC - in response to Message 1897652.  

Don't think it's Arecibo VLARs this time.
Really? I've just kicked a CPU-only cruncher, and got 53 new tasks at the first time of asking. Now I have to re-balance the cache with other projects...
Heh, heh. Mamma Berkeley likes you best! ;^)

The queues on my other 2 Linux boxes are steadily draining, but should survive until their normal shutdown in about 45 minutes, but I just went ahead and moved over 50 CPU tasks to the GPUs on one of them, to see if it might goose the scheduler in some way. No joy, though. My Win Vista box got 8 new tasks about 25 minutes ago, but that's been the only recent burp directed this way.
ID: 1897660 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1897661 - Posted: 26 Oct 2017, 22:24:16 UTC - in response to Message 1897652.  

Don't think it's Arecibo VLARs this time.
Really? I've just kicked a CPU-only cruncher, and got 53 new tasks at the first time of asking. Now I have to re-balance the cache with other projects...

Sorry, I've had a particularly bad intrusion of real like this week (up to and including police interaction), and I'm feeling in a particularly forensic state of mind this evening.


. . Hi Richard

. . Even if the problem is a glut of VLAR work, the fact that you had to kick the servers for a CPU only rig does not bode well. It should be overflowing with the servers trying to give it work.

. .On the second issue I am sorry to hear that, RL has a way of intruding ... :(

Stephen

:(
ID: 1897661 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1897664 - Posted: 26 Oct 2017, 22:44:22 UTC - in response to Message 1897661.  

By 'kick', I meant preventing other projects requesting work, and increasing the cache request fourfold - just to see what happened. I got what I asked for.

About to try the same for my GPU crunchers, now I've worked through the glut of VHAR shorties.
ID: 1897664 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1897674 - Posted: 26 Oct 2017, 23:38:14 UTC

It's possible the server's constipation might have abated. I've received new work across all machines and am closing back in on full caches.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1897674 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1897676 - Posted: 26 Oct 2017, 23:54:38 UTC - in response to Message 1897664.  

By 'kick', I meant preventing other projects requesting work, and increasing the cache request fourfold - just to see what happened. I got what I asked for.

About to try the same for my GPU crunchers, now I've worked through the glut of VHAR shorties.


. . OK, fair enough. I have had no works for a couple of hours, but getting work again now.

Stephen

..
ID: 1897676 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1897685 - Posted: 27 Oct 2017, 1:05:34 UTC

I wish I had a snapshot of the SSP page from Monday, before the outrage. Does the SSP look like the project has started more validators and assimilators to you? And for the first time in memory, all my machines have more validated tasks than pending tasks. I know there is always a bump after the outrage, but I don't remember it hanging on for two days past the outrage.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1897685 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1897822 - Posted: 28 Oct 2017, 4:43:58 UTC - in response to Message 1897685.  

Down to 10 gpu tasks on the linux machine. No work available messages for the past couple of hours. Time to kick the servers upside the head. This is getting tiresome.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1897822 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1897844 - Posted: 28 Oct 2017, 9:27:29 UTC - in response to Message 1897822.  

Down to 10 gpu tasks on the linux machine. No work available messages for the past couple of hours. Time to kick the servers upside the head. This is getting tiresome.

28/10/2017 09:58:27 | SETI@home | Scheduler request completed: got 139 new tasks
... about 30 minutes ago, my time zone.

My current practice is to batch fetch SETI work every few hours, to cover the situation where BOINC doesn't provide the tools to set a long cache on Tuesdays (to cover maintenance) and a short cache at all other times (so that urgent GPUGrid tasks can fit their 15 hours of computing into a 24-hour window, as that project would like). Keeping Einstein's (steady and easily refilled) queue down below 50 or so also helps me to see what's going on.

So I switch my cache between 0.25 days (steady running) and maybe 1.25 days (fully cache SETI to my 200 GPU limit). It normally works first time, as it did today. I don't normally need 139 tasks, but GPUGrid had a problem overnight and their card switched to help out SETI. If you fully understand BOINC's work fetch strategy (nobody does, but the basics are fairly easy), and work within it, you can usually follow your chosen project needs.
ID: 1897844 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1897888 - Posted: 28 Oct 2017, 17:13:21 UTC - in response to Message 1897844.  

Not sure what your reply has to do with the project refusing to send my machine work when it asks for it and has tons available. It only crunches SETI so no other project is interfering. SETI is unable to keep my SETI system limited allotment of tasks fully up repeatedly and the machine runs out of work. I have to use the ghost task recovery protocol to wake the servers up to my machines need for work. Needless work if the servers would just feed it work when it asks for it every 305 seconds.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1897888 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1897909 - Posted: 28 Oct 2017, 18:03:30 UTC - in response to Message 1897888.  

Keith? Did you badmouth the servers?? You know how sensitive they are.... ;)
ID: 1897909 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1897915 - Posted: 28 Oct 2017, 18:30:43 UTC - in response to Message 1897909.  

LOL, no I didn't ..... at least recently. I always held the servers in high respect up and until last December when they decided to see my machines as "second-class citizens" for some reason.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1897915 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1897970 - Posted: 28 Oct 2017, 22:29:19 UTC - in response to Message 1897915.  

LOL, no I didn't ..... at least recently. I always held the servers in high respect up and until last December when they decided to see my machines as "second-class citizens" for some reason.


. . That was the "upgrade" at Berkeley. They became flaky ever after.

Stephen

:(
ID: 1897970 · Report as offensive
Previous · 1 . . . 25 · 26 · 27 · 28 · 29 · Next

Message boards : Number crunching : Panic Mode On (107) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.