Panic Mode On (110) Server Problems?

Message boards : Number crunching : Panic Mode On (110) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 30 · 31 · 32 · 33 · 34 · 35 · 36 . . . 37 · Next

AuthorMessage
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51470
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1925184 - Posted: 18 Mar 2018, 16:29:30 UTC

Hmmm....the kitties are detecting a bit of a lack of nVidia GPU work for the servers to send.
All 6 of my rigs have been slowly losing cache.
Getting some here and there.
Just a heads up from the kitty crew.

Meow.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1925184 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1925187 - Posted: 18 Mar 2018, 16:33:16 UTC

Woke this morning to find all machines severely down on gpu work. It appears that the old problem of only Arecibo tasks coming out of the scheduler has appeared and thus a shortage sent to Nvidia machines that only pick up a few Arecibo non-VLAR's. Don't see a single BLC task in the last 4 hours in any of my machines Event Logs.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1925187 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51470
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1925192 - Posted: 18 Mar 2018, 17:05:37 UTC - in response to Message 1925187.  

That's what I was seeing. All Arecibo or nothing at all.
I did just get a hit on this rig of some 30 tasks and it was almost all BLC work.
So maybe the Arecibo splitting is almost over and caches will start to recover.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1925192 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1925199 - Posted: 18 Mar 2018, 17:57:02 UTC
Last modified: 18 Mar 2018, 17:59:22 UTC

I just receive a lot of blc01 WU.

What is strange are the temps on the GPU when crunch this WU, even the GPU usage still at almost 98-100% the temps drops about 5 C from the normal range 48-55C now at 43-51C only. Only blc01 are crunching on the GPUs at this time. CPUs temp are normal.
ID: 1925199 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1925203 - Posted: 18 Mar 2018, 18:10:55 UTC

Yes, I have received a mix of Arecibo/Green Banks tasks now and the caches are refilling. Don't know whether we just hit a pocket of exclusive Arecibo in the RTS buffer or whether the elves were in the lab this morning to massage the scheduler.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1925203 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1925209 - Posted: 18 Mar 2018, 18:27:57 UTC

Still having issues getting full caches. Seeing a lot of "This computer has reached a limit on tasks in progress" messages. I know it has to with the Project Preferences settings. Started getting that when I set the toggle for AP work and now there is no AP work, it is preventing getting MB work. Can't seem to get the schedulers to ignore that request even though I have tried the settings update and Triple Update tricks.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1925209 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1925212 - Posted: 18 Mar 2018, 18:45:00 UTC
Last modified: 18 Mar 2018, 18:46:05 UTC

Try....

Run only the selected applications	AstroPulse v7: yes
                                        SETI@home v8: yes
If no work for selected applications is available, accept work from other applications?	no


works here normally.
ID: 1925212 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1925217 - Posted: 18 Mar 2018, 19:41:06 UTC - in response to Message 1925212.  

That's what I normally use. When the issue crops up for me I have to keep toggling the accept other work off and on to wake up the schedulers. Or turn off AP work.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1925217 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1925218 - Posted: 18 Mar 2018, 19:44:13 UTC

I see that the Haveland site renewed their security certificate. It was expired yesterday and the site wouldn't load.

I see that others must be having issues getting work too as the Tasks in Progress is steadily dropping.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1925218 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1925230 - Posted: 18 Mar 2018, 20:41:34 UTC

Caches dropping again. Nothing but Arecibo work.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1925230 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51470
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1925231 - Posted: 18 Mar 2018, 20:45:00 UTC - in response to Message 1925230.  

Well, there's no more splitting as of right now.
This rig actually ran out of GPU work.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1925231 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1925239 - Posted: 18 Mar 2018, 21:44:31 UTC
Last modified: 18 Mar 2018, 21:48:12 UTC

The request for new works end on:

Sun 18 Mar 2018 04:41:47 PM EST | SETI@home | This computer has reached a limit on tasks in progress


The same thing happens the last week. Seems like a pattern when the Arecibo tapes are done.

On my host:

Nvidia WU
Max tasks per day 2190
Number of tasks today 851

CPU WU
Max tasks per day 734
Number of tasks today 36

So makes no sense the msg.
ID: 1925239 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 1925243 - Posted: 18 Mar 2018, 21:56:42 UTC

I´ve seen that message many times over time and not just now.
ID: 1925243 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1925244 - Posted: 18 Mar 2018, 22:01:19 UTC - in response to Message 1925243.  

I´ve seen that message many times over time and not just now.


Could be but that is not normal in a host like mine who crunch 1000's of WU per day with almost no WU with error.

The next call for works picks 54 Arecibo WU (no blc?), in the following one the msg returns.

Something is happening on the server side.
ID: 1925244 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1925246 - Posted: 18 Mar 2018, 22:08:41 UTC - in response to Message 1925184.  

Hmmm....the kitties are detecting a bit of a lack of nVidia GPU work for the servers to send.
All 6 of my rigs have been slowly losing cache.
Getting some here and there.
Just a heads up from the kitty crew.

Meow.


. . It seems to be another Arecibo VLAR logjam stuffing up the schedulers. I am getting 100% Arecibo VLAR's on my CPUs and having trouble getting work for the GPUs. When I do it is Arecibo work and that is despite the fact the Arecibo tapes finished splitting a while ago. It seems the splitters are hung again on the Greenbank tapes as the number of channels left to do is not decreasing very fast.

Stephen

? ?
ID: 1925246 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1925247 - Posted: 18 Mar 2018, 22:20:44 UTC - in response to Message 1925209.  

Still having issues getting full caches. Seeing a lot of "This computer has reached a limit on tasks in progress" messages. I know it has to with the Project Preferences settings. Started getting that when I set the toggle for AP work and now there is no AP work, it is preventing getting MB work. Can't seem to get the schedulers to ignore that request even though I have tried the settings update and Triple Update tricks.


. . I don't think that is the culprit in this case. Because "Bertie" can no longer run the AP app I have turned off AP in preferences so I should not be requesting any of those but I am having the same symptoms as you. GPU cache is 47/100 but scheduler response is "you have reached your limit". I give the schedulers a kick and then I get some work but always less than a full allotment. Aint life grand :)

Stephen

? ?
ID: 1925247 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 35532
Credit: 261,360,520
RAC: 489
Australia
Message 1925248 - Posted: 18 Mar 2018, 22:24:16 UTC
Last modified: 18 Mar 2018, 22:24:47 UTC

At least you know that your CPU caches are full. ;-)

Cheers.
ID: 1925248 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14660
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1925250 - Posted: 18 Mar 2018, 22:29:03 UTC - in response to Message 1925248.  

If your CPU caches are full before your GPU caches, you're probably asking for too much work.
ID: 1925250 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 35532
Credit: 261,360,520
RAC: 489
Australia
Message 1925251 - Posted: 18 Mar 2018, 22:33:03 UTC

Nah, they're just burning through their GPU caches too fast Richard. :-D

Cheers.
ID: 1925251 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14660
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1925253 - Posted: 18 Mar 2018, 22:43:53 UTC - in response to Message 1925251.  

Yeah, there are both VLAR Arecibo tasks and VHAR tasks in the batch that has just finished splitting. They do mess with your cache calculations.
ID: 1925253 · Report as offensive
Previous · 1 . . . 30 · 31 · 32 · 33 · 34 · 35 · 36 . . . 37 · Next

Message boards : Number crunching : Panic Mode On (110) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.