Panic Mode On (108) Server Problems?

Message boards : Number crunching : Panic Mode On (108) Server Problems?
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 29 · Next

AuthorMessage
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1898222 - Posted: 30 Oct 2017, 5:31:49 UTC

While the RTS says there is 600,000 plus available, the GPU's are getting nothing.

This and much more on the next thread in the series......









(I can't complain as I have plenty of work for my two 1070's, but it is all Collatz)

ID: 1898222 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1898224 - Posted: 30 Oct 2017, 5:57:32 UTC
Last modified: 30 Oct 2017, 6:01:20 UTC

My theory is there must be around 600k Arecibo VLARs in the RTS. At that count level the creation rate drops off meaning very few tasks are being sent. The extra 100000 were sent out quickly, so, that would imply they weren't VLARs. Solution, temporarily raise the RTS to 1 Mil and hope most of the new additions aren't VLARs. Once the VLARs drop off lower the RTS to normal.
Problem solved...for now.
ID: 1898224 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1898229 - Posted: 30 Oct 2017, 6:58:10 UTC - in response to Message 1898224.  

My theory is there must be around 600k Arecibo VLARs in the RTS. At that count level the creation rate drops off meaning very few tasks are being sent. The extra 100000 were sent out quickly, so, that would imply they weren't VLARs. Solution, temporarily raise the RTS to 1 Mil and hope most of the new additions aren't VLARs. Once the VLARs drop off lower the RTS to normal.
Problem solved...for now.


. . Well that theory isn't been borne out. I have re-configured (changed location) this rig to take CPU work, since there is no GPU work, but still nada, zip, zilch! No flood of Arecibo VLAR tasks that I was hoping for. I can't get work by begging for it ...

Stephen

:(
ID: 1898229 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1898230 - Posted: 30 Oct 2017, 7:08:27 UTC - in response to Message 1898229.  
Last modified: 30 Oct 2017, 7:41:38 UTC

My theory is there must be around 600k Arecibo VLARs in the RTS. At that count level the creation rate drops off meaning very few tasks are being sent. The extra 100000 were sent out quickly, so, that would imply they weren't VLARs. Solution, temporarily raise the RTS to 1 Mil and hope most of the new additions aren't VLARs. Once the VLARs drop off lower the RTS to normal.
Problem solved...for now.


. . Well that theory isn't been borne out. I have re-configured (changed location) this rig to take CPU work, since there is no GPU work, but still nada, zip, zilch! No flood of Arecibo VLAR tasks that I was hoping for. I can't get work by begging for it ...

Stephen

:(

As usual, what works for Most people doesn't seem to work for you,
https://setiathome.berkeley.edu/results.php?hostid=8097309
All I did was increase the cache setting a little and I instantly received 30 VLARs on the next contact.
If I needed GPU tasks I would simply reassign them to the GPUs. Arecibo VLARs run about the same as a BLC5 on the Special App.
Watch as the first two appear as finished in about 13 minutes...

Hmmm, looks as though my clanged together Maxwell zi3xs3 is a little faster on the Arecibo VLARs than zi3v,
29ja07ad.16591.13571.10.37.63.vlar_1 Runtime = 10 min 31 sec
29ja07ad.16591.13571.10.37.69.vlar_1 Run time = 10 min 30 sec

Look at that. I downloaded 30 VLARs and the Server rewarded me with some real GPU task for that machine. Nice.
ID: 1898230 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1856
Credit: 268,616,081
RAC: 1,349
United States
Message 1898239 - Posted: 30 Oct 2017, 8:24:11 UTC

I never get effected by this, but tonight for some reason I did.
I changed my profile ("school" vs. "home") twice and updated, and am now refilling.
:shrugs:
ID: 1898239 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1856
Credit: 268,616,081
RAC: 1,349
United States
Message 1898242 - Posted: 30 Oct 2017, 9:35:25 UTC - in response to Message 1898241.  

"If no work for selected applications is available, accept work from other applications?: Yes, does not work any longer.

That's basically all I'm toggling when I swap from Home to School. I did have to do it a couple times...
ID: 1898242 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1898247 - Posted: 30 Oct 2017, 10:51:04 UTC

That's interesting. The SSP says that the number 'ready to send' has dropped by 400K+ this morning (last five hours back from now), and the number of tasks in progress has gone up by about the same amount. I'd say somebody is getting work - I got over 100 of them.

I don't, as yet, have any explanation for that. But an explanation is what is needed first, before you can ask somebody to break into their other busy work schedules and fix it.
ID: 1898247 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1898250 - Posted: 30 Oct 2017, 11:28:59 UTC - in response to Message 1898230.  

Hmmm, looks as though my clanged together Maxwell zi3xs3 is a little faster on the Arecibo VLARs than zi3v,
29ja07ad.16591.13571.10.37.63.vlar_1 Runtime = 10 min 31 sec
29ja07ad.16591.13571.10.37.69.vlar_1 Run time = 10 min 30 sec

Look at that. I downloaded 30 VLARs and the Server rewarded me with some real GPU task for that machine. Nice.


. . When I eventually got some work for the CPU I started getting work for the GPU as well. But after that the GPU filled right up so I suspect that was when the problem was sorted out. But ironically the first batch of 50 tasks for CPU had not a single Arecibo VLAR. Mind you, after that there were a whole lot of them :(

Stephen

<shrug>
ID: 1898250 · Report as offensive
dwhirl17 Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 19 Feb 17
Posts: 3
Credit: 24,532,548
RAC: 1
United States
Message 1898283 - Posted: 30 Oct 2017, 16:36:13 UTC

Just to put in my two cents as well, my 1060's ran out of work last night around 9:00 pm EST. I tried a couple of manual updates and a restart - no changes. Tasks started trickling out at around 10:30pm EST and were back to normal this morning.
Some info on the situation/issue would be appreciated. And if there is anything we can do to resolve from the host side would also be helpful.
Regards, Doug.
ID: 1898283 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1898285 - Posted: 30 Oct 2017, 16:46:26 UTC

I would say a lot of people were affected. The dropoff in tasks in progress was real and plotted on the Haveland graphs. That parameter is climbing fast as soon as someone got to the lab this morning and fixed the issue whatever it was. Yes, it would be nice to get an explanation of what happened. I see that the RTS buffer has fallen all the way down to the 200k level from the 600-700K level it was at all day yesterday and nobody was getting any work when requested.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1898285 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1898453 - Posted: 1 Nov 2017, 0:57:16 UTC

Back to not being able to get any work after the outage because of "internal server error" messages. I've been dry on gpu tasks on the Linux cruncher since the project came back online. I wonder if it has anything to do with the number of tasks trying to be reported. I played with that cc_config parameter last week and it didn't have any effect by dropping to max reported of 40. Wonder if I should try again or just wait it out.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1898453 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1898454 - Posted: 1 Nov 2017, 1:14:18 UTC - in response to Message 1898453.  

All my machines have reloaded, looking good now.
The big question is, WTH is a blc24, https://setiathome.berkeley.edu/show_server_status.php ? Hopefully it isn't 5 times worse than a blc05? Or is it just a blc2.4?
I think we are about to find out.
ID: 1898454 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1898456 - Posted: 1 Nov 2017, 1:27:33 UTC - in response to Message 1898454.  

Haha. Hopefully not the first you mentioned. I think it is just the catalog number in a prescribed search the BLC staff has developed. Don't think it has anything to do with the star Hipparchos catalog number. Maybe a map catalog number.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1898456 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1898457 - Posted: 1 Nov 2017, 1:36:13 UTC - in response to Message 1898454.  
Last modified: 1 Nov 2017, 1:46:32 UTC

All my machines have reloaded, looking good now.
The big question is, WTH is a blc24, https://setiathome.berkeley.edu/show_server_status.php ? Hopefully it isn't 5 times worse than a blc05? Or is it just a blc2.4?
I think we are about to find out.


. . The only theory I can offer is that it is a progression, GBT tasks were originally blc then 1-7 (I didn't actually see any 0 tasks but they may have existed, nor did I see any 8s so not sure at which point the series starts, 0-7 or 1-8??). Then they revised them adding an extra digit and they became 01-07, though I have yet to see anything outside 02-05. The current batch of blc04's are remarkable in that while still VLAR tasks they run as quickly as Arecibo tasks on GPUs and yet are even faster than blc02s on CPUs. Perhaps then this new number series of blc24 is an identifier for this new variation? As a reference I believe the numbers correspond to the recorder channels at Greenbank. Interesting though that there was no 1x series. There had been talk of doubling the number of recorders so perhaps they were to be 1x.

. . There is always the possibility it is a change of designation to identify 4 bit tasks or something completely different again :)

Stephen

??
ID: 1898457 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1898458 - Posted: 1 Nov 2017, 1:48:57 UTC - in response to Message 1898457.  

There had been talk of doubling the number of recorders so perhaps they were to be 1x.

. . There is always the possibility it is a change of designation to identify 4 bit tasks or something completely different again :)

Stephen

??

Do you have a link about this information about doubling the recorders? I must have missed the news.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1898458 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1898459 - Posted: 1 Nov 2017, 1:50:59 UTC
Last modified: 1 Nov 2017, 2:17:27 UTC

How quickly things change. Suddenly the RTS is Empty and the creation rate is in the toilet. Trying to grab the last few netted 33 out of 250.
I hope you got them while you could. I did get a couple 24s and even a 25, I'll see how they run.

Well, the 24s & 25s appear to be about the same as the old blc3s...so, not that bad.
blc25_2bit_guppi_57895_36299_KIC8462852_0002.32486.0.23.46.204.vlar_0
Run time : 4 min 40 sec
WU true angle range is : 0.008985

blc24_2bit_guppi_57895_36299_KIC8462852_0002.5027.0.24.47.174.vlar_1
Run time : 7 min 10 sec
WU true angle range is : 0.009079

Of course it depends on your machine, note the above difference between a GTX 1060 & 1050Ti.
ID: 1898459 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1898462 - Posted: 1 Nov 2017, 2:14:29 UTC - in response to Message 1898459.  

Doesn't help that the Haveland graphs have gone missing too. 100% packet loss.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1898462 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1898463 - Posted: 1 Nov 2017, 2:21:40 UTC - in response to Message 1898458.  
Last modified: 1 Nov 2017, 2:28:03 UTC

There had been talk of doubling the number of recorders so perhaps they were to be 1x.

. . There is always the possibility it is a change of designation to identify 4 bit tasks or something completely different again :)

Stephen

??

Do you have a link about this information about doubling the recorders? I must have missed the news.


. . It was ages ago, it was some time before they changed the designator to 0x. I can't help with the link :( There was chat in one thread about what the number in the blc designator meant and someone (might have been Zalster) posted the link to an article which gave the information about the recorders and the number being associated. That article also mentioned plans to add another back of eight recorders. So my first reaction when the designator became 0x was that it was preparation for that.

. . But I have now run about a dozen of the blc24 tasks and the run times are closest to blc05. Oh well, scratch theory a).

Stephen

:(
ID: 1898463 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1898464 - Posted: 1 Nov 2017, 2:32:50 UTC - in response to Message 1898463.  

Eric explained what it meant in the News section

https://setiathome.berkeley.edu/forum_thread.php?id=79411&postid=1778453
ID: 1898464 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1898465 - Posted: 1 Nov 2017, 2:41:24 UTC - in response to Message 1898459.  

How quickly things change. Suddenly the RTS is Empty and the creation rate is in the toilet. Trying to grab the last few netted 33 out of 250.
I hope you got them while you could. I did get a couple 24s and even a 25, I'll see how they run.

Well, the 24s & 25s appear to be about the same as the old blc3s...so, not that bad.
blc25_2bit_guppi_57895_36299_KIC8462852_0002.32486.0.23.46.204.vlar_0
Run time : 4 min 40 sec
WU true angle range is : 0.008985

blc24_2bit_guppi_57895_36299_KIC8462852_0002.5027.0.24.47.174.vlar_1
Run time : 7 min 10 sec
WU true angle range is : 0.009079

Of course it depends on your machine, note the above difference between a GTX 1060 & 1050Ti.


. . Again, variations between boxes. I have run a couple of dozen through Bertie with 2 x 970s and 1 x 1050. Run times are just over 5 mins on the 970s and 8 mins on the 1050 which puts them right in the middle of Blc05 territory on that machine (running 3v).

. . Oh well, please give me more of the blc04s.

. . More? You want more?

. . Yes please sir!

Stephen

:)
ID: 1898465 · Report as offensive
1 · 2 · 3 · 4 . . . 29 · Next

Message boards : Number crunching : Panic Mode On (108) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.