Panic Mode On (108) Server Problems?

Message boards : Number crunching : Panic Mode On (108) Server Problems?
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 16 · Next

AuthorMessage
Profile arkaynProject Donor
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4184
Credit: 53,049,088
RAC: 28
United States
Message 1898222 - Posted: 30 Oct 2017, 5:31:49 UTC

While the RTS says there is 600,000 plus available, the GPU's are getting nothing.

This and much more on the next thread in the series......









(I can't complain as I have plenty of work for my two 1070's, but it is all Collatz)

ID: 1898222 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 3806
Credit: 187,234,549
RAC: 239,614
United States
Message 1898224 - Posted: 30 Oct 2017, 5:57:32 UTC
Last modified: 30 Oct 2017, 6:01:20 UTC

My theory is there must be around 600k Arecibo VLARs in the RTS. At that count level the creation rate drops off meaning very few tasks are being sent. The extra 100000 were sent out quickly, so, that would imply they weren't VLARs. Solution, temporarily raise the RTS to 1 Mil and hope most of the new additions aren't VLARs. Once the VLARs drop off lower the RTS to normal.
Problem solved...for now.
ID: 1898224 · Report as offensive     Reply Quote
Stephen "Heretic"Project Donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 2638
Credit: 48,698,364
RAC: 137,391
Australia
Message 1898229 - Posted: 30 Oct 2017, 6:58:10 UTC - in response to Message 1898224.  

My theory is there must be around 600k Arecibo VLARs in the RTS. At that count level the creation rate drops off meaning very few tasks are being sent. The extra 100000 were sent out quickly, so, that would imply they weren't VLARs. Solution, temporarily raise the RTS to 1 Mil and hope most of the new additions aren't VLARs. Once the VLARs drop off lower the RTS to normal.
Problem solved...for now.


. . Well that theory isn't been borne out. I have re-configured (changed location) this rig to take CPU work, since there is no GPU work, but still nada, zip, zilch! No flood of Arecibo VLAR tasks that I was hoping for. I can't get work by begging for it ...

Stephen

:(
ID: 1898229 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 3806
Credit: 187,234,549
RAC: 239,614
United States
Message 1898230 - Posted: 30 Oct 2017, 7:08:27 UTC - in response to Message 1898229.  
Last modified: 30 Oct 2017, 7:41:38 UTC

My theory is there must be around 600k Arecibo VLARs in the RTS. At that count level the creation rate drops off meaning very few tasks are being sent. The extra 100000 were sent out quickly, so, that would imply they weren't VLARs. Solution, temporarily raise the RTS to 1 Mil and hope most of the new additions aren't VLARs. Once the VLARs drop off lower the RTS to normal.
Problem solved...for now.


. . Well that theory isn't been borne out. I have re-configured (changed location) this rig to take CPU work, since there is no GPU work, but still nada, zip, zilch! No flood of Arecibo VLAR tasks that I was hoping for. I can't get work by begging for it ...

Stephen

:(

As usual, what works for Most people doesn't seem to work for you,
https://setiathome.berkeley.edu/results.php?hostid=8097309
All I did was increase the cache setting a little and I instantly received 30 VLARs on the next contact.
If I needed GPU tasks I would simply reassign them to the GPUs. Arecibo VLARs run about the same as a BLC5 on the Special App.
Watch as the first two appear as finished in about 13 minutes...

Hmmm, looks as though my clanged together Maxwell zi3xs3 is a little faster on the Arecibo VLARs than zi3v,
29ja07ad.16591.13571.10.37.63.vlar_1 Runtime = 10 min 31 sec
29ja07ad.16591.13571.10.37.69.vlar_1 Run time = 10 min 30 sec

Look at that. I downloaded 30 VLARs and the Server rewarded me with some real GPU task for that machine. Nice.
ID: 1898230 · Report as offensive     Reply Quote
Profile Jimbocous
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1003
Credit: 91,129,881
RAC: 88,941
United States
Message 1898239 - Posted: 30 Oct 2017, 8:24:11 UTC

I never get effected by this, but tonight for some reason I did.
I changed my profile ("school" vs. "home") twice and updated, and am now refilling.
:shrugs:
ID: 1898239 · Report as offensive     Reply Quote
Tutankhamon
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 6708
Credit: 42,283,871
RAC: 11,407
Sweden
Message 1898241 - Posted: 30 Oct 2017, 9:01:11 UTC

Even the function "If no work for selected applications is available, accept work from other applications?" stopped working as it used to when they did some changes last December.
I used to have only "AstroPulse v7: yes" ticked, and "SETI@home v8: no", and then "If no work for selected applications is available, accept work from other applications?: Yes" ticked.
Then I would get SETI@home v8 tasks if there wasn't any AP's available,

That doesn't work any longer. With that setting I will not get any tasks at all, if there isn't any APs's available.
So simply put : "If no work for selected applications is available, accept work from other applications?: Yes, does not work any longer.
ID: 1898241 · Report as offensive     Reply Quote
Profile Jimbocous
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1003
Credit: 91,129,881
RAC: 88,941
United States
Message 1898242 - Posted: 30 Oct 2017, 9:35:25 UTC - in response to Message 1898241.  

"If no work for selected applications is available, accept work from other applications?: Yes, does not work any longer.

That's basically all I'm toggling when I swap from Home to School. I did have to do it a couple times...
ID: 1898242 · Report as offensive     Reply Quote
Richard HaselgroveProject Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 11516
Credit: 106,439,295
RAC: 71,425
United Kingdom
Message 1898247 - Posted: 30 Oct 2017, 10:51:04 UTC

That's interesting. The SSP says that the number 'ready to send' has dropped by 400K+ this morning (last five hours back from now), and the number of tasks in progress has gone up by about the same amount. I'd say somebody is getting work - I got over 100 of them.

I don't, as yet, have any explanation for that. But an explanation is what is needed first, before you can ask somebody to break into their other busy work schedules and fix it.
ID: 1898247 · Report as offensive     Reply Quote
Stephen "Heretic"Project Donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 2638
Credit: 48,698,364
RAC: 137,391
Australia
Message 1898250 - Posted: 30 Oct 2017, 11:28:59 UTC - in response to Message 1898230.  

Hmmm, looks as though my clanged together Maxwell zi3xs3 is a little faster on the Arecibo VLARs than zi3v,
29ja07ad.16591.13571.10.37.63.vlar_1 Runtime = 10 min 31 sec
29ja07ad.16591.13571.10.37.69.vlar_1 Run time = 10 min 30 sec

Look at that. I downloaded 30 VLARs and the Server rewarded me with some real GPU task for that machine. Nice.


. . When I eventually got some work for the CPU I started getting work for the GPU as well. But after that the GPU filled right up so I suspect that was when the problem was sorted out. But ironically the first batch of 50 tasks for CPU had not a single Arecibo VLAR. Mind you, after that there were a whole lot of them :(

Stephen

<shrug>
ID: 1898250 · Report as offensive     Reply Quote
dwhirl17
Avatar

Send message
Joined: 19 Feb 17
Posts: 3
Credit: 16,595,884
RAC: 91,242
United States
Message 1898283 - Posted: 30 Oct 2017, 16:36:13 UTC

Just to put in my two cents as well, my 1060's ran out of work last night around 9:00 pm EST. I tried a couple of manual updates and a restart - no changes. Tasks started trickling out at around 10:30pm EST and were back to normal this morning.
Some info on the situation/issue would be appreciated. And if there is anything we can do to resolve from the host side would also be helpful.
Regards, Doug.
ID: 1898283 · Report as offensive     Reply Quote
Profile Keith Myers
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 2447
Credit: 185,478,061
RAC: 367,384
United States
Message 1898285 - Posted: 30 Oct 2017, 16:46:26 UTC

I would say a lot of people were affected. The dropoff in tasks in progress was real and plotted on the Haveland graphs. That parameter is climbing fast as soon as someone got to the lab this morning and fixed the issue whatever it was. Yes, it would be nice to get an explanation of what happened. I see that the RTS buffer has fallen all the way down to the 200k level from the 600-700K level it was at all day yesterday and nobody was getting any work when requested.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1898285 · Report as offensive     Reply Quote
Profile Keith Myers
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 2447
Credit: 185,478,061
RAC: 367,384
United States
Message 1898453 - Posted: 1 Nov 2017, 0:57:16 UTC

Back to not being able to get any work after the outage because of "internal server error" messages. I've been dry on gpu tasks on the Linux cruncher since the project came back online. I wonder if it has anything to do with the number of tasks trying to be reported. I played with that cc_config parameter last week and it didn't have any effect by dropping to max reported of 40. Wonder if I should try again or just wait it out.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1898453 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 3806
Credit: 187,234,549
RAC: 239,614
United States
Message 1898454 - Posted: 1 Nov 2017, 1:14:18 UTC - in response to Message 1898453.  

All my machines have reloaded, looking good now.
The big question is, WTH is a blc24, https://setiathome.berkeley.edu/show_server_status.php ? Hopefully it isn't 5 times worse than a blc05? Or is it just a blc2.4?
I think we are about to find out.
ID: 1898454 · Report as offensive     Reply Quote
Profile Keith Myers
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 2447
Credit: 185,478,061
RAC: 367,384
United States
Message 1898456 - Posted: 1 Nov 2017, 1:27:33 UTC - in response to Message 1898454.  

Haha. Hopefully not the first you mentioned. I think it is just the catalog number in a prescribed search the BLC staff has developed. Don't think it has anything to do with the star Hipparchos catalog number. Maybe a map catalog number.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1898456 · Report as offensive     Reply Quote
Stephen "Heretic"Project Donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 2638
Credit: 48,698,364
RAC: 137,391
Australia
Message 1898457 - Posted: 1 Nov 2017, 1:36:13 UTC - in response to Message 1898454.  
Last modified: 1 Nov 2017, 1:46:32 UTC

All my machines have reloaded, looking good now.
The big question is, WTH is a blc24, https://setiathome.berkeley.edu/show_server_status.php ? Hopefully it isn't 5 times worse than a blc05? Or is it just a blc2.4?
I think we are about to find out.


. . The only theory I can offer is that it is a progression, GBT tasks were originally blc then 1-7 (I didn't actually see any 0 tasks but they may have existed, nor did I see any 8s so not sure at which point the series starts, 0-7 or 1-8??). Then they revised them adding an extra digit and they became 01-07, though I have yet to see anything outside 02-05. The current batch of blc04's are remarkable in that while still VLAR tasks they run as quickly as Arecibo tasks on GPUs and yet are even faster than blc02s on CPUs. Perhaps then this new number series of blc24 is an identifier for this new variation? As a reference I believe the numbers correspond to the recorder channels at Greenbank. Interesting though that there was no 1x series. There had been talk of doubling the number of recorders so perhaps they were to be 1x.

. . There is always the possibility it is a change of designation to identify 4 bit tasks or something completely different again :)

Stephen

??
ID: 1898457 · Report as offensive     Reply Quote
Profile Keith Myers
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 2447
Credit: 185,478,061
RAC: 367,384
United States
Message 1898458 - Posted: 1 Nov 2017, 1:48:57 UTC - in response to Message 1898457.  

There had been talk of doubling the number of recorders so perhaps they were to be 1x.

. . There is always the possibility it is a change of designation to identify 4 bit tasks or something completely different again :)

Stephen

??

Do you have a link about this information about doubling the recorders? I must have missed the news.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1898458 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 3806
Credit: 187,234,549
RAC: 239,614
United States
Message 1898459 - Posted: 1 Nov 2017, 1:50:59 UTC
Last modified: 1 Nov 2017, 2:17:27 UTC

How quickly things change. Suddenly the RTS is Empty and the creation rate is in the toilet. Trying to grab the last few netted 33 out of 250.
I hope you got them while you could. I did get a couple 24s and even a 25, I'll see how they run.

Well, the 24s & 25s appear to be about the same as the old blc3s...so, not that bad.
blc25_2bit_guppi_57895_36299_KIC8462852_0002.32486.0.23.46.204.vlar_0
Run time : 4 min 40 sec
WU true angle range is : 0.008985

blc24_2bit_guppi_57895_36299_KIC8462852_0002.5027.0.24.47.174.vlar_1
Run time : 7 min 10 sec
WU true angle range is : 0.009079

Of course it depends on your machine, note the above difference between a GTX 1060 & 1050Ti.
ID: 1898459 · Report as offensive     Reply Quote
Profile Keith Myers
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 2447
Credit: 185,478,061
RAC: 367,384
United States
Message 1898462 - Posted: 1 Nov 2017, 2:14:29 UTC - in response to Message 1898459.  

Doesn't help that the Haveland graphs have gone missing too. 100% packet loss.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1898462 · Report as offensive     Reply Quote
Stephen "Heretic"Project Donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 2638
Credit: 48,698,364
RAC: 137,391
Australia
Message 1898463 - Posted: 1 Nov 2017, 2:21:40 UTC - in response to Message 1898458.  
Last modified: 1 Nov 2017, 2:28:03 UTC

There had been talk of doubling the number of recorders so perhaps they were to be 1x.

. . There is always the possibility it is a change of designation to identify 4 bit tasks or something completely different again :)

Stephen

??

Do you have a link about this information about doubling the recorders? I must have missed the news.


. . It was ages ago, it was some time before they changed the designator to 0x. I can't help with the link :( There was chat in one thread about what the number in the blc designator meant and someone (might have been Zalster) posted the link to an article which gave the information about the recorders and the number being associated. That article also mentioned plans to add another back of eight recorders. So my first reaction when the designator became 0x was that it was preparation for that.

. . But I have now run about a dozen of the blc24 tasks and the run times are closest to blc05. Oh well, scratch theory a).

Stephen

:(
ID: 1898463 · Report as offensive     Reply Quote
Profile ZalsterProject Donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 3993
Credit: 208,963,059
RAC: 38,925
United States
Message 1898464 - Posted: 1 Nov 2017, 2:32:50 UTC - in response to Message 1898463.  

Eric explained what it meant in the News section

https://setiathome.berkeley.edu/forum_thread.php?id=79411&postid=1778453
ID: 1898464 · Report as offensive     Reply Quote
1 · 2 · 3 · 4 . . . 16 · Next

Message boards : Number crunching : Panic Mode On (108) Server Problems?


 
©2017 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.