Panic Mode On (93) Server Problems?

Message boards : Number crunching : Panic Mode On (93) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 24 · Next

AuthorMessage
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1639
Credit: 12,921,799
RAC: 89
New Zealand
Message 1615452 - Posted: 17 Dec 2014, 20:54:42 UTC - in response to Message 1615251.  


I have picked up roughly 52 VLARs. Is there any way without running the tasks knowing which tasks are going to exit early and become overflow's? Reason why I am asking instant that I can return the tasks in the fastest possible way?

Normally tasks that end in 0 or 255, 1, 2, 254 and 253 may also overflow early, If the % gets to 0.300% then they probably won't overflow early.

If you use the Shift key you can Block select a bunch of unstarted Wu's, then use the Ctrl key to deselect Wu's you do want to run,
then hit 'Suspend' to suspend them, then suspend one or more of the running tasks, now wait until the potential overflow task gets to 0.300%,
then suspend it to try another, wait until the next Wu gets to 0.300%, then repeat,
Once you're tested all the Wu's you want to try, then unsuspend all the Wu's, any Wu's that have been started will be run first, before returning to FIFO order,
If you have 'Leave tasks in memory while suspended?' set to Yes, then you don't loose time when you get to crunch them fully, you just use memory until you do.

Claggy

Thanks for the tips. I tried I had one task ending 255 however it did not early complete after .300%
ID: 1615452 · Report as offensive
OTS
Volunteer tester

Send message
Joined: 6 Jan 08
Posts: 369
Credit: 20,533,537
RAC: 0
United States
Message 1615596 - Posted: 18 Dec 2014, 1:25:22 UTC - in response to Message 1615452.  

Looks like the AP splitters are running again :).
ID: 1615596 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11358
Credit: 29,581,041
RAC: 66
United States
Message 1615628 - Posted: 18 Dec 2014, 3:01:43 UTC

I just got a fresh AP on the machine which was down to MB, Most cool.
ID: 1615628 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1615662 - Posted: 18 Dec 2014, 7:28:50 UTC

Must be something special about 26au14ab, there are now 4 AP splitters working on it.
Grant
Darwin NT
ID: 1615662 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1615673 - Posted: 18 Dec 2014, 8:23:02 UTC - in response to Message 1615150.  

The problem with the profile images and avatars seems to have solved itself. Magically. But thank you Matt for checking, nonetheless. :)
ID: 1615673 · Report as offensive
Profile JaundicedEye
Avatar

Send message
Joined: 14 Mar 12
Posts: 5375
Credit: 30,870,693
RAC: 1
United States
Message 1615782 - Posted: 18 Dec 2014, 14:44:01 UTC

Waiting for AP's is like waiting for Molasses to flow........still no GPU work just see2(and those are sparse).

"Sour Grapes make a bitter Whine." <(0)>
ID: 1615782 · Report as offensive
Profile ReiAyanami
Avatar

Send message
Joined: 6 Dec 05
Posts: 116
Credit: 222,900,202
RAC: 174
Japan
Message 1615784 - Posted: 18 Dec 2014, 14:46:44 UTC
Last modified: 18 Dec 2014, 14:50:05 UTC

My main PC used to say "In progress" (400), or at least close to it.
Now it's down to 250. Is this something I need to worry about?
All CPU work units say vlar, if it means something...
I shoud have gotten one AP but cant't find it....mmmm
ID: 1615784 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1615831 - Posted: 18 Dec 2014, 17:07:21 UTC - in response to Message 1615826.  

Still pathetic delivery of AP WU's. Something is not right with the AP part of SETI.

Molasses flows like water, compared to AP delivery....

On average I'm getting about 1 per hour... across ~30 machines.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1615831 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1615846 - Posted: 18 Dec 2014, 17:30:06 UTC - in response to Message 1615841.  

I had to switch to SETI beta last night just to keep the room warm. Slept like a baby with the 150+ AP's in que. Letting them drain down now. Probably go back tonight unless there is a big change on production. I also learned how to set my command lines with the text instructions that down loaded with beta. That was a big plus.

Did you read http://setiweb.ssl.berkeley.edu/beta/forum_thread.php?id=2213, where the request is

Cause it's test for stock app (stock mode) it's important to see how app work w/o any additional tweaking - that's "stock mode" about.
"stock mode" doesn't necessary mean best performance, additional tweaking can improve performance, but in "stock mode" app should:
1) work
2) interfere with user as low as possible (to encourage users to leave app running while they active instead of disabling GPU usage in such cases).

and

My feeing that to have ~100 completed and validated per device that include overflows, zero-blanked, high blanked is enough for this testing.
ID: 1615846 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1615865 - Posted: 18 Dec 2014, 18:02:55 UTC - in response to Message 1615855.  

Thank you Richard,

I ran 80% pure stock and then tried the instructions given on the command line to see if there was any difference. I am letting the que drain (No new task).

That's well within the guidelines - thank you. I hope you have some useful observations to pass back - like, did the app meet the stock target of interfering with general computer use as little as possible?
ID: 1615865 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1615882 - Posted: 18 Dec 2014, 18:33:53 UTC
Last modified: 18 Dec 2014, 18:49:10 UTC

I'm still having trouble receiving work for my Mac. Seems it will just sit there not even asking for work even though the GPUs are Idle. I tried older BOINCs, same thing. When it does ask for work it shows;
Thu Dec 18 13:15:00 2014 | SETI@home | Sending scheduler request: To fetch work.
Thu Dec 18 13:15:00 2014 | SETI@home | Requesting new tasks for ATI
Thu Dec 18 13:15:00 2014 | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
Thu Dec 18 13:15:00 2014 | SETI@home | [sched_op] ATI work request: 2604960.00 seconds; 3.00 devices
Thu Dec 18 13:15:03 2014 | SETI@home | Scheduler request completed: got 0 new tasks
Thu Dec 18 13:15:03 2014 | SETI@home | [sched_op] Server version 705
Thu Dec 18 13:15:03 2014 | SETI@home | No tasks sent

Hitting the Update button receives the same response.

I stumbled across what appears to be a Server delay that my Windows machines don't have. It says;
AMD/ATI GPU work fetch deferral interval 00:20:00
Projects/Properties of project SETI@home

So, why is my out of work machine having to wait 20 minutes between useless requests when my other 2 machines aren't having to wait?
I just checked it again, now it's up to a 40 minute interval, while the GPUs are quiet.

BTW, here is my Win 8.1 machine that isn't showing any "interval";
12/18/2014 13:28:30 | SETI@home | Sending scheduler request: To report completed tasks.
12/18/2014 13:28:30 | SETI@home | Reporting 1 completed tasks
12/18/2014 13:28:30 | SETI@home | Requesting new tasks for ATI
12/18/2014 13:28:32 | SETI@home | Scheduler request completed: got 13 new tasks...
ID: 1615882 · Report as offensive
Aurora Borealis
Volunteer tester
Avatar

Send message
Joined: 14 Jan 01
Posts: 3075
Credit: 5,631,463
RAC: 0
Canada
Message 1615895 - Posted: 18 Dec 2014, 18:50:00 UTC - in response to Message 1615882.  


So, why is my out of work machine having to wait 20 minutes between useless requests when my other 2 machines aren't having to wait?
I just checked it again, now it's up to a 40 minute interval, while the GPUs are quiet.

BOINC has an incremental delay built in for failed response. This is to avoid DOSing the servers. It the same no matter which BOINC version or OS used.
ID: 1615895 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1615896 - Posted: 18 Dec 2014, 18:53:51 UTC - in response to Message 1615895.  
Last modified: 18 Dec 2014, 19:00:28 UTC


So, why is my out of work machine having to wait 20 minutes between useless requests when my other 2 machines aren't having to wait?
I just checked it again, now it's up to a 40 minute interval, while the GPUs are quiet.

BOINC has an incremental delay built in for failed response. This is to avoid DOSing the servers. It the same no matter which BOINC version or OS used.

So if your machine runs out of work you're SOL? My other 2 machines don't have this interval and are receiving work. Why is my Best machine being punished? Why is the normal 5 minutes not enough? The 5 minute delay Works, then it just sits there. One would think 5 minutes is enough to prevent "DOSing the servers". I believe the 5 minute delay was intended to prevent DOSing, Why should a machine have to wait 40 minutes?
ID: 1615896 · Report as offensive
Aurora Borealis
Volunteer tester
Avatar

Send message
Joined: 14 Jan 01
Posts: 3075
Credit: 5,631,463
RAC: 0
Canada
Message 1615897 - Posted: 18 Dec 2014, 19:00:51 UTC - in response to Message 1615896.  
Last modified: 18 Dec 2014, 19:07:21 UTC


So, why is my out of work machine having to wait 20 minutes between useless requests when my other 2 machines aren't having to wait?
I just checked it again, now it's up to a 40 minute interval, while the GPUs are quiet.

BOINC has an incremental delay built in for failed response. This is to avoid DOSing the servers. It the same no matter which BOINC version or OS used.

So if your machine runs out of work you're SOL? My other 2 machines don't have this interval and are receiving work. Why is my Best machine being punished? Why is the normal 5 minutes not enough? The 5 minute delay Works, then it just sits there. One would think 5 minutes is enough to prevent "DOSing the servers".

There is a 24 hr limit to the backoff. Which is a lot better than DA original intended limit of 2 weeks.
edit: 5 min. is not much if you've got 100,000 computers banging the server.
ID: 1615897 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 1615899 - Posted: 18 Dec 2014, 19:02:22 UTC - in response to Message 1615895.  


So, why is my out of work machine having to wait 20 minutes between useless requests when my other 2 machines aren't having to wait?
I just checked it again, now it's up to a 40 minute interval, while the GPUs are quiet.

BOINC has an incremental delay built in for failed response. This is to avoid DOSing the servers. It the same no matter which BOINC version or OS used.

Pre BOINC v7 doesn't have the same high delay. It's just silly.
ID: 1615899 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1615902 - Posted: 18 Dec 2014, 19:05:51 UTC - in response to Message 1615897.  


So, why is my out of work machine having to wait 20 minutes between useless requests when my other 2 machines aren't having to wait?
I just checked it again, now it's up to a 40 minute interval, while the GPUs are quiet.

BOINC has an incremental delay built in for failed response. This is to avoid DOSing the servers. It the same no matter which BOINC version or OS used.

So if your machine runs out of work you're SOL? My other 2 machines don't have this interval and are receiving work. Why is my Best machine being punished? Why is the normal 5 minutes not enough? The 5 minute delay Works, then it just sits there. One would think 5 minutes is enough to prevent "DOSing the servers".

There is a 24 hr limit to the backoff. Which is a lot better than DA original intended limit of 2 weeks.

So tell me why the CPUs don't have this Delay. That's right, the CPUs don't have a delay, just the GPUs. So your attempted explanation fails. Can someone tell me why there is a 40 minute delay on the GPUs but Not the CPUs?
ID: 1615902 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1615907 - Posted: 18 Dec 2014, 19:10:48 UTC - in response to Message 1615902.  


So, why is my out of work machine having to wait 20 minutes between useless requests when my other 2 machines aren't having to wait?
I just checked it again, now it's up to a 40 minute interval, while the GPUs are quiet.

BOINC has an incremental delay built in for failed response. This is to avoid DOSing the servers. It the same no matter which BOINC version or OS used.

So if your machine runs out of work you're SOL? My other 2 machines don't have this interval and are receiving work. Why is my Best machine being punished? Why is the normal 5 minutes not enough? The 5 minute delay Works, then it just sits there. One would think 5 minutes is enough to prevent "DOSing the servers".

There is a 24 hr limit to the backoff. Which is a lot better than DA original intended limit of 2 weeks.

So tell me why the CPUs don't have this Delay. That's right, the CPUs don't have a delay, just the GPUs. So your attempted explanation fails. Can someone tell me why there is a 40 minute delay on the GPUs but Not the CPUs?

Both CPUs and GPUs have the same backoff rules, as can be seen by turning on <work_fetch_debug>. But you would need to study a stable configuration over time, to see when backoffs are applied and when they are cleared.
ID: 1615907 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1615911 - Posted: 18 Dec 2014, 19:15:53 UTC - in response to Message 1615907.  
Last modified: 18 Dec 2014, 19:16:45 UTC


So, why is my out of work machine having to wait 20 minutes between useless requests when my other 2 machines aren't having to wait?
I just checked it again, now it's up to a 40 minute interval, while the GPUs are quiet.

BOINC has an incremental delay built in for failed response. This is to avoid DOSing the servers. It the same no matter which BOINC version or OS used.

So if your machine runs out of work you're SOL? My other 2 machines don't have this interval and are receiving work. Why is my Best machine being punished? Why is the normal 5 minutes not enough? The 5 minute delay Works, then it just sits there. One would think 5 minutes is enough to prevent "DOSing the servers".

There is a 24 hr limit to the backoff. Which is a lot better than DA original intended limit of 2 weeks.

So tell me why the CPUs don't have this Delay. That's right, the CPUs don't have a delay, just the GPUs. So your attempted explanation fails. Can someone tell me why there is a 40 minute delay on the GPUs but Not the CPUs?

Both CPUs and GPUs have the same backoff rules, as can be seen by turning on <work_fetch_debug>. But you would need to study a stable configuration over time, to see when backoffs are applied and when they are cleared.

My CPUs are Not showing any Work Fetch Deferral Interval. The GPUs are. I'll bet if I increase the cache setting I will receive CPU tasks with mixed VLARS & non-VLARS, been there done that. But the server is refusing to send those same non-VLARs to my GPUs.
Why?
ID: 1615911 · Report as offensive
Aurora Borealis
Volunteer tester
Avatar

Send message
Joined: 14 Jan 01
Posts: 3075
Credit: 5,631,463
RAC: 0
Canada
Message 1615915 - Posted: 18 Dec 2014, 19:23:00 UTC - in response to Message 1615902.  


So, why is my out of work machine having to wait 20 minutes between useless requests when my other 2 machines aren't having to wait?
I just checked it again, now it's up to a 40 minute interval, while the GPUs are quiet.

BOINC has an incremental delay built in for failed response. This is to avoid DOSing the servers. It the same no matter which BOINC version or OS used.

So if your machine runs out of work you're SOL? My other 2 machines don't have this interval and are receiving work. Why is my Best machine being punished? Why is the normal 5 minutes not enough? The 5 minute delay Works, then it just sits there. One would think 5 minutes is enough to prevent "DOSing the servers".

There is a 24 hr limit to the backoff. Which is a lot better than DA original intended limit of 2 weeks.

So tell me why the CPUs don't have this Delay. That's right, the CPUs don't have a delay, just the GPUs. So your attempted explanation fails. Can someone tell me why there is a 40 minute delay on the GPUs but Not the CPUs?

If you look at the project property you will find that there is a separate backoff for GPU and CPU.

BTW I can't find the date I got DA to reduce the delays to 24 hr but I'm fairly certain they were introduce in the mid V6 releases. I don't know when the splitting of delays for CPU/GPU occurred.
ID: 1615915 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1615917 - Posted: 18 Dec 2014, 19:24:56 UTC - in response to Message 1615911.  


So, why is my out of work machine having to wait 20 minutes between useless requests when my other 2 machines aren't having to wait?
I just checked it again, now it's up to a 40 minute interval, while the GPUs are quiet.

BOINC has an incremental delay built in for failed response. This is to avoid DOSing the servers. It the same no matter which BOINC version or OS used.

So if your machine runs out of work you're SOL? My other 2 machines don't have this interval and are receiving work. Why is my Best machine being punished? Why is the normal 5 minutes not enough? The 5 minute delay Works, then it just sits there. One would think 5 minutes is enough to prevent "DOSing the servers".

There is a 24 hr limit to the backoff. Which is a lot better than DA original intended limit of 2 weeks.

So tell me why the CPUs don't have this Delay. That's right, the CPUs don't have a delay, just the GPUs. So your attempted explanation fails. Can someone tell me why there is a 40 minute delay on the GPUs but Not the CPUs?

Both CPUs and GPUs have the same backoff rules, as can be seen by turning on <work_fetch_debug>. But you would need to study a stable configuration over time, to see when backoffs are applied and when they are cleared.

My CPUs are Not showing any Work Fetch Deferral Interval. The GPUs are. I'll bet if I increase the cache setting I will receive CPU tasks with mixed VLARS & non-VLARS, been there done that. But the server is refusing to send those same non-VLARs to my GPUs.
Why?

To be honest, I don't know. But then again, I'm not the systems analyst responsible for designing a system that distributes viable workunits across a mixed fleet of ~150,000 active computers. I simply observe that the current processing rate (returned results) is very much in line with the long term average - so the project as a whole is working as required.
ID: 1615917 · Report as offensive
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 24 · Next

Message boards : Number crunching : Panic Mode On (93) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.