Panic Mode On (76) Server Problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (76) Server Problems?

Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · 15 · 16 . . . 21 · Next
Author Message
.clair.
Volunteer moderator
Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 23,063,564
RAC: 596
United Kingdom
Message 1274167 - Posted: 23 Aug 2012, 0:06:46 UTC

I accept that VLAR wu do take longer than `normal`
Though the longest i have seen one take on a 7970 is 25 minits (after i got it set up right)
I am willing to crunch nothing but VLAR if it was possible to do it.
Though i have not seen any heer since the latest problem was fixed.

bill
Send message
Joined: 16 Jun 99
Posts: 861
Credit: 23,957,103
RAC: 13,873
United States
Message 1274170 - Posted: 23 Aug 2012, 0:14:21 UTC - in response to Message 1274154.


Even though Fermi and later don't seem to have to many problems with them, their increased computation times (3-5 times longer than average) really throw the DCF out a lot, so I can't see the point in allowing them back ATM.

Cheers.


I don't think anyone, including myself, was advocating allowing them "back in"
at the moment. It would be nice if they could be allowed by choice, like the AP's.

As for screwing up the DCF to the point where it causes problems, that could probably be gotten around with a bit of programming. OTOH, if it remains that
vlars never get sent to Nvidia gpu's, I can live with that too, they'll get done eventually.

Profile Link
Avatar
Send message
Joined: 18 Sep 03
Posts: 828
Credit: 1,571,371
RAC: 257
Germany
Message 1274259 - Posted: 23 Aug 2012, 6:56:40 UTC - in response to Message 1274164.

A .vlar WU would mean x ~ 10.8 longer than a normal AR WU. Wasted performance.

bill - Can't agree. The vlar will take even longer to work on a cpu than a gpu.
To me crunching a work unit faster is better. If your gpu is not busy doing any other work units, why not let it do vlars if they cause no problems.

Sure, a GPU doing a VLAR might be better than an idle GPU (depending from your point of view), but if for a CPU it pretty much doesn't matter, if it's cruching a VLAR or a normal-AR task while the GPU is x times slower on VLAR, you are waisting performance if you send VLARs to a GPU. It's better for the project that a GPU do few 0.44 WUs instead of 1 VLAR. And it's up to the user to set his cache high enough that his card never idle (OK, not easy with the current load on servers, but that's another thing).
____________
.

Speedy
Volunteer tester
Avatar
Send message
Joined: 26 Jun 04
Posts: 678
Credit: 5,921,941
RAC: 4,101
New Zealand
Message 1274286 - Posted: 23 Aug 2012, 8:36:18 UTC
Last modified: 23 Aug 2012, 8:55:32 UTC

As I type parts of the SSP are 206 hours behind. [As of 23 Aug 2012 | 8:30:04 UTC] SSP was behind before this weeks outage. How are the As of* times set?
____________

Live in NZ y not join Smile City?

juan BFBProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 5394
Credit: 305,609,137
RAC: 319,941
Brazil
Message 1274332 - Posted: 23 Aug 2012, 10:36:35 UTC - in response to Message 1274259.

A .vlar WU would mean x ~ 10.8 longer than a normal AR WU. Wasted performance.

bill - Can't agree. The vlar will take even longer to work on a cpu than a gpu.
To me crunching a work unit faster is better. If your gpu is not busy doing any other work units, why not let it do vlars if they cause no problems.

Sure, a GPU doing a VLAR might be better than an idle GPU (depending from your point of view), but if for a CPU it pretty much doesn't matter, if it's cruching a VLAR or a normal-AR task while the GPU is x times slower on VLAR, you are waisting performance if you send VLARs to a GPU. It's better for the project that a GPU do few 0.44 WUs instead of 1 VLAR. And it's up to the user to set his cache high enough that his card never idle (OK, not easy with the current load on servers, but that's another thing).


I teory thats ok, but when crunching the vlar with Nvidia GPU your entire system turn dificult to use, the cursor flicks, a lot of wierd problems start to apears with the video interface, so you almost loose the host for any other use until this WU is processed, so processing Vlars with NVidia GPU in a non dedicated crunchig host realy is a waste of resources. Lets the vlars running in the CPUS and all others on the GPUs.

____________

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8631
Credit: 51,481,226
RAC: 48,546
United Kingdom
Message 1274337 - Posted: 23 Aug 2012, 10:48:19 UTC - in response to Message 1274332.

I teory thats ok, but when crunching the vlar with Nvidia GPU your entire system turn dificult to use, the cursor flicks, a lot of wierd problems start to apears with the video interface, so you almost loose the host for any other use until this WU is processed, so processing Vlars with NVidia GPU in a non dedicated crunchig host realy is a waste of resources. Lets the vlars running in the CPUS and all others on the GPUs.

Exactly.

From the project's point of view, they couldn't care less if the tasks run fast or slow. We're probably (collectively) supplying more processing power than they want or need at the moment - provided the work comes back, it's been processed accurately, and it doesn't hang around for too long (i.e. weeks), that'll be fine for them.

The screen lag, and not being able to use the machine for anything else, is the big no-no for a volunteer project. If that happens, 99% of volunteers just uninstall BOINC and walk away, cursing. Not only is their resource lost to SETI, it's lost to all the other BOINC projects too - and the person behind the computer is lost to science and scientific research. That's what SETI can't be seen to do.

Profile Link
Avatar
Send message
Joined: 18 Sep 03
Posts: 828
Credit: 1,571,371
RAC: 257
Germany
Message 1274368 - Posted: 23 Aug 2012, 12:32:50 UTC - in response to Message 1274337.

From the project's point of view, they couldn't care less if the tasks run fast or slow.

Well, project like that would never get me to crunch for them. Such thinking from the project staff was a reason why I didn't join Milkyway earlier (those days when they had highly inefficent applications themselves, didn't except anonymous platform and all the other things they did back than). I want my resources to be used as efficently as possible by the project (hence I use opt apps), if I have more than my main project can use, there are other projects, who are happy to get what's over.
____________
.

Profile tullioProject donor
Send message
Joined: 9 Apr 04
Posts: 3754
Credit: 388,028
RAC: 123
Italy
Message 1274388 - Posted: 23 Aug 2012, 13:51:08 UTC

I am crunching both a SETI vlar and a BOINC_VM Virtual Machine from CERN on this CPU, an AMD APU E-450 at 1.67 GHz, which is not a speed champion but uses only 18 W, so it can take the heat wave we have in Italy (33 C now, no AC) while I had to shut down the SUN WS with its fans going full speed.
Tullio
____________

juan BFBProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 5394
Credit: 305,609,137
RAC: 319,941
Brazil
Message 1274390 - Posted: 23 Aug 2012, 13:56:14 UTC - in response to Message 1274368.
Last modified: 23 Aug 2012, 13:58:09 UTC

From the project's point of view, they couldn't care less if the tasks run fast or slow.

Well, project like that would never get me to crunch for them. Such thinking from the project staff was a reason why I didn't join Milkyway earlier (those days when they had highly inefficent applications themselves, didn't except anonymous platform and all the other things they did back than). I want my resources to be used as efficently as possible by the project (hence I use opt apps), if I have more than my main project can use, there are other projects, who are happy to get what's over.


Please don´t missunderstud what I and Richard says, SETI is a project that is spected to runs DECADES before any spected success could be achived (unless of course our little green mens give us a hand), so "fast or slow" is relative to that, the difference from a 12 min WU (GPU) to a 1 1/2 hour (CPU) makes little difference to the 50 Years or maybe more project.

Any help is wanted, just don´t need to be worried on that particular point, we are just warning because the video lag could be a serius problem if you need to use a not crunching only host.
____________

juan BFBProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 5394
Credit: 305,609,137
RAC: 319,941
Brazil
Message 1274403 - Posted: 23 Aug 2012, 14:23:12 UTC - in response to Message 1274396.

I agree with you Mark, this was an unintended error and is fixed now.

For now... "Vlars to NVidia GPUs - Never Again"... unless someone find a way to bypass the problem with maybe some black magic... A task for our Master Guru Jason and his team...
____________

Profile Link
Avatar
Send message
Joined: 18 Sep 03
Posts: 828
Credit: 1,571,371
RAC: 257
Germany
Message 1274431 - Posted: 23 Aug 2012, 15:38:48 UTC - in response to Message 1274390.

Please don´t missunderstud what I and Richard says, SETI is a project that is spected to runs DECADES before any spected success could be achived (unless of course our little green mens give us a hand), so "fast or slow" is relative to that, the difference from a 12 min WU (GPU) to a 1 1/2 hour (CPU) makes little difference to the 50 Years or maybe more project.

I wasn't talking about the progress of SETI@Home, as Richard pointed out we donate more resources to them than they can use ATM, I was talking about efficient usage of our resources by all projects and SETI is one, that can easily use nVidia GPUs more efficiently by not assigning VLAR tasks to them. The more the project care about efficient usage of our resources, the more science we get done, not necessarily for SETI (since they just can't send out more WUs than they are already doing now) but for other projects out there.
____________
.

bill
Send message
Joined: 16 Jun 99
Posts: 861
Credit: 23,957,103
RAC: 13,873
United States
Message 1274434 - Posted: 23 Aug 2012, 15:53:15 UTC - in response to Message 1274259.

A .vlar WU would mean x ~ 10.8 longer than a normal AR WU. Wasted performance.

bill - Can't agree. The vlar will take even longer to work on a cpu than a gpu.
To me crunching a work unit faster is better. If your gpu is not busy doing any other work units, why not let it do vlars if they cause no problems.


Sure, a GPU doing a VLAR might be better than an idle GPU (depending from your point of view),

bill - You would have to prove that an idle gpu is better than a working gpu.

but if for a CPU it pretty much doesn't matter, if it's cruching a VLAR or a normal-AR task while the GPU is x times slower on VLAR, you are waisting performance if you send VLARs to a GPU.

bill - Not if the gpu is sitting idle.

It's better for the project that a GPU do few 0.44 WUs instead of 1 VLAR.

bill - So you missed the part about idle gpu.

And it's up to the user to set his cache high enough that his card never idle (OK, not easy with the current load on servers, but that's another thing).

bill - There's that pesky idle gpu again.


Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8631
Credit: 51,481,226
RAC: 48,546
United Kingdom
Message 1274452 - Posted: 23 Aug 2012, 16:38:20 UTC - in response to Message 1274431.

Please don´t missunderstud what I and Richard says, SETI is a project that is spected to runs DECADES before any spected success could be achived (unless of course our little green mens give us a hand), so "fast or slow" is relative to that, the difference from a 12 min WU (GPU) to a 1 1/2 hour (CPU) makes little difference to the 50 Years or maybe more project.

I wasn't talking about the progress of SETI@Home, as Richard pointed out we donate more resources to them than they can use ATM, I was talking about efficient usage of our resources by all projects and SETI is one, that can easily use nVidia GPUs more efficiently by not assigning VLAR tasks to them. The more the project care about efficient usage of our resources, the more science we get done, not necessarily for SETI (since they just can't send out more WUs than they are already doing now) but for other projects out there.

I don't think we're disagreeing here. Fortunately, not sending VLARs to NVidia is a win-win problem. The solution satisfies both the efficiency and the volunteer satisfaction criteria.

I was merely saying that, from the project's point of view, not alienating volunteers is the stronger argument. I don't think we'd have got the "don't send" solution coded in the first place, if we'd had to argue on the efficiency criterion alone.

Profile Link
Avatar
Send message
Joined: 18 Sep 03
Posts: 828
Credit: 1,571,371
RAC: 257
Germany
Message 1274464 - Posted: 23 Aug 2012, 17:11:40 UTC - in response to Message 1274434.
Last modified: 23 Aug 2012, 17:18:43 UTC

Sure, a GPU doing a VLAR might be better than an idle GPU (depending from your point of view),

bill - You would have to prove that an idle gpu is better than a working gpu.

The GPU does not need to be idle, it's just a matter of BOINC configuration: large enough cache and if that does not help backup project (probably necessary anyway with the current load on S@H's internet connection). Your idle GPU issue is something that the user can fix. Also such GPU might do more for the project if it's idle for a while and than gets again suitable WUs to work on. Blocking it for hours or even days with a bunch of VLARs might indeed lead to less job done at the end of the day/month/year/whatever. So yes, an idle GPU for a while might be better unless the servers would send just 1 VLAR in case they have nothing else and the GPU is idle.
____________
.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5864
Credit: 60,483,153
RAC: 47,393
Australia
Message 1274497 - Posted: 23 Aug 2012, 18:09:50 UTC - in response to Message 1274464.


I've been getting a lot of "No tasks sent" messages lately, and haven't been able to get any work for about an hour and a half.
____________
Grant
Darwin NT.

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4139
Credit: 33,519,319
RAC: 24,114
United Kingdom
Message 1274501 - Posted: 23 Aug 2012, 18:17:11 UTC - in response to Message 1274497.
Last modified: 23 Aug 2012, 18:18:00 UTC


I've been getting a lot of "No tasks sent" messages lately, and haven't been able to get any work for about an hour and a half.

Lots of VLARs out there (as in there are tasks that can't be sent to my Nvidia GPU, but can be sent to my CPU if i choose to accept them):

23/08/2012 18:36:45 | SETI@home | [sched_op] Starting scheduler request
23/08/2012 18:36:45 | SETI@home | Sending scheduler request: To fetch work.
23/08/2012 18:36:45 | SETI@home | Reporting 1 completed tasks
23/08/2012 18:36:45 | SETI@home | Requesting new tasks for NVIDIA
23/08/2012 18:36:45 | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
23/08/2012 18:36:45 | SETI@home | [sched_op] NVIDIA work request: 254142.76 seconds; 0.00 devices
23/08/2012 18:36:55 | SETI@home | Scheduler request completed: got 0 new tasks
23/08/2012 18:36:55 | SETI@home | [sched_op] Server version 701
23/08/2012 18:36:55 | SETI@home | No tasks sent
23/08/2012 18:36:55 | SETI@home | No tasks are available for AstroPulse v6
23/08/2012 18:36:55 | SETI@home | No tasks are available for the applications you have selected.
23/08/2012 18:36:55 | SETI@home | Tasks for CPU are available, but your preferences are set to not accept them
23/08/2012 18:36:55 | SETI@home | Project requested delay of 303 seconds
23/08/2012 18:36:55 | SETI@home | [sched_op] handle_scheduler_reply(): got ack for task 08au12ab.8651.6772.6.10.83_2
23/08/2012 18:36:55 | SETI@home | [sched_op] Deferring communication for 5 min 3 sec
23/08/2012 18:36:55 | SETI@home | [sched_op] Reason: requested by project


After about another 5 requests, i did get 58 Cuda tasks.

Claggy

Profile Link
Avatar
Send message
Joined: 18 Sep 03
Posts: 828
Credit: 1,571,371
RAC: 257
Germany
Message 1274506 - Posted: 23 Aug 2012, 18:36:57 UTC - in response to Message 1274452.

I don't think we're disagreeing here. Fortunately, not sending VLARs to NVidia is a win-win problem. The solution satisfies both the efficiency and the volunteer satisfaction criteria.

I was merely saying that, from the project's point of view, not alienating volunteers is the stronger argument. I don't think we'd have got the "don't send" solution coded in the first place, if we'd had to argue on the efficiency criterion alone.

Sure, fixing issues like unusable systems or even driver crashes are more important than efficiency, however I think efficiency is one of the volunteer satisfaction criteria. Just think about all the complaints about falling RAC caused by pendings which led alredy to several discussions about shorter deadlines. And there is nothing lost, the credit is just awarded little later. So complaints about GPUs doing just a small fraction of what they are capable to do would happen probably more often and I'm pretty sure many would leave the project because of that.
____________
.

Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · 15 · 16 . . . 21 · Next

Message boards : Number crunching : Panic Mode On (76) Server Problems?

Copyright © 2014 University of California