| Author |
Message |
|
|
Do you think that these slow running tasks may be the same issue I reported in PM to you ('Task_Hang_on_ATI', 'Addendum__Task_Hang_on_ATI') and you "copied to the r390 thread at Lunatics"?
("task 'hang' on ATI and auto-continued after a long time (but validated OK)")
Do you think that what I noted in the second letter ('Addendum__Task_Hang_on_ATI') have any merits?:
"
About the pause in GPU processing during BOINC usage:
This may be some driver glitch -
it appears to me that if I start some GPU monitoring tool it kicks the ATI driver and the GPU processing continues.
(I may be wrong, this may be just a coincidence, I didn't watching for this behaviour specifically
but I can say that the GPU processing continued at +- 2 minutes around the start of GPU monitoring program.
)
It is probably not the GPU monitoring as such that do the 'kick' (as SIV and TThrottle run all the time).
I suppose it have to be the start (initialization phase) of GPU monitoring tool.
It was GPU-Z in the case of 21fe12ad.19149.9502.8.10.160.vlar
It was ATI MemoryViewer in the case of 25ap12aa.26506.476.13.10.96
"
I was thinking the same, but could not find any 'real evedence' while errors:
Exit status 197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED, kept happening.
Although to a lesser extend since I used 2 threads to feed the GPUs.
I changed, afew minutes ago, period_iterations from 20 to 10, which
decreased runtime and increased GPU-load, also decreased lag.
Will let it run with this setting. B.t.w. doing 2 instances_per_device
for MB work.
(WIN 7;64bit, BOINC 7.0.28;64bit, Lunatics rev.390 app. for MB., CPU=i7-2600,
GPUs 2x AMD/ATI EAH5870).All stock settings.
____________
Knight Who Says Ni N!, OUT numbered................. |
|
|
Mike Volunteer tester
 Send message
Joined: 17 Feb 01 Posts: 19487 Credit: 21,136,679 RAC: 25,804

|
Thats what i expected.
Your DCF is to high, should be around 1.
This combined with low GPU usage causing this errors.
Have you set flops in appinfo ?
Try to change DCF in client state.xml.
Actually, a high DCF is good in his particular case and could prevent further -197 even though it messes with cache.
the long running task I linked will have pulled DCF up - so the estimates AND more crucially with it the 10x estimate abort limit are high (provided I'm getting my logic right) and if further tasks run long but inside the new margin they will keep DCF up even though the majority of tasks pulls DCF back to 1.
That at least might allow error free processing until the underliying cause of the slowrunning WUs is found.
I had exactly the same issue last week on my 5850.
I already said how to fix it.
____________
|
|
|
LadyLVolunteer tester
 Send message
Joined: 14 Sep 11 Posts: 1679 Credit: 5,230,097 RAC: 14
|
to digress:
-Windows 7 ultimate fully updated.
-BOINC 7.0.28
-ATI 12.3 drivers installed without errors (uninstalled drivers, used Driver sweep, reboot installed 12.3)
allowed 1 CPU core to remain idle and changed from the HD5 to standard GPU app and still getting errors
Are there any other ATI 7970 users that have this problem or use a different driver?
would a period_interation move in the app_info change anything?
it's worth a try at least if it's something app related. If you still get errors with something like 100 you know it's not that...
____________
I'm not the Pope. I don't speak Ex Cathedra! |
|
|
Mike Volunteer tester
 Send message
Joined: 17 Feb 01 Posts: 19487 Credit: 21,136,679 RAC: 25,804

|
to digress:
-Windows 7 ultimate fully updated.
-BOINC 7.0.28
-ATI 12.3 drivers installed without errors (uninstalled drivers, used Driver sweep, reboot installed 12.3)
allowed 1 CPU core to remain idle and changed from the HD5 to standard GPU app and still getting errors
Are there any other ATI 7970 users that have this problem or use a different driver?
would a period_interation move in the app_info change anything?
it's worth a try at least if it's something app related. If you still get errors with something like 100 you know it's not that...
No it won´t change anything.
____________
|
|
|
LadyLVolunteer tester
 Send message
Joined: 14 Sep 11 Posts: 1679 Credit: 5,230,097 RAC: 14
|
Thats what i expected.
Your DCF is to high, should be around 1.
This combined with low GPU usage causing this errors.
Have you set flops in appinfo ?
Try to change DCF in client state.xml.
Actually, a high DCF is good in his particular case and could prevent further -197 even though it messes with cache.
the long running task I linked will have pulled DCF up - so the estimates AND more crucially with it the 10x estimate abort limit are high (provided I'm getting my logic right) and if further tasks run long but inside the new margin they will keep DCF up even though the majority of tasks pulls DCF back to 1.
That at least might allow error free processing until the underliying cause of the slowrunning WUs is found.
I had exactly the same issue last week on my 5850.
I already said how to fix it.
yes, I was wrong - happens.
I'll leave this to your capable hands then. If your suggestions don't help, we can get back to the drawing board.
____________
I'm not the Pope. I don't speak Ex Cathedra! |
|
|
LadyLVolunteer tester
 Send message
Joined: 14 Sep 11 Posts: 1679 Credit: 5,230,097 RAC: 14
|
I had exactly the same issue last week on my 5850.
I already said how to fix it.
Probably best if you repeat how to fix it, Mike.
I find it's rather hidden and skildude may have missed it.
____________
I'm not the Pope. I don't speak Ex Cathedra! |
|
|
|
|
On a FX you need to free 2 cores to get full GPU utilisation.
is that the fix?
I'll free another CPU core and see what happens
reduced my usage to 6 cores
I'm now wondering if I could up my instances to 4 on the GPU if this actually works
____________
Proud member of TSWB.
End terrorism by building a school
|
|
|
|
|
I changed, afew minutes ago, period_iterations from 20 to 10, which
decreased runtime and increased GPU-load, also decreased lag.
Are you sure about the lag??
You can feel the lag most with VLARs, if you now run non-VLARs you will feel less lag.
I run with -period_iterations_num 80 and even with this higher value I feel small lag (especially when scrolling) if VLAR is running.
(with -period_iterations_num 10 lag is very big)
This makes me ask Raistmer - Is it possible to have some option that sets -period_iterations_num at different values depending on AR?
e.g.:
-period_iterations_num 20 -period_iterations_num_VLAR 100
____________
- ALF - "Find out what you don't do well ..... then don't do it!" :)
|
|
|
Mike Volunteer tester
 Send message
Joined: 17 Feb 01 Posts: 19487 Credit: 21,136,679 RAC: 25,804

|
On a FX you need to free 2 cores to get full GPU utilisation.
is that the fix?
I'll free another CPU core and see what happens
reduced my usage to 6 cores
I'm now wondering if I could up my instances to 4 on the GPU if this actually works
Yes.
And watch your DCF.
Tell me your GPU usage please.
____________
|
|
|
|
|
I changed, afew minutes ago, period_iterations from 20 to 10, which
decreased runtime and increased GPU-load, also decreased lag.
Are you sure about the lag??
You can feel the lag most with VLARs, if you now run non-VLARs you will feel less lag.
I run with -period_iterations_num 80 and even with this higher value I feel small lag (especially when scrolling) if VLAR is running.
(with -period_iterations_num 10 lag is very big)
This makes me ask Raistmer - Is it possible to have some option that sets -period_iterations_num at different values depending on AR?
e.g.:
-period_iterations_num 20 -period_iterations_num_VLAR 100
I'm a little confused too, expected to see an increase in lag with a lower
period_iterations_for_pulsefind. Probably each card/GPU has it's
'best' settings for period_iterations_for pulsefind....
Difference in runtime is small, compaired to 20, but I'll keep 10.
Biggest difference was freeing up 2 in stead of 1 thread, that's 1 i7-2600
core.
That's what Mike suggested, in the first place, too.
____________
Knight Who Says Ni N!, OUT numbered................. |
|
|
|
|
|
I seem to be a bit late getting to the party-
I have set `period_iterations 2` and the lag is only a problem if a workunit is starting and being loaded into the GPU.
this thing is a crunch box so i am willing to tolerate quite a bit of lag.
The CPU is only a P4 3.6ghz (prescot 660) with HT, The cpu only crunches one freehal nci so as to keep its load down.
I find that the P4 is often overwelmed by the demands of two 7970 and during a shorty storm can not cope with servicing the GPU and stays at 100% load for several minits at a time and this makes the computer unuseable for me.
Though if it is `busy` it is up to me to leave it alone to get on with it and go play with one of the other comp`s.
I did `borrow` my q6600 from another rig to see how it fared and in that short test found that i had to keep one core free to feed each GPU, though i was not using -pi2 or -hp at that time.
If crunching on all fore cpu cores i was geting Maximum_Time_Exceded errors these stoped with two cores free for the gpu`s to use.
I am only runing two WU per card cos the PSU cant cope with any more, its is only a corsair HX620w and this box is eating about 500w, I have to get another psu before the third card,
edit - OS win7home64, BM 7.0.28, ccc12.4, |
|
|
|
|
On a FX you need to free 2 cores to get full GPU utilisation.
is that the fix?
I'll free another CPU core and see what happens
reduced my usage to 6 cores
I'm now wondering if I could up my instances to 4 on the GPU if this actually works
Yes.
And watch your DCF.
Tell me your GPU usage please.
95-100%
____________
Proud member of TSWB.
End terrorism by building a school
|
|
|
Mike Volunteer tester
 Send message
Joined: 17 Feb 01 Posts: 19487 Credit: 21,136,679 RAC: 25,804

|
|
I see no more errors anymore.
Your times have stabilized as well.
Nice card IMHO.
____________
|
|
|
|
|
|
yet I have a 5850 that isn't having this problem.
____________
Proud member of TSWB.
End terrorism by building a school
|
|
|
Mike Volunteer tester
 Send message
Joined: 17 Feb 01 Posts: 19487 Credit: 21,136,679 RAC: 25,804

|
yet I have a 5850 that isn't having this problem.
I dont see a problem anymore on your 7970.
____________
|
|
|
|
|
yet I have a 5850 that isn't having this problem.
I dont see a problem anymore on your 7970.
Well, the Exit status 197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED errors, surfaced
again.........
Trying todo 1 WU per GPU. See if that's helping.
Giving a very low load, so back to 2 per GPU and period_iterations 40 in
stead of 10.
Still using 1, i7-2600 thread of 8, for each GPU. (ATI HD5870)
CPU load during the first 10 seconds is 100% per (idle)thread.
Errors appear on both 1st and 2nd GPU, having about the same load, 85% average.
____________
Knight Who Says Ni N!, OUT numbered................. |
|
|
Mike Volunteer tester
 Send message
Joined: 17 Feb 01 Posts: 19487 Credit: 21,136,679 RAC: 25,804

|
|
Dont confuse me please Fred.
Whats your DCF ?
Have you flops included in your appinfo ?
Whats the estimated times on GPU´s ?
How many CPU cores are in use ?
____________
|
|
|
|
|
|
I still dont get why it needs 2 cores to load
____________
Proud member of TSWB.
End terrorism by building a school
|
|
|
|
|
Dont confuse me please Fred.
Whats your DCF ?
Have you flops included in your appinfo ?
Whats the estimated times on GPU´s ?
How many CPU cores are in use ?
Why should I confuse you?
Task duration correction factor 3.61424
No FLOPS included. (Never had on this rig).
3 Cores, 6 threads are in use. 1 core or 2 threads (HT=ON) to feed GPUs.
Estimated times are ofcoarse, too high, 1.5 x runtime.
____________
Knight Who Says Ni N!, OUT numbered................. |
|
|
Mike Volunteer tester
 Send message
Joined: 17 Feb 01 Posts: 19487 Credit: 21,136,679 RAC: 25,804

|
|
First of all you quoted my reply to skildude.
So i got confused.
Anyways.
I fear you need to free 1 physical core per GPU.
Not one thread.
Try it please to see if this helps.
It certainly should.
____________
|
|
|