Lunatics Windows Installer v0.40 release notes


log in

Advanced search

Message boards : Number crunching : Lunatics Windows Installer v0.40 release notes

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 14 · Next
Author Message
Josef W. SegurProject donor
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4332
Credit: 1,113,358
RAC: 1,079
United States
Message 1211215 - Posted: 28 Mar 2012, 20:39:40 UTC - in response to Message 1211202.

So just to confirm, when I get to this, which may be a week from now, i7-2 will be fastest with SSE3 correct? Or should I try SSSE4.x as it's a i7-2?


Any i7 should like the SSE3 better. Both the SSSE3x and SSE4.1 builds are really meant for Intel Core 2 architecture and are suboptimal for later Intel developments.
Joe

S@NL - John van GorselProject donor
Volunteer tester
Avatar
Send message
Joined: 5 Jul 99
Posts: 190
Credit: 137,729,293
RAC: 6,635
Netherlands
Message 1211216 - Posted: 28 Mar 2012, 20:40:03 UTC - in response to Message 1211159.


Hmm, definitely weird. Well we'll find out one way or another. With those things eliminated, if after updating driver you still get weirdness, we will get you testing x41u. Can't very well push out an updated release next week if there is some unknown latent problem lurking.

Jason


Some additional info:
Did a clean install of the latest nVidia driver (296.10), and reinstalled v0.40, and only changed the <count> tag to 0.5. I immediately noticed the intermittent load on the GPU, and both instances of the exe in the TaskManager confirmed this. Screenshot of Afterburner:



Just to be sure, I installed v0.40 again and left the <count> tag at 1. When I looked at the Afterburner graph I remembered having seen the same thing on my Linux pc's after I installed x41g a couple of weeks ago. All I needed to do was free up one core (set the "Use at most ...% of the processors" to 75% or 90%). In the graph below you can see where I set the processor use to 75%:



The Linux pc's were running GTX260 cards so it seem unrelated to the GTX560Ti. I have one other pc running a GTX580 and x38g (3 tasks simultanously, GPU load is a straight line at 96%). I can install x41g on that pc and see what happens.

Also no problem to test the x41u on the Q9660/GTX560Ti. This pc is only used for testing and running Boinc. I will send a PM to Richard.

JLConawayII
Send message
Joined: 2 Apr 02
Posts: 186
Credit: 2,762,491
RAC: 0
United States
Message 1211218 - Posted: 28 Mar 2012, 20:45:06 UTC
Last modified: 28 Mar 2012, 21:16:32 UTC

That looks identical to my ATI utilization when it's trying to run a VLAR.

edit: And, apparently, some normally marked WU's as well. Except CPU load has no effect on mine, it stays low.

http://setiathome.berkeley.edu/result.php?resultid=2373271846 This one, for instance, showed low utilization, where this one http://setiathome.berkeley.edu/result.php?resultid=2373271874 ran at 89% utilization and completed 5x faster
____________

Profile Michael W.F. Miles
Avatar
Send message
Joined: 24 Mar 07
Posts: 249
Credit: 29,154,962
RAC: 8,070
Canada
Message 1211228 - Posted: 28 Mar 2012, 21:13:25 UTC

This app works as I crunched 5 last night

ap_6.00_win_x86_SSE3_OpenCL_NV_r540.exe

AstroPulse_Kernels_r540.cl

The name of the application is v6 and call for 601 in the app_info.xml

1 hour 15 minutes approx / task

http://setiathome.berkeley.edu/results.php?userid=8612202&offset=0&show_names=0&state=0&appid=12


Michael Miles





This app validated
http://setiathome.berkeley.edu/result.php?resultid=2370788573

http://setiathome.berkeley.edu/result.php?resultid=2371083839

These where the 540 app

Curious if the validation code was not in then it should not have validated?

I am running 555 app now

Thanks for the app

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8755
Credit: 52,703,170
RAC: 31,292
United Kingdom
Message 1211234 - Posted: 28 Mar 2012, 21:24:04 UTC - in response to Message 1211228.

v6.00 with r540 should validate against v6.01 with r555.

The difference between them is not in task validation, but another part of the code which helps to keep the project running smoothly, with sensible runtime estimates.

That bit affects us all, so it would be a great help if people could ensure that they are running at least r555 as in the installer, not any earlier beta version - as Michael is now doing. Thank you.

JohnDKProject donor
Volunteer tester
Avatar
Send message
Joined: 28 May 00
Posts: 873
Credit: 48,309,354
RAC: 39,055
Denmark
Message 1211237 - Posted: 28 Mar 2012, 21:32:44 UTC

OK I know next to nothing about it, but shouldn't my new notebook use AVX with the new AP app?

Seems it didn't or I'm not reading it correctly ;)

http://setiathome.berkeley.edu/result.php?resultid=2371596165

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4205
Credit: 34,454,749
RAC: 22,305
United Kingdom
Message 1211241 - Posted: 28 Mar 2012, 21:37:21 UTC - in response to Message 1211237.
Last modified: 28 Mar 2012, 21:43:51 UTC

OK I know next to nothing about it, but shouldn't my new notebook use AVX with the new AP app?

Seems it didn't or I'm not reading it correctly ;)

http://setiathome.berkeley.edu/result.php?resultid=2371596165

It is the fftw libary that uses AVX, not the app itself,

Edit: if you copy wisdom.dat from your project folder to the desktop, and open it with notepad, you'll see what codelets were used.

Claggy

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8755
Credit: 52,703,170
RAC: 31,292
United Kingdom
Message 1211243 - Posted: 28 Mar 2012, 21:39:36 UTC - in response to Message 1211237.

OK I know next to nothing about it, but shouldn't my new notebook use AVX with the new AP app?

Seems it didn't or I'm not reading it correctly ;)

http://setiathome.berkeley.edu/result.php?resultid=2371596165

You'll have to ask Jason how AVX usage is flagged in the task output. You meet all the criteria we know about - running the r557 app, second-generation i7, Windows 7 with SP1.

But as we said a couple of days ago, we just run the delivery service: we're not responsible for the contents, I'm afraid.

JohnDKProject donor
Volunteer tester
Avatar
Send message
Joined: 28 May 00
Posts: 873
Credit: 48,309,354
RAC: 39,055
Denmark
Message 1211248 - Posted: 28 Mar 2012, 21:47:45 UTC - in response to Message 1211241.


Edit: if you copy wisdom.dat from your project folder to the desktop, and open it with notepad, you'll see what codelets were used.

Claggy

Yes wondered about that file, and it does contain some AVX stuff.

Profile jason_gee
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 5079
Credit: 74,105,639
RAC: 7,195
Australia
Message 1211250 - Posted: 28 Mar 2012, 21:50:25 UTC - in response to Message 1211243.

OK I know next to nothing about it, but shouldn't my new notebook use AVX with the new AP app?

Seems it didn't or I'm not reading it correctly ;)

http://setiathome.berkeley.edu/result.php?resultid=2371596165

You'll have to ask Jason how AVX usage is flagged in the task output. You meet all the criteria we know about - running the r557 app, second-generation i7, Windows 7 with SP1.

But as we said a couple of days ago, we just run the delivery service: we're not responsible for the contents, I'm afraid.


AP 6.01 was a fairly hasty release all around to ensure the project's sudden move to introduce changes was covered. Rush releases like that aren't typically friendly to cosmetic issues ;)

With hard wired feature prints derived straight from r409, it doesn't indicate AVX anywhere there. I'm in the process of further optimisation & changing those features lists to 'true' detections for future CPU based dispatch, so won't likely stay like that long.

For now, if in doubt that AVX is enabled internally, just go by runtimes as compared to older r409 builds with v505 in comparison. Typical AVX enabled rutnime is about 20-25% faster than non-AVX, so it should be noticeable.

Jason
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4205
Credit: 34,454,749
RAC: 22,305
United Kingdom
Message 1211252 - Posted: 28 Mar 2012, 21:53:59 UTC - in response to Message 1211248.


Edit: if you copy wisdom.dat from your project folder to the desktop, and open it with notepad, you'll see what codelets were used.

Claggy

Yes wondered about that file, and it does contain some AVX stuff.

The Stock apps also generate wisdom files, and so did AP r409, but they are generated in the slot directory every time a Wu starts initially,
now the wisdom.dat file is generated in the project directory once, and the app refers to that copy and updates it as needed,

Claggy

S@NL - John van GorselProject donor
Volunteer tester
Avatar
Send message
Joined: 5 Jul 99
Posts: 190
Credit: 137,729,293
RAC: 6,635
Netherlands
Message 1211254 - Posted: 28 Mar 2012, 21:56:00 UTC - in response to Message 1211216.
Last modified: 28 Mar 2012, 22:04:24 UTC


Also no problem to test the x41u on the Q9660/GTX560Ti. This pc is only used for testing and running Boinc. I will send a PM to Richard.


The pc is now running x41u_cuda32, 1 task at a time. When all 4 cores are used, there are dropouts where the GPU load is close to 0 for about 20-30 seconds. The progress bar in Boinc is then still moving, but only at around 0.001% per second.
With 1 core free, the GPU load slightly increases, but there are no more dropouts.

I checked both the Linux pc's running x41g for the difference when 1 thread or core is left free:
Q9650 + GTX260 running Seti Cuda and 2xSeti AK_v8 + 1xLHC T4T on the CPU:
"Top" shows a 6-8% cpu utilization from x41g
With 3x Seti and 1 x LHC T4T, the cpu load drops to 3-6% and Cuda tasks run at around half the speed

i7-920 + GTX570 448 running Seti Cuda (3 tasks) and 6xWCG + 1xLHC T4T on the CPU
"Top" shows a 4-9% cpu load for each of the three instances of x41g
There is no difference whith 7x WCG + 1xLHC T4T but the WCG application may not be as demanding as the Seti AK_v8. Tomorrow I will try the same with 7xAK_v8

[edit] the first tasks on the i7 at "full load" took about 15% longer than with one thread free. The difference is smaller then with the Q9650 but there is a difference.

Profile jason_gee
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 5079
Credit: 74,105,639
RAC: 7,195
Australia
Message 1211256 - Posted: 28 Mar 2012, 21:59:02 UTC - in response to Message 1211254.
Last modified: 28 Mar 2012, 21:59:21 UTC

Victor is also reporting issues, & has giving up testing in a snot.

Right now I have no idea what's causing either of your issues. Will look at some of your results...
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

zoom314Project donor
Avatar
Send message
Joined: 30 Nov 03
Posts: 46726
Credit: 36,997,322
RAC: 4,237
United States
Message 1211258 - Posted: 28 Mar 2012, 22:00:59 UTC - in response to Message 1211256.

Victor is also reporting issues, & has giving up testing in a snot.

Right now I have no idea what's causing either of your issues. Will look at some of your results...

Do you have a link to x38G Jason?
____________
My Facebook, War Commander, 2015

Profile jason_gee
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 5079
Credit: 74,105,639
RAC: 7,195
Australia
Message 1211260 - Posted: 28 Mar 2012, 22:03:01 UTC - in response to Message 1211258.

Victor is also reporting issues, & has giving up testing in a snot.

Right now I have no idea what's causing either of your issues. Will look at some of your results...

Do you have a link to x38G Jason?


I don't no, you'll have to ask Lunatics for that.

Having been rude to me by PM, you'll not get any more help from me ever, and I'm putting you on ignorr. Best of luck with your problems then.

Jason
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

zoom314Project donor
Avatar
Send message
Joined: 30 Nov 03
Posts: 46726
Credit: 36,997,322
RAC: 4,237
United States
Message 1211264 - Posted: 28 Mar 2012, 22:04:46 UTC - in response to Message 1211260.
Last modified: 28 Mar 2012, 22:06:55 UTC

Victor is also reporting issues, & has giving up testing in a snot.

Right now I have no idea what's causing either of your issues. Will look at some of your results...

Do you have a link to x38G Jason?


I don't no, you'll have to ask Lunatics for that.

Having been rude to me by PM, you'll not get any more help from me ever, and I'm putting you on ignore. Best of luck with your problems then.

Jason

Thanks a lot, I wasn't trying to be rude to You Jason.

Here's x41u, It isn't any better than x41g, but like I said I wasn't, I'm sorry if I came across that way.
____________
My Facebook, War Commander, 2015

Profile jason_gee
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 5079
Credit: 74,105,639
RAC: 7,195
Australia
Message 1211267 - Posted: 28 Mar 2012, 22:11:28 UTC - in response to Message 1211254.
Last modified: 28 Mar 2012, 22:11:44 UTC

The pc is now running x41u_cuda32, 1 task at a time. When all 4 cores are used, there are dropouts where the GPU load is close to 0 for about 20-30 seconds. The progress bar in Boinc is then still moving, but only at around 0.001% per second.
With 1 core free, the GPU load slightly increases, but there are no more dropouts.

I checked both the Linux pc's running x41g for the difference when 1 thread or core is left free:
Q9650 + GTX260 running Seti Cuda and 2xSeti AK_v8 + 1xLHC T4T on the CPU:
"Top" shows a 6-8% cpu utilization from x41g
With 3x Seti and 1 x LHC T4T, the cpu load drops to 3-6% and Cuda tasks run at around half the speed

i7-920 + GTX570 448 running Seti Cuda (3 tasks) and 6xWCG + 1xLHC T4T on the CPU
"Top" shows a 4-9% cpu load for each of the three instances of x41g
There is no difference whith 7x WCG + 1xLHC T4T but the WCG application may not be as demanding as the Seti AK_v8. Tomorrow I will try the same with 7xAK_v8

[edit] the first tasks on the i7 at "full load" took about 15% longer than with one thread free. The difference is smaller then with the Q9650 but there is a difference.


I'm having trouble finding any x41u reported results, can you direct me to some tasks ? thanks.

Jason
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

Profile jason_gee
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 5079
Credit: 74,105,639
RAC: 7,195
Australia
Message 1211277 - Posted: 28 Mar 2012, 22:27:06 UTC
Last modified: 28 Mar 2012, 22:27:42 UTC

Figured it out I think/Hope

Some time back, ~x38g, I had detected many programs including OpenCL & CPU apps throttling the Cuda programs which was killing the threads, so I raised the app prioties to normal so they'd stay alive.

Unfortunately, in doing so, I was bombarded by demands to put it back to below normal, which I did for x41 series.

Please try to use something like either Fred'd efMer prioroty tool, or Process Lasso to jack up the Cuda process priorities. I'll consider making the default priorities configurable in the near future. Both x41g & x41u should operate normally when not being throttled by other programs running at higher priorities.

This will slightly delay next release, but hopefully not too much

Thanks for the help tracking this down. I'll think harder next time before reversing design decisions based on angry dudes comments.

Jason
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

S@NL - John van GorselProject donor
Volunteer tester
Avatar
Send message
Joined: 5 Jul 99
Posts: 190
Credit: 137,729,293
RAC: 6,635
Netherlands
Message 1211278 - Posted: 28 Mar 2012, 22:27:22 UTC - in response to Message 1211256.



Right now I have no idea what's causing either of your issues. Will look at some of your results...


The pc is now running 2 cuda tasks at a time and with 1 core free. I checked a number of stderr outputs but I could not find any difference. All tasks seem to validate fine so I'm not too worried.
Could be worth adding a remark to the readme file about leaving 1 core free.

Here's another Afterburner screenshot (1 task on x41u_cuda32):



The first circle indicates where there was 1 core free. Then I added the fourth core and the dropouts begin. The circle on the right side indicates the loading of the next task.
I have not yet found any correlation between the dropouts and any other occurence on the system but since the dropouts are 10 seconds or more this should be easy to find.

Profile jason_gee
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 5079
Credit: 74,105,639
RAC: 7,195
Australia
Message 1211279 - Posted: 28 Mar 2012, 22:31:33 UTC - in response to Message 1211278.
Last modified: 28 Mar 2012, 22:32:11 UTC

Thanks,
Can you jack up the cuda process priorities (with one of the tools mentioned) & compare ?

BTW: for some reason I'm having trouble getting to your images.

Jason
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 14 · Next

Message boards : Number crunching : Lunatics Windows Installer v0.40 release notes

Copyright © 2014 University of California