Lunatics Help

Message boards : Number crunching : Lunatics Help
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 10 · Next

AuthorMessage
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1773236 - Posted: 22 Mar 2016, 11:58:45 UTC - in response to Message 1773232.  

Any change you make will require a restart of the BOINC client - the application versions are defined in a file called app_info.xml, and BOINC only reads the contents of this file once during start-up.

It is technically possible to extract the necessary payload contents from the Installer using an LZMA-compliant archive tool like 7-zip, and assemble a new app_info.xml file while the old one is still running: then a client restart is enough to activate it. In pre-GPU days, with BOINC installed as a service, I used to find that 'restart service' was the fastest one-click solution - but those days are gone.

Just re-run the installer. That will handle the shutdown and restart for you (of the only bit that's necessary - far less than "everything"). But if you have tweaked anything - I don't understand what you mean by "let GPU tasks run stock" - you will have to re-tweak it.
ID: 1773236 · Report as offensive
Profile Rich Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 27 Oct 14
Posts: 4
Credit: 49,285,910
RAC: 0
United States
Message 1773239 - Posted: 22 Mar 2016, 12:25:23 UTC

When I first started up my FX8350C1 computer I over clocked my processor and memory a little and when I started running Seti I was getting all sorts of errors. Returned the settings to stock and the errors reduced to 3 or 4 inconclusive a day. Noticed that his CPU voltage is high by a little and the speed multiplier is 21, mine is at 20 and boosts to 20.5 to get to the ~4.2 GHz.

Rich
ID: 1773239 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1773240 - Posted: 22 Mar 2016, 12:39:46 UTC - in response to Message 1773236.  
Last modified: 22 Mar 2016, 12:43:21 UTC

. . HI,
. . Thanks for that help, I will shut down BOINC and rerun the installer then.



Just re-run the installer. That will handle the shutdown and restart for you (of the only bit that's necessary - far less than "everything"). But if you have tweaked anything - I don't understand what you mean by "let GPU tasks run stock" - you will have to re-tweak it.



. . I have found that my GPU (Intel HD530) tasks ran slightly faster before installing Lunatics so I wanted them to go back to that level. I was hoping to tell Lunatics to let BOINC run them unassisted. And MB's are running only marginally faster under AVX than before, while on my C2D machine (Win 7) under SSE4.1 they are running better than 50% faster. So I wanted to try SSE 4.2 or SSE4.1 on this i5 machine.
ID: 1773240 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1773264 - Posted: 22 Mar 2016, 15:18:24 UTC - in response to Message 1773240.  

. . I have found that my GPU (Intel HD530) tasks ran slightly faster before installing Lunatics so I wanted them to go back to that level. I was hoping to tell Lunatics to let BOINC run them unassisted. And MB's are running only marginally faster under AVX than before, while on my C2D machine (Win 7) under SSE4.1 they are running better than 50% faster. So I wanted to try SSE 4.2 or SSE4.1 on this i5 machine.

There's not much logic in that.

The HD530 application in the Lunatics installer is built from exactly the same sources as the stock application, r3330. If anything, Raistmer is more likely to have de-tuned the stock version slightly, so that it runs consistently with less user configuration (though I don't know that for sure).

A C2D can't run AVX apps, so you can't make a direct comparison with your i5. As I said earlier in this thread (message 1771847), the developers were unable to supply me with true SSSE3, SSE4.1, or SSE4.2 applications in time for the installer this time: if you make any of those selections, you will in fact be running an SSE3 application for all setiathome v8 tasks (true SSSE3/4.1/4.2 apps were used for v7 tasks, but they've pretty much finished now).

I can only imagine that when you are running the AVX application and the HD530 application together, you may be pushing the total thermal/power envelop of your CPU package harder - your CPU may be reducing boost frequency or even throttling back. Just possibly, dropping back to SSE3 only may allow you to get some GPU speed back, but it won't be a lot - the variation between tasks at different ARs will quite likely mask any effect like that.

But it's worth the experiment, and may help you gain a further insight into the complexity of the various factors in play.
ID: 1773264 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1773268 - Posted: 22 Mar 2016, 15:23:16 UTC - in response to Message 1771847.  

...
We also didn't have time before releasing the installer to specifically test which application - SSE3 or AVX - would be best on an AMD FX-8350. It may be that AVX beats SSE3 for v8, even though we found that SSE4.2 beat AVX for v7. Perhaps, now that the dust has settled, another FX-8350 user could advise?
...

Since I wrote that, an interesting post has appeared on the Einstein Wish List. Quoting in full:

If you implement AVX, make sure you have a way to deny 256-bit wide AVX to AMD Bulldozer and Piledriver processors and instead serve either SSE3 or 128-bit wide AVX plus FMA4 to those processors unless you prove that the 256-bit AVX meets a special case. See http://www.agner.org/optimize/ on why this should be done in most cases. The only advantage I can see to sending 256-bit AVX to those processors is if the programmer can fit the entire working set in the 256-bit registers and not in the 128-bit registers. If neither fit, 128-bit AVX and SSE3 are faster than 256-bit AVX due to some horrendous performance of the 256-bit registers when they need to be written out to memory especially in Piledriver. If both fit, then the 128-bit AVX or SSE is better because a 256-bit instruction takes two of the four shared decoders to decode while the 128-bit instruction uses just one. Bulldozer's set of four shared instruction decoders also has problems when handling 256-bit AVX instructions that must be split into two 128-bit instructions each because this set can only split one of these instructions per clock cycle, so a second 256-bit instruction could stall the decoder set.

Steamroller fixes these problems, so you should serve 256-bit wide AVX with optional FMA4 to this processor with no problem. I would expect the same for Excavator.
ID: 1773268 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1773302 - Posted: 22 Mar 2016, 21:45:00 UTC - in response to Message 1773264.  

. . I have found that my GPU (Intel HD530) tasks ran slightly faster before installing Lunatics so I wanted them to go back to that level. I was hoping to tell Lunatics to let BOINC run them unassisted. And MB's are running only marginally faster under AVX than before, while on my C2D machine (Win 7) under SSE4.1 they are running better than 50% faster. So I wanted to try SSE 4.2 or SSE4.1 on this i5 machine.

There's not much logic in that.

The HD530 application in the Lunatics installer is built from exactly the same sources as the stock application, r3330. If anything, Raistmer is more likely to have de-tuned the stock version slightly, so that it runs consistently with less user configuration (though I don't know that for sure).

A C2D can't run AVX apps, so you can't make a direct comparison with your i5. As I said earlier in this thread (message 1771847), the developers were unable to supply me with true SSSE3, SSE4.1, or SSE4.2 applications in time for the installer this time: if you make any of those selections, you will in fact be running an SSE3 application for all setiathome v8 tasks (true SSSE3/4.1/4.2 apps were used for v7 tasks, but they've pretty much finished now).

I can only imagine that when you are running the AVX application and the HD530 application together, you may be pushing the total thermal/power envelop of your CPU package harder - your CPU may be reducing boost frequency or even throttling back. Just possibly, dropping back to SSE3 only may allow you to get some GPU speed back, but it won't be a lot - the variation between tasks at different ARs will quite likely mask any effect like that.

But it's worth the experiment, and may help you gain a further insight into the complexity of the various factors in play.

Haswell CPUs have the effect of significantly slowing CPU processing when using the iGPU and CPU for SETI@home tasks. Perhaps Skylake resolves that issue or has changed whatever the limitation was.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1773302 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1773315 - Posted: 22 Mar 2016, 22:39:17 UTC - in response to Message 1773264.  
Last modified: 22 Mar 2016, 22:48:13 UTC



There's not much logic in that.

The HD530 application in the Lunatics installer is built from exactly the same sources as the stock application, r3330. If anything, Raistmer is more likely to have de-tuned the stock version slightly, so that it runs consistently with less user configuration (though I don't know that for sure).



. . It wasn't a matter of logic, merely observation. The run times on GPU tasks running under Lunatics were marginally higher than before. But on that point, is there any chance these processes are adaptive? That is, can they become more efficient the more they run? Because the runs times over the last 24 hours are somewhat shorter than the first 36 hours of running Lunatics. The CPU tasks under AVX are also getting slightly quicker. It may be pure coincidence, but the pattern seems consistent rather than spasmodic. I might wait a few days to get a larger result sample for comparison. Though I am committed to trying out the SSE option, even if it is an ersatz version of SSE4.1/4.2, I will probably wait until the end of the week to do so. Just to get more empirical data to compare against.



A C2D can't run AVX apps, so you can't make a direct comparison with your i5. As I said earlier in this thread (message 1771847), the developers were unable to supply me with true SSSE3, SSE4.1, or SSE4.2 applications in time for the installer this time: if you make any of those selections, you will in fact be running an SSE3 application for all setiathome v8 tasks (true SSSE3/4.1/4.2 apps were used for v7 tasks, but they've pretty much finished now).



. . It is certainly true about the C2D not running AVX so I cannot make a comparison between the two processes on the same platform. But I find it quite significant that running SSE4.1 on a C2D can reduce run times from around 4 hours to around 2.25 hours, while AVX on the i5 can only manage a reduction of a few minutes. If it is actually only running SSE3 then all the more so. The proof of the pudding, as they say, so I am itching to run the SSE option on the i5 to find out how it performs. I will keep you advised.



I can only imagine that when you are running the AVX application and the HD530 application together, you may be pushing the total thermal/power envelop of your CPU package harder - your CPU may be reducing boost frequency or even throttling back. Just possibly, dropping back to SSE3 only may allow you to get some GPU speed back, but it won't be a lot - the variation between tasks at different ARs will quite likely mask any effect like that.



. . OK, I am not a programmer and I do not know what ARs in fact are. But the GPU is running at 40 deg C (using 5W) according to GPU-Z. I do not have an app to read the temperature sensors for the CPU (presuming that they do exist) but the case and unit are running quite cool, there does not appear to be any thermal issue here so I don't think that is the cause. But I am trying to be objective about relative performance so I want to have a wide base to compare result times. I will be itching to make the change and see what effect it has.



But it's worth the experiment, and may help you gain a further insight into the complexity of the various factors in play.



. . Whenever there are unexpected or varying results experimentation is always worthwhile. :)

. . Just as an example, while CUDA5.0 is supposedly faster on NVIDIA cards than the earlier versions, I was running a borrowed GT730 (DDR3 ram) in the C2D for 5 days (I wish I had it for longer) and it produced better results under CUDA4.2 than CUDA5.0. So as Muldaur would say, "trust no-one and believe nothing". :)

. . Thanks for your help
ID: 1773315 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1773319 - Posted: 22 Mar 2016, 22:57:45 UTC - in response to Message 1773315.  

Are you also looking at the Angle Range of the tasks and comparing like ARs or just looking at the run times?
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1773319 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1773327 - Posted: 22 Mar 2016, 23:14:52 UTC - in response to Message 1771847.  

... With Joe Segur's continuing absence, and the rush to get v8 out of the door as quickly as possible, we were short of development resources. With that set of options, the installer will actually have deployed an ordinary SSE3 application for MB v8.

We also didn't have time before releasing the installer to specifically test which application - SSE3 or AVX - would be best on an AMD FX-8350. It may be that AVX beats SSE3 for v8, even though we found that SSE4.2 beat AVX for v7. Perhaps, now that the dust has settled, another FX-8350 user could advise?

We're getting close to the point where we can retire the legacy v7 applications, and reissue the installer with a smaller payload. Maybe some developer could step into the breach left by Joe, and re-create the full range of CPU apps (hint, hint)? ;-)

Richard, quick question from someone who wasn't really involved in SETI when all that v7 -> v8 business happened, but from your statement and similiar sentiments I've read in other threads, it looks like it was Really Necessary to get v7 done & dead, and v8 out. And Pronto.

Could you fill me in on what the gist of it was all about, and why it had to be such a rush job? Or if you know of one, point me at a thread that covers it fairly well? Thanks!

ID: 1773327 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1773331 - Posted: 22 Mar 2016, 23:58:23 UTC - in response to Message 1773327.  

Richard, quick question from someone who wasn't really involved in SETI when all that v7 -> v8 business happened, but from your statement and similiar sentiments I've read in other threads, it looks like it was Really Necessary to get v7 done & dead, and v8 out. And Pronto.

Could you fill me in on what the gist of it was all about, and why it had to be such a rush job? Or if you know of one, point me at a thread that covers it fairly well? Thanks!

The best I can do is point you to the news item posted by Eric Korpela on New Year's Day (a public holiday in California, as it is in much of the rest of the world):

News : SETI@home Version 8 has been released.

The key feature was the need to have a multi-source analysis chain in place - one capable of handling recordings made at other telescopes, apart from the Arecibo telescope which has been SETI@Home's only data source for the first 17 years of its life.

I'm not privy to the decision-making process which deemed it essential to have that capability precisely in the middle of the holiday season - my understanding is that the deadline was fixed a long way further up the politico-financial feeding chain. And before you ask, I don't know either why, having bust a gut to get the analysis ready in time, we're still not processing data from other telescopes, nearly three months later.

The 'big ticket' for the v8 release was that multi-telescope capability, but they took the opportunity to slightly tweak and improve the analysis done for Arecibo tapes, too. It was probably simpler to switch all work generation over to the new v8 format as soon as it was working, rather than create new work for v7 and v8 at the same time. I'm not aware of any other reason for shutting down v7 so abruptly.
ID: 1773331 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1773335 - Posted: 23 Mar 2016, 0:14:13 UTC - in response to Message 1773331.  

Thank you. And yes, I won't ask you that why, but I would think that now all the heavy lifting of getting it out there is completed, now comes that task of adding all the bits and pieces of things that either got missed or put off during the big rollout.

It sure would be cool to have those tapes from other telescopes integrated into the mix, now that we can do it. Have you heard of any that might have made it into production, to see how they worked out in the wild? Or haven't we received any yet? Or possibly have and are just holding off?

ID: 1773335 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11361
Credit: 29,581,041
RAC: 66
United States
Message 1773348 - Posted: 23 Mar 2016, 1:27:10 UTC

Haswell CPUs have the effect of significantly slowing CPU processing when using the iGPU and CPU for SETI@home tasks. Perhaps Skylake resolves that issue or has changed whatever the limitation was.
____________

Hal that was my observation with my Intel(R) Core(TM) i3-4170 CPU @ 3.70GHz
ID: 1773348 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1773363 - Posted: 23 Mar 2016, 4:37:39 UTC - in response to Message 1772300.  
Last modified: 23 Mar 2016, 4:39:22 UTC

Coming late to this discussion. I too am running your same motherboard but with an 8370 to replace my failed 8350. It was of the same ilk as yours with a very high VID ~ 1.39V and didn't respond well to overclocking without very high VRM and CPU temps. Your AIO is up to the task of keeping the CPU cool, but how well are the VRMs being cooled. Also, I found that the CPU runs best under load when you have adjusted the CPU LLC and CPU/NB LLC to prevent any voltage sag but not so strong as to bump the VID even higher under load. I also notice that your integer benchmarks are very low for that chip at that clock speed. Is the memory being clocked at correct speed and timings? My integer benches just under 12K, yours is at 8K and doesn't seem correct at all. My reference about my failed 8350 is notable because over time my slight overclock (4.2Ghz) in the end caused my Core #5 to fail ONLY on math. Computer always was stable, just would cause errors for the CPU tasks that occupied that core. A bunch of people here in the forum spent a few weeks diagnosing the problem with me, which I am forever grateful. If you want to read the entire thread and all the nitty gritty details about just what is reported in the stderr.txt output file and how results are compared to each other, this is the thread Why this CPU task invalid so soon? I also have found that with these Bulldozer/Piledriver chips, because of their architecture and shared FPU's, it is best to only run 50% of the cores on CPU tasks. You should be reserving cores anyway to feed any GPU's. If you want to discuss my settings for our motherboard, then we can continue the discussion privately and offline.

Cheers, Keith
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1773363 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1773424 - Posted: 23 Mar 2016, 11:25:14 UTC - in response to Message 1773319.  

Are you also looking at the Angle Range of the tasks and comparing like ARs or just looking at the run times?



. . OK, from that question I take it that the AR is the radial coverage of the signal file that is being analysed. No I have not been taking that into account as I have not mastered translating the co-ordinates of the file (I have the screen saver graphic turned off to save CPU time). But I would need to log the AR for each WU and then it's run time to understand how they relate. I have worked on the presumption most of the WU's covered a similar AR to each other with some obvious exceptions (when the run times drop or increase dramatically from the average). Unless there is an app to run to automatically log those details I can't see myself managing that. It would be revealing though I am sure.

. . Thanks for taking the time to answer me, I am learning slowly :)
ID: 1773424 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1773428 - Posted: 23 Mar 2016, 11:33:10 UTC - in response to Message 1773335.  

It sure would be cool to have those tapes from other telescopes integrated into the mix, now that we can do it. Have you heard of any that might have made it into production, to see how they worked out in the wild? Or haven't we received any yet? Or possibly have and are just holding off?

I've just linked the YouTube video of Andrew Siemion's talk about SETI projects in general, and Breakthrough Listen in particular, in the News area - not a lot of detail, I'm afraid.

The SETI recording equipment at Green Bank telescope saw 'first light' on 30 December 2015, and should have started recording in earnest sometime in January - no confirmation whether that actually happened, or whether the usual teething troubles intervened. The Parkes telescope in Australia isn't scheduled to come on stream until October 2016.

I don't think we've even seen any test tapes from GBT here on the main project (the tape/task names are very distinctive, and somebody would have noticed), but some were put through the Beta project to test that the new v8 application was working properly.
ID: 1773428 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1773487 - Posted: 23 Mar 2016, 15:49:26 UTC - in response to Message 1773424.  

Are you also looking at the Angle Range of the tasks and comparing like ARs or just looking at the run times?



. . OK, from that question I take it that the AR is the radial coverage of the signal file that is being analysed. No I have not been taking that into account as I have not mastered translating the co-ordinates of the file (I have the screen saver graphic turned off to save CPU time). But I would need to log the AR for each WU and then it's run time to understand how they relate. I have worked on the presumption most of the WU's covered a similar AR to each other with some obvious exceptions (when the run times drop or increase dramatically from the average). Unless there is an app to run to automatically log those details I can't see myself managing that. It would be revealing though I am sure.

. . Thanks for taking the time to answer me, I am learning slowly :)


If you look at any of your completed tasks, (Validation pending, Validation inconclusive, Valid, Invalid, or Error), you can select the Task ID and then within the Stderr output find WU true angle range is :
A normal angle range(AR) is ~0.42. These are ideally what you want to compare.
Values that are much higher are often refereed to as VHAR(very high angle range), or also shorties. They start from ~1.127 & go higher. They tend to take less time than normal to complete.
Values that are much lower are often refereed to as VLAR(very low angle range). They start from ~0.12 & go lower. Also the task will have VLAR in the name. They tend to take more time than normal to complete.
When I am looking for normal AR tasks to compare I generally start with the Valid tasks. Then look for ones with ~100 credit. Tasks with 40-60 credit are likely to be in the VHAR range.

You would also want to look at the signal count section. As it can also have a measurable effect on the run time.
Spike count:    0
Autocorr count: 0
Pulse count:    0
Triplet count:  0
Gaussian count: 0


One more thing to watch for are tasks that have the message:
SETI@Home Informational message -9 result_overflow
That message can indicate there was a lot of noise in the signal, probably from a satellite pass over the telescope. A large number of results can also be a clue to a processing device that is not operating normally. Such as a GPU overheating or other host PC weirdness.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1773487 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1773655 - Posted: 24 Mar 2016, 7:31:43 UTC - in response to Message 1773315.  
Last modified: 24 Mar 2016, 7:45:46 UTC

. . I have reinstalled Lunatics. The first time selecting SSE4.2 instead of AVX and still running GPU tasks. I was using 3 CPU cores out of the four.

. . Results: No greatly significant change but seemingly slightly longer run times (by a few minutes).

. . Then I decided to compare apples with apples and mimiced the environment on the C2D. I turned off GPU tasks and re-installed Lunatics again. This time I turned all 4 cores to WU's and selected SSE4.1 exactly as on the C2D. The results have been repeated with a huge success. Not only did I see the sort of gain I have gotten on the C2D but with gusto. WU's that were taking 3.5 hours and more are now running in 1.25 Hours. Some of the first tasks have AR's of 7 plus and took only 35 mins. But the tasks with an AR of 0.42 are running in about 75 mins. HOORAY!

. . Now I have to let it run for a day or so and confirm the consistency of the speed.

. . I am too impatient (and kind of excited) to let this run for very long. If I can't hold out until Saturday I will probably reconfigure again tomorrow, again selecting SSE4.1 but restoring GPU tasks and see if that is the performance killer.

. . It will be hard to wait.
ID: 1773655 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1773725 - Posted: 24 Mar 2016, 16:57:27 UTC - in response to Message 1773655.  

. . I have reinstalled Lunatics. The first time selecting SSE4.2 instead of AVX and still running GPU tasks. I was using 3 CPU cores out of the four.

. . Results: No greatly significant change but seemingly slightly longer run times (by a few minutes).

. . Then I decided to compare apples with apples and mimiced the environment on the C2D. I turned off GPU tasks and re-installed Lunatics again. This time I turned all 4 cores to WU's and selected SSE4.1 exactly as on the C2D. The results have been repeated with a huge success. Not only did I see the sort of gain I have gotten on the C2D but with gusto. WU's that were taking 3.5 hours and more are now running in 1.25 Hours. Some of the first tasks have AR's of 7 plus and took only 35 mins. But the tasks with an AR of 0.42 are running in about 75 mins. HOORAY!

. . Now I have to let it run for a day or so and confirm the consistency of the speed.

. . I am too impatient (and kind of excited) to let this run for very long. If I can't hold out until Saturday I will probably reconfigure again tomorrow, again selecting SSE4.1 but restoring GPU tasks and see if that is the performance killer.

. . It will be hard to wait.

Based on your initial information it seems like Skylake has the same issue as Haswell & Ivy Bridge CPUs. In that using the iGPU for processing nearly doubles the CPU run times when running SETI@home tasks on both.
I brought this issue up back in 2014.
http://setiathome.berkeley.edu/forum_thread.php?id=75215&postid=1543505#1543505
Then in 2015 Raistmer was investigating the issue.
http://setiathome.berkeley.edu/forum_thread.php?id=77119&postid=1664472#1664472
I think the investigation stopped after Intel developers were contacted & they were not able to provide any useful information.
Also it has been observed that when running Einstein iGPU tasks were slowed instead of the CPU.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1773725 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1773824 - Posted: 25 Mar 2016, 2:17:30 UTC - in response to Message 1773725.  
Last modified: 25 Mar 2016, 2:24:02 UTC

. . I have reinstalled Lunatics. The first time selecting SSE4.2 instead of AVX and still running GPU tasks. I was using 3 CPU cores out of the four.

. . Results: No greatly significant change but seemingly slightly longer run times (by a few minutes).

. . Then I decided to compare apples with apples and mimiced the environment on the C2D. I turned off GPU tasks and re-installed Lunatics again. This time I turned all 4 cores to WU's and selected SSE4.1 exactly as on the C2D. The results have been repeated with a huge success. Not only did I see the sort of gain I have gotten on the C2D but with gusto. WU's that were taking 3.5 hours and more are now running in 1.25 Hours. Some of the first tasks have AR's of 7 plus and took only 35 mins. But the tasks with an AR of 0.42 are running in about 75 mins. HOORAY!

. . Now I have to let it run for a day or so and confirm the consistency of the speed.

. . I am too impatient (and kind of excited) to let this run for very long. If I can't hold out until Saturday I will probably reconfigure again tomorrow, again selecting SSE4.1 but restoring GPU tasks and see if that is the performance killer.

. . It will be hard to wait.

Based on your initial information it seems like Skylake has the same issue as Haswell & Ivy Bridge CPUs. In that using the iGPU for processing nearly doubles the CPU run times when running SETI@home tasks on both.
I brought this issue up back in 2014.
http://setiathome.berkeley.edu/forum_thread.php?id=75215&postid=1543505#1543505
Then in 2015 Raistmer was investigating the issue.
http://setiathome.berkeley.edu/forum_thread.php?id=77119&postid=1664472#1664472
I think the investigation stopped after Intel developers were contacted & they were not able to provide any useful information.
Also it has been observed that when running Einstein iGPU tasks were slowed instead of the CPU.



. . I took another intermediate step and reconfigured with the 4 cores running AVX. It got a further improvement of maybe 5% to 10%. I am now convinced that the GPU use is indeed the performance killer. In fact the syndrome may be worse with Skylake than with the earlier versions. Running times now are approximately 1 hour and 10/15 mins per WU, when using the GPU running times were 3hours and 30 plus mins for CPU WU's. I have settled on using 3 CPUs running AVX and no GPU, the net results will be much better. But I do want to try a short run with the GPU and 3 cores running SSE to see if there is less penalty from running the GPU than with AVX. I am ever inquisitive.

. . Thanks muchly for your input.
ID: 1773824 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1773830 - Posted: 25 Mar 2016, 2:51:28 UTC - in response to Message 1773725.  

. . Scanning over those threads I notice you mentioned variations is clocking. This reminded me of a funny thing I observed. When running a mix of CPU and GPU tasks the timer in BOINC manager was running 33% faster for the GPU task than for the 3 MB tasks. After restarting, all four WU's kicked off together. The timers for the MBs stayed in sync but the timer the GPU task kept moving further and further ahead. After observing this I watched it for a while at any given point the elapsed time for the GPU task was 33% ahead of the MD tasks. That made wonder how that timing is performed. I presumed that it would be sampling the system clock, but now I assume it is running timer routines for itself.
ID: 1773830 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 . . . 10 · Next

Message boards : Number crunching : Lunatics Help


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.