Posts by RueiKe

1) Message boards : Number crunching : User achievements thread......... (Message 1924842)
Posted 6 days ago by Profile RueiKe Special Project $250 donor
Just passed 125M on SETI.
2) Message boards : Number crunching : LLC on the Zenith Extreme (Message 1922653)
Posted 18 days ago by Profile RueiKe Special Project $250 donor
Most of the OCN Zenith TR thread posts I have read that have something to with recommended LLC settings for the Zenith have said the LLC action has varied somewhat among the various BIOS releases for that motherboard. That said, I see most people have settled on LLC4 or LLC5 for fully loaded systems. Most everyone is running at 4 or 4.1 Ghz.

Even Buildzoid recommends Level 4 or 5 for ROG (specifically for Intel) for best results, but in the same video, he suggest a flat to slight droop curve for V core vs Load, LLC Video. My results show no droop even at lowest levels of LLC. I am wondering if this is part of a marketing strategy of having a "more over-clockable" MB. Most people wont be aware that the actual Vcore will be 50mV higher than what you set and will be satisfied that with the great results they achieve at such "Low Voltages". Since my work was done in Linux, I don't have good access to software voltages for comparison. Also, I am using ProbeIT terminals instead of reading a power pin on the back of the socket, so maybe there is additional voltage drop before the CPU.

I remember watching a video a while ago on LLC curves for various MBs, probably from der8auer, but I can't find it. It would be good to know if my observations are similar to that of others. I am running BIOS 0902 and plan try this again on the next BIOS release.
3) Message boards : Number crunching : Ubuntu 16.04.4 (Message 1922294)
Posted 19 days ago by Profile RueiKe Special Project $250 donor
I think the issue mentioned on the 17.50 page is concerning a screen corruption issue (minor problem that I have seen). Seems like changes in the Kernel breaks the 17.50 unistall script. Also, there could be other compatibility issues. Hopefully the 1Q18 release comes out soon. I have not tried any open source drivers. Not enough time to investigate.

Interesting, from the notes on 17.50 driver page, it said expected to be fixed in 16.04.4. Guess that didn't happen. I assume that is with the open source AMDGPU-Pro drivers. Have you looked at the additional open source Vulkan driver AMDVLK yet? AMD Open-Source Driver For Vulkan "AMDVLK" Is Now Available
4) Message boards : Number crunching : LLC on the Zenith Extreme (Message 1922291)
Posted 19 days ago by Profile RueiKe Special Project $250 donor
Spent some time investigating the impact of LLC on Vcore vs Loading on the Zenith Extreme MB with a 1950X CPU. I am running a mild OC in order manage temps and power usage, so my Vcore setting is 1.2125V @ 3.725GHz. I used a Fluke 87V multimeter and took stable average measurements at the ProbeIT terminal. I adjust loading using the % CPUs used setting in BOINC. Here are my results:

Based on these results, I will continue to use the Auto setting in BIOS. My reasoning is that since this system is continuously 100% loaded and is using a fixed clock frequency, a flat Vcore curve would be ideal. Seems like that is not possible with this BIOS, but auto seems to be the most appropriate curve for this use case. Does this reasoning seem correct?
5) Message boards : Number crunching : Ubuntu 16.04.4 (Message 1922244)
Posted 19 days ago by Profile RueiKe Special Project $250 donor
Hi Keith, I had done a quick search as I first encountered the problem, but I quickly went the brute force approach of just reinstalling drivers in recovery mode as I did not want to spend my entire Friday night working on it. Hopefully my approach doesn't leave any latent issues.

I also found that 17.50 drivers are not compatible with 16.04.4. One way not to have an issue is to uninstall the drivers before kernel upgrade and reinstall after. AMD driver releases for Linux are quarterly, so the release for this quarter should be compatible. I am considering doing a clean install of 16.04.4 from iso before installing the new drivers when available. I will also retry the 4th GPU card install afterward.

That is a common problem with lots of posts in the Linux help forums. I believe the most common issue is with permissions on the xauthority file and is caused by the removal of video drivers and and the reset of the server configuration. You might want to search on "endless login loop"
[Edit] Did the work for you. Stuck in login loop (Ubuntu 16.04)

Basically what you already discovered.
6) Message boards : Number crunching : Ubuntu 16.04.4 (Message 1922090)
Posted 19 days ago by Profile RueiKe Special Project $250 donor
Latest update: After reinstalling drivers, everything is running fine. Probably need to see it run a week to be certain all is well. I did have an issue after uninstalling AMD GPU drivers that I had never seen with 16.04.3. The system would not enter desktop after login. It would just cycle back to login screen. After I reinstalled drivers while in recovery mode, the problem was resolved.

I also attempted to install a 4th graphics card, by adding a Nano in addition to the 3 ProDuo cards. This was not successful. It gave an error about graphics before even getting to the login screen. Given the new problem I had with uninstall in 16.04.4, I have to wonder if it would work in 16.04.3.
7) Message boards : Number crunching : Ubuntu 16.04.4 (Message 1922040)
Posted 19 days ago by Profile RueiKe Special Project $250 donor
Good to know that I am not alone! I am running the latest GPU driver, so maybe I will just try a reinstall. Seems like I am running the same kernel release as you, 4.13.0-36-generic in my case. I was planning to test out adding a 7th GPU to the system, but the instability I am facing will complicate that.

The day Ubuntu released the Meltdown/Spectre fixes in a new kernel and concomitant with Nvidia driver 384.111, caused issues for both Juan and myself with crashes and corrupted tasks. If you hadn't updated since that, likely you got bit too. Both Juan and myself are up to the HWE release kernel 4.13.0-36 now. And moved on to Nvidia driver 390.25.
8) Message boards : Number crunching : Ubuntu 16.04.4 (Message 1922029)
Posted 20 days ago by Profile RueiKe Special Project $250 donor
Has anyone seen stability issues since upgrading to 16.04.4? I have had 2 system crashes and 2 cases of GPU compute hanging or giving computation errors in the few days since I have upgraded. This is my Threadripper/ProDuo system 8365846. Perhaps it is related to Meltdown/Spectre fixes or maybe I need to reinstall GPU drivers after the upgrade. I did the upgrade during the Tuesday downtime and since then there has been a lot more Arecibo work, which could also be a different stress on my OC.
9) Message boards : Number crunching : Main Computer Down...again (Message 1914614)
Posted 22 Jan 2018 by Profile RueiKe Special Project $250 donor
I am happy with the results I am getting from my XSPC Raystorm Pro CPU waterblock.

Forgot to mention that the coverage issue is a Threadripper specific issue. EK just used their original design which doesn't give good coverage of the multi-die configuration of Threadripper.
10) Message boards : Number crunching : Main Computer Down...again (Message 1914611)
Posted 22 Jan 2018 by Profile RueiKe Special Project $250 donor
I'm 2 days into a 2 week trip to the US and see that my main system is down. I suspect that weather warmed up a bit and the room I moved this machine to got too warm and system was unstable. I probably need to upgrade the CPU waterblock, as the one from EK doesn't have good coverage. Also, auto reboot and restart of BOINC would be good. Anyone have a setup to reboot, auto-login and restart BOINC in linux?
11) Message boards : Number crunching : A very steep decline in Average Credits!!! (Message 1913454)
Posted 17 Jan 2018 by Profile RueiKe Special Project $250 donor
If somebody has some hard data showing just what the impact of rescheduling is on granted credits, or can run some new tests to generate a comparison, I think it would be very useful. When I first experimented with rescheduling in June of 2016, there were some people who said it did affect credit and others who said that was a myth that had already been put to rest long before.

So, just to make sure that my own rescheduling wasn't messing up other people's credit, I did some fairly extensive comparisons. My results were posted in Message 1799300. The conclusion I reached, based on those results, was that rescheduling had "no more impact to the credits than is caused by the random number generator that assigns them in the first place."

Rescheduling at that time simply meant moving Guppi VLARs that were originally assigned to the GPUs over to the CPUs, and moving non-VLAR Arecibo tasks that were originally assigned to the CPUs over to the GPUs. So, yes, tasks were being run on a different device than what they were originally assigned to, which is the issue that is being raised again here.

Now, perhaps things have changed in some way in the last year and a half, such that my previous conclusion is no longer valid. If so, I think new testing and documented results would be needed to demonstrate it.

My results also show that rescheduled work from GPU to CPU gets normal if not higher credit. See this example 6317203128. My observation is that non-rescheduled WU's that ran after the rescheduling event get lower credit. This could be the result of the WUs post outage are very different than WUs before. But I was concerned that something is going on with credit calculation after the rescheduling. Did the rescheduled work somehow change the reference for credit calculation of new WU's? Can information be extracted for the 2 WUs I referenced to verify this?
12) Message boards : Number crunching : A very steep decline in Average Credits!!! (Message 1913394)
Posted 17 Jan 2018 by Profile RueiKe Special Project $250 donor
No that is NOT the cause. CreditScrew is the cause. Think about it. You moved tasks assigned originally to the gpus on your system. The scheduler took into account the APR for the gpu application. You moved the gpu tasks temporarily to the cpu for bunkering. The scheduler and server has no knowledge of this action. You then move your gpu tasks temporarily stored in the cpu cache back to the gpu cache where you process them during the outrage.

What has changed? Nothing. You processed the originally assigned gpu tasks on the gpu as intended. You get 50 credits per task. Thank you CreditScrew.

In my case, I am not doing "bunkering". I moved a bunch of WUs to CPU and left them there. I was not trying to get more tasks, only keep my CPU fully loaded during the outage. I only run SETI and LHC, and LHC doesn't have GPU tasks, so my plan was to move tasks from GPU to CPU to keep the CPU loaded during the outage and use the GPUs for mining. But if this is messing up credit calculations for work done, then I won't do it.

Low consistent credit is not an issue for me. What I like about LHC is that credit is also very low, perhaps even more difficult than SETI. This makes the competitive computing aspect of it even more meaningful. I only raised the concern in this thread since some of the observations after rescheduling seemed extreme. Even some tasks below 20credits, so still concerned that rescheduling is a factor. In this case, there is also a shift in work unit types, so still uncertain what happened.
13) Message boards : Number crunching : A very steep decline in Average Credits!!! (Message 1913341)
Posted 16 Jan 2018 by Profile RueiKe Special Project $250 donor
If you go through your results, you'll see quite a few like that.
It's all to do with Credit New & the way it determines Credit.

My WAG (Wild Arse Guess)- your GPU APR (Average Processing Rate) is only 183.57 GFLOPS, I suspect the theoretical value for the GPU is much, much higher (checkout the first few startup lines in the log to see what the claimed FLOPS for that particular card is)- For that WU your Device peak FLOPS is 8,192.00 GFLOPS.
Yet your actual processing time is very quick, much faster than your APR would indicate.
Credit New considers your cards to be extremely inefficient (big discrepancy between APR & benchmark FLOPS) and a very high device Peak flops (with such a low APR means it's really, really in efficient, or the numbers have been fudged (it's interpretation)).
Credit New makes all sorts of assumptions, and if your figures don't meet those assumptions then it considers your figures to be a result of poor efficiency, or cheating, or both.
Final end result- bugger all credit for work done. By design.

I watch my machines quite closely and have not noticed sub 50s credit awards like this in the past. I did try something different in the latest work shortage. I decide to try the rescheduler for the first time. I used it to move several hundred GPU tasks to CPU. Since I knew the machine would completely run out of work, I decided on a strategy to keep it fully loaded. I moved enough GPU tasks to CPU to make sure the CPU would be fully loaded while I slept my Sunday evening. I then enabled mining on the GPUs. In the morning, I stopped mining and un-suspended GPUs. Everything looked normal. Only thing that doesn't make sense is that it is not the rescheduled work that is getting the lower credit. It is the work that actually ran afterward on the GPUs. Anyone think this is the cause?
14) Message boards : Number crunching : A very steep decline in Average Credits!!! (Message 1913307)
Posted 16 Jan 2018 by Profile RueiKe Special Project $250 donor
This example of credit granted for a normal WU definitely looks off:
Only 26 credits for 360s of GPU work or 3,935s of CPU work from the wingman seems well beyond a credit new effect. Perhaps there is a problem somewhere?
15) Message boards : Number crunching : ubuntu Install (Message 1912722)
Posted 13 Jan 2018 by Profile RueiKe Special Project $250 donor
Hi RueiKe!

Message 1896117 on the previous page in this thread.

Perhaps you meant to say "but I can not get to advanced view" here?

Except for that, becomes the same word twice, but still perhaps not the same.

That was so long ago that I can not remember the details of what I was doing. But definitely all of the issues I was having were fixed by Tbar's latest Linux build of BOINCmgr.
16) Message boards : Number crunching : ubuntu Install (Message 1912721)
Posted 13 Jan 2018 by Profile RueiKe Special Project $250 donor
Guess we should be running the SSE41 app on the BLC05 cpu tasks.

[Its a wash between the r3345 AVX app and the r3711 SSE41 apps. But the r3345 AVX app is 23% faster than the r3712 AVX2 app.

I have confirmed that my actual performance had improved when I switched from AVX to SSE42 as my benchmarks indicated. Probably a good idea to re-validate optimization when WU characteristics change.
17) Message boards : Number crunching : ubuntu Install (Message 1912720)
Posted 13 Jan 2018 by Profile RueiKe Special Project $250 donor
From my experience it depends on the tasks and the CPU load.
On my benches AVX2 was slower than SSE4.1 in most cases on my Ryzen 1800X.

I have definitely found that to be the case. My approach now is to free up 1 core on my machine for the benchmark runs and make sure it doesn't stop BOINCmgr from continuing to run tasks. I am still concerned that results may be influence by what app type is running on the rest of the cores (will tesing AVX be invluence by other cores running SSE42?)
18) Message boards : Number crunching : ECC Memory Correction Rate (Message 1912134)
Posted 10 Jan 2018 by Profile RueiKe Special Project $250 donor
During the downtime I increased memory voltage from 1.2V to 1.21V and after 25hours, I only have 8 CE. I will bump it up another 10mv next time I reboot.
19) Message boards : Number crunching : ubuntu Install (Message 1912005)
Posted 9 Jan 2018 by Profile RueiKe Special Project $250 donor
Since I recently found out about the availability of Linux AVX2 apps, I decided to revisit the work I did in this posting on performance comparisons of the different app versions and testing methods. Here are the new results. Previous results are lower in this thread.

My approach was to use the same single core Linux VM on my 1950X Win10 desktop that I had previously used. Only difference is that this time the CPU is only 70% loaded. This showed AVX2 was best, but strangely invalidated my previous conclusion that AVX was better than SSE41. I suspect this is the result running with the system 70% utilized. I got similar results using the new Win10 linux subsystem. I released AVX2 on Eos and found that processing times increased about 10%. Then I decided to test directly on Eos, my main contributer to SETI. I set CPU usage to 97% which left only 1 thread idle. I then ran one of the original test WUs and found that the results were not that far off from Nemesis, with AVX2 app significantly faster than AVX. I then used one the current WUs (Second column of data) and found that AVX2 was worse and SSE42 was best. I released SSE42 on my system and processing times decreased by 10%.

Seems like WUs coming in now are a bit different from what we received previously and previous optimizations don't apply.
20) Message boards : Number crunching : Postponed: Waiting to acquire lock (Message 1911743)
Posted 8 Jan 2018 by Profile RueiKe Special Project $250 donor
Thanks again. Just wanted to be certain. So, MBv8_8.05r3345_avx_linux64 needs to go under the microscope.

One more item to point out. Even though those 3 tasks were still active, I did not observe the "Waiting to acquire lock" error. Actually, I have only observed that error the one time I posted here. I was only raising these observations as being potentially relevant.

Next 20

©2018 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.