Posts by RueiKe

1) Message boards : Number crunching : Radeon Vega Frontier Edition (Message 1930485)
Posted 2 days ago by Profile RueiKe Special Project $250 donor
Post:
Try installing it in "Safe Mode"

I could install the driver in the Safe Mode through the device manager. But after restarting, the PC went into an endless BSOD - restart loop. Had to get back into safe mode and remove the driver. Now I am trying to search for a possible solution to this issue :( . Hope someone out there has a solution.

Edit: The issue is caused by atimkdag.sys file


Which version driver are you installing? There is a specific section for Frontier Edition on the AMD driver download page. Then there are 2 options from there.
2) Message boards : Number crunching : Radeon Vega Frontier Edition (Message 1928936)
Posted 10 days ago by Profile RueiKe Special Project $250 donor
Post:
Heatkiller seems like a good alternative to EK. I will still try to see how good the card performs with the built in air cooler first.
I have never tried running more than one equipment with water cooling at a time so just to get more info... can two or more GPUs be connected in series to a single cooling system? How will the temperatures be on the last card on the loop? Or is there any option to do a parallel connection for all the GPUs. Wont the pump also need to be of higher capacity to handle the excess tubing?


I have 3 systems with multiple GPUs and the CPU in the same cooling loop. I prefer to use parallel terminal blocks when I can, but in cases where the card slots don't align with standard parallel terminal blocks available, I use the semi-parallel x7 terminal block and block the unused slots. This terminal block has 4 parallel slots in series with 3 parallel slots. I have 2 systems using the EK D5 pump and 420mm radiator and 1 system using a 240mm radiator with a DDC pump. In my previous post I gave temps of dedicated loop vs. shared loop. At steady state, individual GPU temp differences are probably more the result of variable power/performance of each die.

Here is a photo of my most extreme build with 3xProDuo and a Threadripper 1950X all in the same loop. Temps are definitely limited by the cooling loop with the CPU running in 60-65C and the GPUs all running in the mid 40s. The GPUs are 3 cards in parallel with each card having 2 GPUs in series. I thought about running the CPU and GPU loops in parallel, but I think at steady state it would not make much difference.
3) Message boards : Number crunching : Radeon Vega Frontier Edition (Message 1928759)
Posted 11 days ago by Profile RueiKe Special Project $250 donor
Post:
I am happy with EK's Vega Waterblocks. I have an aircooled VegaFE with the EK block and a complete overkill 560mm radiator, Nemesis. It pulls max power of 210W and max HBM/GPU temps of 42C/37C while running SETI. I also have a quad RX Vega64 system with the same water blocks but much smaller 240mm radiator running a max temp of ~54C. I found on Vega, I can run with an sbs size of 2048 vs. 1024 on Fiji cards.
4) Message boards : Number crunching : User achievements thread......... (Message 1924842)
Posted 16 Mar 2018 by Profile RueiKe Special Project $250 donor
Post:
Just passed 125M on SETI.
5) Message boards : Number crunching : LLC on the Zenith Extreme (Message 1922653)
Posted 4 Mar 2018 by Profile RueiKe Special Project $250 donor
Post:
Most of the OCN Zenith TR thread posts I have read that have something to with recommended LLC settings for the Zenith have said the LLC action has varied somewhat among the various BIOS releases for that motherboard. That said, I see most people have settled on LLC4 or LLC5 for fully loaded systems. Most everyone is running at 4 or 4.1 Ghz.


Even Buildzoid recommends Level 4 or 5 for ROG (specifically for Intel) for best results, but in the same video, he suggest a flat to slight droop curve for V core vs Load, LLC Video. My results show no droop even at lowest levels of LLC. I am wondering if this is part of a marketing strategy of having a "more over-clockable" MB. Most people wont be aware that the actual Vcore will be 50mV higher than what you set and will be satisfied that with the great results they achieve at such "Low Voltages". Since my work was done in Linux, I don't have good access to software voltages for comparison. Also, I am using ProbeIT terminals instead of reading a power pin on the back of the socket, so maybe there is additional voltage drop before the CPU.

I remember watching a video a while ago on LLC curves for various MBs, probably from der8auer, but I can't find it. It would be good to know if my observations are similar to that of others. I am running BIOS 0902 and plan try this again on the next BIOS release.
6) Message boards : Number crunching : Ubuntu 16.04.4 (Message 1922294)
Posted 3 Mar 2018 by Profile RueiKe Special Project $250 donor
Post:
I think the issue mentioned on the 17.50 page is concerning a screen corruption issue (minor problem that I have seen). Seems like changes in the Kernel breaks the 17.50 unistall script. Also, there could be other compatibility issues. Hopefully the 1Q18 release comes out soon. I have not tried any open source drivers. Not enough time to investigate.

Interesting, from the notes on 17.50 driver page, it said expected to be fixed in 16.04.4. Guess that didn't happen. I assume that is with the open source AMDGPU-Pro drivers. Have you looked at the additional open source Vulkan driver AMDVLK yet? AMD Open-Source Driver For Vulkan "AMDVLK" Is Now Available
7) Message boards : Number crunching : LLC on the Zenith Extreme (Message 1922291)
Posted 3 Mar 2018 by Profile RueiKe Special Project $250 donor
Post:
Spent some time investigating the impact of LLC on Vcore vs Loading on the Zenith Extreme MB with a 1950X CPU. I am running a mild OC in order manage temps and power usage, so my Vcore setting is 1.2125V @ 3.725GHz. I used a Fluke 87V multimeter and took stable average measurements at the ProbeIT terminal. I adjust loading using the % CPUs used setting in BOINC. Here are my results:

Based on these results, I will continue to use the Auto setting in BIOS. My reasoning is that since this system is continuously 100% loaded and is using a fixed clock frequency, a flat Vcore curve would be ideal. Seems like that is not possible with this BIOS, but auto seems to be the most appropriate curve for this use case. Does this reasoning seem correct?
8) Message boards : Number crunching : Ubuntu 16.04.4 (Message 1922244)
Posted 3 Mar 2018 by Profile RueiKe Special Project $250 donor
Post:
Hi Keith, I had done a quick search as I first encountered the problem, but I quickly went the brute force approach of just reinstalling drivers in recovery mode as I did not want to spend my entire Friday night working on it. Hopefully my approach doesn't leave any latent issues.

I also found that 17.50 drivers are not compatible with 16.04.4. One way not to have an issue is to uninstall the drivers before kernel upgrade and reinstall after. AMD driver releases for Linux are quarterly, so the release for this quarter should be compatible. I am considering doing a clean install of 16.04.4 from iso before installing the new drivers when available. I will also retry the 4th GPU card install afterward.

That is a common problem with lots of posts in the Linux help forums. I believe the most common issue is with permissions on the xauthority file and is caused by the removal of video drivers and and the reset of the x.org server configuration. You might want to search on "endless login loop"
[Edit] Did the work for you. Stuck in login loop (Ubuntu 16.04)

Basically what you already discovered.
9) Message boards : Number crunching : Ubuntu 16.04.4 (Message 1922090)
Posted 2 Mar 2018 by Profile RueiKe Special Project $250 donor
Post:
Latest update: After reinstalling drivers, everything is running fine. Probably need to see it run a week to be certain all is well. I did have an issue after uninstalling AMD GPU drivers that I had never seen with 16.04.3. The system would not enter desktop after login. It would just cycle back to login screen. After I reinstalled drivers while in recovery mode, the problem was resolved.

I also attempted to install a 4th graphics card, by adding a Nano in addition to the 3 ProDuo cards. This was not successful. It gave an error about graphics before even getting to the login screen. Given the new problem I had with uninstall in 16.04.4, I have to wonder if it would work in 16.04.3.
10) Message boards : Number crunching : Ubuntu 16.04.4 (Message 1922040)
Posted 2 Mar 2018 by Profile RueiKe Special Project $250 donor
Post:
Good to know that I am not alone! I am running the latest GPU driver, so maybe I will just try a reinstall. Seems like I am running the same kernel release as you, 4.13.0-36-generic in my case. I was planning to test out adding a 7th GPU to the system, but the instability I am facing will complicate that.

The day Ubuntu released the Meltdown/Spectre fixes in a new kernel and concomitant with Nvidia driver 384.111, caused issues for both Juan and myself with crashes and corrupted tasks. If you hadn't updated since that, likely you got bit too. Both Juan and myself are up to the HWE release kernel 4.13.0-36 now. And moved on to Nvidia driver 390.25.
11) Message boards : Number crunching : Ubuntu 16.04.4 (Message 1922029)
Posted 2 Mar 2018 by Profile RueiKe Special Project $250 donor
Post:
Has anyone seen stability issues since upgrading to 16.04.4? I have had 2 system crashes and 2 cases of GPU compute hanging or giving computation errors in the few days since I have upgraded. This is my Threadripper/ProDuo system 8365846. Perhaps it is related to Meltdown/Spectre fixes or maybe I need to reinstall GPU drivers after the upgrade. I did the upgrade during the Tuesday downtime and since then there has been a lot more Arecibo work, which could also be a different stress on my OC.
12) Message boards : Number crunching : Main Computer Down...again (Message 1914614)
Posted 22 Jan 2018 by Profile RueiKe Special Project $250 donor
Post:
I am happy with the results I am getting from my XSPC Raystorm Pro CPU waterblock.


Forgot to mention that the coverage issue is a Threadripper specific issue. EK just used their original design which doesn't give good coverage of the multi-die configuration of Threadripper.
13) Message boards : Number crunching : Main Computer Down...again (Message 1914611)
Posted 22 Jan 2018 by Profile RueiKe Special Project $250 donor
Post:
I'm 2 days into a 2 week trip to the US and see that my main system is down. I suspect that weather warmed up a bit and the room I moved this machine to got too warm and system was unstable. I probably need to upgrade the CPU waterblock, as the one from EK doesn't have good coverage. Also, auto reboot and restart of BOINC would be good. Anyone have a setup to reboot, auto-login and restart BOINC in linux?
14) Message boards : Number crunching : A very steep decline in Average Credits!!! (Message 1913454)
Posted 17 Jan 2018 by Profile RueiKe Special Project $250 donor
Post:
If somebody has some hard data showing just what the impact of rescheduling is on granted credits, or can run some new tests to generate a comparison, I think it would be very useful. When I first experimented with rescheduling in June of 2016, there were some people who said it did affect credit and others who said that was a myth that had already been put to rest long before.

So, just to make sure that my own rescheduling wasn't messing up other people's credit, I did some fairly extensive comparisons. My results were posted in Message 1799300. The conclusion I reached, based on those results, was that rescheduling had "no more impact to the credits than is caused by the random number generator that assigns them in the first place."

Rescheduling at that time simply meant moving Guppi VLARs that were originally assigned to the GPUs over to the CPUs, and moving non-VLAR Arecibo tasks that were originally assigned to the CPUs over to the GPUs. So, yes, tasks were being run on a different device than what they were originally assigned to, which is the issue that is being raised again here.

Now, perhaps things have changed in some way in the last year and a half, such that my previous conclusion is no longer valid. If so, I think new testing and documented results would be needed to demonstrate it.

My results also show that rescheduled work from GPU to CPU gets normal if not higher credit. See this example 6317203128. My observation is that non-rescheduled WU's that ran after the rescheduling event get lower credit. This could be the result of the WUs post outage are very different than WUs before. But I was concerned that something is going on with credit calculation after the rescheduling. Did the rescheduled work somehow change the reference for credit calculation of new WU's? Can information be extracted for the 2 WUs I referenced to verify this?
15) Message boards : Number crunching : A very steep decline in Average Credits!!! (Message 1913394)
Posted 17 Jan 2018 by Profile RueiKe Special Project $250 donor
Post:
No that is NOT the cause. CreditScrew is the cause. Think about it. You moved tasks assigned originally to the gpus on your system. The scheduler took into account the APR for the gpu application. You moved the gpu tasks temporarily to the cpu for bunkering. The scheduler and server has no knowledge of this action. You then move your gpu tasks temporarily stored in the cpu cache back to the gpu cache where you process them during the outrage.

What has changed? Nothing. You processed the originally assigned gpu tasks on the gpu as intended. You get 50 credits per task. Thank you CreditScrew.

In my case, I am not doing "bunkering". I moved a bunch of WUs to CPU and left them there. I was not trying to get more tasks, only keep my CPU fully loaded during the outage. I only run SETI and LHC, and LHC doesn't have GPU tasks, so my plan was to move tasks from GPU to CPU to keep the CPU loaded during the outage and use the GPUs for mining. But if this is messing up credit calculations for work done, then I won't do it.

Low consistent credit is not an issue for me. What I like about LHC is that credit is also very low, perhaps even more difficult than SETI. This makes the competitive computing aspect of it even more meaningful. I only raised the concern in this thread since some of the observations after rescheduling seemed extreme. Even some tasks below 20credits, so still concerned that rescheduling is a factor. In this case, there is also a shift in work unit types, so still uncertain what happened.
16) Message boards : Number crunching : A very steep decline in Average Credits!!! (Message 1913341)
Posted 16 Jan 2018 by Profile RueiKe Special Project $250 donor
Post:
If you go through your results, you'll see quite a few like that.
It's all to do with Credit New & the way it determines Credit.

My WAG (Wild Arse Guess)- your GPU APR (Average Processing Rate) is only 183.57 GFLOPS, I suspect the theoretical value for the GPU is much, much higher (checkout the first few startup lines in the log to see what the claimed FLOPS for that particular card is)- For that WU your Device peak FLOPS is 8,192.00 GFLOPS.
Yet your actual processing time is very quick, much faster than your APR would indicate.
Credit New considers your cards to be extremely inefficient (big discrepancy between APR & benchmark FLOPS) and a very high device Peak flops (with such a low APR means it's really, really in efficient, or the numbers have been fudged (it's interpretation)).
Credit New makes all sorts of assumptions, and if your figures don't meet those assumptions then it considers your figures to be a result of poor efficiency, or cheating, or both.
Final end result- bugger all credit for work done. By design.


I watch my machines quite closely and have not noticed sub 50s credit awards like this in the past. I did try something different in the latest work shortage. I decide to try the rescheduler for the first time. I used it to move several hundred GPU tasks to CPU. Since I knew the machine would completely run out of work, I decided on a strategy to keep it fully loaded. I moved enough GPU tasks to CPU to make sure the CPU would be fully loaded while I slept my Sunday evening. I then enabled mining on the GPUs. In the morning, I stopped mining and un-suspended GPUs. Everything looked normal. Only thing that doesn't make sense is that it is not the rescheduled work that is getting the lower credit. It is the work that actually ran afterward on the GPUs. Anyone think this is the cause?
17) Message boards : Number crunching : A very steep decline in Average Credits!!! (Message 1913307)
Posted 16 Jan 2018 by Profile RueiKe Special Project $250 donor
Post:
This example of credit granted for a normal WU definitely looks off:
2819745779
Only 26 credits for 360s of GPU work or 3,935s of CPU work from the wingman seems well beyond a credit new effect. Perhaps there is a problem somewhere?
18) Message boards : Number crunching : ubuntu Install (Message 1912722)
Posted 13 Jan 2018 by Profile RueiKe Special Project $250 donor
Post:
Hi RueiKe!

Message 1896117 on the previous page in this thread.

Perhaps you meant to say "but I can not get to advanced view" here?

Except for that, becomes the same word twice, but still perhaps not the same.


That was so long ago that I can not remember the details of what I was doing. But definitely all of the issues I was having were fixed by Tbar's latest Linux build of BOINCmgr.
19) Message boards : Number crunching : ubuntu Install (Message 1912721)
Posted 13 Jan 2018 by Profile RueiKe Special Project $250 donor
Post:
Guess we should be running the SSE41 app on the BLC05 cpu tasks.

[Its a wash between the r3345 AVX app and the r3711 SSE41 apps. But the r3345 AVX app is 23% faster than the r3712 AVX2 app.


I have confirmed that my actual performance had improved when I switched from AVX to SSE42 as my benchmarks indicated. Probably a good idea to re-validate optimization when WU characteristics change.
20) Message boards : Number crunching : ubuntu Install (Message 1912720)
Posted 13 Jan 2018 by Profile RueiKe Special Project $250 donor
Post:
From my experience it depends on the tasks and the CPU load.
On my benches AVX2 was slower than SSE4.1 in most cases on my Ryzen 1800X.


I have definitely found that to be the case. My approach now is to free up 1 core on my machine for the benchmark runs and make sure it doesn't stop BOINCmgr from continuing to run tasks. I am still concerned that results may be influence by what app type is running on the rest of the cores (will tesing AVX be invluence by other cores running SSE42?)


Next 20


 
©2018 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.