Message boards :
Number crunching :
PC Build for my Dad
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 · Next
Author | Message |
---|---|
Jeyl Send message Joined: 22 Sep 17 Posts: 19 Credit: 1,888,105 RAC: 0 |
Sigh. I tried installing the Lunatic and now Boinc doesn't want to open. After re-installing it, it's back to it's old slow self. I hope everyone knows that I am truly thankful for all the time you've spent in replying to my questions on how to run this software. It's been great! I just think that I'm in way over my head on this one. To reiterate, the only purpose these machines serve is running Boinc. That's it. We've got no other apps running with the exception of the ones you recommended like the SIV64X. My Dad truly believes that one machine (16-core) is not pulling it's weight. It's easy to see since the 8-core machine finishes a task in 5 to 8 hours while the 16-core machine is up to 12-15 hours. Problem is neither of us understands a majority of what's being posted on this thread. Ghosts? Crippling the database? How are we crippling the data base? If I can't get the answers here or help on this thread, where else should I go? We've got machines built for Boinc! Why is it so hard to run the darn thing? |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Ghost refer to work unit that the server "thinks" you have even though they may not appear on your computer's hard drive. Example. Say you start to download 64 work units and something, anything happens that prevents you from finishing downloading or your antivirus for whatever reason decides they are trojans and quarinteens them then delete them. I've had it happen when the I've had 2 computers sharing the same network and one sends a request and the other is downloading, interrupting the download. So the server "thinks" you still have those work units even though you don't have then anywhere anymore on your computer. So now it thinks you have X+64 work units when you really only have X. Trouble is, you can actually grow that "ghost" number up into the hundreds to thousands depending on how many computers you have. I remember having 400+ ghost at one time. There is a trick to getting rid of them but it's time consuming. Usually we just wait for them to time out (reach their deadlines) Unfortunately, that means they end up as errors and count against you at some point in the future. Now we need to get back to what is wrong with your TR machine. |
betreger Send message Joined: 29 Jun 99 Posts: 11362 Credit: 29,581,041 RAC: 66 |
A couple of questions, when you say only Boinc is being run on the machine, what Boinc projects is it running? Only S@H or others such as E@H also? As I understand it this is tripple card machine. I think I would learn to walk before trying to run so to speak. I would run only one project with only one card for a while, at least a week, in order to get a base line. The simpler things are the easier they are to debug. The on Seti output from the CPU is going to be pretty trivial compared to the IIRC gtx1080ti you are using. An easy and useful monitor is GPU-Z. If the results are reasonable then try 2 cards and see what happens. I would expect almost twice as much thruput with 2. |
Wiggo Send message Joined: 24 Jan 00 Posts: 34947 Credit: 261,360,520 RAC: 489 |
Are you making sure that for each GPU task running that you are reserving 1 CPU core? Running 1 task per GPU means that you need 3 free CPU cores to support those 3 GPU's and running 2 tasks per GPU requires 6 free CPU cores, etc. Cheers. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13751 Credit: 208,696,464 RAC: 304 |
Are you making sure that for each GPU task running that you are reserving 1 CPU core? That's not the problem, there is a hardware issue with the non performing system. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13751 Credit: 208,696,464 RAC: 304 |
A couple of questions, when you say only Boinc is being run on the machine, what Boinc projects is it running? Only S@H or others such as E@H also? He needs to fix the hardware issue on the problem system before trying to get anything else to run on it. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13751 Credit: 208,696,464 RAC: 304 |
Sigh. Only install Lunatics on the system that is presently working- the 1900X. There is no point trying to install it on the faulty 1950X until you fix whatever is wrong with that system. Use a programme such as Hardware Info to find out the clock speed & the temperature of the 1950X system. That system is completely broken and not working as it should. Edit- I would also suggest a programme such as Process Explorer to make sure nothing other than BOINC & Seti are running on the system. And i'd do as betreger suggested & have just the one Video Card in the system till you get it's problems sorted out. For the 1900X system that is working. Get the Lunatics Beta 6 programme. Exit BOINC. Run the installer (make sure you use the Win64 one). Make sure you select AVX for the CPU and SoG for the Nvidia GPU. Once it is installed restart BOINC and use Task Manager to check that the new applications are running (they will be MB8_win_x86_SSE3_OpenCL_NV_SoG_r3557.exe and MB8_win_x64_AVX_VS2010_r3330.exe). Once they are running, in the C:\ProgramData\BOINC\projects\setiathome.berkeley.edu folder will be a file mb_cmdline_win_x86_SSE3_OpenCL_NV_SoG.txt (it's different from the stock application file name). put the following command line in it and save it (use Notepad only, not Word or Wordpad). -hp -period_iterations_num 1 -high_perf -high_prec_timer -sbs 2048 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -cpu_lock in the same folder put a text file named app_config.xml with the following contents. <app_config> <app> <name>setiathome_v8</name> <gpu_versions> <gpu_usage>1.00</gpu_usage> <cpu_usage>1.00</cpu_usage> </gpu_versions> </app> <app> <name>astropulse_v7</name> <gpu_versions> <gpu_usage>0.5</gpu_usage> <cpu_usage>1.0</cpu_usage> </gpu_versions> </app> </app_config> In the BOINC Manager, select Options, Read config files for the settings to take effect. That should more than double the amount of work the 1900X is presently doing. Grant Darwin NT |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
.... It's easy to see since the 8-core machine finishes a task in 5 to 8 hours while the 16-core machine is up to 12-15 hours.Five to 15 hours??? My AMD6400 is 8 hours. Both of them should be under 1.5 hours for multibeam tasks. Serious setup issues. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13751 Credit: 208,696,464 RAC: 304 |
.... It's easy to see since the 8-core machine finishes a task in 5 to 8 hours while the 16-core machine is up to 12-15 hours.Five to 15 hours??? My AMD6400 is 8 hours. Both of them should be under 1.5 hours for multibeam tasks. Serious setup issues. The 1900 CPU run times should certainly be better, but at least it's GPU run times are OK. And at least those CPU run times are still way better than the 1950X CPU & GPU run times. Grant Darwin NT |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Sigh. Is your Dad the one that likes the Macs? If so, tell him to ditch Windows, buy a couple 2009 Mac Pro 4,1 machines, place two 1080Ti a piece in them, and run the CUDA Special App under Sierra. A 1080Ti running the Special App is good for Almost 100k. That means those two Macs with 2 1080Ti a piece would be in the Top 10 SETI machines very quickly. Probably in the Top 5 at near 200k each. That should make him happy. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
OK, some confusion in the posts. Right now the 16 core TR is reported at having only the expected number of task on board at 400. Other than running stock applications that the project delivered, it is running somewhat OK. Still some gpu tasks are running overlong, likely because the system is overcommitted on cpu tasks and the task isn't getting the required dedicated cpu core to support each gpu task. The cpu tasks are taking much longer than they should. Again, it looks like an overcommitted cpu with too many running cpu tasks. I would use a <project_max_concurrent>16</project_max_concurrent> statement in app_cofig.xml. That would give you 13 cpu tasks and 3 gpu tasks running concurrently. That reserves a couple of cpu cores for general desktop housekeeping and prevents the cpu from being overloaded. Look at the run_time versus cpu_time in the stderr.txt output for a cpu task to try and get equal times. That runs the cpu app most efficiently. You need to use the <avg_ncpus>1</avg_ncpus> <ngpus>1</ngpus> statements in app_config to dedicate the required cpu core to each gpu task. This is all predicated on using the latest Lunatics Installer to get the Anonymous platform installed with the optimized apps running. Now for the 8 core TR system. That system needs to just reset the project to start from scratch and get rid of the +11,000 'ghosts' on the system. Each 'ghost' on a host reserves a space in the server database and contributes to a bloated condition. Our past issues with the project have been database related. Let's help the servers out as much as possible and not contribute to the problem. By resetting the host, you will return the 'ghosts' to the general pool of tasks for delivery to other hosts. Then monitor that host to make sure that it never has more than 400 task in progress. If it starts building more than that, you need to figure out why you are creating 'ghost' tasks. Only after you stop it from creating ghosts would I worry about installing the optimized app from the Lunatics Installer for the Anonymous platform. [edit] Just to clear the confusion: This is the 8 core TR 1900X system Host 8389828 and this is the 16 core TR 1950X system Host 8371071 Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13751 Credit: 208,696,464 RAC: 304 |
You need to use the The app_config.xml I posted does that also. Now for the 8 core TR system. That system needs to just reset the project to start from scratch and get rid of the +11,000 'ghosts' on the system. I wouldn't even do that at this stage, it's just going to create more ghosts. They need to sort out just what is wrong with the system. Low clock speed, overheating? Whatever it is, I suspect that it is what is causing the ghosts, along with the ridiculously long CPU & GPU run times. Then they can sort out the long CPU run times on the system that is presently running (mostly) OK. The one question I asked before, and has yet to be answered is how many memory modules are there for the 16GB of RAM? I suspect it's only 2*8GB modules, not 4*4GB, and that would explain the long run times as they are a quad channel CPU. I remember someone with a multi socket system running with only memory in a couple of banks & the long runtimes it had. Once there was some RAM in at least 1 slot of each bank, the crunch times improved hugely. Grant Darwin NT |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . Exactly, if they have not reserved enough CPU cores for GPU support that would create a bottleneck on the CPU and poor performance of the GPUs ... To keep it simple try going to 'Computer preferences' then 'Processor settings' and where it says 'use at most xx% of processor cores' make sure it is NOT 100%, I suggest trying 75% to see if that makes a difference. Stephen ?? |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
You need to use the Good point, I hadn't noticed the RAM on the 1950X system. You don't HAVE to run Threadripper in quad channel as long as you configure the memory access correctly in the BIOS. After all, Threadripper is made of two Ryzen 1800X dies, and Ryzen only has a dual channel memory controller in it. We need to find out exactly how many memory sticks are installed and in which memory slots for each system. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Jeyl Send message Joined: 22 Sep 17 Posts: 19 Credit: 1,888,105 RAC: 0 |
We need to find out exactly how many memory sticks are installed and in which memory slots for each system. Two 8gig chips in each machine. |
Jeyl Send message Joined: 22 Sep 17 Posts: 19 Credit: 1,888,105 RAC: 0 |
Wait one second! Are these the 1080 video cards that I got in my two machines right now? I thought Macs needed video cards that were specifically built for the Mac in order to function. Is that what you're getting at? |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
We need to find out exactly how many memory sticks are installed and in which memory slots for each system. Read your motherboard manuals for recommended memory slot population and FOLLOW the instructions. Unless you configure your BIOS for NUMA and have a dual ranked memory stick installed for each TR die, you are going to have the issues you have experienced. Best thing you could do would be to buy two more memory sticks for each machine and add two more sticks to each machine in the manufacturer specified memory slots. You have not properly built the machines in their current configuration. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Well, since around 2012, yes 2012, most nVidia cards will work in the Mac Pro by just installing the nVidia Webdriver. According to the posts at MacRumors there shouldn't be any trouble running a 1080Ti in a Mac Pro 4,1. I haven't had any trouble running nVidia GPUs in my Mac Pro 3,1, which is currently ranked #16 running just Two 1060s and a 1050Ti. I wish I could afford to pop in a couple 1080Ti. If you have a standard card it should work, I usually use EVGA , GigaByte, or Powercolor. The only thing you need a card from Apple for is to display the EFI system chooser, which you can easily live without. If you have room, it's best to keep the Apple card installed, or at least available, if you need to boot to the system chooser or update the Webdriver after a Security update. It would be best to stay with Sierra since there are only Security updates now and the CUDA App works well with Sierra. You can look at the thread at MacRumors and check it out, remember, you want at least a Mac Pro 4,1 and Sierra. Do Not try it with High Sierra, Frequently Asked Questions About NVIDIA PC (non-EFI) Graphics Cards BTW, tell your Dad I'm the one that compiled the GPU Apps his Macs are running....there are better ones available. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13751 Credit: 208,696,464 RAC: 304 |
We need to find out exactly how many memory sticks are installed and in which memory slots for each system. I would suggest taking the memory from the 1950X, and put it in the 1900X and see if the CPU times improve- Threadripper is a Quad channel CPU and for the best performance it needs memory in all 4 banks, not just 2 like for most dual channel CPUs. I would also check with HWInfo or similar that the memory timings are correct. Ryzen CPUs are are still very particular about the type of memory they use, and the more modules you use the more particular they are. As Keith said, you also need to check the manual and make sure you are using the right memory slots, and have the BIOS configured correctly for the memory slots you are using. If this results in a boost in CPU processing, then you have that system setup as good as it can be. Then you still need to determine what is wrong with the 1950X to make it (and it's video cards) perform so poorly. As I have asked 3 times previously- what is the CPU clock speed, what is the CPU temperature? And to add to that, what are the memory timings and clock speed (and what are they supposed to be)? EDIT- I just had a look at a manual for a ROG STRIX X399-E GAMING ThreadRipper motherboard. The memory slots you use if there are less modules than the number of slots are not the ones you would think to use. It used to be you used the slot(s) closest to the CPU. Not on this motherboard. Grant Darwin NT |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I see that both systems are now reporting 32GB or RAM. Lets hope that is with at least 4 sticks of RAM installed in the correct slots for each motherboard. The CPU completion times are still out to lunch though, even with the low performance of the stock CPU app. Could we please get some memory clock speeds and also some cpu core clock speeds for each system? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.