Message boards :
Number crunching :
Setting up Linux to crunch CUDA90 and above for Windows users
Message board moderation
Previous · 1 . . . 110 · 111 · 112 · 113 · 114 · 115 · 116 . . . 162 · Next
Author | Message |
---|---|
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Haha, I was about to buy a 1060...but then I forgot the mobo only supported pcie 2.0. This was one of the "better" cards I could find at microcenter. https://www.msi.com/Graphics-card/GT-710-2GD3H-LP.html Nice and cheap...and thankfully they still had some variety to choose from that worked with pcie2.0! Only bad thing...this one is passive. I made sure to upgrade the case fans. Hopefully that's all I need to do. Will find out when the temps raise due to GPU tasks. . . The bad news is that the GTX1060 like most of the nVidia cards is backwards compatible and will run OK on the PCIe gen 2.0 mobo, but only as gen 2.0 obviously :) Stephen :) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Yes, the beta r3602 app doesn't seem much faster than the stock r3584 8.22 Linux SoG app. I have a hunch the zi3v app will be much faster even if you are forced to run with unroll 1 or 2 because of only 1GB of memory. This is what TBar wrote in his README_x41p_zi3v.txt file in the /DOCS directory of the archive. Seems to say you can get away with unroll 6 for a 2GB card. Though he does mention "testing is required" So it might be best to start with removing the unroll first and let it use the default autotune, then proceed from there with an unroll override of 2 - 6 and see what works. For best use; Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I wouldn't worry about running a gpu in PCI Gen 2.0 mobos just like I don't worry about how many lanes the gpu gets. I have cards running in X16, X8 and X4 slots and it makes nary a difference in compute times. Our Seti tasks just don't use or need much bandwidth. If you are actually going to use a card for video, like gaming, then it does make a difference. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Loren Datlof Send message Joined: 24 Jan 14 Posts: 73 Credit: 19,652,385 RAC: 0 |
I setup my host https://setiathome.berkeley.edu/show_host_detail.php?hostid=8702456 which has a GT 720 (1GB) and a GT 730 (2GB) to run the x41p_zi3v special app with the following results: The 720 gets computation errors while the 730 works fine. So now I am running just one GPU. When the 720 was erroring out I set unroll to 1 and I also tried it without an unroll command. I am not sure why I was getting computational errors on the 720. The x41p_zi3v app seems to have cut off about a half hour of computational time versus the Beta SOG app (45 minutes versus 75 minutes). This is a very small sample size and the blc63 WUs seem to take an hour and a half. Any suggestions on how to get the 720 to work? Is it possible to have the 720 running the Beta SOG app while the 730 runs the x41p_zi3v app? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
I am not sure why I was getting computational errors on the 720.The error message is Cuda error 'cudaMalloc((void**) &dev_tmp_potP2' in file 'cuda/cudaAcceleration.cu' in line 644 : out of memory. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I setup my host https://setiathome.berkeley.edu/show_host_detail.php?hostid=8702456 which has a GT 720 (1GB) and a GT 730 (2GB) to run the x41p_zi3v special app with the following results: The 720 gets computation errors while the 730 works fine. So now I am running just one GPU. Thanks for the update Loren. Happy to see some results from that app finally. The reason why the 720 always gets errors is printed right in each stderr.txt. Cuda error 'cudaMalloc((void**) &dev_tmp_potP2' in file 'cuda/cudaAcceleration.cu' in line 644 : out of memory. So the app cannot work in just 1 GB of video memory but requires 2GB. Looks like you are not using the -nobs parameter. If you reduced your cpu usage to free up a cpu core, you could try the parameter and see if it makes any difference. It should . . . . . by how much . . . . who knows? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
See that ThePHX264 's host has completed gpu tasks also. He used the cmdline I originally put into the app_info of -unroll 6 and -nobs with no issues it seems. He too had to crunch the long running BLC63, 53 and 43 tasks. Curious how the app would handle the shorter tasks we see regularly like the BLC34 series or some VHAR Arecibo tasks. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Loren Datlof Send message Joined: 24 Jan 14 Posts: 73 Credit: 19,652,385 RAC: 0 |
I setup my host https://setiathome.berkeley.edu/show_host_detail.php?hostid=8702456 which has a GT 720 (1GB) and a GT 730 (2GB) to run the x41p_zi3v special app with the following results: The 720 gets computation errors while the 730 works fine. So now I am running just one GPU. Thanks Keith and Richard for your responses. I am using the -nobs parameter. When I run top the CPU usage is 100%. Here is the relevant part of my app_info.xml file. I changed to -unroll 6 when just using the 730. <app> <name>setiathome_v8</name> </app> <file_info> <name>setiathome_x41p_zi3v_x86_64-pc-linux-gnu_cuda60</name> <executable/> </file_info> <file_info> <name>libcudart.so.6.0</name> </file_info> <file_info> <name>libcufft.so.6.0</name> </file_info> <app_version> <app_name>setiathome_v8</app_name> <platform>x86_64-pc-linux-gnu</platform> <version_num>802</version_num> <plan_class>cuda60</plan_class> <cmdline>-nobs</cmdline> <coproc> <type>NVIDIA</type> <count>1</count> </coproc> <avg_ncpus>0.1</avg_ncpus> <max_ncpus>0.1</max_ncpus> <file_ref> <file_name>setiathome_x41p_zi3v_x86_64-pc-linux-gnu_cuda60</file_name> <main_program/> </file_ref> <file_ref> <file_name>libcudart.so.6.0</file_name> </file_ref> <file_ref> <file_name>libcufft.so.6.0</file_name> </file_ref> </app_version> I noticed that CPU runs the blc63 WUs faster than the GPUs so I am probably better off running three cores of the i5. I have been able to find the GTX 1050 Ti for around $50 -$60 on eBay and craigslist. But for this computer I need the mini (low profile version) as this is a SFF computer and the one I use daily. Right now they are going for twice that price on eBay and none are to be found on craigslist. The hunt continues... |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Yes, you need to reserve a cpu core for the gpu task if you use the -nobs parameter. If you don't the gpu task will get starved for time slices and the crunch time will get extended. You could either simply bump up the cpu resource allocation in your app_info to <avg_ncpus>1.0</avg_ncpus> <max_ncpus>1.0</max_ncpus> in your gpu section or in an app_config and you can reduce your cpu usage to 75% in Local Preferences to only use 3 of your 4 cpu cores. I saw that for some task types the cpu outperforms the gpu. But I don't know if that was before the addition of the -nobs or after. If you are running at 100% cpu usage, both task types are going to struggle to get enough time slices. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Loren Datlof Send message Joined: 24 Jan 14 Posts: 73 Credit: 19,652,385 RAC: 0 |
Yes, you need to reserve a cpu core for the gpu task if you use the -nobs parameter. If you don't the gpu task will get starved for time slices and the crunch time will get extended. You could either simply bump up the cpu resource allocation in your app_info to <avg_ncpus>1.0</avg_ncpus> <max_ncpus>1.0</max_ncpus> in your gpu section or in an app_config and you can reduce your cpu usage to 75% in Local Preferences to only use 3 of your 4 cpu cores. I have set the computing preferences in the boinc manager to use at most 50% of the CPUs. This results in the following CPU core usage: Seti GPU app = 100% Seti CPU app = 100% Einstein GPU app = 100% The remaining CPU is at less than 5%. I think I am OK and am not starving any of the apps. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I'm still seeing the host with both the 720 and 730 cards installed in your latest reported gpu tasks. I thought you said you pulled the 720 so it wouldn't error out anymore gpu tasks. There is a way to exclude the 720 in cc_config from Seti so it won't be used. That way you could use it for other projects or maybe use the 720 to drive the monitor and have the 730 just crunch. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Loren Datlof Send message Joined: 24 Jan 14 Posts: 73 Credit: 19,652,385 RAC: 0 |
I am using the 720 card for Einstein now. Previously I set the <use_all_gpus> to 0 and it stopped using the 720 so there was no need to pull it. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Thanks Keith and Richard for your responses. . . Hi, like you my machine in this case is an HP SFF 8000 elite so I had difficulty sourcing cards for it. I found only 2 manufacturers supporting this format in GTX1050ti cards, MSI and Gigabyte and I went with MSi both for the GT730 and the GTX1050ti that replaced it. The times I quoted previously for tasks were for the then plentiful 'normal' Arecibo WUs. I used them as benchmarks to compare apples with apples. At the time the 'old' Blc04 tasks were taking about 10 mins longer, ie around 37 to 38 mins and I would expect the typical examples of the current 'new' GBT formats to take no longer except for the occassional super long runners. BUT, and I probably should have mentioned this in the other message, my GT730 is a "smarter than the average bear" version. It is factory clocked to 1006MHz instead of the Nvidia norm of 902MHz plus it has 2GB of GDDR5 ram not the average DDR3 ram. Overall this gave it about a 20 to 30 % speed advantage over its more typical cousins. But at the time (and we are talking almost 2 years ago) more run of the mill 730s were taking about 45 to 50 mins per task. Based on this I would have expected you to get run times of no more than that. . . My HP is only a Core2 Duo so I was running with NO CPU crunching and -nobs set. I had -unroll set to 2 and that was about it. In your case that would translate to 'Number of CPUS' equals 50% and turn -unroll back to 2. Before you abandon GPU crunching you might try those settings to see if it does any good for you. It would be interesting to find out. . . For reference that machine now with the 1050ti and still using those settings on Cuda90 is here ... https://setiathome.berkeley.edu/show_host_detail.php?hostid=8222433 Stephen ? ? |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
You can also just exclude the GT 720 from Seti with a gpu_exclude statement in cc_config.xml. You would use whatever device number BOINC labels the GT 720 in the Event Log at startup to identify the device. <exclude_gpu> <url>http://setiathome.berkeley.edu/</url> <device_num>1</device_num> <type>NVIDIA</type> </exclude_gpu> Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Loren Datlof Send message Joined: 24 Jan 14 Posts: 73 Credit: 19,652,385 RAC: 0 |
You can also just exclude the GT 720 from Seti with a gpu_exclude statement in cc_config.xml. You would use whatever device number BOINC labels the GT 720 in the Event Log at startup to identify the device. That is what I have done. |
Loren Datlof Send message Joined: 24 Jan 14 Posts: 73 Credit: 19,652,385 RAC: 0 |
Thanks Keith and Richard for your responses. Hi, my computer is an HP 8200 Eilite SFF with an i5 CPU. Right now I have unroll set to 6. It is hard to compare how well the GPU is running because of all the different WUs we are getting lately. I will have to give it some time and see how things shake out. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . My HP is only a Core2 Duo so I was running with NO CPU crunching and -nobs set. I had -unroll set to 2 and that was about it. In your case that would translate to 'Number of CPUS' equals 50% and turn -unroll back to 2. Before you abandon GPU crunching you might try those settings to see if it does any good for you. It would be interesting to find out. . . Yep, the number of different tape series and the variability within tape series makes it hard to establish a 'norm' at the moment. But please, indulge me, try setting -unroll to 2 and see if there is any change/improvement in the card's performance. Just to answer the question. :) Stephen ? ? |
Loren Datlof Send message Joined: 24 Jan 14 Posts: 73 Credit: 19,652,385 RAC: 0 |
You have been indulged. -unroll has been set to 2. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Very few examples so far with unroll = 2. But what I have observed in the limited data set is there is no difference between unroll = 6 and unroll = 2. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
elec999 Send message Joined: 24 Nov 02 Posts: 375 Credit: 416,969,548 RAC: 141 |
What kind of results you guys getting from the 1050ti. I saw these cards on eBay for $28.99. Too good to be real? |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.