V8 CUDA for Linux?

Message boards : Number crunching : V8 CUDA for Linux?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · Next

AuthorMessage
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1783432 - Posted: 29 Apr 2016, 8:56:37 UTC - in response to Message 1783430.  

Stock Boincapi and the client don't cope extraordinarily well with buffered IO and threading. You can try renicing to a more normal process priority (which may help).

Hmmm, that could help. I'm using the <no_priority_change>1</no_priority_change> on my Ubuntu 16.04 machine, and I'm not getting those Errors. I ran the cuda42 App for a while but switched to the "Special" App after seeing how hard the credits are falling again. tazzduke is getting that Error on both of his Ubuntu 16.04 machines.
ID: 1783432 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1783433 - Posted: 29 Apr 2016, 9:01:15 UTC - in response to Message 1783432.  

Stock Boincapi and the client don't cope extraordinarily well with buffered IO and threading. You can try renicing to a more normal process priority (which may help).

Hmmm, that could help. I'm using the <no_priority_change>1</no_priority_change> on my Ubuntu 16.04 machine, and I'm not getting those Errors. I ran the cuda42 App for a while but switched to the "Special" App after seeing how hard the credits are falling again. tazzduke is getting that Error on both of his Ubuntu 16.04 machines.


OK, after this Cuda 8 testing is done, will do a little research to see what might have changed in the Linux Kernel and/or C Libraries. My money's on that it's the threading/IO, and that could mean a lot of problems on the horizon for Boinc.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1783433 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 1783434 - Posted: 29 Apr 2016, 9:29:05 UTC - in response to Message 1783183.  
Last modified: 29 Apr 2016, 9:30:29 UTC

Yes, it looks as though Modprobe is back in the repository with 16.04, as we know, it's Not in the Mint repository.


It just appeared for me last night (Mint 17.3); packages are cuda-drivers, nvidia-graphics-drivers-352, nvidia-modprobe and nvidia-settings in the Mint Update Manager all 352.93. I installed them on this machine and no known issues thus far. I'll probably do the rest whenever I have my next scheduled "patch Tuesday" clean day.
ID: 1783434 · Report as offensive
Profile tazzduke
Volunteer tester

Send message
Joined: 15 Sep 07
Posts: 190
Credit: 28,269,068
RAC: 5
Australia
Message 1783438 - Posted: 29 Apr 2016, 10:04:28 UTC

Greetings All

Thanks for the heads up TBAR, just went and had a look, yep, got a error still awaiting upload, but as soon as I get one BOINC locks up, yep went and checked the other machine and it did the same. Here comes a reboot.

MINT 17.3 - NVIDIA 361.42 - KERNEL 4.4.0 - BOINC 7.6.2. Only started getting them today.

Regards
ID: 1783438 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1783441 - Posted: 29 Apr 2016, 10:24:40 UTC - in response to Message 1783434.  
Last modified: 29 Apr 2016, 10:46:38 UTC

Yes, it looks as though Modprobe is back in the repository with 16.04, as we know, it's Not in the Mint repository.


It just appeared for me last night (Mint 17.3); packages are cuda-drivers, nvidia-graphics-drivers-352, nvidia-modprobe and nvidia-settings in the Mint Update Manager all 352.93. I installed them on this machine and no known issues thus far. I'll probably do the rest whenever I have my next scheduled "patch Tuesday" clean day.

Last night? Hmmm, I don't see a 352.93 at nVidia, I suppose this is a Repository 'special'. You think they are reading this thread? Is the driver in the Package Manager? If it is, is it listed as something other than 0ubuntu0.14.04.1?

@tazzduke 7.6.2? That's a new version too. I'm using the old 7.4.25 from Berkeley that only updates the top two tasks, to update anything else you have to click on it or change the tabs. I haven't had any lock-ups though.
You're running Mint with the 4.4 kernel? That could be the problem.
ID: 1783441 · Report as offensive
Profile tazzduke
Volunteer tester

Send message
Joined: 15 Sep 07
Posts: 190
Credit: 28,269,068
RAC: 5
Australia
Message 1783443 - Posted: 29 Apr 2016, 10:48:04 UTC

Greets

Yeah I got that through the PPA from loctusborg, haven't had one for awhile now though. I think I mentioned it in a previous post.

Regards
ID: 1783443 · Report as offensive
Profile tazzduke
Volunteer tester

Send message
Joined: 15 Sep 07
Posts: 190
Credit: 28,269,068
RAC: 5
Australia
Message 1783444 - Posted: 29 Apr 2016, 10:50:14 UTC

Hey that could be it, but its the same kernel version that ships with Ubuntu 16.04 LTS

PS. How do you edit your previous posts ???
ID: 1783444 · Report as offensive
Profile tazzduke
Volunteer tester

Send message
Joined: 15 Sep 07
Posts: 190
Credit: 28,269,068
RAC: 5
Australia
Message 1783447 - Posted: 29 Apr 2016, 10:56:43 UTC - in response to Message 1783444.  

Hey that could be it, but its the same kernel version that ships with Ubuntu 16.04 LTS

PS. How do you edit your previous posts ???




PSS. Going to wait for my queue of SETI tasks to empty before I do anything at the moment.
ID: 1783447 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1783451 - Posted: 29 Apr 2016, 11:12:37 UTC - in response to Message 1783447.  

There's an Edit button at the top of your posts. You have an hour to edit each post.

You could try the <no_priority_change>1</no_priority_change> option in your cc_config.xml file. It will result in all the BOINC tasks running at nice level Zero. I leave at least one core free and I haven't noticed any machine lag.

<cc_config>
 <log_flags>
  <sched_op_debug>1</sched_op_debug>
 </log_flags>
 <options>
  <dont_contact_ref_site>1</dont_contact_ref_site>
  <use_all_gpus>1</use_all_gpus>
  <save_stats_days>365</save_stats_days>
  <no_priority_change>1</no_priority_change>
 </options>
</cc_config>
ID: 1783451 · Report as offensive
Profile tazzduke
Volunteer tester

Send message
Joined: 15 Sep 07
Posts: 190
Credit: 28,269,068
RAC: 5
Australia
Message 1783452 - Posted: 29 Apr 2016, 11:21:18 UTC

Well there you do go, staring me straight it was lol. I will give that a go, but then I havent had a bad wu or lockup for an hour now.

Thankyou
ID: 1783452 · Report as offensive
Profile tazzduke
Volunteer tester

Send message
Joined: 15 Sep 07
Posts: 190
Credit: 28,269,068
RAC: 5
Australia
Message 1783606 - Posted: 29 Apr 2016, 23:38:57 UTC

Greetings All

Nil lockups or errors since yesterday afternoon.

Regards
ID: 1783606 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1783608 - Posted: 29 Apr 2016, 23:51:58 UTC - in response to Message 1783606.  

Greetings All

Nil lockups or errors since yesterday afternoon.

Regards


Was that with the no_priority_change flag set ? or just left as it was ?
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1783608 · Report as offensive
Profile tazzduke
Volunteer tester

Send message
Joined: 15 Sep 07
Posts: 190
Credit: 28,269,068
RAC: 5
Australia
Message 1783632 - Posted: 30 Apr 2016, 2:36:51 UTC
Last modified: 30 Apr 2016, 3:24:51 UTC

Morning All

Well yes it was, but then I let the cache run dry and havent been able to get tasks for awhile to do some more testing on both NVIDIA/MINT 17.3 computers.

I hope to give more feed back during the course off the day, have got work now.

Have enabled the following flags as well

<cpu_sched_debug>: problems involving the choice of applications to run.
<work_fetch_debug>: problems involving work fetch (which projects are asked for work, and how much).
<rr_simulation>: problems involving jobs being run in high-priority mode.
<sched_op_debug>: problems involving scheduler operations and other low level information.

Talk to you soon
Mark
ID: 1783632 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1783642 - Posted: 30 Apr 2016, 4:37:35 UTC
Last modified: 30 Apr 2016, 4:41:46 UTC

Skimming 4.x Kernel changelogs, looks like multiple queue block layers (software and hardware level) were added to better support NVM(e) storage and scaling to multiple cores (spreading IO/driver load). That means functionally the same race conditions now exist as on Windows (with different semantics/buffer-structure/delays), as the symptoms and apparent workaround suggested.

Most likely different applications running at reduced priority, will trigger this particular finished-file-present-too-long issue, depending on system contention and different kinds of devices.

Down the road before recommending Boinc/api fixes, will probably have to attempt to replicate here using a newer Kernel and/or devices.
In the meantime I'd suggest the no_priority_change workaround should help in the majority of cases where it occurs on Linux. Probably a more complete fix to allow running at idle priorities (as intended) will involve more than one of the following:

- restoring normal priority for exit [very brief]
- writing the finish file last (instead of before boinc diagnostics shutdown) [not a complete fix, but minimises risk]
- making the client a bit less fussy about finished files.
[As per Juha's suggestions, probably should care less about them, if at all, e.g. for slot management]

So seems to me we have some technological convergence going on between Windows and Linux :)
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1783642 · Report as offensive
Profile tazzduke
Volunteer tester

Send message
Joined: 15 Sep 07
Posts: 190
Credit: 28,269,068
RAC: 5
Australia
Message 1783650 - Posted: 30 Apr 2016, 5:38:31 UTC

Afternoon

Okay I have had no further error from the 2 MINT 17.3 machines.

Another thing I have noted is that the virtual memory size for some of these units are 20gb, any clues as to why is that, compared to the Windows version running 117.6mb

Regards
ID: 1783650 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1783655 - Posted: 30 Apr 2016, 5:54:15 UTC - in response to Message 1783650.  

Afternoon

Okay I have had no further error from the 2 MINT 17.3 machines.

Another thing I have noted is that the virtual memory size for some of these units are 20gb, any clues as to why is that, compared to the Windows version running 117.6mb

Regards


I'd say the virtual size just represents the address space visible to the process, rather than any physical figure (certainly there are no allocations anything near that large). Most likely the Windows figure represents committed memory.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1783655 · Report as offensive
Profile tazzduke
Volunteer tester

Send message
Joined: 15 Sep 07
Posts: 190
Credit: 28,269,068
RAC: 5
Australia
Message 1783662 - Posted: 30 Apr 2016, 6:24:38 UTC - in response to Message 1783655.  

Oh okay understand now.
ID: 1783662 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1783674 - Posted: 30 Apr 2016, 7:10:05 UTC - in response to Message 1783662.  
Last modified: 30 Apr 2016, 7:10:29 UTC

Oh okay understand now.


looking around at more changes, it looks like the meaning of that may have changed not all that long ago as well, as a lot seems to be pointing to virtualisation and the memory management as having had major overhauls.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1783674 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1783771 - Posted: 30 Apr 2016, 15:10:03 UTC - in response to Message 1783642.  

...So seems to me we have some technological convergence going on between Windows and Linux :)

So Jason, in your opinion, is this a good thing or a bad thing? Are they 'dumbing down' (for lack of a better way of describing it) Linux which might reduce it's performance and implied (to me) advantages over Windows? Or is it good in that it allows programmers to work with more of a single skill set? Or something completely different?

ID: 1783771 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1783778 - Posted: 30 Apr 2016, 15:18:36 UTC - in response to Message 1781787.  

I believe it's a matter of no money, no time. Compiling it yourself isn't hard....
~Juha

This doesn't appear to be entirely correct. A long time ago it may have been. In fact a couple years ago I did actually compile a version of BOINC that worked in Ubuntu 10.4. Currently though, one can spend hours installing the needed packages only to build BOINC Apps that refuse to open any windows. Everything builds without any errors using the same two boinc-masters I used to build All the Apps I've ever built. One is 7.5, the other is 7.7, and I even tried a source package for 7.6.12 I found on some PPA website. Everything would appear to run normally in the terminal right up to the end when it would give two errors about video, or graphics, or something obviously dealing with why the window wasn't opening. I didn't write it down because I wasn't expecting the errors to just disappear shortly thereafter. I think the second error was 180...or maybe not.

After a few tries it decided not to give any errors, just run normally without opening the window. I tried it in Ubuntu 14.04, 16.04, and 12.04, same in all. Now it will just start and give a print as follows;
tbar@TBar-iSETI:~/BOINC$ ./boincmgr
30-Apr-2016 09:57:40 [---] cc_config.xml not found - using defaults
30-Apr-2016 09:57:40 [---] Starting BOINC client version 7.6.12 for x86_64-pc-linux-gnu
30-Apr-2016 09:57:40 [---] log flags: file_xfer, sched_ops, task
30-Apr-2016 09:57:40 [---] Libraries: libcurl/7.35.0 OpenSSL/1.0.1f zlib/1.2.8 libidn/1.28 librtmp/2.3
30-Apr-2016 09:57:40 [---] Data directory: /home/tbar/BOINC
30-Apr-2016 09:57:42 [---] CAL: ATI GPU 0: AMD Radeon HD 6900 series (Cayman) (CAL version 1.4.1848, 2048MB, 1892MB available, 6758 GFLOPS peak)
30-Apr-2016 09:57:42 [---] CAL: ATI GPU 1: AMD Radeon HD 6790/6850/6870 series (Barts) (CAL version 1.4.1848, 1024MB, 1004MB available, 3942 GFLOPS peak)
30-Apr-2016 09:57:42 [---] OpenCL: AMD/ATI GPU 0: AMD Radeon HD 6900 series (Cayman) (driver version 1526.3 (VM), device version OpenCL 1.2 AMD-APP (1526.3), 2048MB, 1892MB available, 6758 GFLOPS peak)
30-Apr-2016 09:57:42 [---] OpenCL: AMD/ATI GPU 1: AMD Radeon HD 6790/6850/6870 series (Barts) (driver version 1526.3, device version OpenCL 1.2 AMD-APP (1526.3), 1024MB, 1004MB available, 3942 GFLOPS peak)
30-Apr-2016 09:57:42 [---] OpenCL CPU: Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz (OpenCL driver vendor: Advanced Micro Devices, Inc., driver version 1526.3 (sse2), device version OpenCL 1.2 AMD-APP (1526.3))
30-Apr-2016 09:57:42 [---] Creating new client state file
30-Apr-2016 09:57:42 [---] Host name: TBar-iSETI
30-Apr-2016 09:57:42 [---] Processor: 4 GenuineIntel Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz [Family 6 Model 23 Stepping 10]
30-Apr-2016 09:57:42 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm dtherm tpr_shadow vnmi flexpriority
30-Apr-2016 09:57:42 [---] OS: Linux: 3.13.0-77-generic
30-Apr-2016 09:57:42 [---] Memory: 1.95 GB physical, 0 bytes virtual
30-Apr-2016 09:57:42 [---] Disk: 97.92 GB total, 81.17 GB free
30-Apr-2016 09:57:42 [---] Local time is UTC -4 hours
30-Apr-2016 09:57:42 [---] No general preferences found - using defaults
30-Apr-2016 09:57:42 [---] Preferences:
30-Apr-2016 09:57:42 [---] max memory usage when active: 1000.73MB
30-Apr-2016 09:57:42 [---] max memory usage when idle: 1801.31MB
30-Apr-2016 09:57:42 [---] max disk usage: 81.10GB
30-Apr-2016 09:57:42 [---] don't use GPU while active
30-Apr-2016 09:57:42 [---] suspend work if non-BOINC CPU load exceeds 25%
30-Apr-2016 09:57:42 [---] (to change preferences, visit a project web site or select Preferences in the Manager)
30-Apr-2016 09:57:42 [---] This computer is not attached to any projects
30-Apr-2016 09:57:42 [---] Visit http://boinc.berkeley.edu for instructions
30-Apr-2016 09:57:42 Initialization completed
30-Apr-2016 09:57:42 [---] Suspending GPU computation - computer is in use

If you try it in a copy of a BOINC folder with actual tasks it will start the tasks and run them, just No window, you have to watch it in top. And yes, All the video packages are installed and the compiler completes without any errors, just No GUI. The configure lines were very simple, ./configure --disable-server --enable static --enable-manager and variations with the same 3, sometimes just configure.
Anyone have any ideas?
ID: 1783778 · Report as offensive
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · Next

Message boards : Number crunching : V8 CUDA for Linux?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.