Message boards :
Number crunching :
V8 CUDA for Linux?
Message board moderation
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · Next
Author | Message |
---|---|
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Stock Boincapi and the client don't cope extraordinarily well with buffered IO and threading. You can try renicing to a more normal process priority (which may help). Hmmm, that could help. I'm using the <no_priority_change>1</no_priority_change> on my Ubuntu 16.04 machine, and I'm not getting those Errors. I ran the cuda42 App for a while but switched to the "Special" App after seeing how hard the credits are falling again. tazzduke is getting that Error on both of his Ubuntu 16.04 machines. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Stock Boincapi and the client don't cope extraordinarily well with buffered IO and threading. You can try renicing to a more normal process priority (which may help). OK, after this Cuda 8 testing is done, will do a little research to see what might have changed in the Linux Kernel and/or C Libraries. My money's on that it's the threading/IO, and that could mean a lot of problems on the horizon for Boinc. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Mr. Kevvy Send message Joined: 15 May 99 Posts: 3799 Credit: 1,114,826,392 RAC: 3,319 |
Yes, it looks as though Modprobe is back in the repository with 16.04, as we know, it's Not in the Mint repository. It just appeared for me last night (Mint 17.3); packages are cuda-drivers, nvidia-graphics-drivers-352, nvidia-modprobe and nvidia-settings in the Mint Update Manager all 352.93. I installed them on this machine and no known issues thus far. I'll probably do the rest whenever I have my next scheduled "patch Tuesday" clean day. |
tazzduke Send message Joined: 15 Sep 07 Posts: 190 Credit: 28,269,068 RAC: 5 |
Greetings All Thanks for the heads up TBAR, just went and had a look, yep, got a error still awaiting upload, but as soon as I get one BOINC locks up, yep went and checked the other machine and it did the same. Here comes a reboot. MINT 17.3 - NVIDIA 361.42 - KERNEL 4.4.0 - BOINC 7.6.2. Only started getting them today. Regards |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Yes, it looks as though Modprobe is back in the repository with 16.04, as we know, it's Not in the Mint repository. Last night? Hmmm, I don't see a 352.93 at nVidia, I suppose this is a Repository 'special'. You think they are reading this thread? Is the driver in the Package Manager? If it is, is it listed as something other than 0ubuntu0.14.04.1? @tazzduke 7.6.2? That's a new version too. I'm using the old 7.4.25 from Berkeley that only updates the top two tasks, to update anything else you have to click on it or change the tabs. I haven't had any lock-ups though. You're running Mint with the 4.4 kernel? That could be the problem. |
tazzduke Send message Joined: 15 Sep 07 Posts: 190 Credit: 28,269,068 RAC: 5 |
Greets Yeah I got that through the PPA from loctusborg, haven't had one for awhile now though. I think I mentioned it in a previous post. Regards |
tazzduke Send message Joined: 15 Sep 07 Posts: 190 Credit: 28,269,068 RAC: 5 |
Hey that could be it, but its the same kernel version that ships with Ubuntu 16.04 LTS PS. How do you edit your previous posts ??? |
tazzduke Send message Joined: 15 Sep 07 Posts: 190 Credit: 28,269,068 RAC: 5 |
Hey that could be it, but its the same kernel version that ships with Ubuntu 16.04 LTS PSS. Going to wait for my queue of SETI tasks to empty before I do anything at the moment. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
There's an Edit button at the top of your posts. You have an hour to edit each post. You could try the <no_priority_change>1</no_priority_change> option in your cc_config.xml file. It will result in all the BOINC tasks running at nice level Zero. I leave at least one core free and I haven't noticed any machine lag. <cc_config> <log_flags> <sched_op_debug>1</sched_op_debug> </log_flags> <options> <dont_contact_ref_site>1</dont_contact_ref_site> <use_all_gpus>1</use_all_gpus> <save_stats_days>365</save_stats_days> <no_priority_change>1</no_priority_change> </options> </cc_config> |
tazzduke Send message Joined: 15 Sep 07 Posts: 190 Credit: 28,269,068 RAC: 5 |
Well there you do go, staring me straight it was lol. I will give that a go, but then I havent had a bad wu or lockup for an hour now. Thankyou |
tazzduke Send message Joined: 15 Sep 07 Posts: 190 Credit: 28,269,068 RAC: 5 |
Greetings All Nil lockups or errors since yesterday afternoon. Regards |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Greetings All Was that with the no_priority_change flag set ? or just left as it was ? "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
tazzduke Send message Joined: 15 Sep 07 Posts: 190 Credit: 28,269,068 RAC: 5 |
Morning All Well yes it was, but then I let the cache run dry and havent been able to get tasks for awhile to do some more testing on both NVIDIA/MINT 17.3 computers. I hope to give more feed back during the course off the day, have got work now. Have enabled the following flags as well <cpu_sched_debug>: problems involving the choice of applications to run. <work_fetch_debug>: problems involving work fetch (which projects are asked for work, and how much). <rr_simulation>: problems involving jobs being run in high-priority mode. <sched_op_debug>: problems involving scheduler operations and other low level information. Talk to you soon Mark |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Skimming 4.x Kernel changelogs, looks like multiple queue block layers (software and hardware level) were added to better support NVM(e) storage and scaling to multiple cores (spreading IO/driver load). That means functionally the same race conditions now exist as on Windows (with different semantics/buffer-structure/delays), as the symptoms and apparent workaround suggested. Most likely different applications running at reduced priority, will trigger this particular finished-file-present-too-long issue, depending on system contention and different kinds of devices. Down the road before recommending Boinc/api fixes, will probably have to attempt to replicate here using a newer Kernel and/or devices. In the meantime I'd suggest the no_priority_change workaround should help in the majority of cases where it occurs on Linux. Probably a more complete fix to allow running at idle priorities (as intended) will involve more than one of the following: - restoring normal priority for exit [very brief] - writing the finish file last (instead of before boinc diagnostics shutdown) [not a complete fix, but minimises risk] - making the client a bit less fussy about finished files. [As per Juha's suggestions, probably should care less about them, if at all, e.g. for slot management] So seems to me we have some technological convergence going on between Windows and Linux :) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
tazzduke Send message Joined: 15 Sep 07 Posts: 190 Credit: 28,269,068 RAC: 5 |
Afternoon Okay I have had no further error from the 2 MINT 17.3 machines. Another thing I have noted is that the virtual memory size for some of these units are 20gb, any clues as to why is that, compared to the Windows version running 117.6mb Regards |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Afternoon I'd say the virtual size just represents the address space visible to the process, rather than any physical figure (certainly there are no allocations anything near that large). Most likely the Windows figure represents committed memory. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
tazzduke Send message Joined: 15 Sep 07 Posts: 190 Credit: 28,269,068 RAC: 5 |
Oh okay understand now. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Oh okay understand now. looking around at more changes, it looks like the meaning of that may have changed not all that long ago as well, as a lot seems to be pointing to virtualisation and the memory management as having had major overhauls. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Al Send message Joined: 3 Apr 99 Posts: 1682 Credit: 477,343,364 RAC: 482 |
...So seems to me we have some technological convergence going on between Windows and Linux :) So Jason, in your opinion, is this a good thing or a bad thing? Are they 'dumbing down' (for lack of a better way of describing it) Linux which might reduce it's performance and implied (to me) advantages over Windows? Or is it good in that it allows programmers to work with more of a single skill set? Or something completely different? |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I believe it's a matter of no money, no time. Compiling it yourself isn't hard.... This doesn't appear to be entirely correct. A long time ago it may have been. In fact a couple years ago I did actually compile a version of BOINC that worked in Ubuntu 10.4. Currently though, one can spend hours installing the needed packages only to build BOINC Apps that refuse to open any windows. Everything builds without any errors using the same two boinc-masters I used to build All the Apps I've ever built. One is 7.5, the other is 7.7, and I even tried a source package for 7.6.12 I found on some PPA website. Everything would appear to run normally in the terminal right up to the end when it would give two errors about video, or graphics, or something obviously dealing with why the window wasn't opening. I didn't write it down because I wasn't expecting the errors to just disappear shortly thereafter. I think the second error was 180...or maybe not. After a few tries it decided not to give any errors, just run normally without opening the window. I tried it in Ubuntu 14.04, 16.04, and 12.04, same in all. Now it will just start and give a print as follows; tbar@TBar-iSETI:~/BOINC$ ./boincmgr If you try it in a copy of a BOINC folder with actual tasks it will start the tasks and run them, just No window, you have to watch it in top. And yes, All the video packages are installed and the compiler completes without any errors, just No GUI. The configure lines were very simple, ./configure --disable-server --enable static --enable-manager and variations with the same 3, sometimes just configure. Anyone have any ideas? |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.