Message boards :
Number crunching :
Open Beta test: SoG for NVidia, Lunatics v0.45 - Beta6 (RC again)
Message board moderation
Previous · 1 . . . 17 · 18 · 19 · 20 · 21 · 22 · 23 . . . 32 · Next
Author | Message |
---|---|
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . I'm glad that you are conservative. That delay you refer to is the lag between you completing a WU and your wingman doing the same. Many jobs are not validated until after much longer than 30 hours but most do not take so long. The fact that you have 43 validated tasks since the last invalid indicates a high validation rate within 24 hours and I feel that you have at least reduced the numbers of invalids. . . A HP Core(2) Quad with a 450W PSU and an i7 with only a 300W PSU, that is the opposite of what I had imagined. And no, a 300W PSU is probably not sufficient for the GTX560, more's the pity, it would have been a good way to prove the issue to either the GPU itself of the HP rig it is in. Not sure where to go from here, it seems it would only be going over what has already been covered. Stephen . . Maybe one thing, you mentioned the adapters, I take it you are using two adapters getting power from 4 mollex connectors to feed the two six pin sockets on the GPU. Have you checked that they are not running hot? I have read about an issue where a bad connection in such an adapter caused it to overheat and melt the plastic. A bit scary. The reason you need two Mollex per 6pin is that the PCIe is rated at a higher current than the Mollex is. If you are using single Mollex to 6pin then resistance may be causing a voltage drop at that amperage. Just trying to think of anything that might cause problems. Stephen . |
robertmiles Send message Joined: 16 Jan 12 Posts: 213 Credit: 4,117,756 RAC: 6 |
[snip] The adapters are supplied from two separate Molexes. They are not melting the plastic. I'll check for overheating tomorrow. I'm using more than two adapters - the type needed to do it with fewer was not available. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . Hi Jim, . . OK I have the picture on your card setup. Personally I am too timid to use risers to access the other slots. But I was actually referring to external PCIe PSU connectors, the PSU in my 8000 has none, I was wondering how many you have. And I suspect the 750ti's crunching at full steam might be using 80% of TDP or a little more (my GTX950 is running at 85% TDP). So 3 x 60W at 80% is about 150W, considering PSUs are good for about 80% of their nominal value under sustained loads that mean 325W @ 80% less 150W for the 750ti's leaves just over 100W to run the C2Q and peripherals, I'd call that pushing the limits a bit :). My Pent-D rig with the 2 970s has a 650W PSU and is running at 450W +/- 30W crunching, it runs fairly warm and I would not want to push it very much harder. But I like being conservative. Stephen . . . |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . Now I am confused, not sure what type of adapters you are using. But the main thing is they are there and not running hot :) Stephen . |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1856 Credit: 268,616,081 RAC: 1,349 |
. . OK I have the picture on your card setup. Personally I am too timid to use risers to access the other slots. But I was actually referring to external PCIe PSU connectors, the PSU in my 8000 has none, I was wondering how many you have. And I suspect the 750ti's crunching at full steam might be using 80% of TDP or a little more (my GTX950 is running at 85% TDP). So 3 x 60W at 80% is about 150W, considering PSUs are good for about 80% of their nominal value under sustained loads that mean 325W @ 80% less 150W for the 750ti's leaves just over 100W to run the C2Q and peripherals, I'd call that pushing the limits a bit :). My Pent-D rig with the 2 970s has a 650W PSU and is running at 450W +/- 30W crunching, it runs fairly warm and I would not want to push it very much harder. But I like being conservative. The EVGA 750tiSCs do not have an external power connector, they take what they need off the mobo (or, in the case of one using a riser, off a single Molex plugged into the riser card). That's one reason I really like that card and got so many. A PCIE slot is rated to supply 75w, so that's another place where I feel confident I'm not pushing the boundary any with this. At this point, the PS fan runs slowly, and the exhaust air doesn't even seem very warm, especially compared to the other boxes which tend to cook. Interestingly enough, my other box in this class (Gigabyte mb, Xeon that's basically a C2Q, and 3x GTX750tiSCs, keeps its OCX 700w ps blowing pretty warm air. Makes me wonder just how over-rated that OCX ps is, and/or how under-rated this HP PS is. As far as I can remember, the 8000CMT PS does not have any PCIE power connectors, too old for that. I think we drifted a bit OT here. Sorry, ... |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Worth to check if GPU has any visual artefacts on GPU tests. Maybe some issues with GPU memory chip. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Cliff Harding Send message Joined: 18 Aug 99 Posts: 1432 Credit: 110,967,840 RAC: 67 |
Has anyone seen this showing up on the notices tab before? It has happened a couple of times, but I can find no other info on it. Everything is running nicely and all of a sudden this pops up. It would be nice to at least see what task was causing the problem. SETI@home: Notice from BOINC Task postponed: CL file build failure 09/15/2016 23:59:21 This is what is in the event log -- 09/15/2016 23:59:21 | SETI@home | task postponed 30.000000 sec: 09/15/2016 23:59:21 | SETI@home | task postponed 30.000000 sec: 09/15/2016 23:59:21 | SETI@home | Task postponed: CL file build failure I don't buy computers, I build them!! |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
09/15/2016 23:59:21 | SETI@home | Task postponed: CL file build failure Which machine/project was this on? There is a known problem with r3525 (only), but 1) That was only ever released (briefly) at the SETI Beta project, and has never been released to the main project, or via a Lunatics Installer, beta or otherwise. 2) The problem was confined to GTX 6xx and earlier GPUs: the only machine you have attached to the Main project (here) has dual GTX 750Ti GPUs, which should be unaffected by this problem. So, more details, please. |
Cliff Harding Send message Joined: 18 Aug 99 Posts: 1432 Credit: 110,967,840 RAC: 67 |
09/15/2016 23:59:21 | SETI@home | Task postponed: CL file build failure Hi Richard, This is happening on SETI main on my 4770K, Win 7 x(64) machine, 2 x GTX750Ti @ 2Gb each, running Lunatics 0.45 beta -4 opencl_nivida_SoG (r3500). GPUs are running 3 tasks each at .5 CPU for each task. cmd_line.txt (-use_sleep -sbs 512 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -hp) I don't buy computers, I build them!! |
Mike Send message Joined: 17 Feb 01 Posts: 34354 Credit: 79,922,639 RAC: 80 |
09/15/2016 23:59:21 | SETI@home | Task postponed: CL file build failure Hi Cliff you need to change to -use_sleep -sbs 512 -spike_fft_thresh 2048 -tune 1 32 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 32 -oclfft_tune_cw 32 -hp Maybe to remove -hp is another thing you should try. With each crime and every kindness we birth our future. |
Cliff Harding Send message Joined: 18 Aug 99 Posts: 1432 Credit: 110,967,840 RAC: 67 |
Hi Cliff you need to change to Will try the new settings including removing the -hp and will get back to you. I don't buy computers, I build them!! |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
Hi Cliff you need to change to Cliff, thanks for providing the host and application details. One thing that's still perplexing me: how long ago did you deploy r3500, and did the build error message start immediately? My understanding is that the 'CL file build' process only has to happen once, the first time the application is run. Looking at my own machine, I have an r3500 BIN file dated 02 September, and an r3528 BIN file dated 11 September - no sign of any rebuilding since then. So I was wondering what might have triggered these new messages? |
Cliff Harding Send message Joined: 18 Aug 99 Posts: 1432 Credit: 110,967,840 RAC: 67 |
Hi Cliff you need to change to R3500 was first deployed on 15 August. It ran for a couple of weeks with the cmd_file.txt supplied above before it first showed up, but I didn't pay that close attention to it. It happened once or twice after that, but when it showed up last night, I decided to report it. What is r3528, and does it have anything to do with CUDA, as I don't run them? I don't buy computers, I build them!! |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
but when it showed up last night, I decided to report it Next time record (Copy/Paste) also the task name so we can search for it and look at stderr What is r3528 The same "thing" as r3500 but newer  - ALF - "Find out what you don't do well ..... then don't do it!" :)  |
Cliff Harding Send message Joined: 18 Aug 99 Posts: 1432 Credit: 110,967,840 RAC: 67 |
but when it showed up last night, I decided to report it Can't copy/paste what isn't there, as there is no indication what task was involved. I noticed the error message in the notice tab this morning, by then the suspected task was already gone. Will r3528 come out in beta -5 or do I need to do a stand-alone install? I don't buy computers, I build them!! |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
Will r3528 come out in beta -5 or do I need to do a stand-alone install? Raistmer has found something 'worth a deeper look', which suggests r3528 won't be the end of the line. So it's not worth hanging on for a full final release - I'll try and get a Beta5 out tomorrow, in the hope we can catch all the bugs in one go if we all combine forces. (A bit late to start that on a Friday night, this side of the pond) |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
Can't copy/paste what isn't there, as there is no indication what task was involved. You can search your stdoutdae.txt and stdoutdae.old for: CL file build failure This will find the lines you already posted: 09/15/2016 23:59:21 | SETI@home | task postponed 30.000000 sec: 09/15/2016 23:59:21 | SETI@home | task postponed 30.000000 sec: 09/15/2016 23:59:21 | SETI@home | Task postponed: CL file build failure Maybe a few lines above is the "[SETI@home] Starting task ..." Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Seems like it might be Task 5157268689. I have a utility that can retrieve all task details for a host, which can then be searched. That's the only currently listed task of his that has anything with "CL file build" in it, and the time frame looks to be about right. |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
OpenCL-kernels filename : MultiBeam_Kernels_r3500.cl CL file build log on device GeForce GTX 750 Ti INFO: can't build program from binary kernels, code 0 , recompiling from source... Error : Building Program (binary, clBuildProgram):main kernels: not OK code -6 CL file build log on device GeForce GTX 750 Ti  - ALF - "Find out what you don't do well ..... then don't do it!" :)  |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
OpenCL-kernels filename : MultiBeam_Kernels_r3500.cl But followed at the next attempt by CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3 SSSE3 FMA3 SSE4.1 SSE4.2 AVX OpenCL-kernels filename : MultiBeam_Kernels_r3500.cl ar=0.012665 NumCfft=117119 NumGauss=0 NumPulse=47842204544 NumTriplet=60817138848 Currently allocated 585 MB for GPU buffers In v_BaseLineSmooth: NumDataPoints=1048576, BoxCarLength=8192, NumPointsInChunk=32768 and from there on, it completed and was validated - with no sign of re-compiling, so the binary file was there all along. It would be interesting if Cliff could search the log files BilBg suggested for blc4_2bit_guppi_57449_43932_HIP78775_0013.24448.0.18.27.37.vlar and post the whole history, from the first attempt at running to final completion. One possible thought: since Cliff has two identical GPUs in the host, if two copies of the app tried to start at nearly the same instant, might one suffer an access problem? |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.