Message boards :
Number crunching :
GPU stalling
Message board moderation
Author | Message |
---|---|
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
Rom Walton wrote: A number of people have reported that the CC starts failing to assign work to GPUs after a period of time. Email Rom at this address. |
skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60 |
I know that the DNETC folks had to install a command to end a WU if the WU wasn't showing progress for more than 20 minutes This apparently only affected ATI card WU's. This might be a problem for seti because of its non standard WU sizes. In a rich man's house there is no place to spit but his face. Diogenes Of Sinope |
Paul D. Buck Send message Joined: 19 Jul 00 Posts: 3898 Credit: 1,158,042 RAC: 0 |
I know that the DNETC folks had to install a command to end a WU if the WU wasn't showing progress for more than 20 minutes This apparently only affected ATI card WU's. This might be a problem for seti because of its non standard WU sizes. A different problem ... In the case of BOINC and what Rom is asking for, most, if not all, of the 6.10.45+ versions would stop assigning work to one or more GPUs in a system. The system might still process work on other GPUs installed, but the one that would "fail" was essentially a fail silent in that there would be no crash and the only positive indicator was that in some cases you could see that the memory size detect code would not get a valid memory size... There is now the question if this code should be pulled (I voted yes) or not ... the whole point was to try to protect tasks from cards with limited GPU memory ... |
MarkJ Send message Joined: 17 Feb 08 Posts: 1139 Credit: 80,854,192 RAC: 5 |
I know that the DNETC folks had to install a command to end a WU if the WU wasn't showing progress for more than 20 minutes This apparently only affected ATI card WU's. This might be a problem for seti because of its non standard WU sizes. They may have found the answer. 6.10.56 would appear to address the GPU's going idle and .55 removed the memory checking. BOINC blog |
Paul D. Buck Send message Joined: 19 Jul 00 Posts: 3898 Credit: 1,158,042 RAC: 0 |
They may have found the answer. 6.10.56 would appear to address the GPU's going idle and .55 removed the memory checking. .55 was actually the fix, and .56 was a fix to the fix ... :) Yes, I am running .56 as we speak on all 5 systems ... one has been up for 2 days now and still going strong ... (well, one install upgrade in there as well) ... UCB does not subscribe to chaos theory which says that even if you do the "same thing" over and over again you do not necessarily get the same results even with the same inputs ... one of the reasons that there is so much instability in running BOINC is that many internal functions are done far more often than they really need to be done ... The memory testing of the GPU was just another example of a good idea gone bad because they went overboard on the number of times they tested to see if there was enough memory ... Theory says you should be able to ask as often as you would like, but, reality said otherwise .. so this thread can be unpinned ... and allowed to die .... |
Paul D. Buck Send message Joined: 19 Jul 00 Posts: 3898 Credit: 1,158,042 RAC: 0 |
Out of 52 projects I am currently running with 2x GTX 480s enabled in non-SLI, I narrowed down, 1 by 1, that SETI is the ONLY project out of all 52 that actually freezes my OS to the point of having to do a hard power-down to recover. The only question I have is will I have to unattach from SETI and reattach in the future or will I have to uninstall BOINC and reinstall once the GTX 480 crash/freeze is finally fixed. Generally speaking ... no ... You may have to run a particular set of optimized applications to get the cards to work correctly. For the moment, you should be able to just set NNT (or turn off the use of CUDA) for SaH ... Usually one only needs to re-install BOINC if really bad things happen... aside from migration of versions (up or down) I cannot remember the last time that I had to re-install BOINC to clear up an issue... |
T-Armstrong Send message Joined: 2 Feb 10 Posts: 9 Credit: 312,965 RAC: 0 |
Its a GPU Poblem oder Server shut down ? Pease help me ! my GPU is running with 1024 MB - oh, men, i hade a problem, all works unit running with my Computers, but not Seti, is a Server Problem ? My PC Toschiba Tosh, weill not running with 8 CPU !!!??? ( look in my Profil / ->"show Computers" so you see it, I must give all Works back. Becourse not running. I`ve wait 48 Hours, but no running. Have a nice day today |
Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0 |
Mir ist nicht ganz klar, was Dein Problem ist. Du sprichst wahrscheinlich von diesem Rechner, da die anderen beiden keine acht Prozessoren haben und außerdem detached sind. Der i7-Rechner hat zur Zeit acht Tasks aktiv, alle für die CUDA Karte. Läuft einer von denen? Ich weiß nichts über die GTS 360M Karte. Wenn Du Dich wunderst dass keine SETI Tasks auf der CPU laufen, musst Du mal die Zeitschulden (Long Term Debt) der anderen Projekte (besonders AQUA) überprüfen. Wahrscheinlich fordert BOINC gar keine CPU-Jobs von SETI an. Das kannst Du alles im Logfile (stdoutdae.txt) nachsehen, aber wahrscheinlich müsstest Du dafür noch ein paar Optionen aktivieren (cc_config.xml). Gruß, Gundolf Computer sind nicht alles im Leben. (Kleiner Scherz) SETI@home classic workunits 3,758 SETI@home classic CPU time 66,520 hours |
T-Armstrong Send message Joined: 2 Feb 10 Posts: 9 Credit: 312,965 RAC: 0 |
ok,Thanks Armstrong |
_heinz Send message Joined: 25 Feb 05 Posts: 744 Credit: 5,539,270 RAC: 0 |
Hi Angeless, I installed BOINC 6.10.56, but it cant connect, then I deinstalled it and installd my former used BOINC 6.10.18 But now I have the situation none of the BOINC clients can connect. Any idea what todo ? edit: After waiting a hour it connected now. Running now BOINC 6.10.56 D5400XS V8-Xeon |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
_heinz, I answered you in the other thread. |
_heinz Send message Joined: 25 Feb 05 Posts: 744 Credit: 5,539,270 RAC: 0 |
Hi Angeless, thanks, it worked now heinz |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
Yay! Glad to be of help. |
_heinz Send message Joined: 25 Feb 05 Posts: 744 Credit: 5,539,270 RAC: 0 |
Hi, last night I must restart my machine, now I have the same issue as yesterday. BOINC can not sync. Now I'm waiting for 6 hours, but it can not connect. My net looks ok. I can ping berkeley and get answer. Ping wird ausgeführt für boinc2.ssl.berkeley.edu [208.68.240.18] mit 32 Bytes Da ten: Antwort von 208.68.240.18: Bytes=32 Zeit=213ms TTL=50 Antwort von 208.68.240.18: Bytes=32 Zeit=214ms TTL=50 Antwort von 208.68.240.18: Bytes=32 Zeit=214ms TTL=50 Antwort von 208.68.240.18: Bytes=32 Zeit=212ms TTL=50 Ping-Statistik für 208.68.240.18: Pakete: Gesendet = 4, Empfangen = 4, Verloren = 0 (0% Verlust), Ca. Zeitangaben in Millisek.: Minimum = 212ms, Maximum = 214ms, Mittelwert = 213ms Any help appreciate D5400XS V8-Xeon |
Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0 |
Any help appreciated I think the other half of boinc2.ssl.berkeley.edu is the problem. Wasn't that 208.68.240.12? And I'm just curious: how come you have a German operating system? I had expected a French one. Gruß, Gundolf [edit]I know that 6.10.56 should take care of that, but who knows?[/edit] |
Fred W Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0 |
.13 F. |
_heinz Send message Joined: 25 Feb 05 Posts: 744 Credit: 5,539,270 RAC: 0 |
12 does not work Ping wird ausgeführt für 208.68.240.12 mit 32 Bytes Daten: Zeitüberschreitung der Anforderung. Zeitüberschreitung der Anforderung. Zeitüberschreitung der Anforderung. Zeitüberschreitung der Anforderung. Ping-Statistik für 208.68.240.12: Pakete: Gesendet = 4, Empfangen = 0, Verloren = 4 (100% Verlust), ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 13 does Ping wird ausgeführt für 208.68.240.13 mit 32 Bytes Daten: Antwort von 208.68.240.13: Bytes=32 Zeit=215ms TTL=50 Antwort von 208.68.240.13: Bytes=32 Zeit=214ms TTL=50 Antwort von 208.68.240.13: Bytes=32 Zeit=210ms TTL=50 Antwort von 208.68.240.13: Bytes=32 Zeit=213ms TTL=50 Ping-Statistik für 208.68.240.13: Pakete: Gesendet = 4, Empfangen = 4, Verloren = 0 (0% Verlust), Ca. Zeitangaben in Millisek.: Minimum = 210ms, Maximum = 215ms, Mittelwert = 213ms ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 18 does Ping wird ausgeführt für 208.68.240.18 mit 32 Bytes Daten: Antwort von 208.68.240.18: Bytes=32 Zeit=210ms TTL=50 Antwort von 208.68.240.18: Bytes=32 Zeit=218ms TTL=50 Antwort von 208.68.240.18: Bytes=32 Zeit=211ms TTL=50 Antwort von 208.68.240.18: Bytes=32 Zeit=215ms TTL=50 Ping-Statistik für 208.68.240.18: Pakete: Gesendet = 4, Empfangen = 4, Verloren = 0 (0% Verlust), Ca. Zeitangaben in Millisek.: Minimum = 210ms, Maximum = 218ms, Mittelwert = 213ms ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ this is in my etc/hosts 208.68.240.18 boinc2.ssl.berkeley.edu 208.68.240.13 boinc2.ssl.berkeley.edu ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ so it should always work. the curiosity is since more than 9 hours it is connected now BONC no sync boincmgr_image boincmgr_threads boincmgr_nosync boinsmgr_tcpip I believe BOINC 6.10.56 has some issues. heinz D5400XS V8-Xeon |
_heinz Send message Joined: 25 Feb 05 Posts: 744 Credit: 5,539,270 RAC: 0 |
Hi Gudolf, mother language is german :-) living in France |
Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0 |
this is in my etc/hosts If both addresses are working (as they do currently), you should comment these lines in your hosts file. There should be only one of them active anyway. To be certain, are we speaking of connecting of the client to the server or of the manager to the client? Gruß, Gundolf |
_heinz Send message Joined: 25 Feb 05 Posts: 744 Credit: 5,539,270 RAC: 0 |
hmm...have seen: isaac.ssl.berkeley.edu:http 207.46.209.243:http C:\Users\heinz>ping isaac.ssl.berkeley.edu Ping wird ausgeführt für isaac.ssl.berkeley.edu [128.32.18.189] mit 32 Bytes Dat en: Antwort von 128.32.18.189: Bytes=32 Zeit=212ms TTL=45 Antwort von 128.32.18.189: Bytes=32 Zeit=212ms TTL=45 Antwort von 128.32.18.189: Bytes=32 Zeit=213ms TTL=45 Antwort von 128.32.18.189: Bytes=32 Zeit=216ms TTL=45 Ping-Statistik für 128.32.18.189: Pakete: Gesendet = 4, Empfangen = 4, Verloren = 0 (0% Verlust), Ca. Zeitangaben in Millisek.: Minimum = 212ms, Maximum = 216ms, Mittelwert = 213ms C:\Users\heinz>ping 207.46.209.243 Ping wird ausgeführt für 207.46.209.243 mit 32 Bytes Daten: Antwort von 207.46.39.45: Zielnetz nicht erreichbar. Zeitüberschreitung der Anforderung. Zeitüberschreitung der Anforderung. Antwort von 207.46.39.45: Zielnetz nicht erreichbar. Ping-Statistik für 207.46.209.243: Pakete: Gesendet = 4, Empfangen = 2, Verloren = 2 (50% Verlust), ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ there must be any problem boinc_exe_tcpip I bought driver-cleaner and run it, as Angeless recommend. But my connection problems are not solved. Any other ideas ? D5400XS V8-Xeon |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.