Panic Mode On (105) Server Problems?

Author	Message
Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1860412 - Posted: 8 Apr 2017, 20:02:19 UTC Last modified: 8 Apr 2017, 20:03:03 UTC Work mix has been rather odd for the last day or 2. Usually it's a fairly steady mix of Arecibo and GBT (generally more Arecibo WUs since they've left the extra PFB splitters running lately). But the last couple of days it's been big batches of Arecibo work, then a batch or 2 of GBT for a few downloads, then Arecibo for the next couple of hours, then a bit of GBT, then back to all Arecibo for a few hours. Odd. Grant Darwin NT ID: 1860412 ·

petri33 Volunteer tester Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156	Message 1860419 - Posted: 8 Apr 2017, 20:52:59 UTC Hi, The varying task generation rate and type, and the resulting outages for NVIDIA GPUs 'No work to be found' even though there are over 500 000 work units 'available', made me to do some rescheduling. Thanks to the author from whom I received the software to do just what I want. Yes. I'm strongly against rescheduling work units from GPU to CPU. But I feel that I must cache some work for the GPUs . Sp I'm moving work from CPU to the GPUs. This is just a test phase and if it succeeds I'll do that every Tuesday before the outage. I'm aware of the impact to the GFLOPS per CPU type going wrong. The first test made my CPU to report almost twice the GFLOPS. (Details, application details). It just makes me wonder how the creditScrew handles the situation. One good thing: While my GPU cache is full, I'm not downloading any NVIDIA GPU tasks. And yes, I know, there is no such thing as a GPU task, but since NVIDIA GPU's do not receive VLAR and I'm doing them through CPU rescheduling I'm not eating from the table of NV allowed work. Petri To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones ID: 1860419 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1860422 - Posted: 8 Apr 2017, 21:50:43 UTC - in response to Message 1860419. the resulting outages for NVIDIA GPUs 'No work to be found' even though there are over 500 000 work units 'available', Do you do AP work at all? Ever since December when something broke in the Scheduler I had been having issues getting GPU work. Changing the application settings "Run only the selected applications" to accept AP work & "If no work for selected applications is available, accept work from other applications?" -even though I didn't have an AP application installed- was necessary to receive work. Then i'd have to change it back again after a few days (or a few hours) to keep the work flowing. A couple of weeks ago I followed someone's suggestion & I ended up installing the AP application. And guess what? Asking for AP work, even though there is none available, results in getting v8 work if you have the AP application installed. Yes there have been a few times during the day where the cache might run down slightly (5-10WUs), but nothing like the cache almost emptying several times a day (30 or less WUs left) as was happening with just the v8 application installed. I really wish they would either 1 Fix the Scheduler so the application settings work as they used to, or 2 Make a note that if you wish to receive v8 work it is required you also have the AP application installed & selected to be sure of getting any. There's not much point in having the option to do or not to do certain types of work, if you have to have both of them selected to get any work anyway. Grant Darwin NT ID: 1860422 ·

petri33 Volunteer tester Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156	Message 1860432 - Posted: 8 Apr 2017, 23:01:59 UTC - in response to Message 1860422. the resulting outages for NVIDIA GPUs 'No work to be found' even though there are over 500 000 work units 'available', Do you do AP work at all? Ever since December when something broke in the Scheduler I had been having issues getting GPU work. Changing the application settings "Run only the selected applications" to accept AP work & "If no work for selected applications is available, accept work from other applications?" -even though I didn't have an AP application installed- was necessary to receive work. Then i'd have to change it back again after a few days (or a few hours) to keep the work flowing. A couple of weeks ago I followed someone's suggestion & I ended up installing the AP application. And guess what? Asking for AP work, even though there is none available, results in getting v8 work if you have the AP application installed. Yes there have been a few times during the day where the cache might run down slightly (5-10WUs), but nothing like the cache almost emptying several times a day (30 or less WUs left) as was happening with just the v8 application installed. I really wish they would either 1 Fix the Scheduler so the application settings work as they used to, or 2 Make a note that if you wish to receive v8 work it is required you also have the AP application installed & selected to be sure of getting any. There's not much point in having the option to do or not to do certain types of work, if you have to have both of them selected to get any work anyway. Hi Grant, I do have the AP application and I sometimes have to change the settings to get v8 work. Petri To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones ID: 1860432 ·

Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1860927 - Posted: 11 Apr 2017, 10:48:06 UTC Last modified: 11 Apr 2017, 10:49:58 UTC Is it just me? No one else has mentioned that uploads/downloads halted at right on 10AM UTC. EDIT: It's not just me, Haveland is showing the drops in data too. ID: 1860927 ·

Mr. Kevvy Volunteer moderator Volunteer tester Send message Joined: 15 May 99 Posts: 3776 Credit: 1,114,826,392 RAC: 3,319	Message 1860930 - Posted: 11 Apr 2017, 11:14:16 UTC - in response to Message 1860927. Is it just me? No one else has mentioned that uploads/downloads halted at right on 10AM UTC. Yup... I see about two hours worth of uploads on all my machines giving "Project communication failed: attempting access to reference site" but bruno and everything else required appears OK. Ah well, it's Tuesday so it should come back up again after the outrage. ID: 1860930 ·

Wiggo Send message Joined: 24 Jan 00 Posts: 34744 Credit: 261,360,520 RAC: 489	Message 1860932 - Posted: 11 Apr 2017, 11:20:56 UTC My old Athlon X4 is connecting fine and my other 2 rigs have no problems with downloads, but they can't get a connection to report. Cheers. ID: 1860932 ·

Ghia Send message Joined: 7 Feb 17 Posts: 238 Credit: 28,911,438 RAC: 50	Message 1860938 - Posted: 11 Apr 2017, 12:12:15 UTC I also have hours worth of uploads "in progess". Don't know about downloads..."not requesting tasks too many uploads in progress". Doesn't bode well for the outage. Humans may rule the world...but bacteria run it... ID: 1860938 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1860944 - Posted: 11 Apr 2017, 23:51:40 UTC - in response to Message 1860927. Is it just me? No one else has mentioned that uploads/downloads halted at right on 10AM UTC. EDIT: It's not just me, Haveland is showing the drops in data too. . . Same for me on all my rigs ... Stephen ?? ID: 1860944 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1860945 - Posted: 11 Apr 2017, 23:53:53 UTC - in response to Message 1860932. My old Athlon X4 is connecting fine and my other 2 rigs have no problems with downloads, but they can't get a connection to report. Cheers. . . Umm, if you cannot report how are you getting downloads?? Stephen ?? ID: 1860945 ·

JaundicedEye Send message Joined: 14 Mar 12 Posts: 5375 Credit: 30,870,693 RAC: 1	Message 1860951 - Posted: 12 Apr 2017, 0:47:03 UTC 4/11/2017 6:42:34 PM \| SETI@home \| update requested by user 4/11/2017 6:42:38 PM \| SETI@home \| Sending scheduler request: Requested by user. 4/11/2017 6:42:38 PM \| SETI@home \| Reporting 124 completed tasks 4/11/2017 6:42:38 PM \| SETI@home \| Requesting new tasks for CPU and NVIDIA GPU and Intel GPU 4/11/2017 6:42:50 PM \| SETI@home \| Scheduler request completed: got 0 new tasks 4/11/2017 6:42:50 PM \| SETI@home \| Project has no tasks available I had 124 task reports stuck, manual update fixed it all. "Sour Grapes make a bitter Whine." <(0)> ID: 1860951 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1860963 - Posted: 12 Apr 2017, 2:04:18 UTC - in response to Message 1860951. 4/11/2017 6:42:34 PM \| SETI@home \| update requested by user 4/11/2017 6:42:38 PM \| SETI@home \| Sending scheduler request: Requested by user. 4/11/2017 6:42:38 PM \| SETI@home \| Reporting 124 completed tasks 4/11/2017 6:42:38 PM \| SETI@home \| Requesting new tasks for CPU and NVIDIA GPU and Intel GPU 4/11/2017 6:42:50 PM \| SETI@home \| Scheduler request completed: got 0 new tasks 4/11/2017 6:42:50 PM \| SETI@home \| Project has no tasks available I had 124 task reports stuck, manual update fixed it all. No amount of manual updating could get my tasks to report. On all three machines. Just errors. What did work was setting some Log options. Try ticking http_debug and network_status_debug along with work_fetch_debug and let it run through the standard 305 seconds reconnect period. In all the messages you will eventually get: 4/11/2017 5:41:24 PM \| SETI@home \| [http] [ID#1] Info: We are completely uploaded and fine And !Voila! all my 300 tasks did report. By getting your uploads to complete and report your finished tasks, it will unplug the logjam and you will start trickling in new work. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1860963 ·

ReiAyanami Send message Joined: 6 Dec 05 Posts: 116 Credit: 222,900,202 RAC: 174	Message 1861074 - Posted: 12 Apr 2017, 17:21:49 UTC Last modified: 12 Apr 2017, 17:24:52 UTC 4/12/2017 1:17:58 PM \| SETI@home \| Sending scheduler request: To report completed tasks. 4/12/2017 1:17:58 PM \| SETI@home \| Reporting 20 completed tasks 4/12/2017 1:17:58 PM \| SETI@home \| Not requesting tasks: too many uploads in progress 4/12/2017 1:18:00 PM \| SETI@home \| Scheduler request failed: Couldn't resolve host name 4/12/2017 1:18:23 PM \| \| Project communication failed: attempting access to reference site 4/12/2017 1:18:25 PM \| \| Internet access OK - project servers may be temporarily down. I have all 400 completed tasks stack since yesterday. Could anyone help me? Please. ID: 1861074 ·

Ivan Send message Joined: 26 Jun 06 Posts: 1 Credit: 89,550,084 RAC: 72	Message 1861078 - Posted: 12 Apr 2017, 18:19:20 UTC - in response to Message 1861074. Last modified: 12 Apr 2017, 18:21:26 UTC >> I have all 400 completed tasks stack since yesterday. >> Could anyone help me? Please. Same for me... :( ID: 1861078 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1861079 - Posted: 12 Apr 2017, 18:21:27 UTC - in response to Message 1861074. Last modified: 12 Apr 2017, 18:22:18 UTC 4/12/2017 1:17:58 PM \| SETI@home \| Sending scheduler request: To report completed tasks. 4/12/2017 1:17:58 PM \| SETI@home \| Reporting 20 completed tasks 4/12/2017 1:17:58 PM \| SETI@home \| Not requesting tasks: too many uploads in progress 4/12/2017 1:18:00 PM \| SETI@home \| Scheduler request failed: Couldn't resolve host name 4/12/2017 1:18:23 PM \| \| Project communication failed: attempting access to reference site 4/12/2017 1:18:25 PM \| \| Internet access OK - project servers may be temporarily down. I have all 400 completed tasks stack since yesterday. Could anyone help me? Please. Manually click Retry a few times on the Transfers tab. Do any of them now upload? If not, what error are you getting? Have you tried re-booting the computer? Re-booting the modem/router? Grant Darwin NT ID: 1861079 ·

ReiAyanami Send message Joined: 6 Dec 05 Posts: 116 Credit: 222,900,202 RAC: 174	Message 1861080 - Posted: 12 Apr 2017, 18:27:09 UTC - in response to Message 1861079. Last modified: 12 Apr 2017, 18:29:16 UTC 4/12/2017 2:22:59 PM \| SETI@home \| Started upload of 08oc08ag.25154.14387.14.41.131_1_r792457085_0 4/12/2017 2:22:59 PM \| SETI@home \| Started upload of blc13_2bit_guppi_57824_84804_HIP22845_0056.2235.409.23.46.33.vlar_1_r1881844723_0 4/12/2017 2:23:00 PM \| SETI@home \| Temporarily failed upload of 08oc08ag.25154.14387.14.41.131_1_r792457085_0: can't resolve hostname 4/12/2017 2:23:00 PM \| SETI@home \| Backing off 00:09:17 on upload of 08oc08ag.25154.14387.14.41.131_1_r792457085_0 4/12/2017 2:23:00 PM \| SETI@home \| Temporarily failed upload of blc13_2bit_guppi_57824_84804_HIP22845_0056.2235.409.23.46.33.vlar_1_r1881844723_0: can't resolve hostname 4/12/2017 2:23:00 PM \| SETI@home \| Backing off 00:09:06 on upload of blc13_2bit_guppi_57824_84804_HIP22845_0056.2235.409.23.46.33.vlar_1_r1881844723_0 4/12/2017 2:23:01 PM \| \| Project communication failed: attempting access to reference site 4/12/2017 2:23:03 PM \| \| Internet access OK - project servers may be temporarily down. Manually clicking Retry gives me the above. Re-booted a few times and not resolved the issue. Any idea? ID: 1861080 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22202 Credit: 416,307,556 RAC: 380	Message 1861081 - Posted: 12 Apr 2017, 18:29:07 UTC Fairly simple (assuming you are using BOINC manager's GUI interface) In the "advanced" view select the "transfers" tab You will see the list of tasks being uploaded, in the "status" column a large number of them will probably say something like "postponed for xx minutes", then "project back-off xx hrs: xx minutes". Select one of the tasks, click the "retry now" button, this will clear the "project back-off" flag, and with luck tasks will start to be uploaded. Select all (or at least a fair number) of the tasks that still have "postponed" times, retry them, this should clear them so they can be transfered. If you have a lot of tasks stalled it may take a few attempts to get them all moved. Once you get down to a small number of tasks (below ten I think) you should find that downloads start automatically, otherwise just wait for the next "new" task to completed, and this should prod the servers into sending you new work, without the impolite message about having too many stalled uploads. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1861081 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1861082 - Posted: 12 Apr 2017, 18:30:38 UTC - in response to Message 1861074. Try my steps that I used in the message earlier in the thread and use the Log Options. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1861082 ·

ReiAyanami Send message Joined: 6 Dec 05 Posts: 116 Credit: 222,900,202 RAC: 174	Message 1861083 - Posted: 12 Apr 2017, 18:34:42 UTC - in response to Message 1861081. Last modified: 12 Apr 2017, 18:39:27 UTC Fairly simple (assuming you are using BOINC manager's GUI interface) In the "advanced" view select the "transfers" tab You will see the list of tasks being uploaded, in the "status" column a large number of them will probably say something like "postponed for xx minutes", then "project back-off xx hrs: xx minutes". Select one of the tasks, click the "retry now" button, this will clear the "project back-off" flag, and with luck tasks will start to be uploaded. Select all (or at least a fair number) of the tasks that still have "postponed" times, retry them, this should clear them so they can be transfered. If you have a lot of tasks stalled it may take a few attempts to get them all moved. Once you get down to a small number of tasks (below ten I think) you should find that downloads start automatically, otherwise just wait for the next "new" task to completed, and this should prod the servers into sending you new work, without the impolite message about having too many stalled uploads. I did what you described more than 100 times by now. I know it usually does solve the problem but not this time. What next? ID: 1861083 ·

ReiAyanami Send message Joined: 6 Dec 05 Posts: 116 Credit: 222,900,202 RAC: 174	Message 1861086 - Posted: 12 Apr 2017, 18:36:40 UTC - in response to Message 1861082. Last modified: 12 Apr 2017, 18:38:56 UTC Try my steps that I used in the message earlier in the thread and use the Log Options. I read your earlier post but I needed a little more detailed instructions. I have no idea what you were talking about. Could you direct me where I can learn about what you were mentioning? I appreciate your help. ID: 1861086 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.