Panic Mode On (105) Server Problems?

Message boards : Number crunching : Panic Mode On (105) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 21 · 22 · 23 · 24 · 25 · 26 · 27 . . . 34 · Next

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1860412 - Posted: 8 Apr 2017, 20:02:19 UTC
Last modified: 8 Apr 2017, 20:03:03 UTC

Work mix has been rather odd for the last day or 2.
Usually it's a fairly steady mix of Arecibo and GBT (generally more Arecibo WUs since they've left the extra PFB splitters running lately). But the last couple of days it's been big batches of Arecibo work, then a batch or 2 of GBT for a few downloads, then Arecibo for the next couple of hours, then a bit of GBT, then back to all Arecibo for a few hours.
Odd.
Grant
Darwin NT
ID: 1860412 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1860419 - Posted: 8 Apr 2017, 20:52:59 UTC

Hi,

The varying task generation rate and type, and the resulting outages for NVIDIA GPUs 'No work to be found' even though there are over 500 000 work units 'available', made me to do some rescheduling. Thanks to the author from whom I received the software to do just what I want.

Yes. I'm strongly against rescheduling work units from GPU to CPU.

But I feel that I must cache some work for the GPUs . Sp I'm moving work from CPU to the GPUs. This is just a test phase and if it succeeds I'll do that every Tuesday before the outage.

I'm aware of the impact to the GFLOPS per CPU type going wrong. The first test made my CPU to report almost twice the GFLOPS. (Details, application details). It just makes me wonder how the creditScrew handles the situation.

One good thing: While my GPU cache is full, I'm not downloading any NVIDIA GPU tasks. And yes, I know, there is no such thing as a GPU task, but since NVIDIA GPU's do not receive VLAR and I'm doing them through CPU rescheduling I'm not eating from the table of NV allowed work.

Petri
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1860419 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1860422 - Posted: 8 Apr 2017, 21:50:43 UTC - in response to Message 1860419.  

the resulting outages for NVIDIA GPUs 'No work to be found' even though there are over 500 000 work units 'available',

Do you do AP work at all?

Ever since December when something broke in the Scheduler I had been having issues getting GPU work. Changing the application settings "Run only the selected applications" to accept AP work & "If no work for selected applications is available, accept work from other applications?" -even though I didn't have an AP application installed- was necessary to receive work. Then i'd have to change it back again after a few days (or a few hours) to keep the work flowing.
A couple of weeks ago I followed someone's suggestion & I ended up installing the AP application.
And guess what? Asking for AP work, even though there is none available, results in getting v8 work if you have the AP application installed. Yes there have been a few times during the day where the cache might run down slightly (5-10WUs), but nothing like the cache almost emptying several times a day (30 or less WUs left) as was happening with just the v8 application installed.


I really wish they would either
1 Fix the Scheduler so the application settings work as they used to, or
2 Make a note that if you wish to receive v8 work it is required you also have the AP application installed & selected to be sure of getting any.
There's not much point in having the option to do or not to do certain types of work, if you have to have both of them selected to get any work anyway.
Grant
Darwin NT
ID: 1860422 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1860432 - Posted: 8 Apr 2017, 23:01:59 UTC - in response to Message 1860422.  

the resulting outages for NVIDIA GPUs 'No work to be found' even though there are over 500 000 work units 'available',

Do you do AP work at all?

Ever since December when something broke in the Scheduler I had been having issues getting GPU work. Changing the application settings "Run only the selected applications" to accept AP work & "If no work for selected applications is available, accept work from other applications?" -even though I didn't have an AP application installed- was necessary to receive work. Then i'd have to change it back again after a few days (or a few hours) to keep the work flowing.
A couple of weeks ago I followed someone's suggestion & I ended up installing the AP application.
And guess what? Asking for AP work, even though there is none available, results in getting v8 work if you have the AP application installed. Yes there have been a few times during the day where the cache might run down slightly (5-10WUs), but nothing like the cache almost emptying several times a day (30 or less WUs left) as was happening with just the v8 application installed.


I really wish they would either
1 Fix the Scheduler so the application settings work as they used to, or
2 Make a note that if you wish to receive v8 work it is required you also have the AP application installed & selected to be sure of getting any.
There's not much point in having the option to do or not to do certain types of work, if you have to have both of them selected to get any work anyway.


Hi Grant,
I do have the AP application and I sometimes have to change the settings to get v8 work.

Petri
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1860432 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1860927 - Posted: 11 Apr 2017, 10:48:06 UTC
Last modified: 11 Apr 2017, 10:49:58 UTC

Is it just me? No one else has mentioned that uploads/downloads halted at right on 10AM UTC.

EDIT: It's not just me, Haveland is showing the drops in data too.
ID: 1860927 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 1860930 - Posted: 11 Apr 2017, 11:14:16 UTC - in response to Message 1860927.  

Is it just me? No one else has mentioned that uploads/downloads halted at right on 10AM UTC.


Yup... I see about two hours worth of uploads on all my machines giving "Project communication failed: attempting access to reference site" but bruno and everything else required appears OK. Ah well, it's Tuesday so it should come back up again after the outrage.
ID: 1860930 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1860932 - Posted: 11 Apr 2017, 11:20:56 UTC

My old Athlon X4 is connecting fine and my other 2 rigs have no problems with downloads, but they can't get a connection to report.

Cheers.
ID: 1860932 · Report as offensive
Ghia
Avatar

Send message
Joined: 7 Feb 17
Posts: 238
Credit: 28,911,438
RAC: 50
Norway
Message 1860938 - Posted: 11 Apr 2017, 12:12:15 UTC

I also have hours worth of uploads "in progess".
Don't know about downloads..."not requesting tasks too many uploads in progress". Doesn't bode well for the outage.
Humans may rule the world...but bacteria run it...
ID: 1860938 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1860944 - Posted: 11 Apr 2017, 23:51:40 UTC - in response to Message 1860927.  

Is it just me? No one else has mentioned that uploads/downloads halted at right on 10AM UTC.

EDIT: It's not just me, Haveland is showing the drops in data too.


. . Same for me on all my rigs ...

Stephen

??
ID: 1860944 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1860945 - Posted: 11 Apr 2017, 23:53:53 UTC - in response to Message 1860932.  

My old Athlon X4 is connecting fine and my other 2 rigs have no problems with downloads, but they can't get a connection to report.

Cheers.


. . Umm, if you cannot report how are you getting downloads??

Stephen

??
ID: 1860945 · Report as offensive
Profile JaundicedEye
Avatar

Send message
Joined: 14 Mar 12
Posts: 5375
Credit: 30,870,693
RAC: 1
United States
Message 1860951 - Posted: 12 Apr 2017, 0:47:03 UTC

4/11/2017 6:42:34 PM | SETI@home | update requested by user
4/11/2017 6:42:38 PM | SETI@home | Sending scheduler request: Requested by user.
4/11/2017 6:42:38 PM | SETI@home | Reporting 124 completed tasks
4/11/2017 6:42:38 PM | SETI@home | Requesting new tasks for CPU and NVIDIA GPU and Intel GPU
4/11/2017 6:42:50 PM | SETI@home | Scheduler request completed: got 0 new tasks
4/11/2017 6:42:50 PM | SETI@home | Project has no tasks available

I had 124 task reports stuck, manual update fixed it all.

"Sour Grapes make a bitter Whine." <(0)>
ID: 1860951 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1860963 - Posted: 12 Apr 2017, 2:04:18 UTC - in response to Message 1860951.  

4/11/2017 6:42:34 PM | SETI@home | update requested by user
4/11/2017 6:42:38 PM | SETI@home | Sending scheduler request: Requested by user.
4/11/2017 6:42:38 PM | SETI@home | Reporting 124 completed tasks
4/11/2017 6:42:38 PM | SETI@home | Requesting new tasks for CPU and NVIDIA GPU and Intel GPU
4/11/2017 6:42:50 PM | SETI@home | Scheduler request completed: got 0 new tasks
4/11/2017 6:42:50 PM | SETI@home | Project has no tasks available

I had 124 task reports stuck, manual update fixed it all.

No amount of manual updating could get my tasks to report. On all three machines. Just errors. What did work was setting some Log options. Try ticking http_debug and network_status_debug along with work_fetch_debug and let it run through the standard 305 seconds reconnect period. In all the messages you will eventually get:

4/11/2017 5:41:24 PM | SETI@home | [http] [ID#1] Info: We are completely uploaded and fine

And !Voila! all my 300 tasks did report. By getting your uploads to complete and report your finished tasks, it will unplug the logjam and you will start trickling in new work.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1860963 · Report as offensive
Profile ReiAyanami
Avatar

Send message
Joined: 6 Dec 05
Posts: 116
Credit: 222,900,202
RAC: 174
Japan
Message 1861074 - Posted: 12 Apr 2017, 17:21:49 UTC
Last modified: 12 Apr 2017, 17:24:52 UTC

4/12/2017 1:17:58 PM | SETI@home | Sending scheduler request: To report completed tasks.
4/12/2017 1:17:58 PM | SETI@home | Reporting 20 completed tasks
4/12/2017 1:17:58 PM | SETI@home | Not requesting tasks: too many uploads in progress
4/12/2017 1:18:00 PM | SETI@home | Scheduler request failed: Couldn't resolve host name
4/12/2017 1:18:23 PM | | Project communication failed: attempting access to reference site
4/12/2017 1:18:25 PM | | Internet access OK - project servers may be temporarily down.

I have all 400 completed tasks stack since yesterday.
Could anyone help me? Please.
ID: 1861074 · Report as offensive
Profile Ivan

Send message
Joined: 26 Jun 06
Posts: 1
Credit: 89,550,084
RAC: 72
Russia
Message 1861078 - Posted: 12 Apr 2017, 18:19:20 UTC - in response to Message 1861074.  
Last modified: 12 Apr 2017, 18:21:26 UTC

>> I have all 400 completed tasks stack since yesterday.
>> Could anyone help me? Please.

Same for me... :(
ID: 1861078 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1861079 - Posted: 12 Apr 2017, 18:21:27 UTC - in response to Message 1861074.  
Last modified: 12 Apr 2017, 18:22:18 UTC

4/12/2017 1:17:58 PM | SETI@home | Sending scheduler request: To report completed tasks.
4/12/2017 1:17:58 PM | SETI@home | Reporting 20 completed tasks
4/12/2017 1:17:58 PM | SETI@home | Not requesting tasks: too many uploads in progress
4/12/2017 1:18:00 PM | SETI@home | Scheduler request failed: Couldn't resolve host name
4/12/2017 1:18:23 PM | | Project communication failed: attempting access to reference site
4/12/2017 1:18:25 PM | | Internet access OK - project servers may be temporarily down.

I have all 400 completed tasks stack since yesterday.
Could anyone help me? Please.

Manually click Retry a few times on the Transfers tab. Do any of them now upload? If not, what error are you getting?
Have you tried re-booting the computer? Re-booting the modem/router?
Grant
Darwin NT
ID: 1861079 · Report as offensive
Profile ReiAyanami
Avatar

Send message
Joined: 6 Dec 05
Posts: 116
Credit: 222,900,202
RAC: 174
Japan
Message 1861080 - Posted: 12 Apr 2017, 18:27:09 UTC - in response to Message 1861079.  
Last modified: 12 Apr 2017, 18:29:16 UTC

4/12/2017 2:22:59 PM | SETI@home | Started upload of 08oc08ag.25154.14387.14.41.131_1_r792457085_0
4/12/2017 2:22:59 PM | SETI@home | Started upload of blc13_2bit_guppi_57824_84804_HIP22845_0056.2235.409.23.46.33.vlar_1_r1881844723_0
4/12/2017 2:23:00 PM | SETI@home | Temporarily failed upload of 08oc08ag.25154.14387.14.41.131_1_r792457085_0: can't resolve hostname
4/12/2017 2:23:00 PM | SETI@home | Backing off 00:09:17 on upload of 08oc08ag.25154.14387.14.41.131_1_r792457085_0
4/12/2017 2:23:00 PM | SETI@home | Temporarily failed upload of blc13_2bit_guppi_57824_84804_HIP22845_0056.2235.409.23.46.33.vlar_1_r1881844723_0: can't resolve hostname
4/12/2017 2:23:00 PM | SETI@home | Backing off 00:09:06 on upload of blc13_2bit_guppi_57824_84804_HIP22845_0056.2235.409.23.46.33.vlar_1_r1881844723_0
4/12/2017 2:23:01 PM | | Project communication failed: attempting access to reference site
4/12/2017 2:23:03 PM | | Internet access OK - project servers may be temporarily down.

Manually clicking Retry gives me the above.
Re-booted a few times and not resolved the issue.
Any idea?
ID: 1861080 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22160
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1861081 - Posted: 12 Apr 2017, 18:29:07 UTC

Fairly simple (assuming you are using BOINC manager's GUI interface)
In the "advanced" view select the "transfers" tab
You will see the list of tasks being uploaded, in the "status" column a large number of them will probably say something like "postponed for xx minutes", then "project back-off xx hrs: xx minutes".
Select one of the tasks, click the "retry now" button, this will clear the "project back-off" flag, and with luck tasks will start to be uploaded. Select all (or at least a fair number) of the tasks that still have "postponed" times, retry them, this should clear them so they can be transfered. If you have a lot of tasks stalled it may take a few attempts to get them all moved.

Once you get down to a small number of tasks (below ten I think) you should find that downloads start automatically, otherwise just wait for the next "new" task to completed, and this should prod the servers into sending you new work, without the impolite message about having too many stalled uploads.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1861081 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1861082 - Posted: 12 Apr 2017, 18:30:38 UTC - in response to Message 1861074.  

Try my steps that I used in the message earlier in the thread and use the Log Options.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1861082 · Report as offensive
Profile ReiAyanami
Avatar

Send message
Joined: 6 Dec 05
Posts: 116
Credit: 222,900,202
RAC: 174
Japan
Message 1861083 - Posted: 12 Apr 2017, 18:34:42 UTC - in response to Message 1861081.  
Last modified: 12 Apr 2017, 18:39:27 UTC

Fairly simple (assuming you are using BOINC manager's GUI interface)
In the "advanced" view select the "transfers" tab
You will see the list of tasks being uploaded, in the "status" column a large number of them will probably say something like "postponed for xx minutes", then "project back-off xx hrs: xx minutes".
Select one of the tasks, click the "retry now" button, this will clear the "project back-off" flag, and with luck tasks will start to be uploaded. Select all (or at least a fair number) of the tasks that still have "postponed" times, retry them, this should clear them so they can be transfered. If you have a lot of tasks stalled it may take a few attempts to get them all moved.

Once you get down to a small number of tasks (below ten I think) you should find that downloads start automatically, otherwise just wait for the next "new" task to completed, and this should prod the servers into sending you new work, without the impolite message about having too many stalled uploads.


I did what you described more than 100 times by now.
I know it usually does solve the problem but not this time.
What next?
ID: 1861083 · Report as offensive
Profile ReiAyanami
Avatar

Send message
Joined: 6 Dec 05
Posts: 116
Credit: 222,900,202
RAC: 174
Japan
Message 1861086 - Posted: 12 Apr 2017, 18:36:40 UTC - in response to Message 1861082.  
Last modified: 12 Apr 2017, 18:38:56 UTC

Try my steps that I used in the message earlier in the thread and use the Log Options.


I read your earlier post but I needed a little more detailed instructions.
I have no idea what you were talking about.
Could you direct me where I can learn about what you were mentioning?
I appreciate your help.
ID: 1861086 · Report as offensive
Previous · 1 . . . 21 · 22 · 23 · 24 · 25 · 26 · 27 . . . 34 · Next

Message boards : Number crunching : Panic Mode On (105) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.