Panic Mode On (107) Server Problems?

Message boards : Number crunching : Panic Mode On (107) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 29 · Next

AuthorMessage
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1883394 - Posted: 12 Aug 2017, 17:12:24 UTC - in response to Message 1883393.  


Connectivity issues can be a lot more complicated than simply connection vs no connection.
If the data is malformed or truncated you can still get a response, but it likely won't be what is expected.
Not receiving a "Not sending work - last request too recent:" response after performing several updates would indicate something is being lost.

Huh?? "Not sending work - last request too recent:" is the exact correct programmed response from the servers if you hit Update before the normal project 305 second timeout between connections.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1883394 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1883397 - Posted: 12 Aug 2017, 17:32:00 UTC - in response to Message 1883394.  


Connectivity issues can be a lot more complicated than simply connection vs no connection.
If the data is malformed or truncated you can still get a response, but it likely won't be what is expected.
Not receiving a "Not sending work - last request too recent:" response after performing several updates would indicate something is being lost.

Huh?? "Not sending work - last request too recent:" is the exact correct programmed response from the servers if you hit Update before the normal project 305 second timeout between connections.

He said NOT receiving the response.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1883397 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1883398 - Posted: 12 Aug 2017, 17:39:06 UTC - in response to Message 1883397.  


He said NOT receiving the response.

Sorry, missed that somehow. I have always received the timeout response when clicking Update too soon.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1883398 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1883429 - Posted: 12 Aug 2017, 21:49:15 UTC

Hi,

I had my GPUs running out of work on Tuesdays and on other weekdays too. 'Not sending work' - 'No tasks available' bla bla blah.

I took a look at the boinc client source code and found that there was a had coded limit of 1000 WUs per host. I removed that.
Next I made the client to tell the servers that I have four times the GPUs I actually have. Now I have 100 CPU + 4 * 4*100 GPU tasks in the cache constantly.

No problems any more. The cache survives most of the Tuesday outage(s) and the servers refusing to send Arecibo vlars to NVIDIA (special app) hosts.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1883429 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1883432 - Posted: 12 Aug 2017, 22:04:08 UTC - in response to Message 1883429.  

Hi,

I had my GPUs running out of work on Tuesdays and on other weekdays too. 'Not sending work' - 'No tasks available' bla bla blah.

I took a look at the boinc client source code and found that there was a had coded limit of 1000 WUs per host. I removed that.
Next I made the client to tell the servers that I have four times the GPUs I actually have. Now I have 100 CPU + 4 * 4*100 GPU tasks in the cache constantly.

No problems any more. The cache survives most of the Tuesday outage(s) and the servers refusing to send Arecibo vlars to NVIDIA (special app) hosts.


. . I wish I had your skills, it would be nice to feel buffered against maintenance days.

Stephen

:)
ID: 1883432 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1883433 - Posted: 12 Aug 2017, 22:07:15 UTC
Last modified: 12 Aug 2017, 22:10:40 UTC

Woke up to find the cache running down. Did Tbar's trick. re-filled on the next automatic request.

I don't run standby projects. After adding AP to my system, the issue isn't nearly as bad as it was, but it still occurs. I have never been able to get more than 54 WUs on a single request, no matter how low the cache is.
It all started back in Dec of last year when people that chose to run AP only, with "If no work for selected applications is available, accept work from other applications?" set to Yes were no longer getting any MBv8 work when there was no AP available & had to set "Run only the selected applications Seti@home v8" to Yes as well to receive any.

EDIT- the problem getting work seems to be greatest when there is no AP work being split, and when (for whatever reason) very little GBT work is available for download.
Grant
Darwin NT
ID: 1883433 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1883434 - Posted: 12 Aug 2017, 22:07:48 UTC - in response to Message 1883432.  

Hi,

I had my GPUs running out of work on Tuesdays and on other weekdays too. 'Not sending work' - 'No tasks available' bla bla blah.

I took a look at the boinc client source code and found that there was a had coded limit of 1000 WUs per host. I removed that.
Next I made the client to tell the servers that I have four times the GPUs I actually have. Now I have 100 CPU + 4 * 4*100 GPU tasks in the cache constantly.

No problems any more. The cache survives most of the Tuesday outage(s) and the servers refusing to send Arecibo vlars to NVIDIA (special app) hosts.


. . I wish I had your skills, it would be nice to feel buffered against maintenance days.

Stephen

:)

I wish the project was able to support such caches for everybody and cheating the code was not required.

Meow.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1883434 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1883438 - Posted: 12 Aug 2017, 22:31:40 UTC - in response to Message 1883434.  
Last modified: 12 Aug 2017, 22:32:30 UTC

Hi,

I had my GPUs running out of work on Tuesdays and on other weekdays too. 'Not sending work' - 'No tasks available' bla bla blah.

I took a look at the boinc client source code and found that there was a had coded limit of 1000 WUs per host. I removed that.
Next I made the client to tell the servers that I have four times the GPUs I actually have. Now I have 100 CPU + 4 * 4*100 GPU tasks in the cache constantly.

No problems any more. The cache survives most of the Tuesday outage(s) and the servers refusing to send Arecibo vlars to NVIDIA (special app) hosts.


. . I wish I had your skills, it would be nice to feel buffered against maintenance days.

Stephen

:)

I wish the project was able to support such caches for everybody and cheating the code was not required.

Meow.


I think the same. The project should make the cache relative to RAC. My cache lasts now 1h 20 min for shorties and 7h for guppi work. If I had only one CPU and no GPUs I should have a cache of max 2 WUs per real core.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1883438 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1883440 - Posted: 12 Aug 2017, 22:41:36 UTC - in response to Message 1883438.  

Well, whatever the compromise, I do feel that there is some relatively simple way to set a sliding scale.
A factor of RAC might work, or average turn around time.
An old CPU with the least capable GPU could run for many days on 100wu for each.
A 16 core CPU with a GTX1080.............not so much.
And a computer running 'special sauce' needs more work to stay happy as well. And it would have a higher RAC and a shorter turn around time.
So one of the two should be a workable multiplier to use in cache adjustment for fast crunchers.

Meow.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1883440 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1883445 - Posted: 12 Aug 2017, 23:04:34 UTC

Do I remember correctly that these limits were put in place back in a time when there was a dearth of work available? What with the newest types of work being added in the last year or so, hasn't that concern been alleviated? When work is in short supply, we want to parse it out equitably to all, that is only fair for everyone who has graciously chosen SETI to crunch for, regardless of their systems output. But, when there is work aplenty, and we have systems that have the capability to really crank it out, wouldn't it make sense to upgrade the cache sizes to be optimal for whatever the system can produce? I do remember a similar discussion about this a while back, and I think one of the responses regarding it was the effect it had on the database, but maybe I am mis-remembering.

ID: 1883445 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1883447 - Posted: 12 Aug 2017, 23:08:07 UTC - in response to Message 1883445.  

Don't want to be a downer but since Dr. A has moved on...... I don't think any of this is going to change....
ID: 1883447 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1883448 - Posted: 12 Aug 2017, 23:08:23 UTC - in response to Message 1883445.  
Last modified: 12 Aug 2017, 23:11:03 UTC

No, if memory serves, the limits were put in place to cut down on the database size, which the servers were having problems with maintaining resulting in a lot of instability, crashes, and downtime. Not because work was scarce.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1883448 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1883449 - Posted: 12 Aug 2017, 23:10:10 UTC - in response to Message 1883447.  

And I don't think that Dr. Anderson had anything to do with the limits. Other than perhaps providing the code to implement them.
I am pretty sure they are a project specific limit, not a Boinc limit.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1883449 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1883452 - Posted: 12 Aug 2017, 23:19:51 UTC

The rate of transactions per second the servers have to do is based just on just how many tasks are returned, irregardless of how big the caches are for every host, fast or slow. The impact would be in the size of the database tables holding the number of tasks out in the field since that would increase for caches that are enlarged for the fastest hosts who are running the special app or have many gpus per host machine.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1883452 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1883454 - Posted: 12 Aug 2017, 23:23:11 UTC - in response to Message 1883449.  

And I don't think that Dr. Anderson had anything to do with the limits. Other than perhaps providing the code to implement them.
I am pretty sure they are a project specific limit, not a Boinc limit.

Maybe I misunderstood Petri's post. He said he recompiled the BOINC application, NOT the SETI applications. So the hard limit is in the platform and not specific to the project.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1883454 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1883456 - Posted: 12 Aug 2017, 23:26:22 UTC - in response to Message 1883454.  

And I don't think that Dr. Anderson had anything to do with the limits. Other than perhaps providing the code to implement them.
I am pretty sure they are a project specific limit, not a Boinc limit.

Maybe I misunderstood Petri's post. He said he recompiled the BOINC application, NOT the SETI applications. So the hard limit is in the platform and not specific to the project.

The 1000 total WU limit is in BOINC.
The 100 WUs per CPU/GPU limit is a Seti one.
Grant
Darwin NT
ID: 1883456 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1883457 - Posted: 12 Aug 2017, 23:27:45 UTC - in response to Message 1883454.  
Last modified: 12 Aug 2017, 23:29:53 UTC

And I don't think that Dr. Anderson had anything to do with the limits. Other than perhaps providing the code to implement them.
I am pretty sure they are a project specific limit, not a Boinc limit.

Maybe I misunderstood Petri's post. He said he recompiled the BOINC application, NOT the SETI applications. So the hard limit is in the platform and not specific to the project.

What I meant is that Dr. Anderson did not determine the limits or hard wire them into Boinc as best as I know.
They are contained in Boinc, but they are set by the project administrators.
In this case, they were probably set by Eric.
And according to Grant's post whilst I was typing this, the ultimate 1000 wu limit may be hardwired into Boinc.
So I may not have been 100% correct.

Meow.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1883457 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1883460 - Posted: 12 Aug 2017, 23:30:07 UTC - in response to Message 1883456.  

The 1000 total WU limit is in BOINC.
The 100 WUs per CPU/GPU limit is a Seti one.

So the 1000 limit is circumventable with the proper coding knowledge, but the 100 is client side and is in stone?

ID: 1883460 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1883461 - Posted: 12 Aug 2017, 23:31:08 UTC - in response to Message 1883434.  

Hi,

I had my GPUs running out of work on Tuesdays and on other weekdays too. 'Not sending work' - 'No tasks available' bla bla blah.

I took a look at the boinc client source code ... {redacted}

No problems any more. The cache survives most of the Tuesday outage(s) and the servers refusing to send Arecibo vlars to NVIDIA (special app) hosts.


. . I wish I had your skills, it would be nice to feel buffered against maintenance days.

Stephen

:)

I wish the project was able to support such caches for everybody and cheating the code was not required.

Meow.


. . Yep, that would be a nice solution as well, even better in fact.

Stephen

<sigh>
ID: 1883461 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1883462 - Posted: 12 Aug 2017, 23:31:58 UTC - in response to Message 1883454.  
Last modified: 12 Aug 2017, 23:32:54 UTC

I think there are 2 different places for limits. The seti limit which says 100 per processing unit (which we are all familiar with) and then there the BOINC limit which Petri pointed out is 1000 per host. Of course this raised an interesting question

Next I made the client to tell the servers that I have four times the GPUs I actually have. Now I have 100 CPU + 4 * 4*100 GPU tasks in the cache constantly.
ie 1700 except...
In progress (3387)


lol, Oh to have those coding skills..... ;) Just pulling your leg Petri... You do great work
ID: 1883462 · Report as offensive
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 29 · Next

Message boards : Number crunching : Panic Mode On (107) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.