Panic Mode On (74) Server problems?

Message boards : Number crunching : Panic Mode On (74) Server problems?

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 11 · Next

AuthorMessage
kittymanProject Donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 45855
Credit: 814,512,636
RAC: 122,010
United States
Message 1229549 - Posted: 9 May 2012, 15:14:34 UTC

My guess is that we are again seeing some kind of scheduler/feeder limitation.
I agree that even with no AP using bandwidth, MB alone has shown the capability of fully saturating the bandwidth.

On the other hand, NOT saturating the bandwidth may actually be making better use of it......


Kitties make wonderful traveling companions on your journey through life.

Have made a few friends in this life.
Most were cats.

ID: 1229549 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6085
Credit: 154,936,154
RAC: 46,355
United States
Message 1229581 - Posted: 9 May 2012, 16:33:38 UTC - in response to Message 1229549.

My guess is that we are again seeing some kind of scheduler/feeder limitation.
I agree that even with no AP using bandwidth, MB alone has shown the capability of fully saturating the bandwidth.

On the other hand, NOT saturating the bandwidth may actually be making better use of it......

My machines are no longer uploading/requesting tasks 1 or 2 at a time. As they seem to have filled to their cache settings. So we may be looking at a normal bandwidth graph again. Which is how it would often look in the days before limits sans AP or shorties.
Not to say all requests are being fulfilled. Just that there are not so many transfers in progress to keep the bandwidth pegged 24/7.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

ID: 1229581 · Report as offensive
kittymanProject Donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 45855
Credit: 814,512,636
RAC: 122,010
United States
Message 1230054 - Posted: 10 May 2012, 16:36:54 UTC
Last modified: 10 May 2012, 16:37:54 UTC

With the increased limits and the scheduler/feeder not having tasks available all the time....

The dang Boinc scheduler bug is kicking up again.

My #1 rig, not banging up against the limits anymore, is getting plenty of work for the GPU, but the scheduler is once again letting the CPUs go idle, not sending them a drop of work because the GPU cache is not full yet.
So the CPUs are twiddling their thumbs.

Dang it, DA....please quit starving the slower resources completely just because the fastest ones do not have their caches full!!!


Kitties make wonderful traveling companions on your journey through life.

Have made a few friends in this life.
Most were cats.

ID: 1230054 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6085
Credit: 154,936,154
RAC: 46,355
United States
Message 1230066 - Posted: 10 May 2012, 17:00:42 UTC - in response to Message 1230054.

With the increased limits and the scheduler/feeder not having tasks available all the time....

The dang Boinc scheduler bug is kicking up again.

My #1 rig, not banging up against the limits anymore, is getting plenty of work for the GPU, but the scheduler is once again letting the CPUs go idle, not sending them a drop of work because the GPU cache is not full yet.
So the CPUs are twiddling their thumbs.

Dang it, DA....please quit starving the slower resources completely just because the fastest ones do not have their caches full!!!

I thought there was talk about that being corrected in the v7 client, but then there is the odd high/low work fetch system it uses.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

ID: 1230066 · Report as offensive
kittymanProject Donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 45855
Credit: 814,512,636
RAC: 122,010
United States
Message 1230068 - Posted: 10 May 2012, 17:07:14 UTC - in response to Message 1230066.

With the increased limits and the scheduler/feeder not having tasks available all the time....

The dang Boinc scheduler bug is kicking up again.

My #1 rig, not banging up against the limits anymore, is getting plenty of work for the GPU, but the scheduler is once again letting the CPUs go idle, not sending them a drop of work because the GPU cache is not full yet.
So the CPUs are twiddling their thumbs.

Dang it, DA....please quit starving the slower resources completely just because the fastest ones do not have their caches full!!!

I thought there was talk about that being corrected in the v7 client, but then there is the odd high/low work fetch system it uses.


I don't believe this has ANYTHING to do with the Boinc client.
The host continually asks for GPU 'AND' CPU tasks. But is repeatedly ONLY sent GPU work.
Kitties make wonderful traveling companions on your journey through life.

Have made a few friends in this life.
Most were cats.

ID: 1230068 · Report as offensive
Profile Alex Storey
Volunteer tester
Avatar

Send message
Joined: 14 Jun 04
Posts: 1087
Credit: 1,950,564
RAC: 200
Greece
Message 1230070 - Posted: 10 May 2012, 17:11:00 UTC - in response to Message 1230068.

I don't believe ANYTHING that has to do with the Boinc client.


There, I fixed it:)

ID: 1230070 · Report as offensive
Profile red-ray
Avatar

Send message
Joined: 24 Jun 99
Posts: 308
Credit: 9,029,848
RAC: 0
United Kingdom
Message 1230072 - Posted: 10 May 2012, 17:13:57 UTC - in response to Message 1230068.
Last modified: 10 May 2012, 17:52:33 UTC

I thought there was talk about that being corrected in the v7 client, but then there is the odd high/low work fetch system it uses.

No, I have 7.0.25 on my QX6700 and it's got the same problem, so having V7 does not help with this server issue.

I would like to see a bigger fifo so fewer requests are needed to replenish the cache.

ID: 1230072 · Report as offensive
kittymanProject Donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 45855
Credit: 814,512,636
RAC: 122,010
United States
Message 1230073 - Posted: 10 May 2012, 17:15:03 UTC - in response to Message 1230072.

I thought there was talk about that being corrected in the v7 client, but then there is the odd high/low work fetch system it uses.

No, I have 7.0.25 on my QX6700 and it's got the same problem.

It's not the client....
It's the what the scheduler logic does with the client request.
Kitties make wonderful traveling companions on your journey through life.

Have made a few friends in this life.
Most were cats.

ID: 1230073 · Report as offensive
Richard HaselgroveProject Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 11136
Credit: 83,468,118
RAC: 40,741
United Kingdom
Message 1230078 - Posted: 10 May 2012, 17:33:06 UTC - in response to Message 1230073.

I thought there was talk about that being corrected in the v7 client, but then there is the odd high/low work fetch system it uses.

No, I have 7.0.25 on my QX6700 and it's got the same problem.

It's not the client....
It's the what the scheduler logic does with the client request.

And by scheduler, Mark means the scheduler that runs on the server - that is indeed where this particular problem lies.

ID: 1230078 · Report as offensive
kittymanProject Donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 45855
Credit: 814,512,636
RAC: 122,010
United States
Message 1230081 - Posted: 10 May 2012, 17:34:51 UTC - in response to Message 1230078.
Last modified: 10 May 2012, 17:51:47 UTC

I thought there was talk about that being corrected in the v7 client, but then there is the odd high/low work fetch system it uses.

No, I have 7.0.25 on my QX6700 and it's got the same problem.

It's not the client....
It's the what the scheduler logic does with the client request.

And by scheduler, Mark means the scheduler that runs on the server - that is indeed where this particular problem lies.

Thank you, Richard.

Of my top 3 rigs, 2 are now running GPU only due to this bug.
The only reason the 3rd is not is that the CPU is running on cached AP work with the manually installed AP app. Otherwise, it would be in the same boat.
Kitties make wonderful traveling companions on your journey through life.

Have made a few friends in this life.
Most were cats.

ID: 1230081 · Report as offensive
Profile red-ray
Avatar

Send message
Joined: 24 Jun 99
Posts: 308
Credit: 9,029,848
RAC: 0
United Kingdom
Message 1230092 - Posted: 10 May 2012, 17:57:42 UTC - in response to Message 1230081.
Last modified: 10 May 2012, 18:00:21 UTC

If you stop BOINC and set a bigish duration_correction_factor you will just get CPU work for a while. The reason my 980X gets CPU WUs is the DCF jumps to 6 when a slow GPU finishes and the system just asks for CPU WUs 'till it drops.

Wow, the 980X hast just hit 4,000 WUs cached.

ID: 1230092 · Report as offensive
kittymanProject Donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 45855
Credit: 814,512,636
RAC: 122,010
United States
Message 1230096 - Posted: 10 May 2012, 18:05:48 UTC - in response to Message 1230092.

If you stop BOINC and set a bigish duration_correction_factor you will just get CPU work for a while. The reason my 980X gets CPU WUs is the DCF jumps to 6 when a slow GPU finishes and the system just asks for CPU WUs 'till it drops.

Wow, the 980X hast just hit 4,000 WUs cached.

I have enough GPU work to last a bit, so I am going to do the 'uncheck use nvidia GPU' trick to get some CPU work flowing.

But, that is a workaround, and should not be necessary.


Kitties make wonderful traveling companions on your journey through life.

Have made a few friends in this life.
Most were cats.

ID: 1230096 · Report as offensive
LadyL
Volunteer tester
Avatar

Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1230097 - Posted: 10 May 2012, 18:06:25 UTC - in response to Message 1230072.

I would like to see a bigger fifo so fewer requests are needed to replenish the cache.


It's called the feeder.

The usual workaround is to disable the resource in the project prefs that is getting all the tasks, until the 'slower' has some sort of cache.

The other option would be to reduce cache, allow the slower resource to catch up and then gradually increase cache again.

It will eventually get sorted by itself, but if you have a large cache to fill, it may take quite a while until you have single resource requests again instead of double ones.
I'm not the Pope. I don't speak Ex Cathedra!

ID: 1230097 · Report as offensive
Tutankhamon "Communist"
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 6081
Credit: 37,585,202
RAC: 14,685
Sweden
Message 1230136 - Posted: 10 May 2012, 19:29:33 UTC

Who stole all APs? Or, who stole the AP splitters?


This is a test of the Emergency Moron System. Had there been a real moron in the room, there would've been a small mushroom cloud in the place where the idiot had been standing.

ID: 1230136 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 2871
Credit: 10,619,733
RAC: 295
United States
Message 1230235 - Posted: 10 May 2012, 21:43:44 UTC - in response to Message 1230136.

Who stole all APs? Or, who stole the AP splitters?

It may be that since AP_v505 is now down to 8 still out in the field, new tapes will be held until the last of those 505's comes in and they can do that DB kick thing (which may have already been done since "awaiting validation" isn't 10k+ anymore).

Or.. it could just be that there were so many tapes loaded up in the first place that we're now at that point where we have to sit around and wait for the MB splitters to catch up to get some new tapes loaded for AP to split.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)

ID: 1230235 · Report as offensive
Profile arkaynProject Donor
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4097
Credit: 51,575,896
RAC: 1,692
United States
Message 1230303 - Posted: 10 May 2012, 23:27:22 UTC - in response to Message 1230235.

Who stole all APs? Or, who stole the AP splitters?

It may be that since AP_v505 is now down to 8 still out in the field, new tapes will be held until the last of those 505's comes in and they can do that DB kick thing (which may have already been done since "awaiting validation" isn't 10k+ anymore).

Or.. it could just be that there were so many tapes loaded up in the first place that we're now at that point where we have to sit around and wait for the MB splitters to catch up to get some new tapes loaded for AP to split.


I am down to 18 from 65 a couple of days ago.

ID: 1230303 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 2871
Credit: 10,619,733
RAC: 295
United States
Message 1230426 - Posted: 11 May 2012, 6:05:02 UTC

Well this is just starting to be almost slightly irritating. Because of the adjustments to the estimates, I ended up with like a 22-day AP-only cache and therefore, my average turnaround time was in the high teens. The result of this was that most of my wingmates were waiting for me, so I ended up with nearly every reported result being validated immediately.

But since there hasn't been new work going out and my cache is now down in the ~8-day range, I'm starting to pick up more and more pendings when I report. Oh well. That's the way it goes.


Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)

ID: 1230426 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 7474
Credit: 90,831,339
RAC: 44,977
Australia
Message 1230437 - Posted: 11 May 2012, 6:27:44 UTC


I still reckon something's not quite right.
I'm not getting as many "Project has no tasks available" messages as i was, but i'm still getting more than i usually do even when network traffic is maxed out. Given how (relatively) low the traffic has been i would expect to get hardly any, if any, such messages when requesting work.


Grant
Darwin NT

ID: 1230437 · Report as offensive
Richard HaselgroveProject Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 11136
Credit: 83,468,118
RAC: 40,741
United Kingdom
Message 1230442 - Posted: 11 May 2012, 6:33:22 UTC - in response to Message 1230437.


I still reckon something's not quite right.
I'm not getting as many "Project has no tasks available" messages as i was, but i'm still getting more than i usually do even when network traffic is maxed out. Given how (relatively) low the traffic has been i would expect to get hardly any, if any, such messages when requesting work.

Well, the tasks are going out, because we're now over 5.5 million out in the field. I don't know how big that figure can be before the database starts slowing down...

ID: 1230442 · Report as offensive
Profile red-ray
Avatar

Send message
Joined: 24 Jun 99
Posts: 308
Credit: 9,029,848
RAC: 0
United Kingdom
Message 1230458 - Posted: 11 May 2012, 8:24:49 UTC - in response to Message 1230437.
Last modified: 11 May 2012, 8:51:52 UTC

I still reckon something's not quite right.
I'm not getting as many "Project has no tasks available" messages as i was, but i'm still getting more than i usually do even when network traffic is maxed out. Given how (relatively) low the traffic has been i would expect to get hardly any, if any, such messages when requesting work.

Now there are no limits I expect many hosts are asking for and getting the entire of the feeder buffer. Getting WUs is going to be a problem 'till all the caches are full. I feel it would help a lot if the feeder could have a bigger buffer.

I am puzzled as to why the Result average turnaround is dropping though.

ID: 1230458 · Report as offensive
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 11 · Next

Message boards : Number crunching : Panic Mode On (74) Server problems?


 
©2016 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.