Panic Mode On (78) Server Problems?

Message boards : Number crunching : Panic Mode On (78) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 13 · 14 · 15 · 16 · 17 · 18 · 19 . . . 22 · Next

AuthorMessage
Profile Khangollo
Avatar

Send message
Joined: 1 Aug 00
Posts: 245
Credit: 36,410,524
RAC: 0
Slovenia
Message 1304911 - Posted: 11 Nov 2012, 14:01:15 UTC - in response to Message 1304863.  

My RAC has been steadily declining and I have noticed that the task list for two of my rigs show most of the assigned tasks under Error, and the status as abandoned. Could anyone tell me why this might occur. The rigs still have all the tasks and are crunching them, but obviously, not gaining any credit for the work being done. Should I reset the rig or is this something that will get sorted out automatically?

The same happened to me on one computer. After thousands of scheduler timeouts, one request apparently got mangled/misinterpreted enough, server thought I reset the project and decided to abandon all tasks. That's at least what I think what happened...
In any case, you should delete (abort) all those tasks with boinc manager, as they will not get automatically deleted and server will just ignore them (you won't get any credit; they were already marked abandoned for you and re-sent to other crunchers).

ID: 1304911 · Report as offensive
Profile Bill G Special Project $75 donor
Avatar

Send message
Joined: 1 Jun 01
Posts: 1282
Credit: 187,688,550
RAC: 182
United States
Message 1304961 - Posted: 11 Nov 2012, 16:19:17 UTC - in response to Message 1304877.  

The servers are still in recover after the AP splitting, no doubt it'll be some time before everyone's ghosts are resent,

There is a scheduler Bug fix in the works, hopefully it'll be deployed at Seti Beta on Monday, not expecting it to be a total cure, just a step in the right direction,

Claggy


Yes, it will be some time. I am now under 3000 ghosts on one computer and with the cache limit it will be a long time before they are all sent to me. But at least downloads are working as they should.

Most of the CPU tasks are being set to run at High Priority once they are scheduled to run. It is interesting to note that only 5 of the 8 CPU tasks will be High Priority at a time, they start in normal running mode then become High Priority as a WU gets finished. Just interesting to note.

SETI@home classic workunits 4,019
SETI@home classic CPU time 34,348 hours
ID: 1304961 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1304998 - Posted: 11 Nov 2012, 18:13:53 UTC - in response to Message 1304877.  

There is a scheduler Bug fix in the works, hopefully it'll be deployed at Seti Beta on Monday, not expecting it to be a total cure, just a step in the right direction,

Will that fix the Scheduler timeout problems, or fix the increasing number of Ghosts that are created when the Scheduler keeps timing out?

Grant
Darwin NT
ID: 1304998 · Report as offensive
Profile Michael W.F. Miles
Avatar

Send message
Joined: 24 Mar 07
Posts: 268
Credit: 34,410,870
RAC: 0
Canada
Message 1305011 - Posted: 11 Nov 2012, 18:39:04 UTC

What get me here is after fixing the connection issue with a proxy server I am now having to check on this machine every four hours as I only am getting enough work to keep it going for four hours before the LIMITS kick in.

I have been running Ghostdet and two days ago I was getting 54% ghosts tasks
Yesterday was 15%
Today 0% with 200 tasks on board.

They are all mostly shorties though.

Seems to be working itself out.

Now everytime I get a work request in the servers will only do ONE TO ONE
Report one, get one.

I hope this gets solved really fast as my patience is wearing thin as most are

We have built the fastest computer system in the world, lets keep it busy
ID: 1305011 · Report as offensive
fscheel

Send message
Joined: 13 Apr 12
Posts: 73
Credit: 11,135,641
RAC: 0
United States
Message 1305017 - Posted: 11 Nov 2012, 19:01:29 UTC

How does one go about finding a proxy that is safe to use?

Frank
ID: 1305017 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1305020 - Posted: 11 Nov 2012, 19:06:52 UTC - in response to Message 1304998.  

There is a scheduler Bug fix in the works, hopefully it'll be deployed at Seti Beta on Monday, not expecting it to be a total cure, just a step in the right direction,

Will that fix the Scheduler timeout problems, or fix the increasing number of Ghosts that are created when the Scheduler keeps timing out?

It'll fix the Bug of resending work to the wrong device, ie Boinc asks for CPU work only, but gets resends to the GPU instead (which wasn't asking for work), and so timing out any VLARs it encounters.

Claggy
ID: 1305020 · Report as offensive
Profile Vipin Palazhi
Avatar

Send message
Joined: 29 Feb 08
Posts: 286
Credit: 167,386,578
RAC: 0
India
Message 1305226 - Posted: 12 Nov 2012, 2:49:02 UTC - in response to Message 1304911.  

The same happened to me on one computer. After thousands of scheduler timeouts, one request apparently got mangled/misinterpreted enough, server thought I reset the project and decided to abandon all tasks. That's at least what I think what happened...
In any case, you should delete (abort) all those tasks with boinc manager, as they will not get automatically deleted and server will just ignore them (you won't get any credit; they were already marked abandoned for you and re-sent to other crunchers).

Thanks Khangollo, I shall do that.
ID: 1305226 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1305282 - Posted: 12 Nov 2012, 5:52:14 UTC - in response to Message 1305226.  


While i was at work, for some reson my internet connection died.
When i was able to reconnect, and upload all the work that had piled up, naturally the Scheduler timed out on all request for work & reporting.
Even with No New Tasks set it took several attempts to get a response from the Scheduler.
And even now, with only one task to report on one system, and a cou7ple on the other, all that i'm getting are Scheduler timeout errors.

Few more hours & i'll be completely out of work, before the weekly outage where i was expecting to run out of GPU work at least.
Grant
Darwin NT
ID: 1305282 · Report as offensive
Profile Fred E.
Volunteer tester

Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,140,697
RAC: 0
United States
Message 1305351 - Posted: 12 Nov 2012, 13:47:33 UTC

While i was at work, for some reson my internet connection died.
When i was able to reconnect, and upload all the work that had piled up, naturally the Scheduler timed out on all request for work & reporting.
Even with No New Tasks set it took several attempts to get a response from the Scheduler.
And even now, with only one task to report on one system, and a cou7ple on the other, all that i'm getting are Scheduler timeout errors.

Few more hours & i'll be completely out of work, before the weekly outage where i was expecting to run out of GPU work at least.

Also getting timeouts, even on NNT. Dropping my max reported setting. I'll also run out because I've got mostly shorties. Low limits and shorties = cruelty to crunchers!

I crunched 3 tasks this morning with 60 day deadlines - 10 Jan. I don't remember seeing that before.
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.
ID: 1305351 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1305380 - Posted: 12 Nov 2012, 15:25:16 UTC

I haven't put my hands on my i7 for a few days, but I will have to when i get home from work today. What concerns me, though, is that my account page says it only has 83 in progress, well below its limit of 200. Before all the trouble started, it typically ran in the 1100-1600 range. I know it just downloaded some new units from Einstein, but I don't know what the cause/effect relationship is. Is it getting Einstein because it can't get Seti, or is it feeling debt to Einstein and favoring it for now? My other two machines each reported one unit back to Einstein over the weekend without asking for more, leaving one of them with only Seti work on board. And I just slid back a position in my joining date class. :-(

David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1305380 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1305408 - Posted: 12 Nov 2012, 17:16:32 UTC

Media alert......
The kitties have inbound WUs!!!!
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1305408 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1305533 - Posted: 12 Nov 2012, 20:29:04 UTC - in response to Message 1305408.  
Last modified: 12 Nov 2012, 20:31:31 UTC

Media alert......
The kitties have inbound WUs!!!!

Purrrrr......

[edit]Purr also for the fact that, according to the weather thing in my signature at the time I posted this, we actually got up to 33F here. Woo hoo.
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1305533 · Report as offensive
Keith White
Avatar

Send message
Joined: 29 May 99
Posts: 392
Credit: 13,035,233
RAC: 22
United States
Message 1305604 - Posted: 12 Nov 2012, 23:03:11 UTC - in response to Message 1305408.  

You do seem to have a metric buttload of GPU tasks even though your CPU pile finally dropped bellow 100 on two kitties.

I think the all tasks web page still needs a filter for CPU vs GPU tasks, or at least a count.
"Life is just nature's way of keeping meat fresh." - The Doctor
ID: 1305604 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 223
United Kingdom
Message 1305632 - Posted: 13 Nov 2012, 0:13:01 UTC - in response to Message 1305604.  

You do seem to have a metric buttload of GPU tasks even though your CPU pile finally dropped bellow 100 on two kitties.

I think the all tasks web page still needs a filter for CPU vs GPU tasks, or at least a count.

Well, this does it for me, but for a particular machine with particular software. On a Windows machine you'd need either cygwin or some replacement for wc (word count); ISTR there's a DOS equivalent of grep -- find? Bug: the grep for 'fermi' returns a line not associated with jobs in progress, so the third line overcounts by one.

[eesridr:BOINC] > cat showjobs 
date
grep 'received_time' client_state.xml|wc
grep 'fermi' client_state.xml|wc

[eesridr:BOINC] > . showjobs 
Tue Nov 13 00:06:50 GMT 2012
    687     687   36411
    589     589   23560

ID: 1305632 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1305640 - Posted: 13 Nov 2012, 0:29:02 UTC - in response to Message 1305604.  

You do seem to have a metric buttload of GPU tasks even though your CPU pile finally dropped bellow 100 on two kitties.

I think the all tasks web page still needs a filter for CPU vs GPU tasks, or at least a count.

Those were all part of my caches before the problems started.
I had about 75,000 WUs in play before the trashing began.
It's down to 32,000 now.
9 rigs, remember. 20 GPUs.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1305640 · Report as offensive
Keith White
Avatar

Send message
Joined: 29 May 99
Posts: 392
Credit: 13,035,233
RAC: 22
United States
Message 1305645 - Posted: 13 Nov 2012, 0:56:52 UTC - in response to Message 1305640.  

I was just talking about one of the rigs that recently got CPU units. You still had around 1500 GPU units for the 3 GPUs. @500 seconds per GPU unit that's nearly 3 days worth left. Even if you get down to 100 per GPU that's still a half a day's worth. What did you normally run your queue as? 10 days.

It doesn't make a difference in bandwidth usage in the long run once the whole seti@home ecosystem hits steady state, it'll just mean that when a super cruncher's nVidia card goes off the rails they can only shaft at most 100 wingman per GPU as oppose to thousands. (Please check your, not directed at you msattler just nVidia users in general, results daily to catch when you system starts to produce mostly inconclusive/error/invalid GPU results.)
"Life is just nature's way of keeping meat fresh." - The Doctor
ID: 1305645 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1305773 - Posted: 13 Nov 2012, 12:12:47 UTC - in response to Message 1305645.  

I was just talking about one of the rigs that recently got CPU units. You still had around 1500 GPU units for the 3 GPUs. @500 seconds per GPU unit that's nearly 3 days worth left. Even if you get down to 100 per GPU that's still a half a day's worth. What did you normally run your queue as? 10 days.

It doesn't make a difference in bandwidth usage in the long run once the whole seti@home ecosystem hits steady state, it'll just mean that when a super cruncher's nVidia card goes off the rails they can only shaft at most 100 wingman per GPU as oppose to thousands. (Please check your, not directed at you msattler just nVidia users in general, results daily to catch when you system starts to produce mostly inconclusive/error/invalid GPU results.)

Each 690 crunch a WU in less than 7 min runing 3 WU at time on each GPU (it have 2) about 48 per hour or more, so in a big cruncher (3x690) a 100 WU cache is simpy ridiculous, not last for 1 hour. I have 2x690 sleeping on a bed waiting they rissing the limits, with the actual limits is a waste of time/resources put them to work, simply they will not receive the WU they need to work.
ID: 1305773 · Report as offensive
WezH
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 576
Credit: 67,033,957
RAC: 95
Finland
Message 1305804 - Posted: 13 Nov 2012, 19:55:14 UTC

Yay! Back from normal Tuesday time-out. (btw, people in lab are really morning people...)

Let's see what comes next... Cricket on top now, AP splitting disabled... Let's see and hope for better...
"Please keep Your signature under four lines so Internet traffic doesn't go up too much"

- In 1992 when I had my first e-mail address -
ID: 1305804 · Report as offensive
Profile Fred E.
Volunteer tester

Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,140,697
RAC: 0
United States
Message 1305811 - Posted: 13 Nov 2012, 20:09:03 UTC

UTC
Yay! Back from normal Tuesday time-out. (btw, people in lab are really morning people...)

Let's see what comes next... Cricket on top now, AP splitting disabled... Let's see and hope for better...


Yes, they took it down just before 6am California time. Doesn't look any better to me. Had timeouts on work requests, so I went NNT and some of those timed out, but I finally reported my completions. First work request generated some new ghosts and I haven't got them yet. Down to 1.5 hrs of gpu work.
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.
ID: 1305811 · Report as offensive
WezH
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 576
Credit: 67,033,957
RAC: 95
Finland
Message 1305822 - Posted: 13 Nov 2012, 20:30:01 UTC - in response to Message 1305811.  

Yes, they took it down just before 6am California time. Doesn't look any better to me. Had timeouts on work requests, so I went NNT and some of those timed out, but I finally reported my completions. First work request generated some new ghosts and I haven't got them yet. Down to 1.5 hrs of gpu work.


My computers logs says that they too it down before 5:25, at that time came first "project in maintenance" -message. I don't like to wake up that early...

Well, timeouts is normal after maintenance period, IMHO.

Let's wait about 12 hours and hope for better.
ID: 1305822 · Report as offensive
Previous · 1 . . . 13 · 14 · 15 · 16 · 17 · 18 · 19 . . . 22 · Next

Message boards : Number crunching : Panic Mode On (78) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.