Panic Mode On (38) Server problems

Message boards : Number crunching : Panic Mode On (38) Server problems
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 10 · Next

AuthorMessage
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1034445 - Posted: 19 Sep 2010, 11:56:25 UTC - in response to Message 1034388.  

..and had to be emailed to Dr. A for manual upload.

When David is asking for such a file to be sent to him, he won't manually upload the contents of said file into the database or anything. He'll be using it to test with what happens to it - checking the backend, see what it does - when he tries to send such a file.

At the end of the day, the user that has the problem will still need to report all those tasks himself; or at least, his BOINC needs to do so.
ID: 1034445 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1034482 - Posted: 19 Sep 2010, 14:31:42 UTC - in response to Message 1034352.  

Yes, limits of 40/320 are in place, but im wondering more about this:

SETI@home 19.09.2010 08:38:50 Scheduler request completed: got 58 new tasks

Thought, there was also a limit of about 20 per request? Seems, this doesn't apply any more.

Yes, the project's config.xml used to have max_wus_to_send set to 20, and that used to be applied directly. But changeset [trac]changeset:18255[/trac] revised it so the number is now scaled by number of CPUs and GPUs (*gpu_multiplier). The 20 is probably still there, but is being treated as 20 per CPU and 160 per GPU.

There are only 100 feeder slots so that's the absolute maximum which could be sent for one request, and for new work there are pairs of tasks for each WU. Getting more than 50 means some are unpaired reissues (_2 and up).
                                                                Joe
ID: 1034482 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13947
Credit: 208,696,464
RAC: 304
Australia
Message 1034549 - Posted: 19 Sep 2010, 18:47:10 UTC


Probably time to up the server side limits.
Network traffic has finally eased off (apart from the AP bursts), but the inbound traffic has been steadily climbing- most likely from hosts that have reached the Server Side limit & still trying for work.
Grant
Darwin NT
ID: 1034549 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51540
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1034550 - Posted: 19 Sep 2010, 18:48:59 UTC - in response to Message 1034549.  


Probably time to up the server side limits.
Network traffic has finally eased off (apart from the AP bursts), but the inbound traffic has been steadily climbing- most likely from hosts that have reached the Server Side limit & still trying for work.

I said that hours ago before I passed out, my friend.

"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1034550 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1034644 - Posted: 20 Sep 2010, 0:20:44 UTC

I remember the first time I noticed I could get more than 20 tasks in one request. It was nice when I needed to get 150-225 tasks to fill the cache back when I was doing MB work. Used to take a ton of requests, but then it would only take 4-6 requests instead of 10+.

Presently, my record for most APs in one request is 23. Just about all of the requests are 1-3 though, but every now and then, I can get 5-15 at a time.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1034644 · Report as offensive
Blake Bonkofsky
Volunteer tester
Avatar

Send message
Joined: 29 Dec 99
Posts: 617
Credit: 46,383,149
RAC: 0
United States
Message 1034647 - Posted: 20 Sep 2010, 0:31:13 UTC

Looks like the limits have been raised up, as I have picked up 250ish WU's within the last few hours. I had been banging off of the limit all weekend.
ID: 1034647 · Report as offensive
Terror Australis
Volunteer tester

Send message
Joined: 14 Feb 04
Posts: 1817
Credit: 262,693,308
RAC: 44
Australia
Message 1034713 - Posted: 20 Sep 2010, 3:09:30 UTC
Last modified: 20 Sep 2010, 3:09:55 UTC

New message from the server !!

20/09/2010 12:21:27 SETI@home Message from server: Resent lost task 24ap10ae.5264.1058.11.10.50_2

Looks like some thing has finally been done about the ghost problem - Yeehaa

T.A.
ID: 1034713 · Report as offensive
Profile Zeus Fab3r
Avatar

Send message
Joined: 17 Jan 01
Posts: 649
Credit: 275,335,635
RAC: 597
Serbia
Message 1034717 - Posted: 20 Sep 2010, 3:14:53 UTC - in response to Message 1034713.  
Last modified: 20 Sep 2010, 4:01:41 UTC

YES indeed !!!!!!!

20-Sep-10 05:12:27	SETI@home	Scheduler request completed: got 20 new tasks
20-Sep-10 05:12:27	SETI@home	[sched_op_debug] Server version 611
20-Sep-10 05:12:27	SETI@home	Message from server: Resent lost task D%my10adep23.24198.12.10.222_2
20-Sep-10 05:12:27	SETI@home	Message from server: Resent lost task 31dc09af.17708.8411.12.10.90_3
20-Sep-10 05:12:27	SETI@home	Message from server: Resent lost task 31dc09af.17708.8411.12.10.102_2


Edit: But something is very wrong... I was tracking 26 ghosts after scheduler request timeout when I saw that message, but instead of those 26 I just got 20 tasks that are not on my rig's list at all ?! So, will I crunch someone else's wu's in the near future?

Edit 2: Things are going from bad to worse. I just got a bunch of fresh .vlars on my GPU ;-< Task 1712880409

Who the hell is General Failure and why is he reading my harddisk?¿
ID: 1034717 · Report as offensive
Profile Dirk Sadowski
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1034829 - Posted: 20 Sep 2010, 13:45:44 UTC - in response to Message 1034717.  
Last modified: 20 Sep 2010, 13:47:08 UTC

YES indeed !!!!!!!

20-Sep-10 05:12:27	SETI@home	Scheduler request completed: got 20 new tasks
20-Sep-10 05:12:27	SETI@home	[sched_op_debug] Server version 611
20-Sep-10 05:12:27	SETI@home	Message from server: Resent lost task D%my10adep23.24198.12.10.222_2
20-Sep-10 05:12:27	SETI@home	Message from server: Resent lost task 31dc09af.17708.8411.12.10.90_3
20-Sep-10 05:12:27	SETI@home	Message from server: Resent lost task 31dc09af.17708.8411.12.10.102_2


Edit: But something is very wrong... I was tracking 26 ghosts after scheduler request timeout when I saw that message, but instead of those 26 I just got 20 tasks that are not on my rig's list at all ?! So, will I crunch someone else's wu's in the near future?

Edit 2: Things are going from bad to worse. I just got a bunch of fresh .vlars on my GPU ;-< Task 1712880409


Ohh.. well.. - a new feature enabled and (unintentional) in the same time an other old feature disabled.. :-(
ID: 1034829 · Report as offensive
Profile Helli_retiered
Volunteer tester
Avatar

Send message
Joined: 15 Dec 99
Posts: 707
Credit: 108,785,585
RAC: 0
Germany
Message 1034831 - Posted: 20 Sep 2010, 13:56:16 UTC - in response to Message 1034647.  

Looks like the limits have been raised up, as I have picked up 250ish WU's within the last few hours. I had been banging off of the limit all weekend.


Yes, fine. But as always for me - as a Power Cruncher - very late. I'm not
sure that i can download >4,000 workunits in the next 24 Hours. :-|

Helli
A loooong time ago: First Credits after SETI@home Restart
ID: 1034831 · Report as offensive
Profile SciManStev Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 6662
Credit: 121,090,076
RAC: 0
United States
Message 1034848 - Posted: 20 Sep 2010, 14:48:46 UTC - in response to Message 1034831.  

Looks like the limits have been raised up, as I have picked up 250ish WU's within the last few hours. I had been banging off of the limit all weekend.


Yes, fine. But as always for me - as a Power Cruncher - very late. I'm not
sure that i can download >4,000 workunits in the next 24 Hours. :-|

Helli


I have that problem too. It is very difficult to download a lot of work at what turns out to be the last minute. You really have to stay with your rig and baby sit it. I hope the new server will allow things to run with higher limits earlier on.

Steve
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 1034848 · Report as offensive
Profile Helli_retiered
Volunteer tester
Avatar

Send message
Joined: 15 Dec 99
Posts: 707
Credit: 108,785,585
RAC: 0
Germany
Message 1034850 - Posted: 20 Sep 2010, 14:54:11 UTC - in response to Message 1034848.  
Last modified: 20 Sep 2010, 14:54:33 UTC

You said it right, Steve. I sit here (maybe the next 4 hours) and push the Retry-Button because
i can't wait 11:32:45 Hours for the next automatic connect. ;-)

Helli
A loooong time ago: First Credits after SETI@home Restart
ID: 1034850 · Report as offensive
Profile Dirk Sadowski
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1034865 - Posted: 20 Sep 2010, 15:41:37 UTC - in response to Message 1034850.  

My machines will DL to tomorrow to the outage.. and with bad luck during the outage they will have a lot of backlogged DLs..

I have only DSL light (384/64 kbit/s).. :-(

For example, my GPU machine need at least 5 hours after every 3 day outage to UL all results of 3 days.. and during this time no work request..

After the 940 BE made the UL, then I enable the network for the E7600 machine..

Everyone have his own specially problems.. ;-)

ID: 1034865 · Report as offensive
Profile Helli_retiered
Volunteer tester
Avatar

Send message
Joined: 15 Dec 99
Posts: 707
Credit: 108,785,585
RAC: 0
Germany
Message 1034872 - Posted: 20 Sep 2010, 16:09:15 UTC - in response to Message 1034865.  
Last modified: 20 Sep 2010, 16:10:03 UTC

...Everyone have his own specially problems.. ;-)


Yupp.

And i don't think that Oscar makes any Difference on this Situation.
IMHO the 100MBit Line in Combination with this 3-Day-in-7-Days Outage
is the Bottleneck. Maybe a 5-Day-in-14-Days Outage would be a better Choice for us.

Helli
A loooong time ago: First Credits after SETI@home Restart
ID: 1034872 · Report as offensive
Profile SciManStev Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 6662
Credit: 121,090,076
RAC: 0
United States
Message 1034887 - Posted: 20 Sep 2010, 16:42:49 UTC - in response to Message 1034872.  

...Everyone have his own specially problems.. ;-)


Yupp.

And i don't think that Oscar makes any Difference on this Situation.
IMHO the 100MBit Line in Combination with this 3-Day-in-7-Days Outage
is the Bottleneck. Maybe a 5-Day-in-14-Days Outage would be a better Choice for us.

Helli


I was thinking that after they shuffle the servers around, and add Oscar, the need for an extended down time might be some what aleviated. I am thinking that 96 Gig of RAM can process a whole lot of science. I am most likely wrong, but if the outage is so NTPC can work while not being bombarded with new files, then the ability to handle the files and do science may reduce the need for down time. Actually, I don't really know.

Steve
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 1034887 · Report as offensive
Profile X-Files 27
Avatar

Send message
Joined: 17 May 99
Posts: 104
Credit: 111,191,433
RAC: 0
Canada
Message 1034888 - Posted: 20 Sep 2010, 16:43:21 UTC

2 parts is broken:

1st) .VLAR are being sent to GPU

2nd) Prefs of "Use CPU" = No, are getting cpu tasks.
ID: 1034888 · Report as offensive
Profile Helli_retiered
Volunteer tester
Avatar

Send message
Joined: 15 Dec 99
Posts: 707
Credit: 108,785,585
RAC: 0
Germany
Message 1034896 - Posted: 20 Sep 2010, 17:20:10 UTC - in response to Message 1034887.  
Last modified: 20 Sep 2010, 17:39:18 UTC



I was thinking that after they shuffle the servers around, and add Oscar, the need for an extended down time might be some what aleviated.
....


If this would be possible, our (Special-)Donations would be a good Investment - for us also.


[Edit:]

Woah..

Not bad man, not bad.

I had luck and found a hole in the wall and downloaded >1,800 Workunits in the past 40 Minutes. Yiippieh..


Helli
A loooong time ago: First Credits after SETI@home Restart
ID: 1034896 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51540
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1034912 - Posted: 20 Sep 2010, 17:46:08 UTC - in response to Message 1034887.  

...Everyone have his own specially problems.. ;-)


Yupp.

And i don't think that Oscar makes any Difference on this Situation.
IMHO the 100MBit Line in Combination with this 3-Day-in-7-Days Outage
is the Bottleneck. Maybe a 5-Day-in-14-Days Outage would be a better Choice for us.

Helli


I was thinking that after they shuffle the servers around, and add Oscar, the need for an extended down time might be some what aleviated. I am thinking that 96 Gig of RAM can process a whole lot of science. I am most likely wrong, but if the outage is so NTPC can work while not being bombarded with new files, then the ability to handle the files and do science may reduce the need for down time. Actually, I don't really know.

Steve

Oscar should be quite a beast after the generous response I got in my donation thread.....
It should be interesting to see if those in charge of actually setting the specs agree with the upgrades I proposed....especially the CPU upgrade to add more cores and speed and allow the RAM to run at it's rated speed. That should all add up to a nice increase in processing power.

More bandwidth would be nice at some point, but the weakest link has been the processing power of the server base. With mighty Oscar taking over some of the chores, the tasks running on the other servers might be spread around a bit, and Matt intends to retire some of the less reliable ones. Of course, if their load is lessened, that might not have to happen. He would be the judge of that.

If the servers can hold up their end, we can saturate the bandwidth like we have since coming back up last Thursday. And if the downtime is 3 days or less in the future, I think we can get by nicely with what is on tap right now.

"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1034912 · Report as offensive
Profile James Sotherden
Avatar

Send message
Joined: 16 May 99
Posts: 10436
Credit: 110,373,059
RAC: 54
United States
Message 1034923 - Posted: 20 Sep 2010, 18:00:24 UTC

Ive been jumping all over the threads, so I cant find wich one said that they have the new line run up the hill. I take it that it hasnt been connected yet?
If not any idea when it will be?

Oscar and a new line, should be a WOW moment right there.
[/quote]

Old James
ID: 1034923 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51540
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1034928 - Posted: 20 Sep 2010, 18:08:33 UTC - in response to Message 1034923.  

Ive been jumping all over the threads, so I cant find wich one said that they have the new line run up the hill. I take it that it hasnt been connected yet?
If not any idea when it will be?

Oscar and a new line, should be a WOW moment right there.

I could be wrong, but if I recall correctly, the discussion was that the new fiber link was NOT going to increase Seti's bandwidth.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1034928 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 10 · Next

Message boards : Number crunching : Panic Mode On (38) Server problems


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.