Panic Mode On (21) Server problems

Message boards : Number crunching : Panic Mode On (21) Server problems
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 12 · Next

AuthorMessage
Profile [B^S] madmac
Volunteer tester
Avatar

Send message
Joined: 9 Feb 04
Posts: 1175
Credit: 4,754,897
RAC: 0
United Kingdom
Message 919668 - Posted: 20 Jul 2009, 10:53:56 UTC - in response to Message 919653.  

well almost the same here down to my last wu being down, four waiting to be uploaded and a few ready to report. I too would like more work.
ID: 919668 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13732
Credit: 208,696,464
RAC: 304
Australia
Message 919670 - Posted: 20 Jul 2009, 11:06:33 UTC - in response to Message 919655.  

Make that uploaded everything, still no new WU... my GPU still has some work, but CPU is empty... would really like some more work.

If you would post the relevant messages from your log it might help to figure out what's going on.
My results have uploaded bit by bit over the last 12 hours or so since the upload server came back online. And after the last couple uploaded it started downloading more work, a few at a time.
Grant
Darwin NT
ID: 919670 · Report as offensive
Joseph Monk

Send message
Joined: 31 Mar 07
Posts: 150
Credit: 1,181,197
RAC: 0
Korea, South
Message 919671 - Posted: 20 Jul 2009, 11:11:28 UTC - in response to Message 919670.  

Make that uploaded everything, still no new WU... my GPU still has some work, but CPU is empty... would really like some more work.

If you would post the relevant messages from your log it might help to figure out what's going on.
My results have uploaded bit by bit over the last 12 hours or so since the upload server came back online. And after the last couple uploaded it started downloading more work, a few at a time.


Mon 20 Jul 2009 08:10:17 PM KST|SETI@home|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 18 completed tasks
Mon 20 Jul 2009 08:10:22 PM KST|SETI@home|Scheduler request completed: got 0 new tasks


Hope that's helpful.
ID: 919671 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13732
Credit: 208,696,464
RAC: 304
Australia
Message 919672 - Posted: 20 Jul 2009, 11:25:33 UTC - in response to Message 919671.  


What version of the BOINC manager are you using?
Wild guess- when older versions (I think 6.6.28 & earlier) requested work, they didn't distinguish between CPU & GPU when working out how much work was on hand.
The GPU still has enough work to keep it busy, so it's not actually requesting any new work even though the CPU is out.

With a bit of luck a GPU cruncher will be along shortly & will have more of an idea of what's going on.
It's way past my bed time. Time to crash.
Grant
Darwin NT
ID: 919672 · Report as offensive
Joseph Monk

Send message
Joined: 31 Mar 07
Posts: 150
Credit: 1,181,197
RAC: 0
Korea, South
Message 919673 - Posted: 20 Jul 2009, 11:29:48 UTC - in response to Message 919672.  

6.4.5, the stable Linux version (6.6.* seem to not work with CUDA, it assigns two units to one GPU).
ID: 919673 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 919675 - Posted: 20 Jul 2009, 11:38:14 UTC - in response to Message 919671.  

Make that uploaded everything, still no new WU... my GPU still has some work, but CPU is empty... would really like some more work.

If you would post the relevant messages from your log it might help to figure out what's going on.
My results have uploaded bit by bit over the last 12 hours or so since the upload server came back online. And after the last couple uploaded it started downloading more work, a few at a time.


Mon 20 Jul 2009 08:10:17 PM KST|SETI@home|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 18 completed tasks
Mon 20 Jul 2009 08:10:22 PM KST|SETI@home|Scheduler request completed: got 0 new tasks

Hope that's helpful.

You will not be allocated CPU work as long as you have to run BOINC v6.4.5

You have either to find out why BOINC v6.6.36 for Linux x64 allocates all CUDA work to the first of your two graphics cards, or manufacture CPU work yourself out of the CUDA work you've been allocated - the process we've christened "rebranding".

To manufacture CPU work (this is for v6.4.5/7 only - users of v6.6.xx don't follow this advice):

Shut down BOINC, and take whatever backup precautions you feel necessary. Find client_state.xml, and open it with a plain text editor (whatever your eqivalent of NotePad is): don't use a standard XML editor.

For each task in your cache, you will see a <workunit> section and a <result> section. Both of them have the line

<version_num>608</version_num>

in them. You need to change that to

<version_num>603</version_num>

in matching workunit/result blocks. Of course, you could do that with a global search/replace, but you already have enough to keep 2 CUDA cards busy for 7.5 days each. Dumping all of that onto the CPU, even an i7, is probably going to overwhelm it: so you need to change them individually. After you've done the first three, you'll be so bored and cross-eyed that'll it'll seem easier to write a script to do it (been there, done that). Best of luck.
ID: 919675 · Report as offensive
Profile [B^S] madmac
Volunteer tester
Avatar

Send message
Joined: 9 Feb 04
Posts: 1175
Credit: 4,754,897
RAC: 0
United Kingdom
Message 919677 - Posted: 20 Jul 2009, 12:14:40 UTC

I am using 6.6.20 because I do not have a Cuda device, yesterday I got no new work and also no astropulse because I did not have the correct version in my xml file. So I got it updated hopefully I will get some work either way. My new xml file still processed the MBs so I know that works so now I will just sit and wait and see
ID: 919677 · Report as offensive
Joseph Monk

Send message
Joined: 31 Mar 07
Posts: 150
Credit: 1,181,197
RAC: 0
Korea, South
Message 919683 - Posted: 20 Jul 2009, 13:19:41 UTC - in response to Message 919675.  

Make that uploaded everything, still no new WU... my GPU still has some work, but CPU is empty... would really like some more work.

If you would post the relevant messages from your log it might help to figure out what's going on.
My results have uploaded bit by bit over the last 12 hours or so since the upload server came back online. And after the last couple uploaded it started downloading more work, a few at a time.


Mon 20 Jul 2009 08:10:17 PM KST|SETI@home|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 18 completed tasks
Mon 20 Jul 2009 08:10:22 PM KST|SETI@home|Scheduler request completed: got 0 new tasks

Hope that's helpful.

You will not be allocated CPU work as long as you have to run BOINC v6.4.5

You have either to find out why BOINC v6.6.36 for Linux x64 allocates all CUDA work to the first of your two graphics cards, or manufacture CPU work yourself out of the CUDA work you've been allocated - the process we've christened "rebranding".

To manufacture CPU work (this is for v6.4.5/7 only - users of v6.6.xx don't follow this advice):

Shut down BOINC, and take whatever backup precautions you feel necessary. Find client_state.xml, and open it with a plain text editor (whatever your eqivalent of NotePad is): don't use a standard XML editor.

For each task in your cache, you will see a <workunit> section and a <result> section. Both of them have the line

<version_num>608</version_num>

in them. You need to change that to

<version_num>603</version_num>

in matching workunit/result blocks. Of course, you could do that with a global search/replace, but you already have enough to keep 2 CUDA cards busy for 7.5 days each. Dumping all of that onto the CPU, even an i7, is probably going to overwhelm it: so you need to change them individually. After you've done the first three, you'll be so bored and cross-eyed that'll it'll seem easier to write a script to do it (been there, done that). Best of luck.


You da man! After screwing up once and losing 25 WU I got it right (for some reason my AK_V8.. file poofed). Now the question, which I assume is no, is there anyway to tell (or make a decent guess) if a WU is a VLAR from that data?

The info you gave above should be enough for me to write a perl script for 6.4.5 Linux to do it all quickly.

Thanks.
ID: 919683 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 919691 - Posted: 20 Jul 2009, 13:42:02 UTC - in response to Message 919683.  

... Now the question, which I assume is no, is there anyway to tell (or make a decent guess) if a WU is a VLAR from that data?

The info you gave above should be enough for me to write a perl script for 6.4.5 Linux to do it all quickly.

Thanks.

There are two answers to that.

1) In client_state.xml, in the <workunit> section for each task, there's a line

<rsc_fpops_est> ... </rsc_fpops_est>

(just below the <version_number> line in the one I checked)

For the vast majority of VLARs, the value of <rsc_fpops_est> is exactly 80360000000000.000000

For VHAR (the 7-day deadline ones), <rsc_fpops_est> is exactly 23780000000000.000000

2) For each workunit, you can find the matching datafile in the SETI project folder, open it (again as plain text), and read the <true_angle_range> tag directly. That's hard work, and probably not necessary for your Mark One script. But Raistmer did it (in Perl) in CPU <-> GPU rebranding, so you could use his code as a starting-point. Just ignore anything relating to <plan_class> tags - that's for BOINC 6.6 users.
ID: 919691 · Report as offensive
Joseph Monk

Send message
Joined: 31 Mar 07
Posts: 150
Credit: 1,181,197
RAC: 0
Korea, South
Message 919692 - Posted: 20 Jul 2009, 13:45:42 UTC - in response to Message 919691.  

Sounds good. I'm assuming if you try to rebrand something that has already started you'll end up with a computation error, right?

I'll probably have some time this weekend to play around with scripting something, if I get it working I'll be sure to share it as I know there are others using 6.4.5 on Linux with CUDA.
ID: 919692 · Report as offensive
Joseph Monk

Send message
Joined: 31 Mar 07
Posts: 150
Credit: 1,181,197
RAC: 0
Korea, South
Message 919693 - Posted: 20 Jul 2009, 13:47:41 UTC - in response to Message 919692.  

Just did a quick search for 80360000000000.000000... WOW that's a lot of them.
ID: 919693 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 919694 - Posted: 20 Jul 2009, 13:49:54 UTC - in response to Message 919683.  

... (for some reason my AK_V8.. file poofed) ...

BOINC will tend to do that every time you run entirely out of 603 CPU tasks. It thinks you've finished with the 'outdated' application, and garbage-collects the wasted space by deleting it.

One way to preserve it is to add a redundant <file_ref> for AK_V8 into the 608 <app_version> section of app_info.xml - not as the main_program, obviously: the other is to ensure that you never let your CPU cache run dry.
ID: 919694 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 919697 - Posted: 20 Jul 2009, 13:58:30 UTC - in response to Message 919692.  

Sounds good. I'm assuming if you try to rebrand something that has already started you'll end up with a computation error, right?

No. If you re-brand something that has already started, it will continue from where it last checkpointed but on the CPU (mine do at any rate). However, it does tend to screw up the estimate of how fast the CPU is so sends the DCF haywire.

F.
ID: 919697 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 919698 - Posted: 20 Jul 2009, 14:02:32 UTC - in response to Message 919697.  

Sounds good. I'm assuming if you try to rebrand something that has already started you'll end up with a computation error, right?

No. If you re-brand something that has already started, it will continue from where it last checkpointed but on the CPU (mine do at any rate). However, it does tend to screw up the estimate of how fast the CPU is so sends the DCF haywire.

F.

Hi Fred,

Since he's using v6.4.5 without a <plan_class>, his DCF is going to be bouncing up and down like a ballet dancer on a trampoline anyway. Not much more to lose.....
ID: 919698 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 919703 - Posted: 20 Jul 2009, 14:13:43 UTC - in response to Message 919698.  

Hi Fred,

Since he's using v6.4.5 without a <plan_class>, his DCF is going to be bouncing up and down like a ballet dancer on a trampoline anyway. Not much more to lose.....

True.

F.
ID: 919703 · Report as offensive
Joseph Monk

Send message
Joined: 31 Mar 07
Posts: 150
Credit: 1,181,197
RAC: 0
Korea, South
Message 919705 - Posted: 20 Jul 2009, 14:34:06 UTC - in response to Message 919697.  

Sounds good. I'm assuming if you try to rebrand something that has already started you'll end up with a computation error, right?

No. If you re-brand something that has already started, it will continue from where it last checkpointed but on the CPU (mine do at any rate). However, it does tend to screw up the estimate of how fast the CPU is so sends the DCF haywire.

F.


Even better! I was able to make a couple quick modifications to script mentioned and it worked... then I tried the file ref thing and it didn't work, deleted all of my CUDA WU only minutes after I deleted my backups...
ID: 919705 · Report as offensive
Marius
Volunteer tester

Send message
Joined: 11 Mar 00
Posts: 12
Credit: 16,655,085
RAC: 0
Netherlands
Message 919723 - Posted: 20 Jul 2009, 15:59:09 UTC - in response to Message 919697.  

Sounds good. I'm assuming if you try to rebrand something that has already started you'll end up with a computation error, right?

No. If you re-brand something that has already started, it will continue from where it last checkpointed but on the CPU (mine do at any rate). However, it does tend to screw up the estimate of how fast the CPU is so sends the DCF haywire.

F.


AFAIK rescheduling on WU that have already been started will continue with the same application (because the application information is already in the slot subdirectory). Or is the application "shortcut" there just for compatibility reasons? Thats at least the reason why i leave the running WU alone in the reschedule tool.

Greetings,
Marius
ID: 919723 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 919733 - Posted: 20 Jul 2009, 16:24:05 UTC - in response to Message 919615.  

I know that allot off users just want to donate their CPU's and nothing extra. Just keep in mind if all active users transfer only 1 dollar or euro to Seti, server issues incl. bandwidth are gone!


I do not donate because one on one of these forums I was told to pay up or stop, on the basis that if I could afford to crunch I could afford to pay!

Such arrogance from a regular posted prevents me from donating.

I would hope that everyone understands that everything here is optional.

I've often said that everyone who runs a cruncher just for SETI should donate -- to help the project feed work.

... but there is a huge difference between should and must.

I would also hope that people can consider the source. Forum members are volunteers, and do not officially represent the project.

I'm told that there are roughly 8,800 people who have posted to the forums at least once. If half of those kicked in $20, that'd likely fund the gigabit upgrade, or help a lot with better hardware, or both.

But it's still optional, no matter what anyone says.
ID: 919733 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 919746 - Posted: 20 Jul 2009, 16:42:09 UTC - in response to Message 919733.  

I know that allot off users just want to donate their CPU's and nothing extra. Just keep in mind if all active users transfer only 1 dollar or euro to Seti, server issues incl. bandwidth are gone!


I do not donate because one on one of these forums I was told to pay up or stop, on the basis that if I could afford to crunch I could afford to pay!

Such arrogance from a regular posted prevents me from donating.

I would hope that everyone understands that everything here is optional.

I've often said that everyone who runs a cruncher just for SETI should donate -- to help the project feed work.

... but there is a huge difference between should and must.

I would also hope that people can consider the source. Forum members are volunteers, and do not officially represent the project.

I'm told that there are roughly 8,800 people who have posted to the forums at least once. If half of those kicked in $20, that'd likely fund the gigabit upgrade, or help a lot with better hardware, or both.

But it's still optional, no matter what anyone says.


I would also hope that people would be more rational and not withold funds from SETI simply because of personality conflicts on these boards, or they read something they disliked. Or use them as excuses not to donate when they really have no intention otherwise.
ID: 919746 · Report as offensive
Profile Vistro
Avatar

Send message
Joined: 6 Aug 08
Posts: 233
Credit: 316,549
RAC: 0
United States
Message 919749 - Posted: 20 Jul 2009, 16:45:26 UTC - in response to Message 919746.  

I know that allot off users just want to donate their CPU's and nothing extra. Just keep in mind if all active users transfer only 1 dollar or euro to Seti, server issues incl. bandwidth are gone!


I do not donate because one on one of these forums I was told to pay up or stop, on the basis that if I could afford to crunch I could afford to pay!

Such arrogance from a regular posted prevents me from donating.

I would hope that everyone understands that everything here is optional.

I've often said that everyone who runs a cruncher just for SETI should donate -- to help the project feed work.

... but there is a huge difference between should and must.

I would also hope that people can consider the source. Forum members are volunteers, and do not officially represent the project.

I'm told that there are roughly 8,800 people who have posted to the forums at least once. If half of those kicked in $20, that'd likely fund the gigabit upgrade, or help a lot with better hardware, or both.

But it's still optional, no matter what anyone says.


I would also hope that people would be more rational and not withold funds from SETI simply because of personality conflicts on these boards, or they read something they disliked. Or use them as excuses not to donate when they really have no intention otherwise.


I really do want to donate, but I can't for a few reasons:

1. Out of money :(

2. I don't make the cash decisions in my house

3. My money would just go to a salary, I really can't force mine to go into a pool for the gigabit link.

Now if 3 can be fixed, then YES, all of us donating a dollar might work, because if we don't raise enough the first, time, we all toss in two more, or something. IDK, just thinking aloud.
ID: 919749 · Report as offensive
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 12 · Next

Message boards : Number crunching : Panic Mode On (21) Server problems


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.