Panic Mode On (21) Server problems

Author	Message
[B^S] madmac Volunteer tester Send message Joined: 9 Feb 04 Posts: 1175 Credit: 4,754,897 RAC: 0	Message 919668 - Posted: 20 Jul 2009, 10:53:56 UTC - in response to Message 919653. well almost the same here down to my last wu being down, four waiting to be uploaded and a few ready to report. I too would like more work. ID: 919668 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13732 Credit: 208,696,464 RAC: 304	Message 919670 - Posted: 20 Jul 2009, 11:06:33 UTC - in response to Message 919655. Make that uploaded everything, still no new WU... my GPU still has some work, but CPU is empty... would really like some more work. If you would post the relevant messages from your log it might help to figure out what's going on. My results have uploaded bit by bit over the last 12 hours or so since the upload server came back online. And after the last couple uploaded it started downloading more work, a few at a time. Grant Darwin NT ID: 919670 ·

Joseph Monk Send message Joined: 31 Mar 07 Posts: 150 Credit: 1,181,197 RAC: 0	Message 919671 - Posted: 20 Jul 2009, 11:11:28 UTC - in response to Message 919670. Make that uploaded everything, still no new WU... my GPU still has some work, but CPU is empty... would really like some more work. If you would post the relevant messages from your log it might help to figure out what's going on. My results have uploaded bit by bit over the last 12 hours or so since the upload server came back online. And after the last couple uploaded it started downloading more work, a few at a time. Mon 20 Jul 2009 08:10:17 PM KST\|SETI@home\|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 18 completed tasks Mon 20 Jul 2009 08:10:22 PM KST\|SETI@home\|Scheduler request completed: got 0 new tasks Hope that's helpful. ID: 919671 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13732 Credit: 208,696,464 RAC: 304	Message 919672 - Posted: 20 Jul 2009, 11:25:33 UTC - in response to Message 919671. What version of the BOINC manager are you using? Wild guess- when older versions (I think 6.6.28 & earlier) requested work, they didn't distinguish between CPU & GPU when working out how much work was on hand. The GPU still has enough work to keep it busy, so it's not actually requesting any new work even though the CPU is out. With a bit of luck a GPU cruncher will be along shortly & will have more of an idea of what's going on. It's way past my bed time. Time to crash. Grant Darwin NT ID: 919672 ·

Joseph Monk Send message Joined: 31 Mar 07 Posts: 150 Credit: 1,181,197 RAC: 0	Message 919673 - Posted: 20 Jul 2009, 11:29:48 UTC - in response to Message 919672. 6.4.5, the stable Linux version (6.6.* seem to not work with CUDA, it assigns two units to one GPU). ID: 919673 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 919675 - Posted: 20 Jul 2009, 11:38:14 UTC - in response to Message 919671. Make that uploaded everything, still no new WU... my GPU still has some work, but CPU is empty... would really like some more work. If you would post the relevant messages from your log it might help to figure out what's going on. My results have uploaded bit by bit over the last 12 hours or so since the upload server came back online. And after the last couple uploaded it started downloading more work, a few at a time. Mon 20 Jul 2009 08:10:17 PM KST\|SETI@home\|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 18 completed tasks Mon 20 Jul 2009 08:10:22 PM KST\|SETI@home\|Scheduler request completed: got 0 new tasks Hope that's helpful. You will not be allocated CPU work as long as you have to run BOINC v6.4.5 You have either to find out why BOINC v6.6.36 for Linux x64 allocates all CUDA work to the first of your two graphics cards, or manufacture CPU work yourself out of the CUDA work you've been allocated - the process we've christened "rebranding". To manufacture CPU work (this is for v6.4.5/7 only - users of v6.6.xx don't follow this advice): Shut down BOINC, and take whatever backup precautions you feel necessary. Find client_state.xml, and open it with a plain text editor (whatever your eqivalent of NotePad is): don't use a standard XML editor. For each task in your cache, you will see a <workunit> section and a <result> section. Both of them have the line <version_num>608</version_num> in them. You need to change that to <version_num>603</version_num> in matching workunit/result blocks. Of course, you could do that with a global search/replace, but you already have enough to keep 2 CUDA cards busy for 7.5 days each. Dumping all of that onto the CPU, even an i7, is probably going to overwhelm it: so you need to change them individually. After you've done the first three, you'll be so bored and cross-eyed that'll it'll seem easier to write a script to do it (been there, done that). Best of luck. ID: 919675 ·

[B^S] madmac Volunteer tester Send message Joined: 9 Feb 04 Posts: 1175 Credit: 4,754,897 RAC: 0	Message 919677 - Posted: 20 Jul 2009, 12:14:40 UTC I am using 6.6.20 because I do not have a Cuda device, yesterday I got no new work and also no astropulse because I did not have the correct version in my xml file. So I got it updated hopefully I will get some work either way. My new xml file still processed the MBs so I know that works so now I will just sit and wait and see ID: 919677 ·

Joseph Monk Send message Joined: 31 Mar 07 Posts: 150 Credit: 1,181,197 RAC: 0	Message 919683 - Posted: 20 Jul 2009, 13:19:41 UTC - in response to Message 919675. Make that uploaded everything, still no new WU... my GPU still has some work, but CPU is empty... would really like some more work. If you would post the relevant messages from your log it might help to figure out what's going on. My results have uploaded bit by bit over the last 12 hours or so since the upload server came back online. And after the last couple uploaded it started downloading more work, a few at a time. Mon 20 Jul 2009 08:10:17 PM KST\|SETI@home\|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 18 completed tasks Mon 20 Jul 2009 08:10:22 PM KST\|SETI@home\|Scheduler request completed: got 0 new tasks Hope that's helpful. You will not be allocated CPU work as long as you have to run BOINC v6.4.5 You have either to find out why BOINC v6.6.36 for Linux x64 allocates all CUDA work to the first of your two graphics cards, or manufacture CPU work yourself out of the CUDA work you've been allocated - the process we've christened "rebranding". To manufacture CPU work (this is for v6.4.5/7 only - users of v6.6.xx don't follow this advice): Shut down BOINC, and take whatever backup precautions you feel necessary. Find client_state.xml, and open it with a plain text editor (whatever your eqivalent of NotePad is): don't use a standard XML editor. For each task in your cache, you will see a <workunit> section and a <result> section. Both of them have the line <version_num>608</version_num> in them. You need to change that to <version_num>603</version_num> in matching workunit/result blocks. Of course, you could do that with a global search/replace, but you already have enough to keep 2 CUDA cards busy for 7.5 days each. Dumping all of that onto the CPU, even an i7, is probably going to overwhelm it: so you need to change them individually. After you've done the first three, you'll be so bored and cross-eyed that'll it'll seem easier to write a script to do it (been there, done that). Best of luck. You da man! After screwing up once and losing 25 WU I got it right (for some reason my AK_V8.. file poofed). Now the question, which I assume is no, is there anyway to tell (or make a decent guess) if a WU is a VLAR from that data? The info you gave above should be enough for me to write a perl script for 6.4.5 Linux to do it all quickly. Thanks. ID: 919683 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 919691 - Posted: 20 Jul 2009, 13:42:02 UTC - in response to Message 919683. ... Now the question, which I assume is no, is there anyway to tell (or make a decent guess) if a WU is a VLAR from that data? The info you gave above should be enough for me to write a perl script for 6.4.5 Linux to do it all quickly. Thanks. There are two answers to that. 1) In client_state.xml, in the <workunit> section for each task, there's a line <rsc_fpops_est> ... </rsc_fpops_est> (just below the <version_number> line in the one I checked) For the vast majority of VLARs, the value of <rsc_fpops_est> is exactly 80360000000000.000000 For VHAR (the 7-day deadline ones), <rsc_fpops_est> is exactly 23780000000000.000000 2) For each workunit, you can find the matching datafile in the SETI project folder, open it (again as plain text), and read the <true_angle_range> tag directly. That's hard work, and probably not necessary for your Mark One script. But Raistmer did it (in Perl) in CPU <-> GPU rebranding, so you could use his code as a starting-point. Just ignore anything relating to <plan_class> tags - that's for BOINC 6.6 users. ID: 919691 ·

Joseph Monk Send message Joined: 31 Mar 07 Posts: 150 Credit: 1,181,197 RAC: 0	Message 919692 - Posted: 20 Jul 2009, 13:45:42 UTC - in response to Message 919691. Sounds good. I'm assuming if you try to rebrand something that has already started you'll end up with a computation error, right? I'll probably have some time this weekend to play around with scripting something, if I get it working I'll be sure to share it as I know there are others using 6.4.5 on Linux with CUDA. ID: 919692 ·

Joseph Monk Send message Joined: 31 Mar 07 Posts: 150 Credit: 1,181,197 RAC: 0	Message 919693 - Posted: 20 Jul 2009, 13:47:41 UTC - in response to Message 919692. Just did a quick search for 80360000000000.000000... WOW that's a lot of them. ID: 919693 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 919694 - Posted: 20 Jul 2009, 13:49:54 UTC - in response to Message 919683. ... (for some reason my AK_V8.. file poofed) ... BOINC will tend to do that every time you run entirely out of 603 CPU tasks. It thinks you've finished with the 'outdated' application, and garbage-collects the wasted space by deleting it. One way to preserve it is to add a redundant <file_ref> for AK_V8 into the 608 <app_version> section of app_info.xml - not as the main_program, obviously: the other is to ensure that you never let your CPU cache run dry. ID: 919694 ·

Fred W Volunteer tester Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0	Message 919697 - Posted: 20 Jul 2009, 13:58:30 UTC - in response to Message 919692. Sounds good. I'm assuming if you try to rebrand something that has already started you'll end up with a computation error, right? No. If you re-brand something that has already started, it will continue from where it last checkpointed but on the CPU (mine do at any rate). However, it does tend to screw up the estimate of how fast the CPU is so sends the DCF haywire. F. ID: 919697 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 919698 - Posted: 20 Jul 2009, 14:02:32 UTC - in response to Message 919697. Sounds good. I'm assuming if you try to rebrand something that has already started you'll end up with a computation error, right? No. If you re-brand something that has already started, it will continue from where it last checkpointed but on the CPU (mine do at any rate). However, it does tend to screw up the estimate of how fast the CPU is so sends the DCF haywire. F. Hi Fred, Since he's using v6.4.5 without a <plan_class>, his DCF is going to be bouncing up and down like a ballet dancer on a trampoline anyway. Not much more to lose..... ID: 919698 ·

Fred W Volunteer tester Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0	Message 919703 - Posted: 20 Jul 2009, 14:13:43 UTC - in response to Message 919698. Hi Fred, Since he's using v6.4.5 without a <plan_class>, his DCF is going to be bouncing up and down like a ballet dancer on a trampoline anyway. Not much more to lose..... True. F. ID: 919703 ·

Joseph Monk Send message Joined: 31 Mar 07 Posts: 150 Credit: 1,181,197 RAC: 0	Message 919705 - Posted: 20 Jul 2009, 14:34:06 UTC - in response to Message 919697. Sounds good. I'm assuming if you try to rebrand something that has already started you'll end up with a computation error, right? No. If you re-brand something that has already started, it will continue from where it last checkpointed but on the CPU (mine do at any rate). However, it does tend to screw up the estimate of how fast the CPU is so sends the DCF haywire. F. Even better! I was able to make a couple quick modifications to script mentioned and it worked... then I tried the file ref thing and it didn't work, deleted all of my CUDA WU only minutes after I deleted my backups... ID: 919705 ·

Marius Volunteer tester Send message Joined: 11 Mar 00 Posts: 12 Credit: 16,655,085 RAC: 0	Message 919723 - Posted: 20 Jul 2009, 15:59:09 UTC - in response to Message 919697. Sounds good. I'm assuming if you try to rebrand something that has already started you'll end up with a computation error, right? No. If you re-brand something that has already started, it will continue from where it last checkpointed but on the CPU (mine do at any rate). However, it does tend to screw up the estimate of how fast the CPU is so sends the DCF haywire. F. AFAIK rescheduling on WU that have already been started will continue with the same application (because the application information is already in the slot subdirectory). Or is the application "shortcut" there just for compatibility reasons? Thats at least the reason why i leave the running WU alone in the reschedule tool. Greetings, Marius ID: 919723 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 919733 - Posted: 20 Jul 2009, 16:24:05 UTC - in response to Message 919615. I know that allot off users just want to donate their CPU's and nothing extra. Just keep in mind if all active users transfer only 1 dollar or euro to Seti, server issues incl. bandwidth are gone! I do not donate because one on one of these forums I was told to pay up or stop, on the basis that if I could afford to crunch I could afford to pay! Such arrogance from a regular posted prevents me from donating. I would hope that everyone understands that everything here is optional. I've often said that everyone who runs a cruncher just for SETI should donate -- to help the project feed work. ... but there is a huge difference between should and must. I would also hope that people can consider the source. Forum members are volunteers, and do not officially represent the project. I'm told that there are roughly 8,800 people who have posted to the forums at least once. If half of those kicked in $20, that'd likely fund the gigabit upgrade, or help a lot with better hardware, or both. But it's still optional, no matter what anyone says. ID: 919733 ·

OzzFan Volunteer tester Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28	Message 919746 - Posted: 20 Jul 2009, 16:42:09 UTC - in response to Message 919733. I know that allot off users just want to donate their CPU's and nothing extra. Just keep in mind if all active users transfer only 1 dollar or euro to Seti, server issues incl. bandwidth are gone! I do not donate because one on one of these forums I was told to pay up or stop, on the basis that if I could afford to crunch I could afford to pay! Such arrogance from a regular posted prevents me from donating. I would hope that everyone understands that everything here is optional. I've often said that everyone who runs a cruncher just for SETI should donate -- to help the project feed work. ... but there is a huge difference between should and must. I would also hope that people can consider the source. Forum members are volunteers, and do not officially represent the project. I'm told that there are roughly 8,800 people who have posted to the forums at least once. If half of those kicked in $20, that'd likely fund the gigabit upgrade, or help a lot with better hardware, or both. But it's still optional, no matter what anyone says. I would also hope that people would be more rational and not withold funds from SETI simply because of personality conflicts on these boards, or they read something they disliked. Or use them as excuses not to donate when they really have no intention otherwise. ID: 919746 ·

Vistro Send message Joined: 6 Aug 08 Posts: 233 Credit: 316,549 RAC: 0	Message 919749 - Posted: 20 Jul 2009, 16:45:26 UTC - in response to Message 919746. I know that allot off users just want to donate their CPU's and nothing extra. Just keep in mind if all active users transfer only 1 dollar or euro to Seti, server issues incl. bandwidth are gone! I do not donate because one on one of these forums I was told to pay up or stop, on the basis that if I could afford to crunch I could afford to pay! Such arrogance from a regular posted prevents me from donating. I would hope that everyone understands that everything here is optional. I've often said that everyone who runs a cruncher just for SETI should donate -- to help the project feed work. ... but there is a huge difference between should and must. I would also hope that people can consider the source. Forum members are volunteers, and do not officially represent the project. I'm told that there are roughly 8,800 people who have posted to the forums at least once. If half of those kicked in $20, that'd likely fund the gigabit upgrade, or help a lot with better hardware, or both. But it's still optional, no matter what anyone says. I would also hope that people would be more rational and not withold funds from SETI simply because of personality conflicts on these boards, or they read something they disliked. Or use them as excuses not to donate when they really have no intention otherwise. I really do want to donate, but I can't for a few reasons: 1. Out of money :( 2. I don't make the cash decisions in my house 3. My money would just go to a salary, I really can't force mine to go into a pool for the gigabit link. Now if 3 can be fixed, then YES, all of us donating a dollar might work, because if we don't raise enough the first, time, we all toss in two more, or something. IDK, just thinking aloud. ID: 919749 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.