Panic Mode On (90) Server Problems?

Message boards : Number crunching : Panic Mode On (90) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 16 · 17 · 18 · 19 · 20 · 21 · 22 . . . 24 · Next

AuthorMessage
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1585365 - Posted: 11 Oct 2014, 18:07:07 UTC - in response to Message 1585348.  

Richard stated that the new installer may be released in the next 48 hours or so.


Indeed good news, I will work on getting my old workstation back on line and happily crunch S@H 7 until the release. I really didn't want to edit code, it gives me a headache.


It is not editing code at all, you drag and drop the new app, then double click on the aimerge.cmd.

Then start BOINC back up.

ID: 1585365 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1585489 - Posted: 11 Oct 2014, 22:04:56 UTC - in response to Message 1585365.  
Last modified: 11 Oct 2014, 22:07:19 UTC

Looks like the AP validator backlog has finally cleared, so all the AP crunchers should all have received their RAC bonus by now.
AP results- 44,000 in progress, 77,000 awaiting validation (down from a peak of 327,000).
Grant
Darwin NT
ID: 1585489 · Report as offensive
Profile Dimly Lit Lightbulb 😀
Volunteer tester
Avatar

Send message
Joined: 30 Aug 08
Posts: 15399
Credit: 7,423,413
RAC: 1
United Kingdom
Message 1585496 - Posted: 11 Oct 2014, 22:12:24 UTC - in response to Message 1585489.  

Looks like the AP validator backlog has finally cleared, so all the AP crunchers should all have received their RAC bonus by now.
AP results- 44,000 in progress, 77,000 awaiting validation (down from a peak of 327,000).

Yep, all the ones that I had waiting are done and today have received a load of re-sends. I've got _2s, a couple of _4s and a _5 :)

Member of the People Encouraging Niceness In Society club.

ID: 1585496 · Report as offensive
Profile JaundicedEye
Avatar

Send message
Joined: 14 Mar 12
Posts: 5375
Credit: 30,870,693
RAC: 1
United States
Message 1585622 - Posted: 12 Oct 2014, 4:28:01 UTC - in response to Message 1585365.  

It is not editing code at all, you drag and drop the new app, then double click on the aimerge.cmd.

Then start BOINC back up.


Thanks Arkayn, I didn't think it could be that simple. Followed the instructions and it worked the first time. Now getting AP 7 work.
ID: 1585622 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1585714 - Posted: 12 Oct 2014, 9:25:20 UTC - in response to Message 1585622.  

It is not editing code at all, you drag and drop the new app, then double click on the aimerge.cmd.

Then start BOINC back up.


Thanks Arkayn, I didn't think it could be that simple. Followed the instructions and it worked the first time. Now getting AP 7 work.

Be warned that you will lose the tasks in progress if/when you run the next Lunatics installer:
http://setiathome.berkeley.edu/forum_thread.php?id=75810&postid=1585689#1585689

The .aistub uses <version_num>701</version_num> and the Lunatics installer will use what is appropriate (704, 705, ...) from the stock list:
http://setiathome.berkeley.edu/apps.php
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1585714 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1585802 - Posted: 12 Oct 2014, 15:12:06 UTC

Hey, I've only just noticed that we have 2014 data on the server status page, and we're finally splitting the last of the 2013 tapes loaded back in April. More work for me at the end of the month :)
ID: 1585802 · Report as offensive
Profile JaundicedEye
Avatar

Send message
Joined: 14 Mar 12
Posts: 5375
Credit: 30,870,693
RAC: 1
United States
Message 1585818 - Posted: 12 Oct 2014, 16:10:04 UTC

I just checked the machine actually processing AP 7 and found repeated instances of "exited with zero status but no 'finished' file". Following is my initialization log and the first few "exits". I have suspended work and would appreciate any help available.

10/12/2014 6:52:54 AM | | cc_config.xml not found - using defaults
10/12/2014 6:52:54 AM | | Starting BOINC client version 7.2.42 for windows_x86_64
10/12/2014 6:52:54 AM | | log flags: file_xfer, sched_ops, task
10/12/2014 6:52:54 AM | | Libraries: libcurl/7.25.0 OpenSSL/1.0.1 zlib/1.2.6
10/12/2014 6:52:54 AM | | Data directory: D:\boinc
10/12/2014 6:52:54 AM | | Running under account Don
10/12/2014 6:52:54 AM | | CUDA: NVIDIA GPU 0: GeForce GTX 750 Ti (driver version 344.11, CUDA version 6.5, compute capability 5.0, 2048MB, 1915MB available, 2208 GFLOPS peak)
10/12/2014 6:52:54 AM | | OpenCL: NVIDIA GPU 0: GeForce GTX 750 Ti (driver version 344.11, device version OpenCL 1.1 CUDA, 2048MB, 1915MB available, 2208 GFLOPS peak)
10/12/2014 6:52:54 AM | | OpenCL CPU: Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz (OpenCL driver vendor: Intel(R) Corporation, driver version 3.0.1.10878, device version OpenCL 1.2 (Build 76413))
10/12/2014 6:52:54 AM | SETI@home | Found app_info.xml; using anonymous platform
10/12/2014 6:52:54 AM | | Host name: CAPNCRUNCH
10/12/2014 6:52:54 AM | | Processor: 8 GenuineIntel Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz [Family 6 Model 60 Stepping 3]
10/12/2014 6:52:54 AM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes syscall nx lm vmx tm2 pbe
10/12/2014 6:52:54 AM | | OS: Microsoft Windows 8.1: Professional x64 Edition, (06.03.9600.00)
10/12/2014 6:52:54 AM | | Memory: 15.84 GB physical, 18.22 GB virtual
10/12/2014 6:52:54 AM | | Disk: 1.82 TB total, 1.38 TB free
10/12/2014 6:52:54 AM | | Local time is UTC -6 hours
10/12/2014 6:52:54 AM | | VirtualBox version: 4.2.16
10/12/2014 6:52:54 AM | SETI@home | URL http://setiathome.berkeley.edu/; Computer ID 7246450; resource share 100
10/12/2014 6:52:54 AM | SETI@home | General prefs: from SETI@home (last modified 18-Sep-2014 09:00:07)
10/12/2014 6:52:54 AM | SETI@home | Computer location: home
10/12/2014 6:52:54 AM | | General prefs: using separate prefs for home
10/12/2014 6:52:54 AM | | Reading preferences override file
10/12/2014 6:52:54 AM | | Preferences:
10/12/2014 6:52:54 AM | | max memory usage when active: 8110.59MB
10/12/2014 6:52:54 AM | | max memory usage when idle: 14599.06MB
10/12/2014 6:52:54 AM | | max disk usage: 500.00GB
10/12/2014 6:52:54 AM | | max download rate: 2000005 bytes/sec
10/12/2014 6:52:54 AM | | max upload rate: 499999 bytes/sec
10/12/2014 6:52:54 AM | | (to change preferences, visit a project web site or select Preferences in the Manager)
10/12/2014 6:52:54 AM | | Not using a proxy
10/12/2014 6:52:58 AM | SETI@home | Sending scheduler request: To fetch work.
10/12/2014 6:52:58 AM | SETI@home | Requesting new tasks for NVIDIA
10/12/2014 6:52:59 AM | SETI@home | Scheduler request completed: got 0 new tasks
10/12/2014 6:52:59 AM | SETI@home | No tasks sent
10/12/2014 6:52:59 AM | SETI@home | No tasks are available for AstroPulse v6
10/12/2014 6:52:59 AM | SETI@home | No tasks are available for AstroPulse v7
10/12/2014 6:56:00 AM | SETI@home | Task ap_19mr14aa_B3_P0_00170_20141011_16647.wu_3 exited with zero status but no 'finished' file
10/12/2014 6:56:00 AM | SETI@home | If this happens repeatedly you may need to reset the project.
10/12/2014 6:56:00 AM | SETI@home | Task ap_27fe14aa_B3_P1_00235_20141011_24075.wu_0 exited with zero status but no 'finished' file
10/12/2014 6:56:00 AM | SETI@home | If this happens repeatedly you may need to reset the project.
10/12/2014 6:56:00 AM | SETI@home | Task ap_27fe14aa_B3_P1_00241_20141011_24075.wu_0 exited with zero status but no 'finished' file
10/12/2014 6:56:00 AM | SETI@home | If this happens repeatedly you may need to reset the project.
10/12/2014 6:56:00 AM | SETI@home | Task ap_28au10ac_B3_P0_00350_20141011_20908.wu_0 exited with zero status but no 'finished' file
10/12/2014 6:56:00 AM | SETI@home | If this happens repeatedly you may need to reset the project.
ID: 1585818 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1585830 - Posted: 12 Oct 2014, 16:28:40 UTC - in response to Message 1585802.  

Hey, I've only just noticed that we have 2014 data on the server status page, and we're finally splitting the last of the 2013 tapes loaded back in April. More work for me at the end of the month :)

We can finally stop burning a candle for the '13 data sets! It's a good thing too. The budget for candles was starting to get a bit nuts.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1585830 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30651
Credit: 53,134,872
RAC: 32
United States
Message 1585836 - Posted: 12 Oct 2014, 16:45:54 UTC - in response to Message 1585818.  

Exit with zero status and no finished file is usually a heartbeat issue. BOINC expects the science application, AP in this case, to send a message to it every minute telling it that it is still working. If anything upsets that, such as the science application crashing, then BOINC writes a diagnostic message in the log file and restarts the job from the last checkpoint. Several thing can happen to that message along the way. One is that the system may run out of resources to pass messages. Another is that the science app which is dead zero priority isn't getting enough time slices to send a message. The latter can happen when things like high priority virus scanners preform full scans as they will tend to use all available resources for many minutes until they finish reading every bit of all the disks on a machine. Of course playing games which load up the GPU can also do it in much the same way, which is one reason there is an exclusive option on BOINC to shut it down when other specific things run.

You may have to think a bit like your computer's O/S to see if there is something else that may be blocking BOINC from getting those heartbeat messages.
ID: 1585836 · Report as offensive
Profile JaundicedEye
Avatar

Send message
Joined: 14 Mar 12
Posts: 5375
Credit: 30,870,693
RAC: 1
United States
Message 1585839 - Posted: 12 Oct 2014, 16:58:56 UTC - in response to Message 1585836.  

No scans were run or games played during the processing. The system processed AP 6 with no problems. After I pasted the app from crunchers anonymous, and edited the count parameters to 0.33 the errors started( no work was run before the edit). I have another machine that is identically configured that has not started processing AP 7 yet(waiting for the load of S@H WU's to complete), so I don't know yet if this is specific to the one or both machines.

I got the same errors last night and reset the project but I guess "doing the same and expecting different results".........etc. Would reloading the standard BOINC app(stock) cure the issue?
ID: 1585839 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1585846 - Posted: 12 Oct 2014, 17:24:40 UTC - in response to Message 1585839.  

All these errors are occurring on your CPU. Just a suggestion. Make a copy of your app.xml and save it someplace other than the boinc file. Then try uninstalling the lunatics or return to stock and see if it works. If you continue to get the errors, then I would think something has been corrupted. As a last course of action . uninstall BOINC and then reinstall and reattach and run and see if the error is gone. I know that may sound extreme but might be necessary. Hope you it's something simple



Zalster
ID: 1585846 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1585853 - Posted: 12 Oct 2014, 17:36:56 UTC - in response to Message 1585836.  

Exit with zero status and no finished file is usually a heartbeat issue. BOINC expects the science application, AP in this case, to send a message to it every minute telling it that it is still working. If anything upsets that, such as the science application crashing, then BOINC writes a diagnostic message in the log file and restarts the job from the last checkpoint. Several thing can happen to that message along the way. One is that the system may run out of resources to pass messages. Another is that the science app which is dead zero priority isn't getting enough time slices to send a message. The latter can happen when things like high priority virus scanners preform full scans as they will tend to use all available resources for many minutes until they finish reading every bit of all the disks on a machine. Of course playing games which load up the GPU can also do it in much the same way, which is one reason there is an exclusive option on BOINC to shut it down when other specific things run.

You may have to think a bit like your computer's O/S to see if there is something else that may be blocking BOINC from getting those heartbeat messages.

It's actually the other way, the BOINC API built into science applications is checking whether the BOINC client is still running and the app exits if it is not. For older versions of the BOINC API the heartbeat message from BOINC was checked and if it was missing for more than 30 seconds the app shut itself down. Newer versions check whether the process ID for the BOINC client still exists and exit if it can't be found for 10 seconds.
ID: 1585853 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1586069 - Posted: 13 Oct 2014, 3:49:24 UTC

Well.. I've just started my offline testing of the two CPU apps for APv7.. x86_AVX and x64_SSE3. Seeing as the same WU was just finished by r557 AVX and reported, I now have a time to compare it to. 5.87% blanking on that WU. I guess one with zero blanking should have been used, but I'm still interested to see which version finishes faster on my CPU. I'll find out in 11-12 hours (or more, maybe).
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1586069 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1586079 - Posted: 13 Oct 2014, 4:07:05 UTC - in response to Message 1585802.  

Hey, I've only just noticed that we have 2014 data on the server status page

Looks like I got AP v7 tasks from 10 of the 11 2014 tapes. Only the 26fe14aa (a short 3.89 GB file) was missed.
ID: 1586079 · Report as offensive
Profile JaundicedEye
Avatar

Send message
Joined: 14 Mar 12
Posts: 5375
Credit: 30,870,693
RAC: 1
United States
Message 1586082 - Posted: 13 Oct 2014, 4:16:39 UTC - in response to Message 1585846.  

All these errors are occurring on your CPU. Just a suggestion. Make a copy of your app.xml and save it someplace other than the boinc file. Then try uninstalling the lunatics or return to stock and see if it works.

Zalster


I followed your suggestion, then edited the <count1/count> line to 0.33 to improve GPU usage. It seems to be processing S@H V7 fine(which has never been a problem), I've changed the fetch to AP only so when I run out of S@H I can test the results of the changed app file. I left AdmiralCrunch as was to see if it throws up like CapnCrunch.

Probably all moot as the revised Lunatic installer will most likely be ready by tomorrow.
ID: 1586082 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1586165 - Posted: 13 Oct 2014, 8:55:43 UTC - in response to Message 1586082.  

AP assimilator backlog appears to be growing.
Grant
Darwin NT
ID: 1586165 · Report as offensive
WezH
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 576
Credit: 67,033,957
RAC: 95
Finland
Message 1586267 - Posted: 13 Oct 2014, 15:16:20 UTC

Anybody else see error messages in Top GPU models?

Notice: unserialize(): Error at offset 32736 of 39399 bytes in /disks/carolyn/b/home/boincadm/projects/sah/html/user/gpu_list.php on line 188
etc...
"Please keep Your signature under four lines so Internet traffic doesn't go up too much"

- In 1992 when I had my first e-mail address -
ID: 1586267 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1586269 - Posted: 13 Oct 2014, 15:23:27 UTC - in response to Message 1586267.  

Anybody else see error messages in Top GPU models?

Notice: unserialize(): Error at offset 32736 of 39399 bytes in /disks/carolyn/b/home/boincadm/projects/sah/html/user/gpu_list.php on line 188
etc...

Yes, I see them. Reported to the lab. Thanks.
ID: 1586269 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1586272 - Posted: 13 Oct 2014, 15:28:38 UTC - in response to Message 1586270.  

Anybody else see error messages in Top GPU models?

Notice: unserialize(): Error at offset 32736 of 39399 bytes in /disks/carolyn/b/home/boincadm/projects/sah/html/user/gpu_list.php on line 188
etc...

Of course, and there are other places where DA has been playing with the code, like this one for example:

http://setiweb.ssl.berkeley.edu/beta/status.php

A little bit worse eh? :-)

That's funny - that's one they fixed in the first round. Now who's gone and un-fixed it again?
ID: 1586272 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1586283 - Posted: 13 Oct 2014, 15:48:06 UTC - in response to Message 1586272.  

Anybody else see error messages in Top GPU models?

Notice: unserialize(): Error at offset 32736 of 39399 bytes in /disks/carolyn/b/home/boincadm/projects/sah/html/user/gpu_list.php on line 188
etc...

Of course, and there are other places where DA has been playing with the code, like this one for example:

http://setiweb.ssl.berkeley.edu/beta/status.php

A little bit worse eh? :-)

That's funny - that's one they fixed in the first round. Now who's gone and un-fixed it again?

http://setiweb.ssl.berkeley.edu/beta/server_status.php is the fixed version, and it is linked from the front page at SETI Beta. I also had a local bookmark to the old page and had to edit it after others said the status had been fixed.
                                                                  Joe
ID: 1586283 · Report as offensive
Previous · 1 . . . 16 · 17 · 18 · 19 · 20 · 21 · 22 . . . 24 · Next

Message boards : Number crunching : Panic Mode On (90) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.