Posts by HAL9000


log in
21) Message boards : Number crunching : The current top cruncher (Message 1589517)
Posted 4 days ago by Profile HAL9000
Even though Charles' RAC has dropped a lot lately he's still the 1st to pass 1,000,000,000 cobblestones mark (not bad for just under 15 months of crunching).

Congratz Charles.

Cheers.

Wow! Let's see, at my current rate, I might reach that milestone sometime in the year 2040....that is, if they let me keep running my machines in the old f[olks|ogeys|ellas|arts] home!

According to allprojectstats.com I wouldn't get there a little sooner, but just.
Your current goal is: 1,000,000,000
Will be reached in x days 6,479.36
Date July 17, 2032

Sometimes I like to set goals for myself to see if I can eek out a bit more oomph from my machines. Right now I have a goal set for 125,000,000 with an estimated date of December 7th. So I'm trying to get there before December 1st. I have more machines I could simply throws at the project, but I would rather do it through tuning and tweaking.
22) Message boards : Number crunching : Panic Mode On (91) Server Problems? (Message 1589450)
Posted 4 days ago by Profile HAL9000
Assimilation & purging for AP still seems to be down. Maybe it will resume after maintenance tomorrow?
23) Message boards : Number crunching : Oddly long AP7 wu's? (Message 1589328)
Posted 4 days ago by Profile HAL9000
I have noticed that tasks with many pulses take longer than ones with only a few or none. Could it be that the 30/30 pulse exits wasn't done until late in the task instead of early?
24) Message boards : Number crunching : Oddly long AP7 wu's? (Message 1589259)
Posted 4 days ago by Profile HAL9000
Ah, version 6... That's why I shouldn't be trying to solve "mysteries" late at night haha...

Thanks!

Chris

That still doesn't explain why your AP v7 tasks lost the command line parameters you were using at the date I mentioned previously.
25) Message boards : Number crunching : Panic Mode On (90) Server Problems? (Message 1589244)
Posted 4 days ago by Profile HAL9000
Well, I got to my 11th "completed task" for APv7.. time to start filling the cache with realistic ETA'ed tasks:
2014-10-20 00:23:57 SETI@home [sched_op_debug] CPU work request: 2116467.91 seconds; 0.00 CPUs

Got to love the use of "completed" there.
It always throws me off when I see something like.
Number of tasks completed 127 Consecutive valid tasks 389

The might consider changing "Number of tasks completed" to "Number of tasks applied to APR". Since most of the wold has a very different definition of what "completed" means vs how it is being used.
26) Message boards : Number crunching : Yosemite and Seti (Message 1589112)
Posted 5 days ago by Profile HAL9000
So lots of complaints about this new OSX from apple. One thing I did see is that there appears to be a problem with the permissions. I rebuilt mine, and am running an AP over in Beta to see if it completes normal now. I still don't like the sluggishness but it seems to be better after the rebuild. I'll know in the morning.

One of our MAC devs at work was telling me about how they smurfed up the permissions again with the new OS. We had more support calls on Friday than we normally get in a month for MAC issues. I didn't have a chance to install the OS on my test system as of yet.
27) Message boards : Number crunching : "Zombie" AP tasks - still alive in AP v7 (Message 1589071)
Posted 5 days ago by Profile HAL9000
There is an issue with science apps not stopping when they should. However, BOINC & most defiantly Windows Explorer should not crash when tasks start.
If I were having crashes that coincided with tasks starting I would look for some bit of hardware that is having issues. If this last happened 6 months ago when it was changing from cold to warm and now it is occurring when it is going from warm to cold I am thinking something thermal expanding/contracting.

I think the key words there are "should not", rather than "could not". On my daily driver running Vista, Windows Explorer is actually the most crash-prone application I have. Often, simply right-clicking on a folder in the tree will cause a lockup and crash. Other times, I have no idea what triggers it. In any event, in this case, it appears to be BOINC that crashes first, taking its parent process, explorer.exe down with it. It would be helpful if the Windows logs provided more info, but they don't (or perhaps I just don't know where to find it).

As to weather, if only it were that easy to correlate the occurrences with something external. However, it's happened once each in December, February, March, and now October on my #1 cruncher. It also happened once on my #2 cruncher in March. And here on the central coast of California, ambient temperatures are more likely to fluctuate dramatically from day to day or from hour to hour, depending on the influence of the marine layer, than from season to season. I certainly can't rule anything out, but the fact that the one consistent element is the start of an AP GPU task, I have to think that that's one piece of the puzzle. (Although I run both MB and AP tasks, I still run hundreds of AP tasks through those two boxes over time, and with the crashes being fairly rare, I'm sure there must be another element required, but at this point I haven't any other clues.)

With the low occurrence of 5 times between 2 machines in a year it does make it hard to troubleshoot. If you wanted to install some debug logging software. Then you could have very detailed crash logs.

Having logs that only state "whatever.exe has stopped unexpectedly" can drive you bonkers. My old i7-860 would sometimes shutdown or restart without warming. The only log was "windows shutdown unexpectedly at time stamp". I dealt with that for years before finding out, just about a week ago, it was a faulty chassis power switch. Which would sometimes close when not pushed. I only found out that's what it was when it failed completely.
28) Message boards : Number crunching : Oddly long AP7 wu's? (Message 1589064)
Posted 5 days ago by Profile HAL9000
None of your GPU tasks returned after 17 Oct 2014, 14:48:52 UTC have in their stderr_txt the section.
DATA_CHUNK_UNROLL set to:24
FFA thread block override value:8192
FFA thread fetchblock override value:4096
Priority of worker thread raised successfully
Priority of process adjusted successfully, high priority class used

Your tasks prior to that do. You may want to see if your tuning settings are still in place or if you have a blank file. Losing your tuning settings could cause the tasks to run longer.
29) Message boards : Number crunching : "Zombie" AP tasks - still alive in AP v7 (Message 1589039)
Posted 5 days ago by Profile HAL9000
Okay, finally got around to looking at those Windows logs and, as I suspected, the only relevant entry I could find is this one:

It doesn't indicate why Explorer stopped "unexpectedly", just that it was restarted. That event occurs approximately 2 seconds after the last entries in the BOINC log prior to the apparent BOINC crash:
18-Oct-2014 02:51:43 [SETI@home] Computation for task 24se14aa.26295.17654.438086664195.12.91_0 finished 18-Oct-2014 02:51:43 [SETI@home] [cpu_sched] Preempting 04se14aa.7333.4157.438086664199.12.169_0 (removed from memory) 18-Oct-2014 02:51:43 [SETI@home] Starting task ap_21no10ab_B0_P0_00325_20141016_19978.wu_2 18-Oct-2014 02:51:43 [SETI@home] [cpu_sched] Starting task ap_21no10ab_B0_P0_00325_20141016_19978.wu_2 using astropulse_v7 version 705 (opencl_nvidia_100) in slot 4 18-Oct-2014 08:14:06 [---] Starting BOINC client version 7.2.42 for windows_x86_64

As you can see, the last entries shown are for the start of an AP task. This has been the case for every one of these BOINC crashes that I've encountered.


There is an issue with science apps not stopping when they should. However, BOINC & most defiantly Windows Explorer should not crash when tasks start.
If I were having crashes that coincided with tasks starting I would look for some bit of hardware that is having issues. If this last happened 6 months ago when it was changing from cold to warm and now it is occurring when it is going from warm to cold I am thinking something thermal expanding/contracting.
30) Message boards : Number crunching : Panic Mode On (90) Server Problems? (Message 1588901)
Posted 5 days ago by Profile HAL9000
Carolyn is running again:)

It was shown as running a couple hours ago, but "replica seconds behind master" is still reporting "offline." queries/sec is unusually high (over 3000), and the RTS buffer has flatlined with creation rate being over 30/sec.

Lot of hungry caches and rigs out there.

I'm just glad the "Workunits waiting for validation" has started going down instead of up.


I'm just glad Carolyn shows as running on the SSP.

Everyone being present and accounted for is always a good thing.
31) Message boards : Number crunching : Panic Mode On (90) Server Problems? (Message 1588891)
Posted 5 days ago by Profile HAL9000
Carolyn is running again:)

It was shown as running a couple hours ago, but "replica seconds behind master" is still reporting "offline." queries/sec is unusually high (over 3000), and the RTS buffer has flatlined with creation rate being over 30/sec.

Lot of hungry caches and rigs out there.

I'm just glad the "Workunits waiting for validation" has started going down instead of up.
32) Message boards : Number crunching : "Zombie" AP tasks - still alive in AP v7 (Message 1588888)
Posted 5 days ago by Profile HAL9000
The science apps are child processes of BOINC. If you look at the properties for the science apps in Process Explorer you can see the instance of BOINC they are associated with on the Image tab. Look for Parent: boinc.exe(nnnn) Where nnnn is the PID(Process ID).
When that Parent app is gone it is gone. The apps shouldn't look for or attach to an instance of BOINC they find. That would be very problematic on machines that run multiple instances of BOINC.

Yes, under normal circumstances, I expect to see all the running tasks shown as children of boinc.exe and, since by default I run BOINC Manager from bootup to shutdown, boinc.exe as a child of boincmgr.exe. However, with the zombie tasks, that association goes away and never comes back. I agree that it could be problematic if it did, which is why it makes sense to me that the science apps should shut down if their parent shuts down, and in my tests, almost all of them do.

I notice in the Process Explorer image you posted that the AP apps appear to be children of Process Explorer itself, so I'm curious as to what happens when you restart the BOINC client? If it doesn't reconnect to the running AP tasks, what happens to those tasks when they finish? Does BOINC successfully recognize the "finish" file when it's created?

The image is a bit misleading. In that image the science apps are not actually associated with processes explorer. If they were they could be offset in the tree. AP7_r2692_x64_AVX_CPU is being displayed as at the same level as the other root processes.
An issue with Processes Explorer is that sometimes it shows the - instead of the + for the tree. Under proexp.exe there is procexp64.exe.

Once the task is finished being processes I would expect the science app to close. However, I didn't want to wait a few hours to verify that. I was more interested in creating a reproducible test scenario.
33) Message boards : Number crunching : "Zombie" AP tasks - still alive in AP v7 (Message 1588870)
Posted 5 days ago by Profile HAL9000
I normally run BOINC without even using BOINC Manager. Instead I choose to start & stop BOINC from a command line.
When you exit BOINC Manager & leave the BOINC client running. You will see boinc.exe still running. In my image you can clearly see that boinc.exe nor boincmgr.exe is running. Also the command line I used to kill boinc.exe using taskkill.

Well, that was weird. I was in the middle of editting my earlier post and I apparently got logged off. In any event, what I was planning to add was that I hadn't scrolled down in your image and therefore hadn't noticed that you used the "taskkill" command, rather than a normal exit from BOINC Manager. I also missed that boinc.exe wasn't still running.

In your scenario, when you restart the BOINC client, does it successfully pick up the orphaned tasks? In my tests (see "Zombie" AP tasks - still alive when BOINC should have killed them), even if I restarted BOINC while the zombie tasks were still running, BOINC wouldn't pick them up. Rebooting the machine was necessary to bring them back under BOINC's control.

The science apps are child processes of BOINC. If you look at the properties for the science apps in Process Explorer you can see the instance of BOINC they are associated with on the Image tab. Look for Parent: boinc.exe(nnnn) Where nnnn is the PID(Process ID).
When that Parent app is gone it is gone. The apps shouldn't look for or attach to an instance of BOINC they find. That would be very problematic on machines that run multiple instances of BOINC.
34) Message boards : Number crunching : "Zombie" AP tasks - still alive in AP v7 (Message 1588854)
Posted 5 days ago by Profile HAL9000
Do you have the "Manager exit dialog" enabled in the BOINC Manager options? If not, and if the "Stop running tasks when exiting the BOINC Manager" box wasn't checked the last time the Exit Confirmation dialog was displayed, the tasks will keep running because the BOINC client is still running. Then, when you restart the BOINC Manager, it will pick up the client and the running tasks. In the case of the zombie AP tasks, they keep running standalone, without benefit of the BOINC client, and if the BOINC Manager is restarted while they're still running, it won't actually see them.

I normally run BOINC without even using BOINC Manager. Instead I choose to start & stop BOINC from a command line.
When you exit BOINC Manager & leave the BOINC client running. You will see boinc.exe still running. In my image you can clearly see that boinc.exe nor boincmgr.exe is running. Also the command line I used to kill boinc.exe using taskkill.
35) Message boards : Number crunching : "Zombie" AP tasks - still alive in AP v7 (Message 1588844)
Posted 5 days ago by Profile HAL9000
I can easily reproduce the BOINC crashing issue by simply killing BOINC.
http://www.hal6000.com/seti/images/stuck_sci_apps.png

It has been about 12min since I killed BOINC & the apps are still humming right along. I did this several times & for this last instance I had told BOINC to suspend GPU processing prior to killing BOINC to see if it had any effect. It did not. The CPU & GPU apps look like they would run to completion in this scenario.
Perhaps when the processes are orphaned they are getting a heartbeat source from another application or the mechanism stops when they are orphaned. That would be an issue for the devs to wrestle. I just get paid to break things, not fix them.

The apps not stopping when BOINC is told to shutdown I think would be a separate, but possibly related, issue.
36) Message boards : Number crunching : First batch of GPU AP tasks, only one task running on GPU? (Message 1588825)
Posted 5 days ago by Profile HAL9000
I just started crunching the first round of AP tasks allocated to my cruncher and noticed my GPU was running only one AP (opencl_nvidia_100) task. I have other GPU tasks downloaded, both opencl_nvidia_100 and cuda50. The running task indicated it is using 0.282 CPUs and 1 GPU.

I have my GPU set to run two tasks and under AP6, I was consistently running 2, either MB or AP prior to the newer tasks.

I'm assuming it is something within myapp_config (pasted below) that needs updating to allow running two of the newer tasks again?

<app_config>
<app>
<name>astropulse_v6</name>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>0.04</cpu_usage>
</gpu_versions>
</app>
<app>
<name>setiathome_v7</name>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>0.04</cpu_usage>
</gpu_versions>
</app>
</app_config>


Well if you add an AP v7 section it could help.

<app_config>
<app>
<name>astropulse_v6</name>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>0.04</cpu_usage>
</gpu_versions>
</app>
<app>
<name>astropulse_v7</name>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>0.04</cpu_usage>
</gpu_versions>
</app>

<app>
<name>setiathome_v7</name>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>0.04</cpu_usage>
</gpu_versions>
</app>
</app_config>
37) Message boards : Number crunching : The GTX750(Ti) Thread (Message 1588818)
Posted 5 days ago by Profile HAL9000
CU = Compute Units

The 750 has 4, the 750Ti has 5



That's a new one for me.. Where do you find that in the specifications? I have several 780s and would like to know how to configure that unroll

http://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units#GeForce_700_Series
SMX count column.
Or looks in a completed result for
OpenCL Platform Name: NVIDIA CUDA
Number of devices: 3
Max compute units: 12
38) Message boards : Number crunching : CPU heat/ Lunatics Application (Message 1588790)
Posted 5 days ago by Profile HAL9000
I have not seen a change in CPU temps on my machines that run AP only.
39) Message boards : Number crunching : "Zombie" AP tasks - still alive in AP v7 (Message 1588757)
Posted 6 days ago by Profile HAL9000
As it's been over 6 months since I last saw this problem pop up, my original "Zombie" AP tasks - still alive when BOINC should have killed them is locked. But since I've now had this happen with AP v7, a new thread is probably called for anyway.

The circumstances are pretty much the same. BOINC crashed on my T7400 at 2:51 AM local time, immediately after starting to process AP v7 task 3789149958. That task continued to run, without benefit of BOINC, until 3:23 AM, when the results were posted and the "finish" file created. Another AP v7 task, 3789137125, was running on a different GPU at the time of the BOINC crash and continued running until 2:56 AM. All MB tasks shut down cleanly when contact with BOINC was lost.

As it had been more than 6 months since I experienced one of these, and since my brain was still in first gear at ~8:00 AM when I discovered the chilly room with the low noise level, I forgot to delete the "finish" files for the 2 AP tasks before I restarted BOINC. Naturally, the tasks errored out with the "finish file present too long" message.

In any event, it appears that whatever changes and enhancements were made to AP for v7, none of them managed to fix the problem of AP tasks not noticing when the BOINC client goes AWOL.

Given this is a BOINC or OS issue with child processes not getting correctly killed. I don't see why changing a science app would solve this.

I most commonly see this happen on a machine where BOINC has been running for a few weeks. I will close BOINC, or restart the system, & the science apps will not have been killed. I have a batch file to run taskkill /IM appname.exe /T /F for when this happens.
40) Message boards : Number crunching : NVIDIA tasks (Message 1588624)
Posted 6 days ago by Profile HAL9000
So - now there is no more NVIDIA tasks for my GPU.
Will they be back?
18-10-2014 09:28:04 | SETI@home | Requesting new tasks for NVIDIA
18-10-2014 09:29:34 | SETI@home | Scheduler request completed: got 0 new tasks
18-10-2014 09:29:34 | SETI@home | Project has no tasks available

There are no such things as NV task, ATI tasks, or CPU tasks. All tasks are the same and simply assigned to a device when you make a request. Sometimes when you ask for work the feeder has no tasks in it & responds with "Project has no tasks available".
It looks like on your next work request 6 minutes after this request you received 34 new NV tasks. Two have already had an error & you are generating mostly errors on that machine.
http://setiathome.berkeley.edu/results.php?hostid=6764842&offset=0&show_names=0&state=6&appid=11


Previous 20 · Next 20

Copyright © 2014 University of California