FIFO??

留言板 : Number crunching : FIFO??
留言板合理

To post messages, you must log in.

1 · 2 · 后

作者消息
Claggy
志愿者测试人员

发送消息
已加入:5 Jul 99
贴子:4654
积分:47,537,079
近期平均积分:4
United Kingdom
消息 1709471 - 发表于:6 Aug 2015, 21:12:56 UTC - 回复消息 1709420.  

I haven't seen any sign of the next alpha build yet, but we'll need that and a new round of testing before we start to consider a public release again.

I'm waiting for a new tagged release so the Arm CPU model Identification fix can be hopefully be passed out to the repository builds, still waiting.

Claggy
ID: 1709471 · 举报违规帖子
Richard Haselgrove Project Donor
志愿者测试人员

发送消息
已加入:4 Jul 99
贴子:14152
积分:200,643,578
近期平均积分:874
United Kingdom
消息 1709420 - 发表于:6 Aug 2015, 19:31:15 UTC - 回复消息 1709409.  
最近的修改日期:6 Aug 2015, 19:31:37 UTC

Agreed. We got a code fix

fix job scheduling bug that starves CPU instances

and a private drop (hosted by Rom, because he built it while the BOINC server was out of action)

http://www.romwnet.org/files/boinc.030815.x64.zip

I haven't seen any sign of the next alpha build yet, but we'll need that and a new round of testing before we start to consider a public release again.
ID: 1709420 · 举报违规帖子
Profile Cliff Harding
志愿者测试人员
Avatar

发送消息
已加入:18 Aug 99
贴子:1432
积分:110,967,840
近期平均积分:67
United States
消息 1709409 - 发表于:6 Aug 2015, 19:05:37 UTC - 回复消息 1708174.  
最近的修改日期:6 Aug 2015, 19:07:48 UTC

Take 2 on the event logs has provided the requisite smoking gun:

[cpu_sched_debug] using 2.00 out of 6 CPUs

With Cliff's permission, I've forwarded the logs on to David and Rom: I think we'll have to wait for feedback when they get their server(s) working again.


A change was made and I did test it successfully, but I don't know how many more testers are needed or when it will be available for beta testing. In my opinion this is a show stopper, so I wouldn't be too anxious to put 7.6.6 beta to the general public.


I don't buy computers, I build them!!
ID: 1709409 · 举报违规帖子
Claggy
志愿者测试人员

发送消息
已加入:5 Jul 99
贴子:4654
积分:47,537,079
近期平均积分:4
United Kingdom
消息 1709404 - 发表于:6 Aug 2015, 18:56:49 UTC - 回复消息 1708127.  

We're in a bit of a bind at the moment because the BOINC server (holding the message boards and changelogs) has crashed - and even the WayBack machine hasn't got any of the alpha build history archived.

You can check back in the 7.6 head to see what changesets were applied:

https://github.com/BOINC/boinc/commits/client_release/7/7.6

Claggy

Or look at the tagged release:

https://github.com/BOINC/boinc/commits/client_release/7.6/7.6.3

Claggy
ID: 1709404 · 举报违规帖子
Richard Haselgrove Project Donor
志愿者测试人员

发送消息
已加入:4 Jul 99
贴子:14152
积分:200,643,578
近期平均积分:874
United Kingdom
消息 1708174 - 发表于:3 Aug 2015, 19:55:48 UTC

Take 2 on the event logs has provided the requisite smoking gun:

[cpu_sched_debug] using 2.00 out of 6 CPUs

With Cliff's permission, I've forwarded the logs on to David and Rom: I think we'll have to wait for feedback when they get their server(s) working again.
ID: 1708174 · 举报违规帖子
Richard Haselgrove Project Donor
志愿者测试人员

发送消息
已加入:4 Jul 99
贴子:14152
积分:200,643,578
近期平均积分:874
United Kingdom
消息 1708140 - 发表于:3 Aug 2015, 17:58:44 UTC - 回复消息 1708132.  

Well, Cliff has sent me the debug logs I requested, and they're showing the expected "reserving 0.500000 of coproc NVIDIA" - so we're still in 'multiple apps per GPU' territory, not the 'multiple GPUs per app' that commit was designed to handle. Still, the law of unintended consequences is very widely applicable...

Argh, just noticed a finger-fumble on his logs - they're identical, even to the 12:56:09 timestamp on both logs. Must have failed to copy the second one before pasting it into a text file (we've all done that). Back to the drawing board.
ID: 1708140 · 举报违规帖子
Claggy
志愿者测试人员

发送消息
已加入:5 Jul 99
贴子:4654
积分:47,537,079
近期平均积分:4
United Kingdom
消息 1708132 - 发表于:3 Aug 2015, 17:34:15 UTC - 回复消息 1708127.  
最近的修改日期:3 Aug 2015, 17:35:49 UTC

This one looks like a possible candidate:

https://github.com/BOINC/boinc/commit/eab76dc245ed88bcce3e787b9315cd3c440d4f97

client: fix bug when app version uses > 1 GPU instance

Note: the code wasn't written with multi-GPU apps in mind.
There may be other bugs with multi-GPU apps.


Claggy
ID: 1708132 · 举报违规帖子
Claggy
志愿者测试人员

发送消息
已加入:5 Jul 99
贴子:4654
积分:47,537,079
近期平均积分:4
United Kingdom
消息 1708127 - 发表于:3 Aug 2015, 17:25:46 UTC - 回复消息 1708108.  
最近的修改日期:3 Aug 2015, 17:27:28 UTC

We're in a bit of a bind at the moment because the BOINC server (holding the message boards and changelogs) has crashed - and even the WayBack machine hasn't got any of the alpha build history archived.

You can check back in the 7.6 head to see what changesets were applied:

https://github.com/BOINC/boinc/commits/client_release/7/7.6

Claggy
ID: 1708127 · 举报违规帖子
Richard Haselgrove Project Donor
志愿者测试人员

发送消息
已加入:4 Jul 99
贴子:14152
积分:200,643,578
近期平均积分:874
United Kingdom
消息 1708108 - 发表于:3 Aug 2015, 16:23:06 UTC - 回复消息 1708079.  

4) Did the same with 7.6.3 and that is when the problem starts. AP CPU tasks will not run concurrently with AP GPU tasks.

That's very odd. Both v7.6.2 and v7.6.3 were created because of bugs I reported:

v7.6.2 was to fix the problem deleting files over 4 GB
v7.6.3 was to fix a delay doing the task cleanup at the end of a job (GPUGrid, using optional renamed output files)

Neither of those should have changed GPU/CPU scheduling, though of course other development work will have been pottering along beside the specific bugs I remember.

We're in a bit of a bind at the moment because the BOINC server (holding the message boards and changelogs) has crashed - and even the WayBack machine hasn't got any of the alpha build history archived.

So, even though it's a bit of a chore, could you possibly save and send me one cycle of <cpu_sched_debug> log for the working v7.6.2 client, and one cycle for v7.6.3 (or v7.6.6, doesn't really matter: either of the broken ones), so I can try and work out what went wrong? Unless we can send them some evidence like that, it's unlikely they'll bother to build a new test version - and then we'd be stuck with v7.6.6 for ever.
ID: 1708108 · 举报违规帖子
Profile Cliff Harding
志愿者测试人员
Avatar

发送消息
已加入:18 Aug 99
贴子:1432
积分:110,967,840
近期平均积分:67
United States
消息 1708079 - 发表于:3 Aug 2015, 15:27:39 UTC - 回复消息 1708028.  

Since it has been proven that the AP CPU 7.03 (SSE2/AVX) tasks are running on this machine when all AP GPU 7.10 (opencl_nvidia_100) tasks are suspended, I'm now wondering if there is something within Lunatics, the BOINC client, or a combination of the two that is preventing the two working in conjunction with each other.

Two (and only two) simple possibilities come to mind:

You may have overdone the much-touted "free at least one core when running OpenCL tasks on NVidia" advice. The Lunatics default setting is 0.04 avg / 0.2 max CPUs per task (so five tasks would need to be running concurrently to free an extra core - maybe we should rethink that). Any additional restriction would be down to your local choices/edits for the number of CPUs BOINC schedules, or any app_config.xml file settings.

The other possibility is a <max_concurrent> setting in app_config.xml

Beyond that, you would be into the realms of resource exhaustion - not enough memory, perhaps - but that's less likely.

The evidence would be found using the <cpu_sched_debug> event log flag (which includes GPU scheduling, despite the name) - but that's complex and verbose. You probably wouldn't want to run it continually, but it might be worth a peek.


I've been running Lunatics with .5 CPU & .5 GPU for several years with no problems. I know the current wisdom is to allocate 1 core for each GPU task, but I've found that my settings have worked very well since my Fermi days.

I did some regression testing this morning and found the following:

1) Uninstalled 7.6.6 and reverted back to 7.4.42, recycled the machine and the CPU tasks had no problem with running concurrent with the GPU tasks.

2) Installed 7.6.1 over 7.4.42 and recycled machine with the same results once BOINC started.

3) Installed 7.6.2 over 7.6.1 with the same results as #1 & #2.

4) Did the same with 7.6.3 and that is when the problem starts. AP CPU tasks will not run concurrently with AP GPU tasks.

5) Did the same with 7.6.6 with the same results as #4.

There is definitely something, either in Lunatics or BOINC (whether it's the client or manager) is preventing the two working concurrently.

Right now I'm running 7.6.2 so that both apps run together until this is fixed or another version is ready for beta testing.


I don't buy computers, I build them!!
ID: 1708079 · 举报违规帖子
Richard Haselgrove Project Donor
志愿者测试人员

发送消息
已加入:4 Jul 99
贴子:14152
积分:200,643,578
近期平均积分:874
United Kingdom
消息 1708028 - 发表于:3 Aug 2015, 11:27:15 UTC - 回复消息 1708026.  

Since it has been proven that the AP CPU 7.03 (SSE2/AVX) tasks are running on this machine when all AP GPU 7.10 (opencl_nvidia_100) tasks are suspended, I'm now wondering if there is something within Lunatics, the BOINC client, or a combination of the two that is preventing the two working in conjunction with each other.

Two (and only two) simple possibilities come to mind:

You may have overdone the much-touted "free at least one core when running OpenCL tasks on NVidia" advice. The Lunatics default setting is 0.04 avg / 0.2 max CPUs per task (so five tasks would need to be running concurrently to free an extra core - maybe we should rethink that). Any additional restriction would be down to your local choices/edits for the number of CPUs BOINC schedules, or any app_config.xml file settings.

The other possibility is a <max_concurrent> setting in app_config.xml

Beyond that, you would be into the realms of resource exhaustion - not enough memory, perhaps - but that's less likely.

The evidence would be found using the <cpu_sched_debug> event log flag (which includes GPU scheduling, despite the name) - but that's complex and verbose. You probably wouldn't want to run it continually, but it might be worth a peek.
ID: 1708028 · 举报违规帖子
Profile Cliff Harding
志愿者测试人员
Avatar

发送消息
已加入:18 Aug 99
贴子:1432
积分:110,967,840
近期平均积分:67
United States
消息 1708026 - 发表于:3 Aug 2015, 10:51:22 UTC

Since it has been proven that the AP CPU 7.03 (SSE2/AVX) tasks are running on this machine when all AP GPU 7.10 (opencl_nvidia_100) tasks are suspended, I'm now wondering if there is something within Lunatics, the BOINC client, or a combination of the two that is preventing the two working in conjunction with each other.


I don't buy computers, I build them!!
ID: 1708026 · 举报违规帖子
Profile Cliff Harding
志愿者测试人员
Avatar

发送消息
已加入:18 Aug 99
贴子:1432
积分:110,967,840
近期平均积分:67
United States
消息 1707936 - 发表于:3 Aug 2015, 1:59:06 UTC
最近的修改日期:3 Aug 2015, 1:59:59 UTC

All MB CPU tasks have been completed and the MB(cuda50) are now running. There are NO CPU tasks running at this time.


I don't buy computers, I build them!!
ID: 1707936 · 举报违规帖子
Profile betreger Project Donor
Avatar

发送消息
已加入:29 Jun 99
贴子:10359
积分:29,581,041
近期平均积分:66
United States
消息 1707934 - 发表于:3 Aug 2015, 1:40:03 UTC - 回复消息 1707626.  
最近的修改日期:3 Aug 2015, 1:40:48 UTC

Frankly, with the very relaxed deadlines at this project (most other BOINC projects set 7-day or 14-day deadlines), it would actually take considerable effort and ingenuity to get BOINC to worry about deadlines (invoke 'high priority' or EDF processing) for a SETI-only cruncher, and I've not seen any evidence of it yet in this thread.

Richard I have 2 examples of high priority. My XP box when it runs AP only and they are plentiful as they have been recently does it on its GT430, as does my W7 box on its CPU. I'll admit I try to run a 20 day cache but since nothing ever times out and all validates I figure that's OK.
ID: 1707934 · 举报违规帖子
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
志愿者测试人员

发送消息
已加入:1 Dec 99
贴子:2786
积分:685,657,289
近期平均积分:835
Canada
消息 1707931 - 发表于:3 Aug 2015, 1:25:51 UTC
最近的修改日期:3 Aug 2015, 1:27:01 UTC

From one of your CPU tasks ...

Build features: Non-graphics BLANKIT TWINDECHIRP USE_LRINT FFTW USE_INCREASED_PRECISION USE_AVX x64
CPUID: Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz

Cache: L1=64K L2=256K

CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3 SSSE3 FMA3 SSE4.1 SSE4.2 AVX


You are running AVX, mine also shows SEE2, but it's AVX as well


Simple solution if you are running into priority issues ... Back off your cache to 4 days, that should still keep you full.
ID: 1707931 · 举报违规帖子
Profile Cliff Harding
志愿者测试人员
Avatar

发送消息
已加入:18 Aug 99
贴子:1432
积分:110,967,840
近期平均积分:67
United States
消息 1707925 - 发表于:3 Aug 2015, 0:32:30 UTC - 回复消息 1707626.  

The 4 CPU work units are all VLARs which take a lot longer to process than an AP. As the APs are using the Lunatics app to run.

So to me, since the VLARs don't have any special app to speed them along, they will take alot longer to run and having a shorter deadline, they would move to the front of the cache to be processed.

Sorry, but absolutely none of that is true. VLARs are a special case for NVidia GPUs only, but on CPUs:

They take less time to process than APs
If Lunatics apps are installed, they will be run with a Lunatics MB CPU app.
They have a longer deadline than AP - 53 days. VLAR work issued today won't be required back until 24 or 25 September.

Frankly, with the very relaxed deadlines at this project (most other BOINC projects set 7-day or 14-day deadlines), it would actually take considerable effort and ingenuity to get BOINC to worry about deadlines (invoke 'high priority' or EDF processing) for a SETI-only cruncher, and I've not seen any evidence of it yet in this thread.


All of the MB CPU VLAR tasks had a receive date of 08/01 and a deadline of 09/23, with a est. run time of 5.07 hrs. each, with 3 "running" left. There are 42 MB (cuda50) with a deadline of 08/21 and 15 with deadlines beginning 09/11, all d/l'ed on 08/01 w/ est. run time of .13 hrs. each "ready to start". I have 130 AP GPU tasks remaining (4 "running" & 126 "ready to start" with earliest deadline of 08/21, w/ est. run time of 1.28 hrs. each.

There are the 6 AP CPU tasks, d/l'ed on 07/31 w/ a deadlines between 08/21 & 08/25 and est. run times of 5.07 hrs. each "waiting to run". These are the CPU tasks that started this thread. (These tasks are in this status because I suspended all GPU tasks to make sure that they were able to run. As soon as I allowed MB tasks to be processed they have not resumed running).

There are 72 AP CPU tasks, d/l'ed on 08/01 with the earliest deadline of 08/26 and est. run time of 5.07 hrs. each. None of the AP CPU tasks have resumed running at this time, and I see nothing apparent with security issues stopping them from running.

BTW, the AP GPU tasks have been running since the start of this tread.

I will point out that some of the AP GPU tasks were already in the queue prior to the failure of the Win 10 upgrade and the complete re-install of Win 7, BOINC 7.6.6 & Lunatics 0.43b on 07/31. The data directory is not on the system drive (C:), but on the data drive (D:) which allows me to fully retain all SETI data.

Something else I've noticed is that all of the AP CPU tasks are coming came in as SSE2, but AVX was defaulted in the Lunatics install.


I don't buy computers, I build them!!
ID: 1707925 · 举报违规帖子
Richard Haselgrove Project Donor
志愿者测试人员

发送消息
已加入:4 Jul 99
贴子:14152
积分:200,643,578
近期平均积分:874
United Kingdom
消息 1707626 - 发表于:2 Aug 2015, 8:04:34 UTC - 回复消息 1707511.  

The 4 CPU work units are all VLARs which take a lot longer to process than an AP. As the APs are using the Lunatics app to run.

So to me, since the VLARs don't have any special app to speed them along, they will take alot longer to run and having a shorter deadline, they would move to the front of the cache to be processed.

Sorry, but absolutely none of that is true. VLARs are a special case for NVidia GPUs only, but on CPUs:

They take less time to process than APs
If Lunatics apps are installed, they will be run with a Lunatics MB CPU app.
They have a longer deadline than AP - 53 days. VLAR work issued today won't be required back until 24 or 25 September.

Frankly, with the very relaxed deadlines at this project (most other BOINC projects set 7-day or 14-day deadlines), it would actually take considerable effort and ingenuity to get BOINC to worry about deadlines (invoke 'high priority' or EDF processing) for a SETI-only cruncher, and I've not seen any evidence of it yet in this thread.
ID: 1707626 · 举报违规帖子
Profile Zalster Special Project $250 donor
志愿者测试人员
Avatar

发送消息
已加入:27 May 99
贴子:5445
积分:528,817,460
近期平均积分:242
United States
消息 1707511 - 发表于:2 Aug 2015, 1:26:07 UTC - 回复消息 1707504.  

If you have it set at 80% CPU usage ... you are saying that 2 cores BOINC can't touch.

Then 4 AP tasks at 0.5 CPU is 2 more cores that really don't do much other than fed the GPU, which are about 3% CPU usage .... but you can't run CPU tasks on a reserved core.

So you would have 4 cores that are reserved, and not allowed to do BONIC CPU processing.


08/01/2015 16:37:10 | | max CPUs used: 6
08/01/2015 16:37:11 | SETI@home | [cpu_sched] Restarting task 09my15ad.11509.22323.438086664197.12.103.vlar_0 using setiathome_v7 version 700 in slot 11
08/01/2015 16:37:11 | SETI@home | [cpu_sched] Restarting task 09my15ac.31563.143519.438086664198.12.159.vlar_0 using setiathome_v7 version 700 in slot 13
08/01/2015 16:37:11 | SETI@home | [cpu_sched] Restarting task 09my15ac.31563.143519.438086664198.12.94.vlar_1 using setiathome_v7 version 700 in slot 10
08/01/2015 16:37:11 | SETI@home | [cpu_sched] Restarting task 09my15aa.9767.21335.438086664200.12.125.vlar_1 using setiathome_v7 version 700 in slot 12
08/01/2015 16:37:11 | SETI@home | [cpu_sched] Restarting task ap_11my15af_B6_P1_00355_20150731_17277.wu_0 using astropulse_v7 version 710 (opencl_nvidia_100) in slot 2
08/01/2015 16:37:11 | SETI@home | [cpu_sched] Restarting task ap_11my15af_B6_P1_00355_20150731_17277.wu_0 using astropulse_v7 version 710 (opencl_nvidia_100) in slot 2
08/01/2015 16:37:11 | SETI@home | [cpu_sched] Restarting task ap_12my15ag_B0_P0_00070_20150731_00353.wu_0 using astropulse_v7 version 710 (opencl_nvidia_100) in slot 3
08/01/2015 16:37:11 | SETI@home | [cpu_sched] Restarting task ap_12my15ab_B4_P0_00157_20150731_07482.wu_0 using astropulse_v7 version 710 (opencl_nvidia_100) in slot 1
08/01/2015 16:37:11 | SETI@home | [cpu_sched] Restarting task ap_12my15aa_B4_P0_00361_20150731_13562.wu_0 using astropulse_v7 version 710 (opencl_nvidia_100) in slot 0

That's 4 GPU tasks @ .5 CPU & GPU each, plus 4 CPU tasks, sorry I didn't make that clear in the previous post.



I'm guessing that line is a duplicate? Other wise you have 5 GPU work units.

The 4 CPU work units are all VLARs which take a lot longer to process than an AP. As the APs are using the Lunatics app to run.

So to me, since the VLARs don't have any special app to speed them along, they will take alot longer to run and having a shorter deadline, they would move to the front of the cache to be processed.

Now I have to go back and read what was happening prior to the MB download.
ID: 1707511 · 举报违规帖子
Profile Cliff Harding
志愿者测试人员
Avatar

发送消息
已加入:18 Aug 99
贴子:1432
积分:110,967,840
近期平均积分:67
United States
消息 1707504 - 发表于:2 Aug 2015, 1:05:34 UTC - 回复消息 1707455.  

If you have it set at 80% CPU usage ... you are saying that 2 cores BOINC can't touch.

Then 4 AP tasks at 0.5 CPU is 2 more cores that really don't do much other than fed the GPU, which are about 3% CPU usage .... but you can't run CPU tasks on a reserved core.

So you would have 4 cores that are reserved, and not allowed to do BONIC CPU processing.


08/01/2015 16:37:10 | | max CPUs used: 6
08/01/2015 16:37:11 | SETI@home | [cpu_sched] Restarting task 09my15ad.11509.22323.438086664197.12.103.vlar_0 using setiathome_v7 version 700 in slot 11
08/01/2015 16:37:11 | SETI@home | [cpu_sched] Restarting task 09my15ac.31563.143519.438086664198.12.159.vlar_0 using setiathome_v7 version 700 in slot 13
08/01/2015 16:37:11 | SETI@home | [cpu_sched] Restarting task 09my15ac.31563.143519.438086664198.12.94.vlar_1 using setiathome_v7 version 700 in slot 10
08/01/2015 16:37:11 | SETI@home | [cpu_sched] Restarting task 09my15aa.9767.21335.438086664200.12.125.vlar_1 using setiathome_v7 version 700 in slot 12
08/01/2015 16:37:11 | SETI@home | [cpu_sched] Restarting task ap_11my15af_B6_P1_00355_20150731_17277.wu_0 using astropulse_v7 version 710 (opencl_nvidia_100) in slot 2
08/01/2015 16:37:11 | SETI@home | [cpu_sched] Restarting task ap_11my15af_B6_P1_00355_20150731_17277.wu_0 using astropulse_v7 version 710 (opencl_nvidia_100) in slot 2
08/01/2015 16:37:11 | SETI@home | [cpu_sched] Restarting task ap_12my15ag_B0_P0_00070_20150731_00353.wu_0 using astropulse_v7 version 710 (opencl_nvidia_100) in slot 3
08/01/2015 16:37:11 | SETI@home | [cpu_sched] Restarting task ap_12my15ab_B4_P0_00157_20150731_07482.wu_0 using astropulse_v7 version 710 (opencl_nvidia_100) in slot 1
08/01/2015 16:37:11 | SETI@home | [cpu_sched] Restarting task ap_12my15aa_B4_P0_00361_20150731_13562.wu_0 using astropulse_v7 version 710 (opencl_nvidia_100) in slot 0

That's 4 GPU tasks @ .5 CPU & GPU each, plus 4 CPU tasks, sorry I didn't make that clear in the previous post.


I don't buy computers, I build them!!
ID: 1707504 · 举报违规帖子
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
志愿者测试人员

发送消息
已加入:1 Dec 99
贴子:2786
积分:685,657,289
近期平均积分:835
Canada
消息 1707455 - 发表于:1 Aug 2015, 23:16:34 UTC - 回复消息 1707334.  

If you have it set at 80% CPU usage ... you are saying that 2 cores BOINC can't touch.

Then 4 AP tasks at 0.5 CPU is 2 more cores that really don't do much other than fed the GPU, which are about 3% CPU usage .... but you can't run CPU tasks on a reserved core.

So you would have 4 cores that are reserved, and not allowed to do BONIC CPU processing.
ID: 1707455 · 举报违规帖子
1 · 2 · 后

留言板 : Number crunching : FIFO??


 
©2020 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.