finish file present too long

留言板 : Number crunching : finish file present too long
留言板合理

To post messages, you must log in.

1 · 2 · 3 · 4 · 后

作者消息
Richard Haselgrove Project Donor
志愿者测试人员

发送消息
已加入:4 Jul 99
贴子:14141
积分:200,643,578
近期平均积分:874
United Kingdom
消息 1744835 - 发表于:25 Nov 2015, 19:00:35 UTC - 回复消息 1744820.  

And is that just the actual VMs, or also the tasks they run? And what about the hypervisor?

If by hypervisor, you mean vboxheadless.exe, that was running at Normal priority, and with I/O priority normal, Memory priority 5. That's at the thread level - there are lots of them, but I checked all the busiest ones, and they were all the same.

VBox Wrapper did run at Low priority (the same as normal CPU tasks under BOINC), but also with I/O priority normal, Memory priority 5.

I don't think I can even see, let alone control, the niceness of the threads inside the Linux VM.
ID: 1744835 · 举报违规帖子
Profile Jord
志愿者测试人员
Avatar

发送消息
已加入:9 Jun 99
贴子:15170
积分:4,362,181
近期平均积分:3
Netherlands
消息 1744827 - 发表于:25 Nov 2015, 18:34:09 UTC - 回复消息 1744826.  

I never said it was an RPC, I was alluding to the slowness of having to read all that data from a disk drive in the middle of the mess of Windows boot-up. :)
ID: 1744827 · 举报违规帖子
Richard Haselgrove Project Donor
志愿者测试人员

发送消息
已加入:4 Jul 99
贴子:14141
积分:200,643,578
近期平均积分:874
United Kingdom
消息 1744826 - 发表于:25 Nov 2015, 18:29:05 UTC - 回复消息 1744820.  

Now, just imagine that your VM has to load 1.4GB of data at Windows start-up on a client that is running at low priority and has update RPCs running at the will of the operating system (ref. Richard's test). No, you do the imagining, my headache is nasty enough already.

The details of that particular delay are:

1) BOINC downloads a 500 MB compressed file to the project directory.
2) BOINC unpacks the 500 MB to a 1.4 GB virtual machine image, also in the project directory.

I'd been through both those stages months ago, and still had the files, so no delay there.

3) BOINC copies the 1.4 GB vmi from the project directory to the slot directory, at the start of each new task. That's not an RPC, that's BOINC's own asynchronous file copy routine. That's the one which will have occupied most of the four minutes.
ID: 1744826 · 举报违规帖子
Profile Jord
志愿者测试人员
Avatar

发送消息
已加入:9 Jun 99
贴子:15170
积分:4,362,181
近期平均积分:3
Netherlands
消息 1744820 - 发表于:25 Nov 2015, 17:55:44 UTC

I checked around a bit, but can only find a reference to being able to set the call priority on RPCs on a Windows 2000 RPC server, and not much after that. Unless MSMQ works that way on later Windows as well. It probably doesn't.

Which then means that the OS dictates the RPC priority, not the client/application.

Now, aside from that, these slow start-ups/refreshes are of course also an effect of where the data loaded into the client is coming from. In my case, when I want to start BOINC at Windows start-up I must take into account that all data is read from (relatively) slow HDDs (even though they spin at 7,200 rpms). Is the data read from the outside of the platters (slow), or the middle (fast)? How fragmented is the data (all over the platter)?

And of course, at Windows start-up it's not just BOINC that starts up, but also Windows and all of its services and stuff, plus loads of other programs.
I prefer to let my computer hibernate, write all data in memory to a file on disk and then close down. This speeds up the starting of the computer, but only if:
1. I don't allow BOINC to run at that time.
2. I close down my Bittorrent client (you don't want to know how much speed you lose on opening 150 ports).
3. I have no AV running.

All this would probably be going faster when read from a SSD. But alas, I haven't seen a 2TB SSD yet that's cheap enough to get and test with. ;-)

Now, just imagine that your VM has to load 1.4GB of data at Windows start-up on a client that is running at low priority and has update RPCs running at the will of the operating system (ref. Richard's test). No, you do the imagining, my headache is nasty enough already.

And I forget, what priority do the VMs run in?
And is that just the actual VMs, or also the tasks they run? And what about the hypervisor?
ID: 1744820 · 举报违规帖子
Profile William
志愿者测试人员
Avatar

发送消息
已加入:14 Feb 13
贴子:2037
积分:17,689,662
近期平均积分:0
消息 1744818 - 发表于:25 Nov 2015, 17:32:23 UTC - 回复消息 1744802.  

well my only contentions remain that in a state of overcommit the only thing you can do to improve things is reduce workload. Tinkering things beyond the point of contention will only ever be a triage situation.

as i said. you need to cut those VM tasks when the load goes up.
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1744818 · 举报违规帖子
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
志愿者测试人员

发送消息
已加入:1 Dec 99
贴子:2786
积分:685,657,289
近期平均积分:835
Canada
消息 1744816 - 发表于:25 Nov 2015, 17:29:13 UTC

You guys give me a headache :(
ID: 1744816 · 举报违规帖子
Richard Haselgrove Project Donor
志愿者测试人员

发送消息
已加入:4 Jul 99
贴子:14141
积分:200,643,578
近期平均积分:874
United Kingdom
消息 1744813 - 发表于:25 Nov 2015, 17:15:29 UTC - 回复消息 1744812.  

Can we agree that VM integration is suboptimal? :D

Yes.
ID: 1744813 · 举报违规帖子
Profile William
志愿者测试人员
Avatar

发送消息
已加入:14 Feb 13
贴子:2037
积分:17,689,662
近期平均积分:0
消息 1744812 - 发表于:25 Nov 2015, 17:12:29 UTC

Can we agree that VM integration is suboptimal? :D
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1744812 · 举报违规帖子
Profile jason_gee
志愿者开发人员
志愿者测试人员
Avatar

发送消息
已加入:24 Nov 06
贴子:7489
积分:91,093,184
近期平均积分:0
Australia
消息 1744809 - 发表于:25 Nov 2015, 17:01:19 UTC - 回复消息 1744806.  

Yeah RPC's IMO should be the highest possible priority, on the agreement they are short and sweet. Prompt service is good service.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1744809 · 举报违规帖子
Profile jason_gee
志愿者开发人员
志愿者测试人员
Avatar

发送消息
已加入:24 Nov 06
贴子:7489
积分:91,093,184
近期平均积分:0
Australia
消息 1744808 - 发表于:25 Nov 2015, 16:59:38 UTC - 回复消息 1744807.  

Yes that's a tenuous limb, but I'll hold onto that straw knowing that each layer of processing is about 10x slower than the previous, so smaller is still better
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1744808 · 举报违规帖子
Richard Haselgrove Project Donor
志愿者测试人员

发送消息
已加入:4 Jul 99
贴子:14141
积分:200,643,578
近期平均积分:874
United Kingdom
消息 1744807 - 发表于:25 Nov 2015, 16:55:21 UTC - 回复消息 1744803.  

I would even go out on a limb, to say that if Boinc overnight doubled the default CPU requirement for each task, then the overall throughput might increase by freeing cores on individual hosts. Hard to justify without an actual experiment of some sort of course.

Unconvinced. I think that might need testing on a wide variety of BOINC projects. It might be true here or at Einstein, with tightly optimised CPU applications - but there are a lot of other projects with simpler and more relaxed apps.
ID: 1744807 · 举报违规帖子
Richard Haselgrove Project Donor
志愿者测试人员

发送消息
已加入:4 Jul 99
贴子:14141
积分:200,643,578
近期平均积分:874
United Kingdom
消息 1744806 - 发表于:25 Nov 2015, 16:53:16 UTC - 回复消息 1744802.  

well my only contentions remain that in a state of overcommit the only thing you can do to improve things is reduce workload. Tinkering things beyond the point of contention will only ever be a triage situation.

Yes, just had a very vivid demonstration of that. My screenshot in the edited post shows that the VM machine had been running for 33 minutes. BOINC Manager on the local machine showed that the task had been running for 20 minutes - and it continued to show 20 minutes, for at least the next half hour. I eventually posted from a different machine, and shut down all foreground tasks on that one.

Meanwhile, my central BoincView monitoring machine upstairs had failed to contact that client for even longer. When I completely closed the BOINC Manager which had been trying to update every second, the remote machine (which is on a 30-second update cycle) finally caught up.

Which I think is a problem Jord put his finger on - RPC priority. With the client in background mode and the machine under stress, an RPC every second is too much - the client couldn't answer the previous request before the next one arrived. I was testing on my 8-core dual Xeon with GTX 470 - so there's lots to report with each RPC. With everything closed, BoincView's (configurable) 30-second refresh is happily keeping pace - but BOINC Manager's RPC rate isn't configurable.
ID: 1744806 · 举报违规帖子
Profile jason_gee
志愿者开发人员
志愿者测试人员
Avatar

发送消息
已加入:24 Nov 06
贴子:7489
积分:91,093,184
近期平均积分:0
Australia
消息 1744803 - 发表于:25 Nov 2015, 16:49:14 UTC - 回复消息 1744802.  
最近的修改日期:25 Nov 2015, 16:50:18 UTC

I would even go out on a limb, to say that if Boinc overnight doubled the default CPU requirement for each task, then the overall throughput might increase by freeing cores on individual hosts. Hard to justify without an actual experiment of some sort of course.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1744803 · 举报违规帖子
Profile jason_gee
志愿者开发人员
志愿者测试人员
Avatar

发送消息
已加入:24 Nov 06
贴子:7489
积分:91,093,184
近期平均积分:0
Australia
消息 1744802 - 发表于:25 Nov 2015, 16:39:11 UTC - 回复消息 1744801.  

well my only contentions remain that in a state of overcommit the only thing you can do to improve things is reduce workload. Tinkering things beyond the point of contention will only ever be a triage situation.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1744802 · 举报违规帖子
Richard Haselgrove Project Donor
志愿者测试人员

发送消息
已加入:4 Jul 99
贴子:14141
积分:200,643,578
近期平均积分:874
United Kingdom
消息 1744801 - 发表于:25 Nov 2015, 16:28:08 UTC

VBox + photobucket between them slowed that machine to a crawl, but I just got there in time to edit.

Conclusion - David's tinkerings so far have done nothing whatsoever to address issue #1392 as presented - i.e. that the IO and memory demands of VM tasks under BOINC are too disruptive to normal foreground use. If anything, they're much worse than when I last tried this - but I did deliberately test the machine showing most stress already.
ID: 1744801 · 举报违规帖子
Profile jason_gee
志愿者开发人员
志愿者测试人员
Avatar

发送消息
已加入:24 Nov 06
贴子:7489
积分:91,093,184
近期平均积分:0
Australia
消息 1744796 - 发表于:25 Nov 2015, 15:37:10 UTC - 回复消息 1744795.  

I don't know what'that means...

Nor do I - ask Ivan how it works.


Dr Ivan's tapped into an alternate source with bosons and stuff. Too busy to send me toys and ideas to play with at the moment I expect :D
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1744796 · 举报违规帖子
Richard Haselgrove Project Donor
志愿者测试人员

发送消息
已加入:4 Jul 99
贴子:14141
积分:200,643,578
近期平均积分:874
United Kingdom
消息 1744795 - 发表于:25 Nov 2015, 15:35:29 UTC - 回复消息 1744794.  

I don't know what'that means...

Nor do I - ask Ivan how it works.
ID: 1744795 · 举报违规帖子
Profile jason_gee
志愿者开发人员
志愿者测试人员
Avatar

发送消息
已加入:24 Nov 06
贴子:7489
积分:91,093,184
近期平均积分:0
Australia
消息 1744794 - 发表于:25 Nov 2015, 15:32:16 UTC - 回复消息 1744793.  
最近的修改日期:25 Nov 2015, 15:32:30 UTC

I don't know what'that means, but it sounds cool and I eagerly await results.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1744794 · 举报违规帖子
Richard Haselgrove Project Donor
志愿者测试人员

发送消息
已加入:4 Jul 99
贴子:14141
积分:200,643,578
近期平均积分:874
United Kingdom
消息 1744793 - 发表于:25 Nov 2015, 15:23:46 UTC
最近的修改日期:25 Nov 2015, 16:22:37 UTC

OK, I've set two VBox experiments running, both using the CERN CMS-dev virtual machines.

The first uses VBox version 5.0.2 with CMS-dev's current BOINC wrapper (26178): the second uses VBox 4.3.26, with the latest BOINC wrapper v26179. Both are running under BOINC v7.6.16, which is running in David's background mode.

The first has been running for long enough to get through all its startup phases, and has settled down to the usual "one full CPU core" running mode. The BOINC Wrapper is running at 'Low' priority (like any normal BOINC CPU app), but VBox(headless) is running at Normal priority. I can see no sign at all that either the Wrapper or VBox is running any of their various threads at anything other than Normal memory and IO priority.

The second hasn't got up to speed yet, but I should be able to perform the same checks while I can still edit this post.

Edit 1: it took bloody ages for the Wrapper to start up - about 4 minutes for the Client to copy 1.4 GB at minimal IO priority.

Edit 2: That took bloody hours, but I got there.


cmsRun is the key one.

Same result as test 1 - Wrapper running low priority, VBoxheadless running normal, all threads memory and IO at normal priority.
ID: 1744793 · 举报违规帖子
Profile jason_gee
志愿者开发人员
志愿者测试人员
Avatar

发送消息
已加入:24 Nov 06
贴子:7489
积分:91,093,184
近期平均积分:0
Australia
消息 1744792 - 发表于:25 Nov 2015, 15:23:42 UTC - 回复消息 1744783.  

maybe oddly, the higher priority client seems the best for our purposes so far. That's on the proviso the client does things quickly/efficiently. Maybe there is still a place for optimisation in this world after all.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1744792 · 举报违规帖子
1 · 2 · 3 · 4 · 后

留言板 : Number crunching : finish file present too long


 
©2020 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.