Posts by ivan


log in
21) Message boards : Technical News : Colo! (Mar 28 2013) (Message 1353588)
Posted 43 days ago by Profile ivan
Upward and onward goes the Seti search.

Meows and thanks to all who made this happen.

+1
22) Message boards : Number crunching : Don't forget about the Server Relocation on 1st of April ! (Message 1353180)
Posted 44 days ago by Profile ivan
Hm, most of my clients can at least communicate with the servers. However, my Linux box running BOINC 7.1.0 gets stuck at "Fetching scheduler list":

Thu 04 Apr 2013 09:25:54 AM CEST | SETI@home | update requested by user
Thu 04 Apr 2013 09:25:57 AM CEST | SETI@home | Fetching scheduler list
Thu 04 Apr 2013 09:26:14 AM CEST | Project communication failed: attempting access to reference site
Thu 04 Apr 2013 09:26:15 AM CEST | Internet access OK - project servers may be temporarily down.

Try stopping boinc, flushing your DNS cache (perhaps /etc/init.d/nscd restart) and restarting boinc.
23) Message boards : Number crunching : Don't forget about the Server Relocation on 1st of April ! (Message 1352954)
Posted 45 days ago by Profile ivan
And it looks like we have twice the link capacity now, unless the scale goes higher.

I'd imagine it's auto-scaling (remember it does zero-suppression when the lowest point is greater than zero) so it may have just upped to 200 Mbps once a point over 100 Mbps was plotted.
24) Message boards : Number crunching : If they don't want it .... (Message 1352947)
Posted 45 days ago by Profile ivan
Getting rid of the heat in a controlled manner is a bit more fun - there's at least one data centre that exports its waste heat to other buildings in the area - great in the winter, but a bit on the warm side in the summer (if we ever get one that is)

Cheer up, Rob, locally Northolt is predicting 9 C for the weekend -- but sleet tomorrow...
Must check to see what woollies I need for CERN next week.
25) Message boards : Number crunching : Case Fans - High CFM (Message 1352431)
Posted 48 days ago by Profile ivan
While not a direct reply. I hope y'all can get to this presentation without accreditation. While it's main concern is computer centre efficiency, one of its messages is that modern computing machinery can operate effectively with input temperatures of 28 C or more without invalidating the warranty. So you needn't overchill your computer room (especially if you can duct the warm air outside the cooled area).
26) Message boards : Number crunching : CUDA Processing (Message 1350112)
Posted 55 days ago by Profile ivan
This Lenovo laptop has 2 video cards, the Nvidia GTX 660M for gaming and the Intel Graphics HD 4000 for everday use, so it may be that the Nvidia card is just no longer running for SETI now with that variable set.

I am not certain, but I am guessing that the Intel is an onboard GPU and you may have to disable it to get the 660m to work properly for crunching.

It should be OK in Windows. I have a similar set-up with my Dell laptop and it finds the Nvidia card quite easily, for both seti and my digital hologram reconstructions. My problem is when I dual-boot into Linux, then I can only use the onboard Intel graphics.
27) Message boards : Number crunching : Panic Mode On (82) Server Problems? (Message 1349549)
Posted 57 days ago by Profile ivan
Blue line is sunk

Edit: looks like it just went down for a byte ( LOL ) and came back up for air again...

Yes, look like part of the 'net took a little nap. Remember, though, that that is a suppressed-zero graph; the blue line was at the bottom only because that was the lowest it had been in the graph's interval -- it only got down to about 5 Mbps.
28) Message boards : Number crunching : Limits and 19/20 March Outage (Message 1348641)
Posted 60 days ago by Profile ivan
Wiggo, I can appreciate your point of view however I only run this one project (as do others), hence our slightly differing views I suppose. All I am asking is that they consider it as a stop gap measure for the period of the outage which in turn would help people like myself keep putting the work through even though the project is down.

The download link is already maxxed out. Even if they rushed around like mad things changing stable configurations, how are you going to download 14 (I make it 15) hours of work in the next 64 minutes?

Either they haven't cut the link yet, or we seem to have a back-channel...
(A slow one for incoming, it would seem.)
29) Message boards : Number crunching : gtx680 lightning (Message 1347428)
Posted 63 days ago by Profile ivan

I don't know where this myth comes from -- it seems it might be necessary with some AMD/ATI co-processors, but I've never seen the need with my Nvidia rigs.

It's not a myth, but it's not a requirement either. At least not on Intel/nVidia rigs.
Leaving a processor idle can slightly improve GPU output by reducing the preprocessing and loading of WUs into the card, especially on rigs that have several GPU cores running.
I am not sure it would provide much benefit on a rig with only a single GPU.

"myth" may not have been exactly the right word, but in my career I have come across many cases where some workaround had become embedded in the mythos (there's that word again...) of a workgroup, long after the need had been remedied. E.g. one case where I was working at UCL and I found my workstation grinding to a halt; on examination I found several instances of grad students running ROOT on the machine, exhausting CPU time and virtual memory -- they'd been told that "the fastest ROOT machine is xxx".
30) Message boards : Number crunching : gtx680 lightning (Message 1347416)
Posted 63 days ago by Profile ivan
on monday my video card will arrive. before that happens i would like to know a few things. i hear that you can crunch more then one wu on the card at the same time, how is this achieved?

In your projects\setiathome.berkely.edu\app_info.xml file, wherever you see
<coproc>
<type>CUDA</type>
<count>0.4999</count>
</coproc>
make sure that the count is .4999 (or .3333 if you want to try three at a time)

also i have been reading that if you use a gpu to crunch you need to leave 1 core free if so is this just done by changing use at most x processors in muti processor systems to equal in my case 2 less then what my system shows since i have hyperthreading? and if i do crunch more then 1 wu on the card do i have to free up one core per wu running?

I don't know where this myth comes from -- it seems it might be necessary with some AMD/ATI co-processors, but I've never seen the need with my Nvidia rigs.
31) Message boards : Number crunching : Remaining (Estimated) Time Column Counting Upwards (Message 1346856)
Posted 64 days ago by Profile ivan
Yep! ... My Rig just needed a Good Swift Kick in The Rear Tyre ...LOL :-)

Cheers.

Ooh, you don't want to do that! Here's what can happen if you do...
(see http://www.brunel.ac.uk/~eesridr/pt.html for the full story.)
32) Message boards : Number crunching : Ubuntu client_state.xml file help (Message 1345633)
Posted 68 days ago by Profile ivan
Hello,

Workunit #1 <checkpoint_fraction_done> shows 0.667525
Workunit #1 GUI shows 77% done

The fraction shown should be 67% done, no? I'm at a loss trying to figure out what's causing the discrepancy. Perhaps getting the % done is not as easy as reading the value stored in the xml file?

Any ideas?

A checkpoint is when a programme writes its entire internal data state to external storage so that the calculation can be recovered/restarted from that point. Execution then continues until the next time a checkpoint occurs, and so on. If the programme is interrupted it can be restarted at the last checkpoint but obviously calculations done since that point are lost and must be redone. How often checkpointing is done is a trade-off between the resources needed to create the checkpoint and the risk of lost calculations. If the programmer decides that a checkpoint need only be done, say, every 1/6th of the estimated time to complete, then the 4th checkpoint will occur at 66.6% (plus or minus the time between decision points as to whether to checkpoint this iteration or not) and the 5th will occur at about 80%. External snapshot monitoring can then quite easily report that the program is 77% finished, and the last checkpoint was at 66%.
33) Message boards : Number crunching : Panic Mode On (82) Server Problems? (Message 1345140)
Posted 69 days ago by Profile ivan
So, what are the theories as to why the cricket graph incoming (blue line) has started to creep upward? Something to do with the current lack of AP workunits to send?
34) Message boards : Number crunching : Errors with new GTX 560's (Message 1345047)
Posted 69 days ago by Profile ivan
OK, so it looks like "something" happened around the time I shut down for the night, that put the display in a funny state and throttled back the computing. Then my switching resolutions fixed the problem (as GPU-Z then reported full power) but one job was so close to its time limit that it ran over, while the second managed to just scrape in.
So it might be worthwhile in future to try changing resolutions down and back to see if that's sufficient to reset the GPU engine, rather than power cycling.
There haven't been any more problems yet.


Yes, sounds like a one off glitch so far. If a soft error occurs at any level (Application host code, GPU code, driver(s), OS-Kernel, hardware) then there's no ECC or other fault tolerant mechanisms to handle that & nVidia opt to enter failsafe operating modes. That in the case of your own glitch, a resolution reset was sufficient to recover normal operation is interesting. It could be effective in some special cases if Boinc or some other tool provided that capability automatically, though I'd certainly be hesitant to implement anything
like a 'catch all safety pin puller', as a failsafe was usually entered for some intentional reason, not always video driver stack or GPU issues.

Still no repeat performance. Apologies to my wingmen, of course, but some WUs have already been validated by a third party. One or two of the rest have an interesting history...
35) Message boards : Number crunching : TDP vs. flops/performance (Message 1344988)
Posted 69 days ago by Profile ivan
Hi all.
I'm from Denmark, and our pricing on elektricity is properly the most expensive in the world. 0,352 US$/KWh.

The hardware overhere is also expensive, but the powerbill is just as high a faktor to incalc when you get new hardware.

I wish there was a easy way on this site to quiqly see what produkt/GPU/CPU i could get the most numbercrunching per $ spent on elektricity.

maby there is such a website allready, in that case, please post it here in this post - Thanks in adv.

regards Kim.

Wiki has a well-updated list comparing the NVIDIA GPUs.
36) Message boards : Number crunching : Errors with new GTX 560's (Message 1344932)
Posted 69 days ago by Profile ivan
My home machine (Athlon 64X2 + GTX 560) has just shown some weirdness. When I sat down to it this morning the display was in a funny state -- not showing the full display but panning around in it when I moved the mouse to the edge of the screen. The display setup still showed 1440x900, so I changed it to a lower resolution and back again and all was well.
Then I looked at boincmgr and realised that the two GPU jobs were taking much longer than normal, well over an hour.

The 11 errored tasks are all

Exit status 197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED

BOINC has killed them, because they ran slow - taking more than 10 times longer than expected. So, an effect, rather than a cause.

It's a known design feature of newer nVidia cards that they enter a protective low-power and low-speed state when "something funny happens", and can only be brought back to full speed by a power cycle (computer reboot).

The trouble is, "something funny" covers a multitude of potential sins - Jason probably has the fullest list, but I doubt even that is exhaustive. Searching for the cause is probably more trouble than it's worth for a single event, but if you start experiencing this frequently, you'll be looking at heat, power, PCIe bus stability, RAM timings, and a whole lot more.

OK, so it looks like "something" happened around the time I shut down for the night, that put the display in a funny state and throttled back the computing. Then my switching resolutions fixed the problem (as GPU-Z then reported full power) but one job was so close to its time limit that it ran over, while the second managed to just scrape in.
So it might be worthwhile in future to try changing resolutions down and back to see if that's sufficient to reset the GPU engine, rather than power cycling.
There haven't been any more problems yet.
37) Message boards : Number crunching : Errors with new GTX 560's (Message 1344907)
Posted 69 days ago by Profile ivan
My home machine (Athlon 64X2 + GTX 560) has just shown some weirdness. When I sat down to it this morning the display was in a funny state -- not showing the full display but panning around in it when I moved the mouse to the edge of the screen. The display setup still showed 1440x900, so I changed it to a lower resolution and back again and all was well.
Then I looked at boincmgr and realised that the two GPU jobs were taking much longer than normal, well over an hour. I checked with GPU-Z and the compute load was 99%. Just then one of them completed and reported "Computing Error" so I uploaded it and went to my tasks page to check its status. I found 11 errored jobs, all from the time I'd been away from the machine, and apparently all being unhandled exceptions. Then the second long job finished, but it reported normally. Subsequently two more jobs have finished normally -- make that three now.
So, it appears that there was some sort of glitch, possibly when I was switching off (I just turn off the monitor and speakers and the wireless mouse when I leave the PC) that affected both the display and the computing. Another possibility is the UPS which is currently showing "new battery needed" so it ran a self-test some time recently -- I'm not sure if its minute-long beeping session when it fails is enough to wake me from the next room so I'm not sure when that happened.
If anyone can extract any useful information from the debug dumps, please feel free to do so.
38) Message boards : Number crunching : Windows TCP Settings - Follow up - Help with server communication (Message 1344151)
Posted 71 days ago by Profile ivan
In HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters you need to change Tcp123Opts to the value 3, or add it if its not already there.


Mkay. The Tcp123Opts isnt actually there. Do I need to add it as a new DWORD?

Yes.
39) Message boards : Number crunching : Windows TCP Settings - Follow up - Help with server communication (Message 1344146)
Posted 71 days ago by Profile ivan
Ill try some regediting on win 8 if you want as tcp optimizer definitely doesnt work. It does have regedit and command prompt (although i think command prompt has been stripped of a lot of useful things if i remember correctly). Might been babystepping on what to input in reg edit though as it has been a loooong while


In HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters you need to change Tcp123Opts to the value 3, or add it if its not already there.
40) Message boards : Number crunching : Windows TCP Settings - Follow up - Help with server communication (Message 1344141)
Posted 71 days ago by Profile ivan
Now to go see if this can be tweaked on Linux...

We think Linux has it enabled by default, but an actual observation would be interesting.

See here.
tcp_window_scaling (Boolean; default: enabled; since Linux 2.2)
Enable RFC 1323 TCP window scaling.

[server01] /home/ireid > cat /proc/sys/net/ipv4/tcp_window_scaling
1
[server01] /home/ireid > cat /proc/sys/net/ipv4/tcp_wmem
4096 16384 4194304

tcp_timestamps (Boolean; default: enabled; since Linux 2.2)
Enable RFC 1323 TCP timestamps.

[server01] /home/ireid > cat /proc/sys/net/ipv4/tcp_timestamps
1


So, it looks like they are set by default (this is my most veteran Linux box, I keep it in sync with the OS we use at CERN which is basically RHEL5).


Previous 20 · Next 20

Copyright © 2013 University of California