Message boards :
Number crunching :
WU Marked As Abandoned, Still Running On Computer
Message board moderation
Author | Message |
---|---|
Send message Joined: 28 Oct 11 Posts: 15 Credit: 9,888,738 RAC: 10,806 |
WU hadam4_a16s_201310_6_914_012099105 is showing as abandoned on my task list, but it is still running on the VM (Ubuntu, computer ID is 1493840). Should I really abort it on that VM, or can the status be changed to "In Progress"? Steve |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Best guess: It's not really doing anything useful, so get rid of it. Climate models were never intended for VMs and the like, and any results produced may not be what would be produced from the same starting data set, as just running it on a plain computer. And I'll bet THAT starts lots of arguments. |
Send message Joined: 7 Aug 04 Posts: 2185 Credit: 64,822,615 RAC: 5,275 |
Best guess: It's not really doing anything useful, so get rid of it. That's just silly. |
Send message Joined: 7 Aug 04 Posts: 2185 Credit: 64,822,615 RAC: 5,275 |
WU hadam4_a16s_201310_6_914_012099105 is showing as abandoned on my task list, but it is still running on the VM (Ubuntu, computer ID is 1493840). Should I really abort it on that VM, or can the status be changed to "In Progress"? Usually abandoned shows up as that when a computer is detached from cpdn and any tasks running at the time are then assigned as abandoned so another task from that work unit can be sent to some other computer. Les is right about that. I would abort the ones designated as "Abandoned" |
Send message Joined: 6 Oct 06 Posts: 204 Credit: 7,608,986 RAC: 0 |
Best guess: It's not really doing anything useful, so get rid of it. _______________________________ No arguments from my side but I have been reading that statement for quite a few days. Les, have you observed something? Let us know, please. Which might start an argument. No guarantees. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Just observing the number of failures and thinking. It's not possible to answer that question without a direct experiment, such as was done back in 2004/5 regarding overclocking. |
Send message Joined: 7 Aug 04 Posts: 2185 Credit: 64,822,615 RAC: 5,275 |
Just observing the number of failures and thinking. Besides running cpdn and other projects in both Linux and Windows for 17 years, I've run boinc in a Linux VM guest in Windows for the last 10 years at various times when Windows cpdn tasks weren't available, or on other projects where Linux provides a considerable speed advantage over Windows. On cpdn, I've seen no difference in failure percentage for Linux VM tasks vs. native Linux tasks that I've run. If anything, it's safer for controlled shutdowns because all you have to do is pause the VM, then resume when wanting to start it again. No oddities. There can be a problem with the usual Windows automatic reboots when updates occur, but I always pause Windows updates until I want to do it manually. That's really the biggest problem I see with running in a VM and crashes...if boinc isn't shutdown, or the VM suspended, before an unplanned Windows reboot or a power outage, there can be problems with the models on restarting boinc. As for the "Climate models were never intended for VMs and the like, and any results produced may not be what would be produced from the same starting data set, as just running it on a plain computer." speculation, well that's just silly. While we don't compare two model tasks from the same work unit in cpdn to validate the results, the climate experiment at WCG does. I've never had an invalid result over there (ARP climate model or other sub-project) from any task running in a VM. For validation, the ARP results are validated with tasks run on other computers with the same processor manufacturer, of the same or similar processor generation, and the same type of OS. So, for example, an AMD Ryzen task in Linux will be compared to the results of another task from the same work unit run on another Ryzen in Linux. The only invalid tasks I've had over there were with a native Linux PC which had memory go bad. ARP is a great memory tester. Now I'm speaking of experience with VMs with VMWare Workstation Player as the hypervisor. I've tried VirtualBox and WSL, but didn't find them as easy to configure as the VMWare Player. Also, for whatever reason, the tasks in the Linux guests did not run as fast as in VMWare Player. YMMV. I always set up the VM with an 80 GB fixed size virtual disk and fixed 8 or 16 GB of memory depending on the PC RAM size and task type. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
OK, that seems to rule that out, which is good. We've been in lockdown for two months, and that may be getting to me. And never mind hyperthetical weather, this area has just had two days of severe weather, with a "weather bomb" now forming, although it looks like moving out to sea a bit. But it's still freezing, and windy. BOM says rapidly deepening storm off coast of New South Wales brings Sydney's coldest day in 37 years |
Send message Joined: 7 Aug 04 Posts: 2185 Credit: 64,822,615 RAC: 5,275 |
And never mind hyperthetical weather, this area has just had two days of severe weather, with a "weather bomb" now forming, although it looks like moving out to sea a bit. Cool article. Very interesting for this retired meteorologist. The storm is similar to the bomb cyclones we sometimes get, even in the interior of the U.S.. And their "east coast low" seems similar to the northeast U.S. "nor'easter". He took quite the approach at defining the "east coast low" and explaining why this storm wasn't quite it. |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
WU hadam4_a16s_201310_6_914_012099105 is showing as abandoned on my task list, but it is still running on the VM (Ubuntu, computer ID is 1493840). Should I really abort it on that VM, or can the status be changed to "In Progress"? No way, no easy way, no way that will not violate the auditability of the work. If /when/ there's a foxtrot software fail, no researcher will waste time trying to resurrect the lost wu. No way. Try fix software edge case, maybe. Try re-submit particular work-unit maybe. Report as unspecified failure in multilevel software stack, sure. Kill it if you can. Otherwise wait until the wu times out next year. Keep on crunching. e |
©2024 cpdn.org