climateprediction.net home page
WU Marked As Abandoned, Still Running On Computer

WU Marked As Abandoned, Still Running On Computer

Message boards : Number crunching : WU Marked As Abandoned, Still Running On Computer
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Steve Dodd

Send message
Joined: 28 Oct 11
Posts: 15
Credit: 9,888,738
RAC: 10,806
Message 64388 - Posted: 20 Aug 2021, 5:09:28 UTC

WU hadam4_a16s_201310_6_914_012099105 is showing as abandoned on my task list, but it is still running on the VM (Ubuntu, computer ID is 1493840). Should I really abort it on that VM, or can the status be changed to "In Progress"?

Steve
ID: 64388 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64389 - Posted: 20 Aug 2021, 5:27:23 UTC - in response to Message 64388.  

Best guess: It's not really doing anything useful, so get rid of it.

Climate models were never intended for VMs and the like, and any results produced may not be what would be produced from the same starting data set, as just running it on a plain computer.

And I'll bet THAT starts lots of arguments.
ID: 64389 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2185
Credit: 64,822,615
RAC: 5,275
Message 64392 - Posted: 20 Aug 2021, 15:04:06 UTC - in response to Message 64389.  

Best guess: It's not really doing anything useful, so get rid of it.

Climate models were never intended for VMs and the like, and any results produced may not be what would be produced from the same starting data set, as just running it on a plain computer.

And I'll bet THAT starts lots of arguments.

That's just silly.
ID: 64392 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2185
Credit: 64,822,615
RAC: 5,275
Message 64393 - Posted: 20 Aug 2021, 15:12:34 UTC - in response to Message 64388.  

WU hadam4_a16s_201310_6_914_012099105 is showing as abandoned on my task list, but it is still running on the VM (Ubuntu, computer ID is 1493840). Should I really abort it on that VM, or can the status be changed to "In Progress"?

Steve

Usually abandoned shows up as that when a computer is detached from cpdn and any tasks running at the time are then assigned as abandoned so another task from that work unit can be sent to some other computer. Les is right about that. I would abort the ones designated as "Abandoned"
ID: 64393 · Report as offensive     Reply Quote
KAMasud

Send message
Joined: 6 Oct 06
Posts: 204
Credit: 7,608,986
RAC: 0
Message 64397 - Posted: 24 Aug 2021, 8:10:25 UTC - in response to Message 64389.  

Best guess: It's not really doing anything useful, so get rid of it.

Climate models were never intended for VMs and the like, and any results produced may not be what would be produced from the same starting data set, as just running it on a plain computer.

And I'll bet THAT starts lots of arguments.

_______________________________

No arguments from my side but I have been reading that statement for quite a few days.
Les, have you observed something? Let us know, please.
Which might start an argument. No guarantees.
ID: 64397 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64399 - Posted: 24 Aug 2021, 8:18:42 UTC - in response to Message 64397.  

Just observing the number of failures and thinking.

It's not possible to answer that question without a direct experiment, such as was done back in 2004/5 regarding overclocking.
ID: 64399 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2185
Credit: 64,822,615
RAC: 5,275
Message 64400 - Posted: 24 Aug 2021, 20:50:50 UTC - in response to Message 64399.  

Just observing the number of failures and thinking.

It's not possible to answer that question without a direct experiment, such as was done back in 2004/5 regarding overclocking.

Besides running cpdn and other projects in both Linux and Windows for 17 years, I've run boinc in a Linux VM guest in Windows for the last 10 years at various times when Windows cpdn tasks weren't available, or on other projects where Linux provides a considerable speed advantage over Windows.

On cpdn, I've seen no difference in failure percentage for Linux VM tasks vs. native Linux tasks that I've run. If anything, it's safer for controlled shutdowns because all you have to do is pause the VM, then resume when wanting to start it again. No oddities. There can be a problem with the usual Windows automatic reboots when updates occur, but I always pause Windows updates until I want to do it manually. That's really the biggest problem I see with running in a VM and crashes...if boinc isn't shutdown, or the VM suspended, before an unplanned Windows reboot or a power outage, there can be problems with the models on restarting boinc.

As for the "Climate models were never intended for VMs and the like, and any results produced may not be what would be produced from the same starting data set, as just running it on a plain computer." speculation, well that's just silly. While we don't compare two model tasks from the same work unit in cpdn to validate the results, the climate experiment at WCG does. I've never had an invalid result over there (ARP climate model or other sub-project) from any task running in a VM. For validation, the ARP results are validated with tasks run on other computers with the same processor manufacturer, of the same or similar processor generation, and the same type of OS. So, for example, an AMD Ryzen task in Linux will be compared to the results of another task from the same work unit run on another Ryzen in Linux. The only invalid tasks I've had over there were with a native Linux PC which had memory go bad. ARP is a great memory tester.

Now I'm speaking of experience with VMs with VMWare Workstation Player as the hypervisor. I've tried VirtualBox and WSL, but didn't find them as easy to configure as the VMWare Player. Also, for whatever reason, the tasks in the Linux guests did not run as fast as in VMWare Player. YMMV. I always set up the VM with an 80 GB fixed size virtual disk and fixed 8 or 16 GB of memory depending on the PC RAM size and task type.
ID: 64400 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64401 - Posted: 24 Aug 2021, 22:08:41 UTC
Last modified: 25 Aug 2021, 4:35:44 UTC

OK, that seems to rule that out, which is good.

We've been in lockdown for two months, and that may be getting to me.
And never mind hyperthetical weather, this area has just had two days of severe weather, with a "weather bomb" now forming, although it looks like moving out to sea a bit.
But it's still freezing, and windy.

BOM says rapidly deepening storm off coast of New South Wales brings Sydney's coldest day in 37 years
ID: 64401 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2185
Credit: 64,822,615
RAC: 5,275
Message 64403 - Posted: 25 Aug 2021, 3:25:36 UTC - in response to Message 64401.  

And never mind hyperthetical weather, this area has just had two days of severe weather, with a "weather bomb" now forming, although it looks like moving out to sea a bit.
But it's still freezing, and windy.

BOM says rapidly deepening storm off coast of New South Wales brings Sydney's coldest day in 37 years

Cool article. Very interesting for this retired meteorologist. The storm is similar to the bomb cyclones we sometimes get, even in the interior of the U.S.. And their "east coast low" seems similar to the northeast U.S. "nor'easter". He took quite the approach at defining the "east coast low" and explaining why this storm wasn't quite it.
ID: 64403 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,896,461
RAC: 649
Message 64415 - Posted: 28 Aug 2021, 8:51:14 UTC - in response to Message 64388.  

WU hadam4_a16s_201310_6_914_012099105 is showing as abandoned on my task list, but it is still running on the VM (Ubuntu, computer ID is 1493840). Should I really abort it on that VM, or can the status be changed to "In Progress"?

Steve


No way, no easy way, no way that will not violate the auditability of the work.
If /when/ there's a foxtrot software fail, no researcher will waste time trying to resurrect the lost wu. No way.
Try fix software edge case, maybe. Try re-submit particular work-unit maybe. Report as unspecified failure in multilevel software stack, sure.

Kill it if you can. Otherwise wait until the wu times out next year.
Keep on crunching.

e
ID: 64415 · Report as offensive     Reply Quote

Message boards : Number crunching : WU Marked As Abandoned, Still Running On Computer

©2024 cpdn.org