climateprediction.net (CPDN) home page
Thread 'Notice: Problems with PNW 'd' series Weather at Home models issued on Feb 22'

Thread 'Notice: Problems with PNW 'd' series Weather at Home models issued on Feb 22'

Message boards : Number crunching : Notice: Problems with PNW 'd' series Weather at Home models issued on Feb 22
Message board moderation

To post messages, you must log in.

AuthorMessage
ProfileGreg van Paassen

Send message
Joined: 17 Nov 07
Posts: 142
Credit: 4,271,370
RAC: 0
Message 45572 - Posted: 22 Feb 2013, 22:56:31 UTC

These models appear to have multiple issues: missing download files, and missing files within the zip files that are present and do get downloaded.

See this thread in the phpBB forums.
ID: 45572 · Report as offensive     Reply Quote
MarkJ
Avatar

Send message
Joined: 28 Mar 09
Posts: 126
Credit: 9,825,980
RAC: 0
Message 45573 - Posted: 22 Feb 2013, 23:41:12 UTC
Last modified: 22 Feb 2013, 23:45:38 UTC

I can second that. I have received 10 of them on the 22nd of Feb, all failed. Most of my wingmen have also errored out or haven't reported yet.

I can't seem to look at the task details on the website for any of them. I am clicking on the TaskId links. The website simply displays the CPDN logo at the top and the circle running around indicating its waiting on the website. Using IE9 under Win7.

Links to some wu:
One
Two
Three

Looks like the whole batch are stuffed. I wonder if they could check 1 or 2 in a batch before they send them out? Maybe they could generate one, see if that works and then generate the rest once its successful.
BOINC blog
ID: 45573 · Report as offensive     Reply Quote
ProfileThe Ancient One

Send message
Joined: 5 Sep 04
Posts: 21
Credit: 2,502,662
RAC: 2,171
Message 45574 - Posted: 22 Feb 2013, 23:54:51 UTC

I'm getting similar problems 3 WU's all have computation error's within 1 - 3 mins of starting.
"All man born has a right to life and no man born has the right to take that life"
ID: 45574 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 45575 - Posted: 23 Feb 2013, 7:07:32 UTC

The links are now opening. Looking at the first link, the tasks have two different errors. The machine running Darwin has a segmentation violation, the other two both have something pretty similar, this is the first one.

Signal 11 received, exiting...
Called boinc_finish
Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=1460, selfPID=1460, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=1460, selfPID=2428, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Regional yearly means requires 12 input files got 0
Called boinc_finish




[/url]
ID: 45575 · Report as offensive     Reply Quote
Lockleys

Send message
Joined: 13 Jan 07
Posts: 195
Credit: 10,581,566
RAC: 0
Message 45576 - Posted: 23 Feb 2013, 10:07:51 UTC

I've had 6 PNWs fail this morning. Running on Windows 7 and Intel.
ID: 45576 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 45577 - Posted: 23 Feb 2013, 11:04:57 UTC

I've just had two PNW failures on Win7 and geophi's had several, probably on Linux. There's clearly something wrong with this batch. At least they don't spend much time crunching.
Cpdn news
ID: 45577 · Report as offensive     Reply Quote
ProfileByron Leigh Hatch @ team Carl ...
Avatar

Send message
Joined: 17 Aug 04
Posts: 289
Credit: 44,103,664
RAC: 0
Message 45579 - Posted: 23 Feb 2013, 17:00:18 UTC
Last modified: 23 Feb 2013, 17:08:55 UTC

I've also had one PNW fail this morning. Running on Windows 7 and Intel within 1 min of starting -- hadam3p_pnw_df38_2046_1_008313166
ID: 45579 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 45580 - Posted: 24 Feb 2013, 0:31:21 UTC

I've just had another PNW crash. I knew it would probably crash at about 25sec so I started the graphics to try to see what was happening. Just a completely black window - the model didn't appear to have started crunching.
Cpdn news
ID: 45580 · Report as offensive     Reply Quote
candido

Send message
Joined: 15 Nov 10
Posts: 43
Credit: 6,118,949
RAC: 0
Message 45592 - Posted: 1 Mar 2013, 23:21:25 UTC - in response to Message 45580.  

I had 3 that failed in one computer and 1 failed in another.
In all cases after a few hours after having started.

ID: 45592 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 45593 - Posted: 1 Mar 2013, 23:49:18 UTC

Occasionally, some ranges of models get released with the wrong values in some of the supporting files. This leads to the model(s) in question failing when it/they get to the incorrect part.



Backups: Here
ID: 45593 · Report as offensive     Reply Quote
old_user671679

Send message
Joined: 30 Jan 12
Posts: 38
Credit: 10,197,388
RAC: 0
Message 45594 - Posted: 2 Mar 2013, 0:06:09 UTC

I had all 25 fail on my machines. Are they going to rework these wu's and send them back out?
ID: 45594 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 45595 - Posted: 2 Mar 2013, 1:02:55 UTC

ccandido, I can't see your models because your computers are hidden.

Flashawk, your crashed PNW models all seem to be from the problematic batch created on 22 February. This is a nuisance, but they do crash very quickly after starting and don't use much processing time.

I expect this batch of models will indeed be reworked and reissued as this is usually done when a batch doesn't run successfully as expected.
Cpdn news
ID: 45595 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 45596 - Posted: 2 Mar 2013, 2:34:11 UTC

The usual way of 'reworking' a large batch of faulty WUs, is to inform the external suppliers of the data in question, and leave it up to them to sort it out. This can take time.
In this case, they're from the University of Oregon in the USA.


Backups: Here
ID: 45596 · Report as offensive     Reply Quote
old_user671679

Send message
Joined: 30 Jan 12
Posts: 38
Credit: 10,197,388
RAC: 0
Message 45597 - Posted: 2 Mar 2013, 22:23:37 UTC

Thanks guys, that's where my youngest son goes to school.
ID: 45597 · Report as offensive     Reply Quote
candido

Send message
Joined: 15 Nov 10
Posts: 43
Credit: 6,118,949
RAC: 0
Message 45612 - Posted: 6 Mar 2013, 23:46:17 UTC

Currently I have 14 WU running
Some have reached more than 20% completion
But several ohters failed during download or in the first hours
Lets see how these 14 will do...
ID: 45612 · Report as offensive     Reply Quote

Message boards : Number crunching : Notice: Problems with PNW 'd' series Weather at Home models issued on Feb 22

©2024 cpdn.org