climateprediction.net (CPDN) home page
Thread 'Error while computing'

Thread 'Error while computing'

Message boards : Number crunching : Error while computing
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4538
Credit: 19,008,987
RAC: 21,524
Message 53498 - Posted: 22 Feb 2016, 20:42:12 UTC
Last modified: 22 Feb 2016, 21:01:28 UTC

I found it was necessary to use the most recent development Wine versions, the earlier releases didn't work on my system. (Using Ubuntu 15.04/15.10 on stock hardware.)


Interesting, I am also running Ubuntu15.10 and pretty stock hardware and Wine1.6.2 the standard offering seems to work just fine with no messing about with settings needed whatsoever.

Edit: I am running the latest BOINC. Don't know if that makes a difference?
ID: 53498 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 53499 - Posted: 22 Feb 2016, 21:38:02 UTC
Last modified: 22 Feb 2016, 21:43:38 UTC

My Haswell is running Mint 16, Wine 1.6.1, and BOINC 7.6.9
The Windows version is XP Pro, as it's the one that I'm most familiar with.

The hard parts were finding where the Wine stuff was installed, and remembering to right-click and install with the Wine option, rather than the usual double left-click.
Everything else was a bit of a let down, as it all was "just there".
The only 2 failures have been due to model problems.
ID: 53499 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 53500 - Posted: 22 Feb 2016, 21:51:21 UTC

jrapdx

I only looked at one of your failed models, and that had a long list of Suspends in it, indicating that your option for Suspend work if CPU usage is above is probably set at the default of 25%, which isn't a good idea with climate models.

I'm also starting to get suspicious of how Windows 10 interacts with BOINC and the tasks it runs. Although that may not matter with Wine, as it's a rebuild and not the Real Deal.


ID: 53500 · Report as offensive     Reply Quote
ProfileBonsai911

Send message
Joined: 9 Sep 04
Posts: 228
Credit: 30,750,791
RAC: 3,898
Message 53502 - Posted: 23 Feb 2016, 15:46:50 UTC

after two minutes, two wu crashed. This is Stderr:


<core_client_version>7.6.22</core_client_version>
<![CDATA[
<stderr_txt>
Signal 11 received, exiting...
16:16:49 (39024): called boinc_finish(193)
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=36120, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=39024, selfPID=40312, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
16:16:53 (40312): called boinc_finish(0)

ID: 53502 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 53503 - Posted: 23 Feb 2016, 17:01:50 UTC - in response to Message 53502.  

Batch 341 was a misconfigured batch. So, the three recent failures from that PC have nothing to do with your PC, just that week old bad batch.
ID: 53503 · Report as offensive     Reply Quote
jrapdx

Send message
Joined: 4 Jul 15
Posts: 63
Credit: 3,223,760
RAC: 0
Message 53506 - Posted: 23 Feb 2016, 20:41:10 UTC - in response to Message 53500.  
Last modified: 23 Feb 2016, 20:46:33 UTC

Suspend work if CPU usage is above is probably set at the default of 25%, which isn't a good idea with climate models.
It was set to 25%. On my other computer, setting of 60% seems OK, and I reset this one to 60% too. However the computer wasn't being used for much except Wine/BOINC, without which CPU usage was well below 25%. I doubt the occasional OS activity exceeded 25%, but no harm using the higher setting.

Yesterday one task completed (yay!), but subsequent downloads errored out with message "couldn't start app: CreateProcess() failed - Internal error.(0x54f)". Around that time I had trouble getting Wine to run (after a system reboot), which probably accounts for these errors.

Wine does seem unreliable on my system, perhaps configuration issues but haven't found anything notable. I've considered deleting and reinstalling Wine (and BOINC), but hesitate re: losing the CPDN work underway. Maybe there's a way to save and resume it, but haven't dug into the question yet.

Could be coincidental but all the failures under Wine have been with wah2 tasks. However on the positive side two wah2 are still running and with any luck will successfully complete.
ID: 53506 · Report as offensive     Reply Quote
jrapdx

Send message
Joined: 4 Jul 15
Posts: 63
Credit: 3,223,760
RAC: 0
Message 53507 - Posted: 23 Feb 2016, 20:56:42 UTC

Terribly sorry about the multiple posts. The browser kept timing out and I didn't know the (partial) messages were sent. Maybe a moderator could delete all but the last one, I'd appreciate it.
ID: 53507 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 53508 - Posted: 23 Feb 2016, 23:21:33 UTC
Last modified: 23 Feb 2016, 23:22:21 UTC

Done.
My trick with the "spinning wheel" is to open a new window and look at the forum. If the post made it, then I cancel the original post. (It seems that it's mostly the reply to the poster that gets held up.)

Present problem is a firewall issue, which looks like dragging on for a while, while whoever supplies and installs the hardware does what ever needs to be done.
And I have oodles of zips to upload. Sigh.
ID: 53508 · Report as offensive     Reply Quote
jrapdx

Send message
Joined: 4 Jul 15
Posts: 63
Credit: 3,223,760
RAC: 0
Message 53509 - Posted: 24 Feb 2016, 5:53:02 UTC - in response to Message 53508.  

Thank you!

I've tried the trick of opening another tab to load the URL, but it doesn't always work. I mean the indicator will be spinning on that tab instead of this one. Info about the firewall is interesting, how well I can relate, it's sometimes hard getting things to work like we believe they ought to. Anyway, it sheds some light on noticing how slow CPDN has been been lately, obviously the problem I was recently having...
ID: 53509 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 25,620,508
RAC: 4,981
Message 53511 - Posted: 24 Feb 2016, 14:14:33 UTC - in response to Message 53497.  

However I think it's worth pointing out that BOINC/CPDN under Wine is not all a bed of roses. I've experienced numerous "error while computing" task failures, some of which are likley attributable to Wine-related interruptions. Wine itself can be tricky to set up, I am still working on getting boincmgr.exe to start correctly when the computer unexpectedly reboots (as we are subject to random power failures here).

I found it was necessary to use the most recent development Wine versions, the earlier releases didn't work on my system. (Using Ubuntu 15.04/15.10 on stock hardware.) Wine 1.9.4 was just announced, with Ubuntu PPA latest is 1.9.3.
When I nail down the magic recipe for keeping all the plates spinning, I'll post the information.



While this might actually go under WINE discussion, I noticed that your WINE computer (ID: 1389186) is reported as Windows 10 and since win 10 is new that might be a cause for your wine troubles. Perhaps it is better to choose other windows distribution that WINE should run applications on.
ID: 53511 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 53515 - Posted: 24 Feb 2016, 18:24:05 UTC - in response to Message 53511.  

Bernard

According to the Wiki article on WINE:
It duplicates functions of Windows by providing alternative implementations of the DLLs that Windows programs call, and a process to substitute for the Windows NT kernel. This method of duplication differs from other methods that might also be considered emulation, where Windows programs run in a virtual machine. Wine is predominantly written using black-box testing reverse-engineering, to avoid copyright issues


As none of the many versions of "Wine Windows" available are MicroSoft Windows, I don't think that any comparison of problems can be made.

Although this IS about computers, so who knows.

ID: 53515 · Report as offensive     Reply Quote
jrapdx

Send message
Joined: 4 Jul 15
Posts: 63
Credit: 3,223,760
RAC: 0
Message 53518 - Posted: 25 Feb 2016, 10:07:02 UTC - in response to Message 53511.  

It's not really clear what the Windows version means for Wine. I've looked at the documentation but difficult to sort out how it affects execution of apps like boinc*. I could change it from Win10 but my hunch is it won't matter. At this point BOINC/CPDN are running OK. Today a task completed and after a new one started 4 tasks are going so I'm inclined to leave things alone for now.

The main problem I've had with Wine is reliably starting BOINC. I was trying to set it up so if the computer reboots (prone to random power outages in this location), BOINC would be automatically restarted. However, despite various attempts with shell scripts, etc., it hasn't worked. The only way it does work requires manually changing to BOINC program directory to start BOINC in a terminal. I need to learn more about the intricacies of Wine, I've hardly used it up to now.

BTW with the Ubuntu PPA Wine version is now 1.9.4, I updated it yesterday.
ID: 53518 · Report as offensive     Reply Quote
Kevin

Send message
Joined: 5 Jul 09
Posts: 63
Credit: 6,091,274
RAC: 0
Message 54527 - Posted: 15 Jul 2016, 15:02:32 UTC

Just lost 2 tasks, both afr50, one yesterday one today, but both in the final 15 min of processing.

I had a windows error message to say that a program had failed on screen, one related to the file, closed down boinc before closing down message, on restarting boinc message re-appeared and then task failed with computational error.

http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=19799983
http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=19810807



Kevin
ID: 54527 · Report as offensive     Reply Quote
Kevin

Send message
Joined: 5 Jul 09
Posts: 63
Credit: 6,091,274
RAC: 0
Message 54535 - Posted: 17 Jul 2016, 18:33:08 UTC - in response to Message 54527.  

Just lost 2 tasks, both afr50, one yesterday one today, but both in the final 15 min of processing.

I had a windows error message to say that a program had failed on screen, one related to the file, closed down boinc before closing down message, on restarting boinc message re-appeared and then task failed with computational error.

http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=19799983
http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=19810807


And another two.

http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=19811590
http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=19813607

The error message I am getting is hadam3p_afr_7.22_windows_intelx86.exe has stopped working.

ATM the error message is still on the screen and the last work unit is showing in boinc to be 100% completed and --- time remaining but is still running.

I have now shut down boinc manager and waited until all boinc programs have closed in task manager then restarted boinc. Error message re-appeared after a few seconds and the work unit has restarted at 99.723 with 11 min remaining.

Any ideas anyone??


Kevin
ID: 54535 · Report as offensive     Reply Quote
ProfileJIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,363,583
RAC: 5,022
Message 54536 - Posted: 17 Jul 2016, 21:56:55 UTC - in response to Message 54535.  

Abort
ID: 54536 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,808,726
RAC: 5,192
Message 54537 - Posted: 17 Jul 2016, 21:57:40 UTC - in response to Message 54535.  

[Kevin wrote:And another two.

http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=19811590
http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=19813607

The error message I am getting is hadam3p_afr_7.22_windows_intelx86.exe has stopped working.

ATM the error message is still on the screen and the last work unit is showing in boinc to be 100% completed and --- time remaining but is still running.

I have now shut down boinc manager and waited until all boinc programs have closed in task manager then restarted boinc. Error message re-appeared after a few seconds and the work unit has restarted at 99.723 with 11 min remaining.

Any ideas anyone??

No ideas, but same error as one of your models (AFR):

<core_client_version>7.6.22</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -1073740791 (0xc0000409)
</message>
<stderr_txt>
Suspended CPDN Monitor - Suspend request from BOINC...
Leaving CPDN_Main::Monitor...

</stderr_txt>
]]>
ID: 54537 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 54542 - Posted: 17 Jul 2016, 22:28:37 UTC

I've had a couple of pop-ups about Windows having a problem.
I could still get at the BOINC menus though, so I left the message alone, and uploaded all of the files, THEN clicked on the message.
I forget what happened then. Another message?

This was on the Linux machine running Wine.

ID: 54542 · Report as offensive     Reply Quote
Kevin

Send message
Joined: 5 Jul 09
Posts: 63
Credit: 6,091,274
RAC: 0
Message 54543 - Posted: 18 Jul 2016, 0:53:40 UTC

Oh well I messed up, I tried to delete a couple of exe files within boinc cpnd folder and it blew out the rest of the wu's, so I have removed CPND and then added it back, I have picked up one afr and have set NNT, will see how this processes, they normally run for three days on this machine.


Kevin
ID: 54543 · Report as offensive     Reply Quote
MartinNZ

Send message
Joined: 22 Mar 06
Posts: 144
Credit: 24,695,428
RAC: 0
Message 54568 - Posted: 25 Jul 2016, 9:41:08 UTC - in response to Message 54542.  

I've had a couple of pop-ups about Windows having a problem.

Funny, I had some but ignored them as there was nothing in the event log at the time. Never seen pop-ups from CPDN/BOINC before.

Over the last few days had two hadam3p_afr50 tasks go down with the same stderr message -

"The extended attributes are inconsistent. (0xff) - exit code 255 (0xff)"
e.g. task 19819095
Any thoughts? Sound model related issue to me.

Also had a wah2 go down with "The system cannot find the drive specified. (0xf) - exit code 15 (0xf)", which of course could be PC related. Task 19795175
ID: 54568 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 54570 - Posted: 25 Jul 2016, 14:02:11 UTC - in response to Message 54568.  
Last modified: 25 Jul 2016, 14:04:07 UTC

Also had a wah2 go down with "The system cannot find the drive specified. (0xf) - exit code 15 (0xf)", which of course could be PC related. Task

I have seen the same thing, when three tasks failed upon a reboot. Two of them had that error message:
http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=19801173
http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=19806053

The third one had a different error message:
http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=19808034

In fact, a fourth one failed earlier, but had already been reported about 35 minutes before the reboot. However, it did not show up in BoincTasks History, so there may have been something strange about it.
http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=19810870

I am not sure of cause and effect, but I have rebooted a couple of times since without any problems, so it seems to be somehow connected to the work units themselves.
I can't figure it out beyond that.
ID: 54570 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : Error while computing

©2024 cpdn.org