climateprediction.net (CPDN) home page
Thread 'Error while computing???'

Thread 'Error while computing???'

Message boards : Number crunching : Error while computing???
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4538
Credit: 19,007,330
RAC: 21,449
Message 58502 - Posted: 1 Aug 2018, 12:46:55 UTC - in response to Message 58501.  

It has been noted that they have a high failure rate. My looking at the work units suggests failure rate is higher on AMD than Intel but there seem to be some running well past an hour on both. Will wait and see if Sihan comes up with any reason for the high failure rate.
ID: 58502 · Report as offensive     Reply Quote
ProfileBonsai911

Send message
Joined: 9 Sep 04
Posts: 228
Credit: 30,750,791
RAC: 3,898
Message 58504 - Posted: 1 Aug 2018, 13:28:45 UTC

Twelve workunits without any problems so far on an Intel-cpu.
ID: 58504 · Report as offensive     Reply Quote
flashawk

Send message
Joined: 29 Jun 12
Posts: 31
Credit: 1,438,478
RAC: 0
Message 58506 - Posted: 1 Aug 2018, 13:50:05 UTC - in response to Message 58502.  

My looking at the work units suggests failure rate is higher on AMD than Intel



That's one thing we could have done without, an unscientific opinion like that is going to get all the fanbois all sturred up and start flame wars.
ID: 58506 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4538
Credit: 19,007,330
RAC: 21,449
Message 58508 - Posted: 1 Aug 2018, 14:03:07 UTC - in response to Message 58506.  
Last modified: 1 Aug 2018, 14:33:16 UTC

fanbois?

Surely the ones that would get upset over this are all on super quiet fanless liquid cooling?

(I have used both so can get upset on either account :) )
ID: 58508 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 58511 - Posted: 1 Aug 2018, 15:18:39 UTC - in response to Message 58506.  

an unscientific opinion

I said that first about AMDs, and it's NOT an opinion, it's an observation, based on the 2 failures that I received a few hours ago.

Since then, more people have posted, and there's lots of Intels as well now.

Whatever the reason, a failure is a failure, and will gather no credit.
ID: 58511 · Report as offensive     Reply Quote
flashawk

Send message
Joined: 29 Jun 12
Posts: 31
Credit: 1,438,478
RAC: 0
Message 58513 - Posted: 1 Aug 2018, 16:21:57 UTC - in response to Message 58511.  

Your statement doesn't make any sense at all to me, you said what first? You had 2 failures on your computers? I'm sorry I don't understand. To me, these failures don't seem hardware related at all and I'll try to do better with using the correct terminology next time.
ID: 58513 · Report as offensive     Reply Quote
MartinNZ

Send message
Joined: 22 Mar 06
Posts: 144
Credit: 24,695,428
RAC: 0
Message 58625 - Posted: 17 Aug 2018, 3:37:06 UTC - in response to Message 58513.  

Just about all the wah2 tasks issued between 1-4 Aug failed - about 100 of them. Ones since then seem to be OK. Computer 1415561 (Ignore the abandoned ones - that's another thing.)
ID: 58625 · Report as offensive     Reply Quote
Bjarke

Send message
Joined: 23 Aug 06
Posts: 6
Credit: 5,365,473
RAC: 0
Message 59105 - Posted: 27 Nov 2018, 11:09:22 UTC
Last modified: 27 Nov 2018, 12:00:43 UTC

Several failures within minutes after downloading. All wah2_safr50 tasks:

Task 21397321
Task 21390625
Task 21395215

And I get a Fortran error appearing as a warning box/window outside BOINC. Below example is for Task 21393668

Intel(r) Visual Fortran run-time error
forrtl: severe (19): invalid reference to variable in NAMELIST input, unit
4, file
C:\ProgramData\BOINC\projects\climateprediction.net\wah2_safrSO_b
2nw_199912_16_774_011679367\jobs\xadae.stashc, line 60, position 13
Image PC Routine Line Source
wah2am3m2_um_8.24 016B32AA Unknown Unknown
Unknown
wah2am3m2_um_8.24 0165FC90 Unknown Unknown
Unknown
wah2am3m2_um_8.24 0165EESA Unknown Unknown
Unknown
wah2am3m2_um_8.24 01641DEA Unknown Unknown
Unknown
wah2am3m2_um_8.24 014F999F Unknown Unknown
Unknown
wah2am3m2_um_8.24 0155F43F Unknown Unknown
Unknown
wah2am3m2_um_8.24 016112E8 Unknown Unknown
Unknown
wah2am3m2_um_8.24 013497F4 Unknown Unknown
Unknown
wah2am3m2_um_8.24 016989FF Unknown Unknown
Unknown
KERNEL32.DLL 745462C4 Unknown Unknown Unknown
ntdll.dll 77971 F69 Unknown Unknown Unknown
ntdll.dll 77971 F34 Unknown Unknown Unknown

Event log example for tasks 21397321,
21390625 and 21395215.


27/11/2018 11:46:10 | climateprediction.net | Finished download of atmos_restart_batch_741_safr50_a0sl_2004-12-01.gz
27/11/2018 11:46:10 | climateprediction.net | Started download of ic19611201_16_N96.gz
27/11/2018 11:46:11 | climateprediction.net | Finished download of ic19611201_16_N96.gz
27/11/2018 11:46:11 | climateprediction.net | Started download of final_ancil_2year_OSTIA_sst_2004-12-01_2006-12-30.gz
27/11/2018 11:46:13 | climateprediction.net | Finished download of region_restart_batch_741_safr50_a0sl_2004-12-01.gz
27/11/2018 11:46:13 | climateprediction.net | Started download of final_ancil_2year_OSTIA_ice_2004-12-01_2006-12-30.gz
27/11/2018 11:46:14 | climateprediction.net | Finished download of final_ancil_2year_OSTIA_ice_2004-12-01_2006-12-30.gz
27/11/2018 11:46:14 | climateprediction.net | Started download of so2dms_rcp45_N96_1999_2010.gz
27/11/2018 11:46:14 | climateprediction.net | Computation for task wah2_safr50_b0bx_198712_16_774_011676344_0 finished
27/11/2018 11:46:14 | climateprediction.net | Output file wah2_safr50_b0bx_198712_16_774_011676344_0_r1505665212_1.zip for task wah2_safr50_b0bx_198712_16_774_011676344_0 absent
27/11/2018 11:46:14 | climateprediction.net | Output file wah2_safr50_b0bx_198712_16_774_011676344_0_r1505665212_2.zip for task wah2_safr50_b0bx_198712_16_774_011676344_0 absent
27/11/2018 11:46:14 | climateprediction.net | Output file wah2_safr50_b0bx_198712_16_774_011676344_0_r1505665212_3.zip for task wah2_safr50_b0bx_198712_16_774_011676344_0 absent
27/11/2018 11:46:14 | climateprediction.net | Output file wah2_safr50_b0bx_198712_16_774_011676344_0_r1505665212_4.zip for task wah2_safr50_b0bx_198712_16_774_011676344_0 absent
27/11/2018 11:46:14 | climateprediction.net | Output file wah2_safr50_b0bx_198712_16_774_011676344_0_r1505665212_5.zip for task wah2_safr50_b0bx_198712_16_774_011676344_0 absent
27/11/2018 11:46:14 | climateprediction.net | Output file wah2_safr50_b0bx_198712_16_774_011676344_0_r1505665212_6.zip for task wah2_safr50_b0bx_198712_16_774_011676344_0 absent
27/11/2018 11:46:14 | climateprediction.net | Output file wah2_safr50_b0bx_198712_16_774_011676344_0_r1505665212_7.zip for task wah2_safr50_b0bx_198712_16_774_011676344_0 absent
27/11/2018 11:46:14 | climateprediction.net | Output file wah2_safr50_b0bx_198712_16_774_011676344_0_r1505665212_8.zip for task wah2_safr50_b0bx_198712_16_774_011676344_0 absent
27/11/2018 11:46:14 | climateprediction.net | Output file wah2_safr50_b0bx_198712_16_774_011676344_0_r1505665212_9.zip for task wah2_safr50_b0bx_198712_16_774_011676344_0 absent
27/11/2018 11:46:14 | climateprediction.net | Output file wah2_safr50_b0bx_198712_16_774_011676344_0_r1505665212_10.zip for task wah2_safr50_b0bx_198712_16_774_011676344_0 absent
27/11/2018 11:46:14 | climateprediction.net | Output file wah2_safr50_b0bx_198712_16_774_011676344_0_r1505665212_11.zip for task wah2_safr50_b0bx_198712_16_774_011676344_0 absent
27/11/2018 11:46:14 | climateprediction.net | Output file wah2_safr50_b0bx_198712_16_774_011676344_0_r1505665212_12.zip for task wah2_safr50_b0bx_198712_16_774_011676344_0 absent
27/11/2018 11:46:14 | climateprediction.net | Output file wah2_safr50_b0bx_198712_16_774_011676344_0_r1505665212_13.zip for task wah2_safr50_b0bx_198712_16_774_011676344_0 absent
27/11/2018 11:46:14 | climateprediction.net | Output file wah2_safr50_b0bx_198712_16_774_011676344_0_r1505665212_14.zip for task wah2_safr50_b0bx_198712_16_774_011676344_0 absent
27/11/2018 11:46:14 | climateprediction.net | Output file wah2_safr50_b0bx_198712_16_774_011676344_0_r1505665212_15.zip for task wah2_safr50_b0bx_198712_16_774_011676344_0 absent
27/11/2018 11:46:14 | climateprediction.net | Output file wah2_safr50_b0bx_198712_16_774_011676344_0_r1505665212_16.zip for task wah2_safr50_b0bx_198712_16_774_011676344_0 absent
27/11/2018 11:46:14 | climateprediction.net | Output file wah2_safr50_b0bx_198712_16_774_011676344_0_r1505665212_restart.zip for task wah2_safr50_b0bx_198712_16_774_011676344_0 absent

27/11/2018 11:46:15 | climateprediction.net | Finished download of final_ancil_2year_OSTIA_sst_2004-12-01_2006-12-30.gz
27/11/2018 11:46:15 | climateprediction.net | Started download of ozone_rcp45_N96_1999_2010v2.gz
27/11/2018 11:46:16 | climateprediction.net | Finished download of ozone_rcp45_N96_1999_2010v2.gz
27/11/2018 11:46:16 | climateprediction.net | Started upload of wah2_safr50_b0bx_198712_16_774_011676344_0_r1505665212_out.zip
27/11/2018 11:46:16 | climateprediction.net | Computation for task wah2_safr50_b5gj_201312_16_774_011682990_0 finished
27/11/2018 11:46:16 | climateprediction.net | Output file wah2_safr50_b5gj_201312_16_774_011682990_0_r162766967_1.zip for task wah2_safr50_b5gj_201312_16_774_011682990_0 absent
27/11/2018 11:46:16 | climateprediction.net | Output file wah2_safr50_b5gj_201312_16_774_011682990_0_r162766967_2.zip for task wah2_safr50_b5gj_201312_16_774_011682990_0 absent
27/11/2018 11:46:16 | climateprediction.net | Output file wah2_safr50_b5gj_201312_16_774_011682990_0_r162766967_3.zip for task wah2_safr50_b5gj_201312_16_774_011682990_0 absent
27/11/2018 11:46:16 | climateprediction.net | Output file wah2_safr50_b5gj_201312_16_774_011682990_0_r162766967_4.zip for task wah2_safr50_b5gj_201312_16_774_011682990_0 absent
27/11/2018 11:46:16 | climateprediction.net | Output file wah2_safr50_b5gj_201312_16_774_011682990_0_r162766967_5.zip for task wah2_safr50_b5gj_201312_16_774_011682990_0 absent
27/11/2018 11:46:16 | climateprediction.net | Output file wah2_safr50_b5gj_201312_16_774_011682990_0_r162766967_6.zip for task wah2_safr50_b5gj_201312_16_774_011682990_0 absent
27/11/2018 11:46:16 | climateprediction.net | Output file wah2_safr50_b5gj_201312_16_774_011682990_0_r162766967_7.zip for task wah2_safr50_b5gj_201312_16_774_011682990_0 absent
27/11/2018 11:46:16 | climateprediction.net | Output file wah2_safr50_b5gj_201312_16_774_011682990_0_r162766967_8.zip for task wah2_safr50_b5gj_201312_16_774_011682990_0 absent
27/11/2018 11:46:16 | climateprediction.net | Output file wah2_safr50_b5gj_201312_16_774_011682990_0_r162766967_9.zip for task wah2_safr50_b5gj_201312_16_774_011682990_0 absent
27/11/2018 11:46:16 | climateprediction.net | Output file wah2_safr50_b5gj_201312_16_774_011682990_0_r162766967_10.zip for task wah2_safr50_b5gj_201312_16_774_011682990_0 absent
27/11/2018 11:46:16 | climateprediction.net | Output file wah2_safr50_b5gj_201312_16_774_011682990_0_r162766967_11.zip for task wah2_safr50_b5gj_201312_16_774_011682990_0 absent
27/11/2018 11:46:16 | climateprediction.net | Output file wah2_safr50_b5gj_201312_16_774_011682990_0_r162766967_12.zip for task wah2_safr50_b5gj_201312_16_774_011682990_0 absent
27/11/2018 11:46:16 | climateprediction.net | Output file wah2_safr50_b5gj_201312_16_774_011682990_0_r162766967_13.zip for task wah2_safr50_b5gj_201312_16_774_011682990_0 absent
27/11/2018 11:46:16 | climateprediction.net | Output file wah2_safr50_b5gj_201312_16_774_011682990_0_r162766967_14.zip for task wah2_safr50_b5gj_201312_16_774_011682990_0 absent
27/11/2018 11:46:16 | climateprediction.net | Output file wah2_safr50_b5gj_201312_16_774_011682990_0_r162766967_15.zip for task wah2_safr50_b5gj_201312_16_774_011682990_0 absent
27/11/2018 11:46:16 | climateprediction.net | Output file wah2_safr50_b5gj_201312_16_774_011682990_0_r162766967_16.zip for task wah2_safr50_b5gj_201312_16_774_011682990_0 absent
27/11/2018 11:46:16 | climateprediction.net | Output file wah2_safr50_b5gj_201312_16_774_011682990_0_r162766967_restart.zip for task wah2_safr50_b5gj_201312_16_774_011682990_0 absent

27/11/2018 11:46:17 | climateprediction.net | Finished upload of wah2_safr50_b0bx_198712_16_774_011676344_0_r1505665212_out.zip
27/11/2018 11:46:18 | climateprediction.net | Finished download of so2dms_rcp45_N96_1999_2010.gz
27/11/2018 11:46:18 | climateprediction.net | Started upload of wah2_safr50_b5gj_201312_16_774_011682990_0_r162766967_out.zip
27/11/2018 11:46:19 | climateprediction.net | Finished upload of wah2_safr50_b5gj_201312_16_774_011682990_0_r162766967_out.zip
27/11/2018 11:46:20 | climateprediction.net | Starting task wah2_safr50_b3ul_200412_16_774_011680904_0

27/11/2018 11:48:25 | climateprediction.net | Computation for task wah2_safr50_b3ul_200412_16_774_011680904_0 finished
27/11/2018 11:48:25 | climateprediction.net | Output file wah2_safr50_b3ul_200412_16_774_011680904_0_r381998839_1.zip for task wah2_safr50_b3ul_200412_16_774_011680904_0 absent
27/11/2018 11:48:25 | climateprediction.net | Output file wah2_safr50_b3ul_200412_16_774_011680904_0_r381998839_2.zip for task wah2_safr50_b3ul_200412_16_774_011680904_0 absent
27/11/2018 11:48:25 | climateprediction.net | Output file wah2_safr50_b3ul_200412_16_774_011680904_0_r381998839_3.zip for task wah2_safr50_b3ul_200412_16_774_011680904_0 absent
27/11/2018 11:48:25 | climateprediction.net | Output file wah2_safr50_b3ul_200412_16_774_011680904_0_r381998839_4.zip for task wah2_safr50_b3ul_200412_16_774_011680904_0 absent
27/11/2018 11:48:25 | climateprediction.net | Output file wah2_safr50_b3ul_200412_16_774_011680904_0_r381998839_5.zip for task wah2_safr50_b3ul_200412_16_774_011680904_0 absent
27/11/2018 11:48:25 | climateprediction.net | Output file wah2_safr50_b3ul_200412_16_774_011680904_0_r381998839_6.zip for task wah2_safr50_b3ul_200412_16_774_011680904_0 absent
27/11/2018 11:48:25 | climateprediction.net | Output file wah2_safr50_b3ul_200412_16_774_011680904_0_r381998839_7.zip for task wah2_safr50_b3ul_200412_16_774_011680904_0 absent
27/11/2018 11:48:25 | climateprediction.net | Output file wah2_safr50_b3ul_200412_16_774_011680904_0_r381998839_8.zip for task wah2_safr50_b3ul_200412_16_774_011680904_0 absent
27/11/2018 11:48:25 | climateprediction.net | Output file wah2_safr50_b3ul_200412_16_774_011680904_0_r381998839_9.zip for task wah2_safr50_b3ul_200412_16_774_011680904_0 absent
27/11/2018 11:48:25 | climateprediction.net | Output file wah2_safr50_b3ul_200412_16_774_011680904_0_r381998839_10.zip for task wah2_safr50_b3ul_200412_16_774_011680904_0 absent
27/11/2018 11:48:25 | climateprediction.net | Output file wah2_safr50_b3ul_200412_16_774_011680904_0_r381998839_11.zip for task wah2_safr50_b3ul_200412_16_774_011680904_0 absent
27/11/2018 11:48:25 | climateprediction.net | Output file wah2_safr50_b3ul_200412_16_774_011680904_0_r381998839_12.zip for task wah2_safr50_b3ul_200412_16_774_011680904_0 absent
27/11/2018 11:48:25 | climateprediction.net | Output file wah2_safr50_b3ul_200412_16_774_011680904_0_r381998839_13.zip for task wah2_safr50_b3ul_200412_16_774_011680904_0 absent
27/11/2018 11:48:25 | climateprediction.net | Output file wah2_safr50_b3ul_200412_16_774_011680904_0_r381998839_14.zip for task wah2_safr50_b3ul_200412_16_774_011680904_0 absent
27/11/2018 11:48:25 | climateprediction.net | Output file wah2_safr50_b3ul_200412_16_774_011680904_0_r381998839_15.zip for task wah2_safr50_b3ul_200412_16_774_011680904_0 absent
27/11/2018 11:48:25 | climateprediction.net | Output file wah2_safr50_b3ul_200412_16_774_011680904_0_r381998839_16.zip for task wah2_safr50_b3ul_200412_16_774_011680904_0 absent
27/11/2018 11:48:25 | climateprediction.net | Output file wah2_safr50_b3ul_200412_16_774_011680904_0_r381998839_restart.zip for task wah2_safr50_b3ul_200412_16_774_011680904_0 absent

27/11/2018 11:48:27 | climateprediction.net | Started upload of wah2_safr50_b3ul_200412_16_774_011680904_0_r381998839_out.zip
27/11/2018 11:48:29 | climateprediction.net | Finished upload of wah2_safr50_b3ul_200412_16_774_011680904_0_r381998839_out.zip
ID: 59105 · Report as offensive     Reply Quote
The Teitschs

Send message
Joined: 22 Aug 06
Posts: 1
Credit: 832,463
RAC: 0
Message 59110 - Posted: 27 Nov 2018, 14:46:06 UTC - in response to Message 59105.  
Last modified: 27 Nov 2018, 15:00:54 UTC

I started getting those this morning when I started getting work units. Going to try the settings mentioned above.

I changed the settings as recommended above. This cut the number of active tasks in half and the error message went away.
ID: 59110 · Report as offensive     Reply Quote
ProfileJIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,363,583
RAC: 5,022
Message 59112 - Posted: 27 Nov 2018, 15:43:04 UTC - in response to Message 59110.  

If they are batch 744 everyone is getting the run time errors. The whole batch appears to be bad. See “new Work” thread.
ID: 59112 · Report as offensive     Reply Quote
ProfileAlan K

Send message
Joined: 22 Feb 06
Posts: 491
Credit: 30,975,898
RAC: 14,500
Message 59225 - Posted: 22 Dec 2018, 23:20:59 UTC

Getting a compute error on batch 771 model with the following error:



<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -1073740791 (0xc0000409)</message>
<stderr_txt>
BOINC...

Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Processing restart Year 1910 Month 12 Day 1
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...

Had previously failed on another machine with same error code after trickle at timestep 259272.
ID: 59225 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 59229 - Posted: 24 Dec 2018, 16:46:32 UTC

I just got a work unit:

https://www.cpdn.org/cpdnboinc/workunit.php?wuid=11667216

It has used 11 hours 27 minutes of CPU time so far since it started. 334 hours predicted for it to finish.

You can see that two others attempted this work unit, used up a lot of machine time, and then failed. But notice that the reported run time is fantastically longer than the CPU time. I do not know what run time measures; wall-clock time perhaps?

Those work units that I do get (very very few since I run Linux) seem to all be of this type: failures. And they usually complete just fine on my machine, an Dell T7600 with a 4-core 64-bit Xeon processor, 8 GBytes RAM, running Red Hat Enterprise Linux Server release 6.10 (Santiago). It makes me wonder of the other users run on unreliable hardware, or unreliable software. My machines usually run 24/7 except when I reboot after installing a new kernel.
ID: 59229 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 59230 - Posted: 24 Dec 2018, 18:39:00 UTC - in response to Message 59229.  
Last modified: 24 Dec 2018, 18:39:18 UTC

It makes me wonder of the other users run on unreliable hardware, or unreliable software. My machines usually run 24/7 except when I reboot after installing a new kernel.

I think a lot of people use laptops, and they are constantly shutting them down or allowing them to go into sleep mode.
That kills the CPDN work units after too many times. The project should really ban machines that fail too often.
ID: 59230 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 59231 - Posted: 24 Dec 2018, 20:16:39 UTC

You can see that two others attempted this work unit, used up a lot of machine time, and then failed.


That model is what the researchers are looking for: one whose starting parameters eventually lead to an unrealistic physics.
So now they know.

Which is what the error message: ATM_DYN : INVALID THETA DETECTED. means.
I think that the first part is an abbreviation of: Atmospheric Dynamics.

So you'll probably also get that error, which will be "proof of the pudding".
ID: 59231 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 59234 - Posted: 24 Dec 2018, 22:19:56 UTC - in response to Message 59231.  

That model is what the researchers are looking for: one whose starting parameters eventually lead to an unrealistic physics.
So now they know.

Which is what the error message: ATM_DYN : INVALID THETA DETECTED. means.
I think that the first part is an abbreviation of: Atmospheric Dynamics.

So you'll probably also get that error, which will be "proof of the pudding".


In this case, you are probably right. On the other hand, many of the work units I have gotten in the last year or so (not a lot of them) also failed for one or two other users, and completed successfully for me.

Some of them died due to missing libraries (usually died very fast). Others died later because of missing trickle files (I think).

In the case of this work unit, do they really need it to fail on three different machines? I will let it run, but ...
ID: 59234 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 59235 - Posted: 25 Dec 2018, 0:53:34 UTC

Years ago, a test was run with the same starting values on different computers.
It showed that there were slight differences between the results, enough to make some of the tests appear to have different starting values.

And this IS research. Perhaps your computer is just slightly different in a way that will mean that it WON'T fail.
ID: 59235 · Report as offensive     Reply Quote
ProfileJIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,363,583
RAC: 5,022
Message 59236 - Posted: 25 Dec 2018, 4:55:46 UTC - in response to Message 59234.  

In the case of this work unit, do they really need it to fail on three different machines? I will let it run, but ...


I know that I have successfully completed a number of _2 WU’s that had failed on 2 other machines.
ID: 59236 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 59238 - Posted: 25 Dec 2018, 12:51:21 UTC - in response to Message 59235.  

this IS research. Perhaps your computer is just slightly different in a way that will mean that it WON'T fail.


Well, it has just sent a trickle.

And unless this has been fixed (not likely), it will be the only trickle.

Work unit has used 31 hours 43 minutes so far and predicts 312 hours 19 hours to go.
ID: 59238 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 59239 - Posted: 25 Dec 2018, 20:46:41 UTC - in response to Message 59238.  

predicts 312 hours 19 hours to go.


OOPS!

predicts 312 hours 19 minutes to go.
ID: 59239 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 59242 - Posted: 26 Dec 2018, 13:30:00 UTC

What I find interesting about this work unit

hadcm3s_st249_190012_120_771_011667216

is the large amount of Run Time required (149,672.31, 138,538.43 seconds) to get 30 to 60 seconds of CPU time. This is on two different machines with different CPUs, both running 64-bit Windows 10. What are they spending that time on without using a CPU?

On my machine with this same work unit, I already have over 56 hours of CPU time, have uploaded a trickle, and still running with 283 hours predicted to go.
ID: 59242 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : Error while computing???

©2024 cpdn.org