climateprediction.net (CPDN) home page
Thread 'No work for Linux either now.'

Thread 'No work for Linux either now.'

Questions and Answers : Unix/Linux : No work for Linux either now.
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4537
Credit: 19,001,532
RAC: 21,726
Message 52946 - Posted: 25 Nov 2015, 19:32:35 UTC

Just to let everyone know the scientist in charge of the last batch of linux tasks has said he does not need any more data. Keep crunching tasks already downloaded however.
ID: 52946 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 57653 - Posted: 15 Jan 2018, 22:59:45 UTC - in response to Message 52946.  

Just to let everyone know the scientist in charge of the last batch of linux tasks has said he does not need any more data. Keep crunching tasks already downloaded however.


Is he ever going to need more data, or should I just resign from climatprediction?
ID: 57653 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4537
Credit: 19,001,532
RAC: 21,726
Message 57654 - Posted: 16 Jan 2018, 9:32:15 UTC

As I said in another thread, there is work happening in testing to try and resolve the problem with mac and Linux tasks crashing. At some point there are likely to be some more of the hadcm3 tasks but as all the work that comes out through Oxford is commissioned by scientists from universities all over the world I don't know when that will be.

The last testing batch which it was hoped might not crash on Linux boxes still crashed so it is back to the drawing board. On this machine I have resorted to using WINE and it is running two tasks from CPDN that way. I am keeping my slightly faster machine using native Linux so that it is ready should more testing batches be released.
ID: 57654 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4537
Credit: 19,001,532
RAC: 21,726
Message 57805 - Posted: 19 Feb 2018, 9:04:25 UTC

Also, that big batch of work is not for Linux/Unix users.


And until the bug which causes almost everything to crash on Linux and Mac's there won't be work for those platforms. A shame as I find using WINE a pain the proverbial.
ID: 57805 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 57808 - Posted: 19 Feb 2018, 13:26:07 UTC - in response to Message 57805.  

And until the bug which causes almost everything to crash on Linux and Mac's there won't be work for those platforms. A shame as I find using WINE a pain the proverbial.


The two work-units I have received are running on my Linux machine. They have each delivered one trickle. They have each run over 121 hours of CPU time.

Two similar work units completed successfully last October.

The only "recent" work units that failed (Error while computing) were from last July. One of them failed like this, which is not a crash.

Name wah2_sas50_l2nz_199512_13_617_011135004_1
Workunit 11135004
Created 28 Jul 2017, 16:02:10 UTC
Sent 28 Jul 2017, 16:02:19 UTC
Report deadline 10 Jul 2018, 21:22:19 UTC
Received 29 Jul 2017, 20:11:45 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 0 (0x0)
Computer ID 1256552
Run time 13 hours 54 min 15 sec
CPU time 12 hours 35 min 51 sec
Validate state Invalid
Credit 0.00
Device peak FLOPS 1.28 GFLOPS
Application version Weather At Home 2 (wah2) v8.25
i686-pc-linux-gnu
ID: 57808 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4537
Credit: 19,001,532
RAC: 21,726
Message 57809 - Posted: 19 Feb 2018, 14:53:09 UTC

The two work-units I have received are running on my Linux machine. They have each delivered one trickle. They have each run over 121 hours of CPU time.


I note both of these are retreads, Perhaps I should have said there will be no new work for Linux/Mac till the problem is resolved though it is possible there may be the occasional hadcm3s batch which is not affected by the current problem which affects WA2 tasks.
ID: 57809 · Report as offensive     Reply Quote
WB8ILI

Send message
Joined: 1 Sep 04
Posts: 161
Credit: 81,522,141
RAC: 1,164
Message 57854 - Posted: 24 Feb 2018, 21:51:34 UTC - in response to Message 57848.  

Jean-David Beyer

Currently there is no work for LINUX systems as all of the tasks crash. Supposedly, "they" are working on the problem although communication on this project is next to null.

I am also frustrated, as I built several LINUX systems to contribute to CPDN. I am using those for other projects now.
ID: 57854 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 57865 - Posted: 28 Feb 2018, 17:13:50 UTC - in response to Message 57854.  

Currently there is no work for LINUX systems as all of the tasks crash. Supposedly, "they" are working on the problem although communication on this project is next to null.

It would help if they just came out with a 64-bit Linux version and be done with it. But supposedly only the authors at the U.K. Meteorological Office (or some such place) can do that.

My guess is that they have had a 64-bit version sitting on the shelf for years and are surprised that no one has asked them for it yet.
ID: 57865 · Report as offensive     Reply Quote
WB8ILI

Send message
Joined: 1 Sep 04
Posts: 161
Credit: 81,522,141
RAC: 1,164
Message 57867 - Posted: 28 Feb 2018, 17:21:09 UTC - in response to Message 57865.  

Jim1348 -

It could be "they" are working hard on resolving the LINUX issue. Or, maybe not.

It has been reported there are something like 3,000,000 lines of Fortran coded involved. You can't just throw the code into a different compiler (a 64-bit version) without some serious debugging. Maybe that is not good route to take if resources are limited.

It just would be nice to know what is going on.
ID: 57867 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4537
Credit: 19,001,532
RAC: 21,726
Message 57869 - Posted: 1 Mar 2018, 9:02:17 UTC - in response to Message 57867.  

The last test batch that it was hoped would solve the problem didn't. Since then there has been the server problem. I don't know how active this work is. I do know they have just had one of the team leave to go and work at another Uni but those changes happen all the time in university life so that may not mean much.
ID: 57869 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 57870 - Posted: 1 Mar 2018, 13:49:15 UTC - in response to Message 57867.  

Jim1348 -

It could be "they" are working hard on resolving the LINUX issue. Or, maybe not.

It has been reported there are something like 3,000,000 lines of Fortran coded involved. You can't just throw the code into a different compiler (a 64-bit version) without some serious debugging. Maybe that is not good route to take if resources are limited.

It just would be nice to know what is going on.


I used to work on compilers, in particular the C compiler and optimizer for UNIX. To retarget a compiler from one machine to another is actually quite easy, because most of the compiler remains unchanged: lexical analysis, syntax analysis, building the internal program, and some optimizations are independent of the target machine. Only the code-generation (compiling to assembler level, or straight to binary) needs to be changed. And here, changing from 32-bit to 64-bit should be quite trivial (for someone used to doing it).

We even retargeted from an AT&T 32100 chipset to a SPARC and changed only the code-generator and assembly-level optimizer.
ID: 57870 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4537
Credit: 19,001,532
RAC: 21,726
Message 57871 - Posted: 1 Mar 2018, 13:58:22 UTC

I wonder if these posts could be moved to a thread under the Linux section?

While the tasks involved in re-compiling stuff may be relatively easy, CPDN does not have a license that enables them to do this or that is my understanding. This may or may not address the problems with Linux and Mac even if they did have this license.

My understanding is that they still haven't managed to work out exactly what the problem is. The last test batches on this problem they hoped they had isolated the problem but clearly not.
ID: 57871 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 57877 - Posted: 3 Mar 2018, 18:22:40 UTC - in response to Message 57870.  
Last modified: 3 Mar 2018, 18:54:17 UTC

We even retargeted from an AT&T 32100 chipset to a SPARC and changed only the code-generator and assembly-level optimizer.

Good grief. I was involved in that when I worked for AT&T Bell Labs back in the good old days (not as an engineer though). I recall the discussions of CISC versus RISC, though they were somewhat beyond me.

But one of my practical concerns is that if you have to run Windows in order to do CPDN, there aren't that many good backup projects that run efficiently on Windows. That is why I have converted all of my dedicated machines to Linux, leaving me only my main PC on Windows. Then, if CPDN does not have work, I need to do something else. And that machine needs to be rebooted more often, which sometimes doesn't work on CPDN, though it is a lot better now. Of course, there are a lot more Windows machines around, so it may not matter much in the big picture, but if they need more crunching, Linux is the place to get it.

PS - Yes, by all means move the discussion if possible. It is a bit OT by now.
ID: 57877 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 57985 - Posted: 25 Mar 2018, 21:15:23 UTC - in response to Message 57877.  

Oh, well; I am getting a lot of World Community Grid work done.
ID: 57985 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 57987 - Posted: 26 Mar 2018, 15:20:44 UTC - in response to Message 57985.  

Oh, well; I am getting a lot of World Community Grid work done.

I am unfortunately in the same boat. I need to reboot my only Windows machine fairly often, and then lose the wah2 in progress. I have been getting the "fortran" error messages recently.
ID: 57987 · Report as offensive     Reply Quote
ProfileJIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,363,583
RAC: 5,022
Message 57989 - Posted: 26 Mar 2018, 15:47:35 UTC - in response to Message 57987.  

What type of Windows are you running on that machine? I reboot my 3 Windows machines, all running Win7, regularly without losing WU’s in progress. Do you do the suspend (then wait for a minute or two) and then exit BOINC (and again waiting for a minute or 2) before you reboot?
ID: 57989 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 57990 - Posted: 26 Mar 2018, 16:49:43 UTC - in response to Message 57989.  
Last modified: 26 Mar 2018, 16:53:23 UTC

I run Win7 64-bit on an i7-4771, and do not suspend before rebooting. Usually, there is no problem, but in the last batches (wah2_sam25 and _pn25) they have been erroring out.
ID: 57990 · Report as offensive     Reply Quote
ProfileastroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 57991 - Posted: 26 Mar 2018, 17:45:04 UTC - in response to Message 57990.  

You've been lucky. Windows doesn't check for data transfers, so there is no guarantee that everything is saved. The error occurs during restart, when the restart set has date/time mismatch.

I get caught when Win10 restarts after updates and I'm not around to manage this new manifestation of corporate arrogance. (As far as M$ is concerned, only keyboard and/or mouse activity indicate 'activity'.)
"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 57991 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 57992 - Posted: 26 Mar 2018, 18:16:14 UTC - in response to Message 57991.  
Last modified: 26 Mar 2018, 18:32:34 UTC

I get caught when Win10 restarts after updates and I'm not around to manage this new manifestation of corporate arrogance.

I briefly considered installing Win10 on my Ryzen+ build later this year, but quickly drew back from such insanity.

EDIT: I should point out that I use Win10 on a laptop and a second machine, but for a cruncher build where I need to control everything from drivers to reboots, it is a non-starter for me. And I don't even have control of the base OS. MS can change it out at any time and still call it "Win 10". What will they think of next?
ID: 57992 · Report as offensive     Reply Quote
WB8ILI

Send message
Joined: 1 Sep 04
Posts: 161
Credit: 81,522,141
RAC: 1,164
Message 57993 - Posted: 26 Mar 2018, 18:59:29 UTC

Getting a little off the subject of this thread, but -

On my Windows 10 Home Version I can stop all updates and re-boots by -

Control Panel -> Administrative Tools -> Services
Scroll down to Windows Update Service
Stop it.
Disable it.

Now, no Windows 10 updates and re-boots until you reverse the process.
Remember to this periodically.
ID: 57993 · Report as offensive     Reply Quote
1 · 2 · 3 · Next

Questions and Answers : Unix/Linux : No work for Linux either now.

©2024 cpdn.org