Questions and Answers : Unix/Linux : No work for Linux either now.
Message board moderation
Author | Message |
---|---|
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Just to let everyone know the scientist in charge of the last batch of linux tasks has said he does not need any more data. Keep crunching tasks already downloaded however. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
Just to let everyone know the scientist in charge of the last batch of linux tasks has said he does not need any more data. Keep crunching tasks already downloaded however. Is he ever going to need more data, or should I just resign from climatprediction? |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
As I said in another thread, there is work happening in testing to try and resolve the problem with mac and Linux tasks crashing. At some point there are likely to be some more of the hadcm3 tasks but as all the work that comes out through Oxford is commissioned by scientists from universities all over the world I don't know when that will be. The last testing batch which it was hoped might not crash on Linux boxes still crashed so it is back to the drawing board. On this machine I have resorted to using WINE and it is running two tasks from CPDN that way. I am keeping my slightly faster machine using native Linux so that it is ready should more testing batches be released. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Also, that big batch of work is not for Linux/Unix users. And until the bug which causes almost everything to crash on Linux and Mac's there won't be work for those platforms. A shame as I find using WINE a pain the proverbial. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
And until the bug which causes almost everything to crash on Linux and Mac's there won't be work for those platforms. A shame as I find using WINE a pain the proverbial. The two work-units I have received are running on my Linux machine. They have each delivered one trickle. They have each run over 121 hours of CPU time. Two similar work units completed successfully last October. The only "recent" work units that failed (Error while computing) were from last July. One of them failed like this, which is not a crash. Name wah2_sas50_l2nz_199512_13_617_011135004_1 Workunit 11135004 Created 28 Jul 2017, 16:02:10 UTC Sent 28 Jul 2017, 16:02:19 UTC Report deadline 10 Jul 2018, 21:22:19 UTC Received 29 Jul 2017, 20:11:45 UTC Server state Over Outcome Computation error Client state Compute error Exit status 0 (0x0) Computer ID 1256552 Run time 13 hours 54 min 15 sec CPU time 12 hours 35 min 51 sec Validate state Invalid Credit 0.00 Device peak FLOPS 1.28 GFLOPS Application version Weather At Home 2 (wah2) v8.25 i686-pc-linux-gnu |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
The two work-units I have received are running on my Linux machine. They have each delivered one trickle. They have each run over 121 hours of CPU time. I note both of these are retreads, Perhaps I should have said there will be no new work for Linux/Mac till the problem is resolved though it is possible there may be the occasional hadcm3s batch which is not affected by the current problem which affects WA2 tasks. |
Send message Joined: 1 Sep 04 Posts: 161 Credit: 81,522,141 RAC: 1,164 |
Jean-David Beyer Currently there is no work for LINUX systems as all of the tasks crash. Supposedly, "they" are working on the problem although communication on this project is next to null. I am also frustrated, as I built several LINUX systems to contribute to CPDN. I am using those for other projects now. |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
Currently there is no work for LINUX systems as all of the tasks crash. Supposedly, "they" are working on the problem although communication on this project is next to null. It would help if they just came out with a 64-bit Linux version and be done with it. But supposedly only the authors at the U.K. Meteorological Office (or some such place) can do that. My guess is that they have had a 64-bit version sitting on the shelf for years and are surprised that no one has asked them for it yet. |
Send message Joined: 1 Sep 04 Posts: 161 Credit: 81,522,141 RAC: 1,164 |
Jim1348 - It could be "they" are working hard on resolving the LINUX issue. Or, maybe not. It has been reported there are something like 3,000,000 lines of Fortran coded involved. You can't just throw the code into a different compiler (a 64-bit version) without some serious debugging. Maybe that is not good route to take if resources are limited. It just would be nice to know what is going on. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
The last test batch that it was hoped would solve the problem didn't. Since then there has been the server problem. I don't know how active this work is. I do know they have just had one of the team leave to go and work at another Uni but those changes happen all the time in university life so that may not mean much. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
Jim1348 - I used to work on compilers, in particular the C compiler and optimizer for UNIX. To retarget a compiler from one machine to another is actually quite easy, because most of the compiler remains unchanged: lexical analysis, syntax analysis, building the internal program, and some optimizations are independent of the target machine. Only the code-generation (compiling to assembler level, or straight to binary) needs to be changed. And here, changing from 32-bit to 64-bit should be quite trivial (for someone used to doing it). We even retargeted from an AT&T 32100 chipset to a SPARC and changed only the code-generator and assembly-level optimizer. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
I wonder if these posts could be moved to a thread under the Linux section? While the tasks involved in re-compiling stuff may be relatively easy, CPDN does not have a license that enables them to do this or that is my understanding. This may or may not address the problems with Linux and Mac even if they did have this license. My understanding is that they still haven't managed to work out exactly what the problem is. The last test batches on this problem they hoped they had isolated the problem but clearly not. |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
We even retargeted from an AT&T 32100 chipset to a SPARC and changed only the code-generator and assembly-level optimizer. Good grief. I was involved in that when I worked for AT&T Bell Labs back in the good old days (not as an engineer though). I recall the discussions of CISC versus RISC, though they were somewhat beyond me. But one of my practical concerns is that if you have to run Windows in order to do CPDN, there aren't that many good backup projects that run efficiently on Windows. That is why I have converted all of my dedicated machines to Linux, leaving me only my main PC on Windows. Then, if CPDN does not have work, I need to do something else. And that machine needs to be rebooted more often, which sometimes doesn't work on CPDN, though it is a lot better now. Of course, there are a lot more Windows machines around, so it may not matter much in the big picture, but if they need more crunching, Linux is the place to get it. PS - Yes, by all means move the discussion if possible. It is a bit OT by now. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
Oh, well; I am getting a lot of World Community Grid work done. |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
Oh, well; I am getting a lot of World Community Grid work done. I am unfortunately in the same boat. I need to reboot my only Windows machine fairly often, and then lose the wah2 in progress. I have been getting the "fortran" error messages recently. |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
What type of Windows are you running on that machine? I reboot my 3 Windows machines, all running Win7, regularly without losing WU’s in progress. Do you do the suspend (then wait for a minute or two) and then exit BOINC (and again waiting for a minute or 2) before you reboot? |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
I run Win7 64-bit on an i7-4771, and do not suspend before rebooting. Usually, there is no problem, but in the last batches (wah2_sam25 and _pn25) they have been erroring out. |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
You've been lucky. Windows doesn't check for data transfers, so there is no guarantee that everything is saved. The error occurs during restart, when the restart set has date/time mismatch. I get caught when Win10 restarts after updates and I'm not around to manage this new manifestation of corporate arrogance. (As far as M$ is concerned, only keyboard and/or mouse activity indicate 'activity'.) "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
I get caught when Win10 restarts after updates and I'm not around to manage this new manifestation of corporate arrogance. I briefly considered installing Win10 on my Ryzen+ build later this year, but quickly drew back from such insanity. EDIT: I should point out that I use Win10 on a laptop and a second machine, but for a cruncher build where I need to control everything from drivers to reboots, it is a non-starter for me. And I don't even have control of the base OS. MS can change it out at any time and still call it "Win 10". What will they think of next? |
Send message Joined: 1 Sep 04 Posts: 161 Credit: 81,522,141 RAC: 1,164 |
Getting a little off the subject of this thread, but - On my Windows 10 Home Version I can stop all updates and re-boots by - Control Panel -> Administrative Tools -> Services Scroll down to Windows Update Service Stop it. Disable it. Now, no Windows 10 updates and re-boots until you reverse the process. Remember to this periodically. |
©2024 cpdn.org