Message boards : Number crunching : New work Discussion
Message board moderation
Previous · 1 . . . 50 · 51 · 52 · 53 · 54 · 55 · 56 . . . 91 · Next
Author | Message |
---|---|
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
HADAM4 tasks are Linux only. And there are only 20 Linux users for them. That seems a bit low. Maybe they have given up? |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,017,270 RAC: 20,902 |
HADAM4 tasks are Linux only. I am pretty certain that means the number of users who have returned completed tasks in the past 24 hours. |
Send message Joined: 28 Jul 19 Posts: 150 Credit: 12,830,559 RAC: 228 |
HADAM4 tasks are Linux only. No, I’m here and wanting work but can’t get any :-( |
Send message Joined: 3 Sep 04 Posts: 105 Credit: 5,646,090 RAC: 102,785 |
And I must be among them coz for the first time in living memory I managed to return 2... yes thats two successfully completed w/u in 1 day. And just to settle the nerves I have a bunch of ARP w/u to do for a couple of days. I do like w/u's that seem impossible to kill. Unlike others that we could mention..... |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,017,270 RAC: 20,902 |
And just to settle the nerves I have a bunch of ARP w/u to do for a couple of days. I do like w/u's that seem impossible to kill. Unlike others that we could mention..... Yes, I lost two at some point in the past 24 hours following a hard reboot or I assume that was the cause, one was a seg fault. I forget what the other error was. Please do not private message myself or other moderators for help. This limits the number of people who are able to help and deprives others who may benefit from the answer. |
Send message Joined: 27 Jan 07 Posts: 300 Credit: 3,288,263 RAC: 26,370 |
I got five (5) of those HadAM4h tasks, but each one is consuming ~4GB of disk space, which far exceeds the 10G max I have in BOINC settings. Seems like a bug in the task scheduler... I had to abort a few to avoid running out of disk space on the /var partition. :( |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
These tasks are BIG, as you've found. You'll need a lot more than 10 Gigs. 50-100 would be better. |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
I am missing out on the big hadam4h. I am running five on three machines, the last downloaded two hours ago, and just see the usual 1377 MB. If they had a way to select the big ones (right), I would take them. OK, now I see the new work unit uses 3.7 GB disk memory, not the main memory you are talking about. I have even more of that free. Send them to me. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
And you should see how big the restart file is if they don't use the No setting, which was developed just for these models. |
Send message Joined: 18 Feb 17 Posts: 81 Credit: 14,024,464 RAC: 5,225 |
I forgot myself and restarted one of my windows machines. The WU at least started over but now is claiming 4 days elapsed at 0%. These poor sandy bridge (and even a few below) are trucking along. I think some of them need to be retired soon. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,017,270 RAC: 20,902 |
0% after 4 days means something is seriously wrong. Have you checked how much CPU it is using in Task Manager? I suspect that whatever BOINC is telling you it isn't actually running. Please do not private message myself or other moderators for help. This limits the number of people who are able to help and deprives others who may benefit from the answer. |
Send message Joined: 18 Feb 17 Posts: 81 Credit: 14,024,464 RAC: 5,225 |
0% after 4 days means something is seriously wrong. Have you checked how much CPU it is using in Task Manager? I suspect that whatever BOINC is telling you it isn't actually running. Sorry, it is climbing again. What I meant was it seemed to have encountered an error to make it start over again after 4 days of work. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,017,270 RAC: 20,902 |
0% after 4 days means something is seriously wrong. Have you checked how much CPU it is using in Task Manager? I suspect that whatever BOINC is telling you it isn't actually running. Ah that makes more sense. If you switch off or have to reboot before a checkpoint, it will go back to the previous checkpoint or the start if it hasn't reached one yet. If you highlight a task then click on properties you can see the time since last checkpoint. I looked at one just now and it was a tad over 11 minutes. On my old laptop sitting next to the keyboard attached to my Ryzen that would equate to well over an hour. There was one batch that went out a while ago with a much longer time between checkpoints and would I think have been several hours even on my Ryzen. Please do not private message myself or other moderators for help. This limits the number of people who are able to help and deprives others who may benefit from the answer. |
Send message Joined: 18 Feb 17 Posts: 81 Credit: 14,024,464 RAC: 5,225 |
Random question because I have forgotten over the years. Each month is a trickle, so a 12 month task would give 12 total thus 12 bits of credit (when the credit is actually calculated and updated weekly)? thanks! |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,017,270 RAC: 20,902 |
Random question because I have forgotten over the years. Yes though the size of the bits of credit varies depending on the amount of computing that needs to go into it. Credit is currently updated on Thursdays but that has been a moveable feast over the years. At some point Andy plans to introduce a credit script that needs less work by the server running it at which point we should move to daily but there is no news of a date for that to happen and it is something he works on when he has spare work hours rather than being a priority task. I am pretty sure he could have sorted it by now had that been the case. Please do not private message myself or other moderators for help. This limits the number of people who are able to help and deprives others who may benefit from the answer. |
Send message Joined: 18 Feb 17 Posts: 81 Credit: 14,024,464 RAC: 5,225 |
Random question because I have forgotten over the years. Thanks! One more question - I remember seeing something about figuring out the actual computing speed by something like t/sec, but not sure where that's located or if it's still something to consider. E.g. 20 is faster than 30, I think. I could be mistaken and I forget the specifics of what the T stands for. Surely not trickles. Maybe the steps per second or something. |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
One more question - I remember seeing something about figuring out the actual computing speed by something like t/sec, but not sure where that's located or if it's still something to consider. E.g. 20 is faster than 30, I think. The speed can be represented in units of seconds/timestep (sec/TS) and after trickles are uploaded, can be seen on each task's webpage. The lower the average number of sec/TS, the relatively faster the model is running and the less CPU time a completed model will take. One can also see the sec/TS on running models by going into the .../projects/climateprediction.net/{task name} directory and looking at the file stdout_mon.txt, which is a log of the timesteps throughout the model run. In Linux, one can be in that directory and do a tail -f stdout_mon.txt and it will output a display to the terminal window continuously as the model runs. Depending on the Linux distribution and how it handles permissions for that directory, one might need to be a superuser to maneuver to that directory and tail that file. Edit...For the same PC, the value of the sec/TS for a given model will be dependent on how complex one model type may be relative to another. So for the same PC, the sec/TS for a hadam4 N144 model will be lower than the sec/TS for a hadam4 N216 model which is run at a higher resolution. |
Send message Joined: 18 Feb 17 Posts: 81 Credit: 14,024,464 RAC: 5,225 |
One more question - I remember seeing something about figuring out the actual computing speed by something like t/sec, but not sure where that's located or if it's still something to consider. E.g. 20 is faster than 30, I think. Thank you so much! I'm trying to figure out if it is even worthwhile to keep running these on my struggling i7-920 and Xeon w3520. I've been trying to run them into the ground but they just won't quit. My Ryzen 3700x is absolutely flying through whatever I throw at it I may end up going for a 5000 series of some sort. It will probably do as much work as 4 of the older i7's put together. |
Send message Joined: 28 Jul 19 Posts: 150 Credit: 12,830,559 RAC: 228 |
Bryn Mawr - Further to this, I have been looking through the server’s scheduler code on github and it seems to me that there are two conditions where it blocks the send of work and logs the fact but does not appear to return an error message to the user. During a work request it sets a lock file on the host id. During the next work request it finds the lock file still exists so exits. It receives an unrecognised code sign key. Now, obviously, I cannot check the server for an uncleared loch file but is there any way I can change my host id and is there any way I can resend my code sign key? |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,017,270 RAC: 20,902 |
Now, obviously, I cannot check the server for an uncleared loch file but is there any way I can change my host id and is there any way I can resend my code sign key? Not an area I have experience in. I would try removing the project using BOINC manager and then try re-attaching if you haven't already tried this. If no joy, my next step would be to ask over on the BOINC forums Please do not private message myself or other moderators for help. This limits the number of people who are able to help and deprives others who may benefit from the answer. |
©2024 cpdn.org