Thread 'New work Discussion'

Author	Message
Jim1348 Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653	Message 63009 - Posted: 23 Nov 2020, 16:30:55 UTC - in response to Message 63007. HADAM4 tasks are Linux only. And there are only 20 Linux users for them. That seems a bit low. Maybe they have given up? ID: 63009 ·

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944	Message 63010 - Posted: 23 Nov 2020, 16:41:04 UTC - in response to Message 63009. HADAM4 tasks are Linux only. And there are only 20 Linux users for them. That seems a bit low. Maybe they have given up? I am pretty certain that means the number of users who have returned completed tasks in the past 24 hours. ID: 63010 ·

Bryn Mawr Send message Joined: 28 Jul 19 Posts: 150 Credit: 12,830,559 RAC: 228	Message 63011 - Posted: 23 Nov 2020, 23:48:28 UTC - in response to Message 63009. HADAM4 tasks are Linux only. And there are only 20 Linux users for them. That seems a bit low. Maybe they have given up? No, I’m here and wanting work but can’t get any :-( ID: 63011 ·

nairb Send message Joined: 3 Sep 04 Posts: 105 Credit: 5,646,090 RAC: 102,785	Message 63012 - Posted: 24 Nov 2020, 0:05:37 UTC - in response to Message 63010. Last modified: 24 Nov 2020, 0:07:48 UTC I am pretty certain that means the number of users who have returned completed tasks in the past 24 hours. And I must be among them coz for the first time in living memory I managed to return 2... yes thats two successfully completed w/u in 1 day. And just to settle the nerves I have a bunch of ARP w/u to do for a couple of days. I do like w/u's that seem impossible to kill. Unlike others that we could mention..... ID: 63012 ·

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944	Message 63013 - Posted: 24 Nov 2020, 10:29:35 UTC - in response to Message 63012. And just to settle the nerves I have a bunch of ARP w/u to do for a couple of days. I do like w/u's that seem impossible to kill. Unlike others that we could mention..... Yes, I lost two at some point in the past 24 hours following a hard reboot or I assume that was the cause, one was a seg fault. I forget what the other error was. Please do not private message myself or other moderators for help. This limits the number of people who are able to help and deprives others who may benefit from the answer. ID: 63013 ·

DJStarfox Send message Joined: 27 Jan 07 Posts: 300 Credit: 3,288,263 RAC: 26,370	Message 63018 - Posted: 25 Nov 2020, 13:58:45 UTC I got five (5) of those HadAM4h tasks, but each one is consuming ~4GB of disk space, which far exceeds the 10G max I have in BOINC settings. Seems like a bug in the task scheduler... I had to abort a few to avoid running out of disk space on the /var partition. :( ID: 63018 ·

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 63023 - Posted: 25 Nov 2020, 15:53:25 UTC - in response to Message 63018. These tasks are BIG, as you've found. You'll need a lot more than 10 Gigs. 50-100 would be better. ID: 63023 ·

Jim1348 Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653	Message 63024 - Posted: 25 Nov 2020, 16:33:12 UTC - in response to Message 63023. Last modified: 25 Nov 2020, 16:41:07 UTC I am missing out on the big hadam4h. I am running five on three machines, the last downloaded two hours ago, and just see the usual 1377 MB. If they had a way to select the big ones (right), I would take them. OK, now I see the new work unit uses 3.7 GB disk memory, not the main memory you are talking about. I have even more of that free. Send them to me. ID: 63024 ·

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 63026 - Posted: 25 Nov 2020, 19:53:26 UTC - in response to Message 63024. And you should see how big the restart file is if they don't use the No setting, which was developed just for these models. ID: 63026 ·

wolfman1360 Send message Joined: 18 Feb 17 Posts: 81 Credit: 14,031,919 RAC: 1,370	Message 63049 - Posted: 29 Nov 2020, 5:17:50 UTC I forgot myself and restarted one of my windows machines. The WU at least started over but now is claiming 4 days elapsed at 0%. These poor sandy bridge (and even a few below) are trucking along. I think some of them need to be retired soon. ID: 63049 ·

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944	Message 63050 - Posted: 29 Nov 2020, 6:43:20 UTC - in response to Message 63049. 0% after 4 days means something is seriously wrong. Have you checked how much CPU it is using in Task Manager? I suspect that whatever BOINC is telling you it isn't actually running. Please do not private message myself or other moderators for help. This limits the number of people who are able to help and deprives others who may benefit from the answer. ID: 63050 ·

wolfman1360 Send message Joined: 18 Feb 17 Posts: 81 Credit: 14,031,919 RAC: 1,370	Message 63051 - Posted: 29 Nov 2020, 22:14:18 UTC - in response to Message 63050. 0% after 4 days means something is seriously wrong. Have you checked how much CPU it is using in Task Manager? I suspect that whatever BOINC is telling you it isn't actually running. Sorry, it is climbing again. What I meant was it seemed to have encountered an error to make it start over again after 4 days of work. ID: 63051 ·

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944	Message 63053 - Posted: 30 Nov 2020, 7:47:14 UTC - in response to Message 63051. 0% after 4 days means something is seriously wrong. Have you checked how much CPU it is using in Task Manager? I suspect that whatever BOINC is telling you it isn't actually running. Sorry, it is climbing again. What I meant was it seemed to have encountered an error to make it start over again after 4 days of work. Ah that makes more sense. If you switch off or have to reboot before a checkpoint, it will go back to the previous checkpoint or the start if it hasn't reached one yet. If you highlight a task then click on properties you can see the time since last checkpoint. I looked at one just now and it was a tad over 11 minutes. On my old laptop sitting next to the keyboard attached to my Ryzen that would equate to well over an hour. There was one batch that went out a while ago with a much longer time between checkpoints and would I think have been several hours even on my Ryzen. Please do not private message myself or other moderators for help. This limits the number of people who are able to help and deprives others who may benefit from the answer. ID: 63053 ·

wolfman1360 Send message Joined: 18 Feb 17 Posts: 81 Credit: 14,031,919 RAC: 1,370	Message 63057 - Posted: 30 Nov 2020, 20:14:43 UTC Random question because I have forgotten over the years. Each month is a trickle, so a 12 month task would give 12 total thus 12 bits of credit (when the credit is actually calculated and updated weekly)? thanks! ID: 63057 ·

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944	Message 63058 - Posted: 30 Nov 2020, 20:33:33 UTC - in response to Message 63057. Random question because I have forgotten over the years. Each month is a trickle, so a 12 month task would give 12 total thus 12 bits of credit (when the credit is actually calculated and updated weekly)? thanks! Yes though the size of the bits of credit varies depending on the amount of computing that needs to go into it. Credit is currently updated on Thursdays but that has been a moveable feast over the years. At some point Andy plans to introduce a credit script that needs less work by the server running it at which point we should move to daily but there is no news of a date for that to happen and it is something he works on when he has spare work hours rather than being a priority task. I am pretty sure he could have sorted it by now had that been the case. Please do not private message myself or other moderators for help. This limits the number of people who are able to help and deprives others who may benefit from the answer. ID: 63058 ·

wolfman1360 Send message Joined: 18 Feb 17 Posts: 81 Credit: 14,031,919 RAC: 1,370	Message 63059 - Posted: 30 Nov 2020, 20:36:13 UTC - in response to Message 63058. Random question because I have forgotten over the years. Each month is a trickle, so a 12 month task would give 12 total thus 12 bits of credit (when the credit is actually calculated and updated weekly)? thanks! Yes though the size of the bits of credit varies depending on the amount of computing that needs to go into it. Credit is currently updated on Thursdays but that has been a moveable feast over the years. At some point Andy plans to introduce a credit script that needs less work by the server running it at which point we should move to daily but there is no news of a date for that to happen and it is something he works on when he has spare work hours rather than being a priority task. I am pretty sure he could have sorted it by now had that been the case. Thanks! One more question - I remember seeing something about figuring out the actual computing speed by something like t/sec, but not sure where that's located or if it's still something to consider. E.g. 20 is faster than 30, I think. I could be mistaken and I forget the specifics of what the T stands for. Surely not trickles. Maybe the steps per second or something. ID: 63059 ·

geophi Volunteer moderator Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275	Message 63060 - Posted: 30 Nov 2020, 21:25:26 UTC - in response to Message 63059. Last modified: 30 Nov 2020, 21:38:16 UTC One more question - I remember seeing something about figuring out the actual computing speed by something like t/sec, but not sure where that's located or if it's still something to consider. E.g. 20 is faster than 30, I think. I could be mistaken and I forget the specifics of what the T stands for. Surely not trickles. Maybe the steps per second or something. The speed can be represented in units of seconds/timestep (sec/TS) and after trickles are uploaded, can be seen on each task's webpage. The lower the average number of sec/TS, the relatively faster the model is running and the less CPU time a completed model will take. One can also see the sec/TS on running models by going into the .../projects/climateprediction.net/{task name} directory and looking at the file stdout_mon.txt, which is a log of the timesteps throughout the model run. In Linux, one can be in that directory and do a tail -f stdout_mon.txt and it will output a display to the terminal window continuously as the model runs. Depending on the Linux distribution and how it handles permissions for that directory, one might need to be a superuser to maneuver to that directory and tail that file. Edit...For the same PC, the value of the sec/TS for a given model will be dependent on how complex one model type may be relative to another. So for the same PC, the sec/TS for a hadam4 N144 model will be lower than the sec/TS for a hadam4 N216 model which is run at a higher resolution. ID: 63060 ·

wolfman1360 Send message Joined: 18 Feb 17 Posts: 81 Credit: 14,031,919 RAC: 1,370	Message 63094 - Posted: 4 Dec 2020, 6:58:04 UTC - in response to Message 63060. One more question - I remember seeing something about figuring out the actual computing speed by something like t/sec, but not sure where that's located or if it's still something to consider. E.g. 20 is faster than 30, I think. I could be mistaken and I forget the specifics of what the T stands for. Surely not trickles. Maybe the steps per second or something. The speed can be represented in units of seconds/timestep (sec/TS) and after trickles are uploaded, can be seen on each task's webpage. The lower the average number of sec/TS, the relatively faster the model is running and the less CPU time a completed model will take. One can also see the sec/TS on running models by going into the .../projects/climateprediction.net/{task name} directory and looking at the file stdout_mon.txt, which is a log of the timesteps throughout the model run. In Linux, one can be in that directory and do a tail -f stdout_mon.txt and it will output a display to the terminal window continuously as the model runs. Depending on the Linux distribution and how it handles permissions for that directory, one might need to be a superuser to maneuver to that directory and tail that file. Edit...For the same PC, the value of the sec/TS for a given model will be dependent on how complex one model type may be relative to another. So for the same PC, the sec/TS for a hadam4 N144 model will be lower than the sec/TS for a hadam4 N216 model which is run at a higher resolution. Thank you so much! I'm trying to figure out if it is even worthwhile to keep running these on my struggling i7-920 and Xeon w3520. I've been trying to run them into the ground but they just won't quit. My Ryzen 3700x is absolutely flying through whatever I throw at it I may end up going for a 5000 series of some sort. It will probably do as much work as 4 of the older i7's put together. ID: 63094 ·

Bryn Mawr Send message Joined: 28 Jul 19 Posts: 150 Credit: 12,830,559 RAC: 228	Message 63121 - Posted: 18 Dec 2020, 13:17:01 UTC - in response to Message 62986. Bryn Mawr - Sorry to read you can't get any work. I have 5 or 6 computers and it has worked for me every time. You have the same OS and AMD processors as me.There must be something that I haven't encountered yet. Frustrating but that’s life. Further to this, I have been looking through the server’s scheduler code on github and it seems to me that there are two conditions where it blocks the send of work and logs the fact but does not appear to return an error message to the user. During a work request it sets a lock file on the host id. During the next work request it finds the lock file still exists so exits. It receives an unrecognised code sign key. Now, obviously, I cannot check the server for an uncleared loch file but is there any way I can change my host id and is there any way I can resend my code sign key? ID: 63121 ·

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944	Message 63122 - Posted: 18 Dec 2020, 14:29:29 UTC - in response to Message 63121. Now, obviously, I cannot check the server for an uncleared loch file but is there any way I can change my host id and is there any way I can resend my code sign key? Not an area I have experience in. I would try removing the project using BOINC manager and then try re-attaching if you haven't already tried this. If no joy, my next step would be to ask over on the BOINC forums Please do not private message myself or other moderators for help. This limits the number of people who are able to help and deprives others who may benefit from the answer. ID: 63122 ·