climateprediction.net (CPDN) home page
Thread 'New work Discussion'

Thread 'New work Discussion'

Message boards : Number crunching : New work Discussion
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 50 · 51 · 52 · 53 · 54 · 55 · 56 . . . 91 · Next

AuthorMessage
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 63009 - Posted: 23 Nov 2020, 16:30:55 UTC - in response to Message 63007.  

HADAM4 tasks are Linux only.

And there are only 20 Linux users for them. That seems a bit low. Maybe they have given up?
ID: 63009 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,017,270
RAC: 20,902
Message 63010 - Posted: 23 Nov 2020, 16:41:04 UTC - in response to Message 63009.  

HADAM4 tasks are Linux only.

And there are only 20 Linux users for them. That seems a bit low. Maybe they have given up?


I am pretty certain that means the number of users who have returned completed tasks in the past 24 hours.
ID: 63010 · Report as offensive
Bryn Mawr

Send message
Joined: 28 Jul 19
Posts: 150
Credit: 12,830,559
RAC: 228
Message 63011 - Posted: 23 Nov 2020, 23:48:28 UTC - in response to Message 63009.  

HADAM4 tasks are Linux only.

And there are only 20 Linux users for them. That seems a bit low. Maybe they have given up?


No, I’m here and wanting work but can’t get any :-(
ID: 63011 · Report as offensive
nairb

Send message
Joined: 3 Sep 04
Posts: 105
Credit: 5,646,090
RAC: 102,785
Message 63012 - Posted: 24 Nov 2020, 0:05:37 UTC - in response to Message 63010.  
Last modified: 24 Nov 2020, 0:07:48 UTC


I am pretty certain that means the number of users who have returned completed tasks in the past 24 hours.


And I must be among them coz for the first time in living memory I managed to return 2... yes thats two successfully completed w/u in 1 day.

And just to settle the nerves I have a bunch of ARP w/u to do for a couple of days. I do like w/u's that seem impossible to kill. Unlike others that we could mention.....
ID: 63012 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,017,270
RAC: 20,902
Message 63013 - Posted: 24 Nov 2020, 10:29:35 UTC - in response to Message 63012.  

And just to settle the nerves I have a bunch of ARP w/u to do for a couple of days. I do like w/u's that seem impossible to kill. Unlike others that we could mention.....


Yes, I lost two at some point in the past 24 hours following a hard reboot or I assume that was the cause, one was a seg fault. I forget what the other error was.
Please do not private message myself or other moderators for help. This limits the number of people who are able to help and deprives others who may benefit from the answer.
ID: 63013 · Report as offensive
DJStarfox

Send message
Joined: 27 Jan 07
Posts: 300
Credit: 3,288,263
RAC: 26,370
Message 63018 - Posted: 25 Nov 2020, 13:58:45 UTC

I got five (5) of those HadAM4h tasks, but each one is consuming ~4GB of disk space, which far exceeds the 10G max I have in BOINC settings. Seems like a bug in the task scheduler...

I had to abort a few to avoid running out of disk space on the /var partition. :(
ID: 63018 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 63023 - Posted: 25 Nov 2020, 15:53:25 UTC - in response to Message 63018.  

These tasks are BIG, as you've found.

You'll need a lot more than 10 Gigs.
50-100 would be better.
ID: 63023 · Report as offensive
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 63024 - Posted: 25 Nov 2020, 16:33:12 UTC - in response to Message 63023.  
Last modified: 25 Nov 2020, 16:41:07 UTC

I am missing out on the big hadam4h. I am running five on three machines, the last downloaded two hours ago, and just see the usual 1377 MB.
If they had a way to select the big ones (right), I would take them.

OK, now I see the new work unit uses 3.7 GB disk memory, not the main memory you are talking about. I have even more of that free. Send them to me.
ID: 63024 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 63026 - Posted: 25 Nov 2020, 19:53:26 UTC - in response to Message 63024.  

And you should see how big the restart file is if they don't use the No setting, which was developed just for these models.
ID: 63026 · Report as offensive
wolfman1360

Send message
Joined: 18 Feb 17
Posts: 81
Credit: 14,024,464
RAC: 5,225
Message 63049 - Posted: 29 Nov 2020, 5:17:50 UTC

I forgot myself and restarted one of my windows machines. The WU at least started over but now is claiming 4 days elapsed at 0%.
These poor sandy bridge (and even a few below) are trucking along. I think some of them need to be retired soon.
ID: 63049 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,017,270
RAC: 20,902
Message 63050 - Posted: 29 Nov 2020, 6:43:20 UTC - in response to Message 63049.  

0% after 4 days means something is seriously wrong. Have you checked how much CPU it is using in Task Manager? I suspect that whatever BOINC is telling you it isn't actually running.
Please do not private message myself or other moderators for help. This limits the number of people who are able to help and deprives others who may benefit from the answer.
ID: 63050 · Report as offensive
wolfman1360

Send message
Joined: 18 Feb 17
Posts: 81
Credit: 14,024,464
RAC: 5,225
Message 63051 - Posted: 29 Nov 2020, 22:14:18 UTC - in response to Message 63050.  

0% after 4 days means something is seriously wrong. Have you checked how much CPU it is using in Task Manager? I suspect that whatever BOINC is telling you it isn't actually running.

Sorry, it is climbing again. What I meant was it seemed to have encountered an error to make it start over again after 4 days of work.
ID: 63051 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,017,270
RAC: 20,902
Message 63053 - Posted: 30 Nov 2020, 7:47:14 UTC - in response to Message 63051.  

0% after 4 days means something is seriously wrong. Have you checked how much CPU it is using in Task Manager? I suspect that whatever BOINC is telling you it isn't actually running.

Sorry, it is climbing again. What I meant was it seemed to have encountered an error to make it start over again after 4 days of work.


Ah that makes more sense. If you switch off or have to reboot before a checkpoint, it will go back to the previous checkpoint or the start if it hasn't reached one yet. If you highlight a task then click on properties you can see the time since last checkpoint. I looked at one just now and it was a tad over 11 minutes. On my old laptop sitting next to the keyboard attached to my Ryzen that would equate to well over an hour. There was one batch that went out a while ago with a much longer time between checkpoints and would I think have been several hours even on my Ryzen.
Please do not private message myself or other moderators for help. This limits the number of people who are able to help and deprives others who may benefit from the answer.
ID: 63053 · Report as offensive
wolfman1360

Send message
Joined: 18 Feb 17
Posts: 81
Credit: 14,024,464
RAC: 5,225
Message 63057 - Posted: 30 Nov 2020, 20:14:43 UTC

Random question because I have forgotten over the years.
Each month is a trickle, so a 12 month task would give 12 total thus 12 bits of credit (when the credit is actually calculated and updated weekly)?
thanks!
ID: 63057 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,017,270
RAC: 20,902
Message 63058 - Posted: 30 Nov 2020, 20:33:33 UTC - in response to Message 63057.  

Random question because I have forgotten over the years.
Each month is a trickle, so a 12 month task would give 12 total thus 12 bits of credit (when the credit is actually calculated and updated weekly)?
thanks!

Yes though the size of the bits of credit varies depending on the amount of computing that needs to go into it. Credit is currently updated on Thursdays but that has been a moveable feast over the years. At some point Andy plans to introduce a credit script that needs less work by the server running it at which point we should move to daily but there is no news of a date for that to happen and it is something he works on when he has spare work hours rather than being a priority task. I am pretty sure he could have sorted it by now had that been the case.
Please do not private message myself or other moderators for help. This limits the number of people who are able to help and deprives others who may benefit from the answer.
ID: 63058 · Report as offensive
wolfman1360

Send message
Joined: 18 Feb 17
Posts: 81
Credit: 14,024,464
RAC: 5,225
Message 63059 - Posted: 30 Nov 2020, 20:36:13 UTC - in response to Message 63058.  

Random question because I have forgotten over the years.
Each month is a trickle, so a 12 month task would give 12 total thus 12 bits of credit (when the credit is actually calculated and updated weekly)?
thanks!

Yes though the size of the bits of credit varies depending on the amount of computing that needs to go into it. Credit is currently updated on Thursdays but that has been a moveable feast over the years. At some point Andy plans to introduce a credit script that needs less work by the server running it at which point we should move to daily but there is no news of a date for that to happen and it is something he works on when he has spare work hours rather than being a priority task. I am pretty sure he could have sorted it by now had that been the case.

Thanks!
One more question - I remember seeing something about figuring out the actual computing speed by something like t/sec, but not sure where that's located or if it's still something to consider. E.g. 20 is faster than 30, I think.
I could be mistaken and I forget the specifics of what the T stands for. Surely not trickles. Maybe the steps per second or something.
ID: 63059 · Report as offensive
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 63060 - Posted: 30 Nov 2020, 21:25:26 UTC - in response to Message 63059.  
Last modified: 30 Nov 2020, 21:38:16 UTC

One more question - I remember seeing something about figuring out the actual computing speed by something like t/sec, but not sure where that's located or if it's still something to consider. E.g. 20 is faster than 30, I think.
I could be mistaken and I forget the specifics of what the T stands for. Surely not trickles. Maybe the steps per second or something.

The speed can be represented in units of seconds/timestep (sec/TS) and after trickles are uploaded, can be seen on each task's webpage. The lower the average number of sec/TS, the relatively faster the model is running and the less CPU time a completed model will take.

One can also see the sec/TS on running models by going into the .../projects/climateprediction.net/{task name} directory and looking at the file stdout_mon.txt, which is a log of the timesteps throughout the model run. In Linux, one can be in that directory and do a

tail -f stdout_mon.txt

and it will output a display to the terminal window continuously as the model runs. Depending on the Linux distribution and how it handles permissions for that directory, one might need to be a superuser to maneuver to that directory and tail that file.

Edit...For the same PC, the value of the sec/TS for a given model will be dependent on how complex one model type may be relative to another. So for the same PC, the sec/TS for a hadam4 N144 model will be lower than the sec/TS for a hadam4 N216 model which is run at a higher resolution.
ID: 63060 · Report as offensive
wolfman1360

Send message
Joined: 18 Feb 17
Posts: 81
Credit: 14,024,464
RAC: 5,225
Message 63094 - Posted: 4 Dec 2020, 6:58:04 UTC - in response to Message 63060.  

One more question - I remember seeing something about figuring out the actual computing speed by something like t/sec, but not sure where that's located or if it's still something to consider. E.g. 20 is faster than 30, I think.
I could be mistaken and I forget the specifics of what the T stands for. Surely not trickles. Maybe the steps per second or something.

The speed can be represented in units of seconds/timestep (sec/TS) and after trickles are uploaded, can be seen on each task's webpage. The lower the average number of sec/TS, the relatively faster the model is running and the less CPU time a completed model will take.

One can also see the sec/TS on running models by going into the .../projects/climateprediction.net/{task name} directory and looking at the file stdout_mon.txt, which is a log of the timesteps throughout the model run. In Linux, one can be in that directory and do a

tail -f stdout_mon.txt

and it will output a display to the terminal window continuously as the model runs. Depending on the Linux distribution and how it handles permissions for that directory, one might need to be a superuser to maneuver to that directory and tail that file.

Edit...For the same PC, the value of the sec/TS for a given model will be dependent on how complex one model type may be relative to another. So for the same PC, the sec/TS for a hadam4 N144 model will be lower than the sec/TS for a hadam4 N216 model which is run at a higher resolution.

Thank you so much!
I'm trying to figure out if it is even worthwhile to keep running these on my struggling i7-920 and Xeon w3520.
I've been trying to run them into the ground but they just won't quit. My Ryzen 3700x is absolutely flying through whatever I throw at it I may end up going for a 5000 series of some sort. It will probably do as much work as 4 of the older i7's put together.
ID: 63094 · Report as offensive
Bryn Mawr

Send message
Joined: 28 Jul 19
Posts: 150
Credit: 12,830,559
RAC: 228
Message 63121 - Posted: 18 Dec 2020, 13:17:01 UTC - in response to Message 62986.  

Bryn Mawr -

Sorry to read you can't get any work. I have 5 or 6 computers and it has worked for me every time.

You have the same OS and AMD processors as me.There must be something that I haven't encountered yet.


Frustrating but that’s life.


Further to this, I have been looking through the server’s scheduler code on github and it seems to me that there are two conditions where it blocks the send of work and logs the fact but does not appear to return an error message to the user.

During a work request it sets a lock file on the host id. During the next work request it finds the lock file still exists so exits.

It receives an unrecognised code sign key.


Now, obviously, I cannot check the server for an uncleared loch file but is there any way I can change my host id and is there any way I can resend my code sign key?
ID: 63121 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,017,270
RAC: 20,902
Message 63122 - Posted: 18 Dec 2020, 14:29:29 UTC - in response to Message 63121.  

Now, obviously, I cannot check the server for an uncleared loch file but is there any way I can change my host id and is there any way I can resend my code sign key?


Not an area I have experience in. I would try removing the project using BOINC manager and then try re-attaching if you haven't already tried this. If no joy, my next step would be to ask over on the BOINC forums
Please do not private message myself or other moderators for help. This limits the number of people who are able to help and deprives others who may benefit from the answer.
ID: 63122 · Report as offensive
Previous · 1 . . . 50 · 51 · 52 · 53 · 54 · 55 · 56 . . . 91 · Next

Message boards : Number crunching : New work Discussion

©2024 cpdn.org