Message boards : Number crunching : New Work Announcements 2024
Message board moderation
Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · Next
Author | Message |
---|---|
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
OK. All mine are failing the same way, but I am letting them complete. Mine use a little more than 3. GB at times. top - 07:04:32 up 2 days, 19:32, 2 users, load average: 14.17, 14.43, 14.58 Tasks: 486 total, 15 running, 471 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.5 us, 0.6 sy, 86.9 ni, 11.7 id, 0.0 wa, 0.2 hi, 0.0 si, 0.0 st MiB Mem : 128086.0 total, 3521.9 free, 17893.0 used, 106671.1 buff/cache MiB Swap: 15992.0 total, 15990.5 free, 1.5 used. 108432.6 avail Mem PID PPID USER PR NI S RES %MEM %CPU P TIME+ COMMAND 537222 537215 boinc 39 19 R 5.1g 4.1 99.3 12 97:14.89 /var/lib/boinc/slots/14/oifs_43r3_model.exe 508741 508738 boinc 39 19 R 4.6g 3.7 99.4 1 353:06.52 /var/lib/boinc/slots/3/oifs_43r3_model.exe 504560 504516 boinc 39 19 R 2.4g 1.9 99.5 13 375:49.65 /var/lib/boinc/slots/0/oifs_43r3_model.exe Computer 1511241 CPU type GenuineIntel Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz [Family 6 Model 85 Stepping 7] Number of processors 16 Operating System Linux Red Hat Enterprise Linux Red Hat Enterprise Linux 8.10 (Ootpa) [4.18.0-553.5.1.el8_10.x86_64|libc 2.28] BOINC version 7.20.2 Memory 125.08 GB Cache 16896 KB |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
And there might even be some left for when my new machine arrives! The trick might be to let it start downloading enough for all 24 threads then while they are downloading, reduce the number of cpus BOINC can use to 6 which should leave plenty of headroom with 64GB RAM. |
Send message Joined: 14 Sep 08 Posts: 127 Credit: 42,131,288 RAC: 71,163 |
Ouch, I just read the "Batch 1017 Errors" post. I didn't know we'd use batch numbers across apps and thought that must be a continuation for WAH batches and skipped the post... Sorry for the duplicates. On the other hand, same as observed by Jean-David Beyer, the RSS usage is not capped at 3.5GB. This looks like the normal OIFS apps when I collected RSS every second for 10 minutes. 2311604 - 2488436: ************** (82, 13.8%) 2488437 - 2665269: **************** (15, 16.3%) 2665270 - 2842101: ******************** (19, 19.5%) 2842102 - 3018934: *********************** (20, 22.9%) 3018935 - 3195766: ************************** (21, 26.4%) 3195767 - 3372599: ****************************** (20, 29.8%) 3372600 - 3549431: ******************************** (16, 32.5%) 3549432 - 3726264: ********************************** (10, 34.2%) 3726265 - 3903097: ************************************ (11, 36.0%) 3903098 - 4079929: ********************************************************************************** (272, 81.8%) 4079930 - 4256762: *********************************************************************************** (8, 83.2%) 4256763 - 4433594: ************************************************************************************* (9, 84.7%) 4433595 - 4610427: ************************************************************************************** (8, 86.0%) 4610428 - 4787259: **************************************************************************************** (12, 88.0%) 4787260 - 4964092: ****************************************************************************************** (13, 90.2%) 4964093 - 5140925: **************************************************************************************************** (58, 100.0%) |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
If I understand your graph correctly, it seems the Working set is monotonically increasing. Now in the short run, that may be true. I read only what my top program shows and it updates every 19 seconds. In my experience, the working set increases for a while, then it drops back and rinse and repeat. I.e., the process allocates more and more RAM up to a certain point, gives some back and does another cycle. |
Send message Joined: 14 Sep 08 Posts: 127 Credit: 42,131,288 RAC: 71,163 |
Ah sorry I should have explained. It's not a time series but a histogram. It's sampling the RSS usage over 10 minutes with a rate of one sample per second and grouping them into buckets. RSS is from whatever shown by `ps`. The number on the left are recorded RSS bytes, divided into equal buckets. The number on the right of each bar are the number samples that fall into that bucket. Then the percentage is total percentage that falls into this bucket and below. The stars are just visualization. You can think this graph as a CDF rotated by 90 degrees. Yes, the actual memory allocation pattern is as what you described. My goal with this little script is to figure out the range of RSS this task actually use over time, so that I can set the concurrent correctly. |
Send message Joined: 14 Sep 08 Posts: 127 Credit: 42,131,288 RAC: 71,163 |
A different topic. Is there any criteria gating what client can get new tasks? Most of my Linux machines are happily crunching, except one host where I've migrated from a physical disk to a VM. I've since reset the project, waited for the 1 hour update interval many times, but each time still get reply of no new tasks. I also tried uninstalling boinc, clear the data directory and install again. That didn't help either, though the new client get associated to the same host id, so if it's some server side filtering it won't make a difference anyway. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,528,638 RAC: 17,959 |
Probably because there are no more linux tasks available, according to the server status. I have stopped resends for batch 1017, otherwise we'll be swamped by always failing tasks. A different topic. Is there any criteria gating what client can get new tasks? Most of my Linux machines are happily crunching, except one host where I've migrated from a physical disk to a VM. I've since reset the project, waited for the 1 hour update interval many times, but each time still get reply of no new tasks. I also tried uninstalling boinc, clear the data directory and install again. That didn't help either, though the new client get associated to the same host id, so if it's some server side filtering it won't make a difference anyway. |
Send message Joined: 14 Sep 08 Posts: 127 Credit: 42,131,288 RAC: 71,163 |
Probably because there are no more linux tasks available, according to the server status. I have stopped resends for batch 1017, otherwise we'll be swamped by always failing tasks. Thanks. Oops, I read the wrong column and thought tasks are still available. Guess I will wait for the next batch of fun while figuring out how to not be upload bandwidth limited next time... :-) |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,528,638 RAC: 17,959 |
I'm told there's more Windows & Linux work on the way. Windows: More batches from Weather@Home for the New Zealand configuration (NZ25) will come first, followed by more batches for the East Asia configuration (EAS25). Note that the NZ batch will use WAH2 version 8.24, whereas the EAS25 batches will use a new WAH-RI version 8.31. Linux: There's also a rerun of the flawed 1017 batch for OpenIFS on its way. --- CPDN Visiting Scientist |
Send message Joined: 5 Aug 04 Posts: 178 Credit: 18,956,646 RAC: 44,988 |
Regarding wah2 region independend on Windows (Batch 1006 / 1015 ???), how much RAM should I calculate for each task ? Supporting BOINC, a great concept ! |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Regarding wah2 region independend on Windows (Batch 1006 / 1015 ???), how much RAM should I calculate for each task ?I reckon on allowing 2GB/task normally on WAH2 which leaves some spare. In practice on my new machine it is always going to be my upload bandwidth that limits me till my connection is upgraded. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,528,638 RAC: 17,959 |
Regarding wah2 region independend on Windows (Batch 1006 / 1015 ???), how much RAM should I calculate for each task ? The WaH tasks will take no more than 500Mb RAM. That applies to both wah2 and wah2-ri. OpenIFS tasks take much more, 5GB. Note the change of units. --- CPDN Visiting Scientist |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,528,638 RAC: 17,959 |
New Weather@Home batch going out today. NZ25 domain, Windows only, app version 8.24. --- CPDN Visiting Scientist |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
New Weather@Home batch going out today. NZ25 domain, Windows only, app version 8.24.3150 25 month tasks. Roll up roll up, they won't last long! |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,528,638 RAC: 17,959 |
Some tasks were sent out as batch 995. This was a mistake. The correct batch is 1019. If you have a task from 995 it can be aborted. Don't waste time running it as the results are not needed. It's an previously run batch. --- CPDN Visiting Scientist |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,528,638 RAC: 17,959 |
Please don't download and sit on a pile of unstarted tasks though...New Weather@Home batch going out today. NZ25 domain, Windows only, app version 8.24.3150 25 month tasks. Roll up roll up, they won't last long! --- CPDN Visiting Scientist |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
Some tasks were sent out as batch 995. This was a mistake. The correct batch is 1019. If you have a task from 995 it can be aborted. I got two of each on my pipsqueak machine. I just aborted the 995 ones. My big machine is Linux only, so I got none of these (of course). |
Send message Joined: 6 Aug 04 Posts: 195 Credit: 28,385,596 RAC: 10,164 |
Some tasks were sent out as batch 995. This was a mistake. The correct batch is 1019. If you have a task from 995 it can be aborted. Don't waste time running it as the results are not needed. It's an previously run batch.28 deg C here, today. I wondered why the desktop PC was making extra fan-noise when I got home. Six tasks from batch 995 aborted as requested. |
Send message Joined: 14 Feb 06 Posts: 31 Credit: 4,507,116 RAC: 2,013 |
Aren't the Weather At Home 2 (wah2) v8.24 the ones that crash on restart? Or has this been solved? Just wondering whether to abort the one that I've got. It's survived 2 restarts so far, so if there is still a problem, its luck must run out soon. Thanks! |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,724,038 RAC: 7,570 |
I can answer that! I've just had a brief (1 or 2 seconds) power outage, and everything shut down. On power up (and after waiting ages for the router to restart), I can see that the four tasks I got from this batch (v8.24 app, batch 1019 data) have picked up and restarted running from the point they'd reached. |
©2024 cpdn.org