Message boards : Number crunching : New work Discussion
Message board moderation
Previous · 1 . . . 71 · 72 · 73 · 74 · 75 · 76 · 77 . . . 91 · Next
Author | Message |
---|---|
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
They don't crash, the last batch I checked were taking 12GB of RAM each and uploads were about 550MB Haven't tried to check on CPU cache but it hasn't been raised as an issue by other testers so I suspect not as much as the N216 tasks. Some batches have had final uploads of over 1GB so I have had them uploading while I sleep if on a day when I am doing any Zoom calls. Obviously not an issue for those with real broad as opposed to bored band. Yes,. thank you. I have relatively a lot of RAM and a fair amount of processor cache: I also have good connectivity. 75 Megbit per second up and down. I see less below, but I infer that is the problem of the servers, not my machine or local connectivity. CPU type GenuineIntel Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz [Family 6 Model 85 Stepping 7] Number of processors 16 [8 real, 8 hyperthreaded] Operating System Red Hat Enterprise Linux 8.5 (Ootpa) [4.18.0-348.12.2.el8_5.x86_64|libc 2.28 (GNU libc)] BOINC version 7.16.11 Memory 62.4 GB <---<<< Cache 16896 KB <---<<< Swap space 15.62 GB Total disk space 488.04 GB Free Disk Space 475.07 GB Measured floating point speed 6.58 billion ops/sec Measured integer speed 32.05 billion ops/sec Average upload rate 657.97 KB/sec <---<<< Average download rate 33896.29 KB/sec <---<<< Timestamp Download Uploads Test Server 1/31/2022 12:7:10 76.73 Mbps 83.69 Mbps New York City, NY |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
Some batches have had final uploads of over 1GB so I have had them uploading while I sleep if on a day when I am doing any Zoom calls. Obviously not an issue for those with real broad as opposed to bored band. Just for laughs, I tested my Internet speeds to various (US) locations. N.B.: these are in MegaBits per second, not MegaBytes per second. Date Download Speed Upload Speed Test Server 1/31/2022 12:7:10 76.73 Mbps 83.69 Mbps New York City, NY 1/31/2022 15:36:9 79.57 Mbps 89.00 Mbps San Jose, CA 1/31/2022 15:38:19 77.54 Mbps 88.21 Mbps Los Angeles, CA 1/31/2022 20:55:52 76.30 Mbps 88.67 Mbps Dallas, TX 1/31/2022 20:57:54 80.25 Mbps 88.86 Mbps Miami 1/31/2022 20:58:59 78.69 Mbps 89.20 Mbps Atlanta, GA 1/31/2022 20:59:40 79.09 Mbps 88.99 Mbps Washington DC |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
I finally got a winner! Name hadcm3s_1k9d_200012_168_926_012129726_2 Workunit 12129726 Created 29 Jan 2022, 20:46:55 UTC Sent 29 Jan 2022, 20:48:05 UTC Report deadline 12 Jan 2023, 2:08:05 UTC Received 1 Feb 2022, 13:43:03 UTC Server state Over Outcome Success Client state Done Exit status 0 (0x00000000) Computer ID 1511241 Run time 2 days 10 hours 49 min 14 sec CPU time 2 days 10 hours 24 min 3 sec Validate state Valid Credit 0.00 Device peak FLOPS 6.58 GFLOPS Application version UK Met Office HadCM3 short v8.36 i686-pc-linux-gnu Peak working set size 158.54 MB Peak swap size 206.18 MB |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,025,554 RAC: 20,468 |
I finally got a winner! Time to buy a lottery ticket? |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
I finally got a winner! I do not think so. I majored in math in collage and took a course in probability and another in statistics. I learned the only way to make money gambling was to be the house, not its client. Now what I need is more CPDN work units for Linux. And now does not seem the time to bet on getting them. Unless they release some of those OpenIFS work units. And I do not propose to bet on that either. |
Send message Joined: 31 May 18 Posts: 53 Credit: 4,725,987 RAC: 9,174 |
I don't know if this is really the right place to ask, but the phrase "segmentation fault" comes up a lot in this thread so ... I rebooted my systems this morning and four units died. The command used to reboot the system (Linux Mint, Ubuntu based) was "sudo reboot now". I didn't directly kill any processes. On inspection they all have logs like this: <core_client_version>7.16.6</core_client_version> <![CDATA[ <message> process exited with code 22 (0x16, -234)</message> <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Signal 15 received: Software termination signal from kill Signal 15 received: Abnormal termination triggered by abort call Signal 15 received, exiting... 08:38:10 (2793): called boinc_finish(193) Signal 15 received: Software termination signal from kill Signal 15 received: Abnormal termination triggered by abort call Signal 15 received, exiting... 08:38:11 (2793): called boinc_finish(193) SIGSEGV: segmentation violation Stack trace (10 frames): /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu(boinc_catch_signal+0x67)[0x84ff4f7] linux-gate.so.1(__kernel_sigreturn+0x0)[0xf7f02b60] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x84277ad] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x80e8e67] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8089442] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8479d6e] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8494feb] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x848be04] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8496bad] /lib32/libc.so.6(__libc_start_main+0xf5)[0xf7bfcee5] Exiting... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=850, iMonCtr=1 Model crash detected, will try to restart... SIGSEGV: segmentation violation Stack trace (10 frames): /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu(boinc_catch_signal+0x67)[0x84ff4f7] linux-gate.so.1(__kernel_sigreturn+0x0)[0xf7edeb60] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x84277ad] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x80e8e67] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8089442] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8479d6e] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8494feb] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x848be04] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8496bad] /lib32/libc.so.6(__libc_start_main+0xf5)[0xf7bd8ee5] Exiting... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=850, iMonCtr=1 Model crash detected, will try to restart... SIGSEGV: segmentation violation Stack trace (10 frames): /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu(boinc_catch_signal+0x67)[0x84ff4f7] linux-gate.so.1(__kernel_sigreturn+0x0)[0xf7f59b60] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x84277ad] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x80e8e67] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8089442] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8479d6e] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8494feb] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x848be04] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8496bad] /lib32/libc.so.6(__libc_start_main+0xf5)[0xf7c53ee5] Exiting... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=850, iMonCtr=1 Model crash detected, will try to restart... SIGSEGV: segmentation violation Stack trace (10 frames): /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu(boinc_catch_signal+0x67)[0x84ff4f7] linux-gate.so.1(__kernel_sigreturn+0x0)[0xf7f5eb60] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x84277ad] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x80e8e67] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8089442] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8479d6e] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8494feb] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x848be04] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8496bad] /lib32/libc.so.6(__libc_start_main+0xf5)[0xf7c58ee5] Exiting... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=850, iMonCtr=1 Model crash detected, will try to restart... SIGSEGV: segmentation violation Stack trace (10 frames): /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu(boinc_catch_signal+0x67)[0x84ff4f7] linux-gate.so.1(__kernel_sigreturn+0x0)[0xf7eefb60] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x84277ad] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x80e8e67] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8089442] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8479d6e] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8494feb] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x848be04] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8496bad] /lib32/libc.so.6(__libc_start_main+0xf5)[0xf7be9ee5] Exiting... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=850, iMonCtr=1 Model crash detected, will try to restart... SIGSEGV: segmentation violation Stack trace (10 frames): /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu(boinc_catch_signal+0x67)[0x84ff4f7] linux-gate.so.1(__kernel_sigreturn+0x0)[0xf7efab60] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x84277ad] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x80e8e67] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8089442] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8479d6e] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8494feb] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x848be04] /var/lib/boinc-client/projects/climateprediction.net/hadcm3s_um_8.36_i686-pc-linux-gnu[0x8496bad] /lib32/libc.so.6(__libc_start_main+0xf5)[0xf7bf4ee5] Exiting... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=850, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( 08:40:52 (850): called boinc_finish(22) </stderr_txt> ]]> If someone could explain what I'm looking at and how to prevent this sort of thing, if it's possible, it would be greatly appreciated. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,025,554 RAC: 20,468 |
The segmentation faults are at least partly due to the initial conditions for this batch but there are other seemingly random factors at play as well. With CPDN tasks chances of tasks surviving after a reboot are substantially increased if you suspend computation, wait long enough to be sure everything necessary is written to disk, then exit BOINC. There will be a batch based on the successes from the current batch and they should largely avoid the initial conditions problem but there still remains the fact that a task may fail on one seemingly reliable computer and succeed on another even when CPU type and operating system are the same. I have looked at successes and failures in work units and my only success from examining the data is to give myself a headache! |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,025,554 RAC: 20,468 |
Unless they release some of those OpenIFS work units. And I do not propose to bet on that either. Two more from testing running on my machine at the moment. They are peaking at about 10GB memory usage so I need to hurry up and get that upgrade to 64GB if they are to appear in more than two at a time! |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
Two more from testing running on my machine at the moment. They are peaking at about 10GB memory usage so I need to hurry up and get that upgrade to 64GB if they are to appear in more than two at a time! While doubling your RAM is undoubtedly a good idea, it is not everything. My machine seems to return work units about five times faster than yours even though the processor is only two or three times faster. Do you think it is because my processor chip cache is so much larger than yours? Also, I suspect that my upload and download speeds are faster than yours because my Internet connection is on Verizon FiOS with a transmission rate claimed to be 75 Megabits per second and yours is probably lower. But the rates here, while claiming KB/sec (i.e., kiloBytes per second) are probably Kb/sec (kilobits per second). Dave Jackson Jean-David Beyer Memory 31.18 GB 62.4 GB Cache 512 KB 16896 KB Measured floating point speed 3.51 billion ops/sec 6.58 billion ops/sec Measured integer speed 12.91 billion ops/sec 32.05 billion ops/sec Average upload rate 43.76 KB/sec 385.38 KB/sec Average download rate 2231.03 KB/sec 35329.76 KB/sec Average turnaround time 31.68 days 6.53 days |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
Two more from testing running on my machine at the moment. They are peaking at about 10GB memory usage so I need to hurry up and get that upgrade to 64GB if they are to appear in more than two at a time! BOINC software cannot determine the L3 cache size on Ryzens (which is 32 MB for his 3700X). That is not a problem with performance, just the ability of BOINC (not the science application) to see the L3 cache. But Ryzens are fast in speed, and it's tough to compare speeds when the workloads of the various PCs aren't known. My Ryzen 3600X can run 4 hadam4h tasks at a time at about 13.5 sec/TS and 5 of that type of model at less than 15 sec/TS. Dave's Ryzen 3700X should be able to perform similarly. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
My Ryzen 3600X can run 4 hadam4h tasks at a time at about 13.5 sec/TS and 5 of that type of model at less than 15 sec/TS. Dave's Ryzen 3700X should be able to perform similarly. My machine is running N216 models at about this rate, running up to four at a time. CPU type GenuineIntel Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz [Family 6 Model 85 Stepping 7] Time Sent (UTC) Host ID Result ID Result Name Phase Timestep CPU Time (sec) Average (sec/TS) 24 Jan 2022 04:24:25 1511241 22160123 hadam4h_208i_209202_4_922_012119724_1 1 34,763 551,632 15.8684 |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,025,554 RAC: 20,468 |
While doubling your RAM is undoubtedly a good idea, it is not everything. My machine seems to return work units about five times faster than yours even though the processor is only two or three times faster. Do you think it is because my processor chip cache is so much larger than yours?The fact that my machine is often suspended overnight unless running testing work I am sure affects the speed it returns work units! Edit: Just looked at one of the tasks I am currently running and the two zips/ trickle ups have been completed at just over and just under 15seconds/time step. I suspect that may improve a fraction on swapping from pc2800 to 3200RAM. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
The fact that my machine is often suspended overnight unless running testing work I am sure affects the speed it returns work units! I agree: on both points you have made. My main machine (the Linux one) runs pretty much 24/7 and has almost nothing to do except Boinc when I am asleep and when I am not using the machine for something compute-intensive. My RAM is 4x16GB DDR4 2933MHz RDIM M ECC Memory My other machine is slower, has less memory, and runs Windows 10. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,025,554 RAC: 20,468 |
The openIFS tasks are producing between two of them about 650MB/ hour or a bit more than the pipe to my house can handle. I am currently uploading via4G which is a bit Edit: (almost five times) faster and I have enough free data left for the month to upload what these will produce. I think I need to go through the load balancing tutorial I found on-line again to use the bored band and 4G together. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
The openIFS tasks are producing between two of them about 650MB/ hour or a bit more than the pipe to my house can handle. I am currently uploading via4G which is a bit Edit: (almost five times) faster and I have enough free data left for the month to upload what these will produce. I think I need to go through the load balancing tutorial I found on-line again to use the bored band and 4G together. I notice we both get much faster download speeds than we get upload speeds. I do not understand that. But if you need two transmission speeds, and split them up, perhaps you should send the upload stuff via the faster method and accept the download stuff on the slower path. If you have the choice, that is. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,025,554 RAC: 20,468 |
I notice we both get much faster download speeds than we get upload speeds. I do not understand that. But if you need two transmission speeds, and split them up, perhaps you should send the upload stuff via the faster method and accept the download stuff on the slower path. If you have the choice, that is. I haven't looked into that possibility. - I can do it by only tethering my phone when doing uploads. These tasks produce a zip of between about 550MB and 660MB every 2% of computation so just over one an hour each. I am told that our street is getting upgraded towards the end of the year so it is hopefully not a problem for too much longer. My understanding of load balancing is that you can share the work between two or more connections so I can get up to 400KB/second from 4G. Minimum is down about 250 and the max I get from broad band is about 120. Going up to 520 max by combining them would make a significant difference, but I do need to keep a close eye on my mobile data! |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
I am told that our street is getting upgraded towards the end of the year so it is hopefully not a problem for too much longer. I was so glad when Verizon put their FiOS service on my street: fiber-optic all the way to the box on the side of my house and CAT 5e from there to my router and from there to my machine(s). It was 5 Megabits/sec down and 2 Megabits/sec up to begin with. Much better than my 56.6 Kilobits per second up and down with dial-up. This must have been in about 2004. Since then, they have slowly increased the speed to where I now have nominally 75 Megabits/second both up and down (it is usually a little faster). And I could get about 10 times that if I wish to pay more (I do not). I hope when they upgrade your street, you can get performance around this, or perhaps even more, at a reasonable price. I know when I upgraded, I bought enough Verizon stock so the dividend would pay my FiOS bill. ;-) |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
I can't get the new HadCM3 shorts. I just get a "no work sent". It is the same machine where I have been running the HadAM4 at N216. https://www.cpdn.org/results.php?hostid=1523408 And the "Project status" page shows only 3 users. There must be something wrong. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
I can't get the new HadCM3 shorts. I just get a "no work sent". I get the same. I am even running an hadam4h_20ak at the moment with about a day to go. Computer 1511241 The event log reveals, in part, Fri 04 Feb 2022 03:31:20 AM EST | climateprediction.net | Sending scheduler request: To fetch work. Fri 04 Feb 2022 03:31:20 AM EST | climateprediction.net | Requesting new tasks for CPU Fri 04 Feb 2022 03:31:22 AM EST | climateprediction.net | Scheduler request completed: got 0 new tasks Fri 04 Feb 2022 03:31:22 AM EST | climateprediction.net | Project has no tasks available Fri 04 Feb 2022 03:31:22 AM EST | climateprediction.net | Project requested delay of 3636 seconds[/quote] |
Send message Joined: 28 Jul 19 Posts: 150 Credit: 12,830,559 RAC: 228 |
I can't get the new HadCM3 shorts. I just get a "no work sent". A note from Dave in the getting started area has the answer, these have been amended to be Mac only so we’re out of luck. |
©2024 cpdn.org