Message boards : Number crunching : OpenIFS Discussion
Message board moderation
Previous · 1 . . . 25 · 26 · 27 · 28 · 29 · 30 · 31 . . . 32 · Next
Author | Message |
---|---|
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
My guess is it failed on the first machine because of lack of memory. - 11GB RAM and 4 cores- it looks likely that the user is running all four cores at once which would explain a relatively high failure rate. Order came in and four modules installed, so I am less likely to run out of memory than ever before. OTOH, there are no CPDN tasks here, so none are running. My first hard drive, at work, held 40 Megabytes and spun at 2400 rpm. One or two decades later, I was amazed to have a 2 Gigabyte hard drive that spun at 7200 rpm. And now I have 125 GBytes of RAM! Amazing the progress from 1965 until now! Computer 1511241 Created 14 Nov 2020, 15:37:02 UTC Total credit 7,074,017 Average credit 1.58 CPU type GenuineIntel Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz [Family 6 Model 85 Stepping 7] Number of processors 16 Operating System Linux Red Hat Enterprise Linux Red Hat Enterprise Linux 8.7 (Ootpa) [4.18.0-425.13.1.el8_7.x86_64|libc 2.28] BOINC version 7.20.2 Memory 125.34 GB Cache 16896 KB $ free -hw total used free shared buffers cache available Mem: 125Gi 4.7Gi 117Gi 26Mi 13Mi 2.7Gi 119Gi Swap: 15Gi 0B 15Gi |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
Send Personal Message to me if interested rather than reply here. If there is sufficient interest, I'll share the files on dropbox. I'll post answers to PM'd questions here. How do I do that? If you are still interested, I raised my RAM to 128 GBytes this afternoon. $ free -hw total used free shared buffers cache available Mem: 125Gi 5.8Gi 114Gi 82Mi 13Mi 5.2Gi 118Gi Swap: 15Gi 0B 15Gi |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
Send Personal Message to me if interested rather than reply here. If there is sufficient interest, I'll share the files on dropbox. I'll post answers to PM'd questions here. Click on his name in the Author section for his post. It'll bring up an abbreviated profile page for him and then click on "Send personal message" on the right hand side of the webpage. Or, easier, just click on the "Send Message" button under his name in the Author section. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
Or, easier, just click on the "Send Message" button under his name in the Author section. I tried that and I got this: User Glenn Carver (ID: 1560856) is not accepting private messages from you |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
Next OpenIFS batch about to go out today. Missing file fixed. --- CPDN Visiting Scientist |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,022,240 RAC: 20,762 |
Next OpenIFS batch about to go out today. Missing file fixed.#993 is now out there. My first one is about 6 minutes in so well past the time when the problem occurred. Edit: third zip has now been sent. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,022,240 RAC: 20,762 |
I wonder if there is a problem with the tasks being uploaded. I have only got one, a request an hour later said, "project has no tasks available." That seems a bit quick for them all to go for Linux tasks. Well check again after next request. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,708,278 RAC: 9,361 |
There were 188 unsent on the server status page at 12:45, and I got one of them at 13:01. It's running, but still in the early stages - I'll watch how it runs for a while, before switching into full multi-fetch mode. |
Send message Joined: 5 Aug 04 Posts: 178 Credit: 18,767,744 RAC: 43,954 |
Mine seem to run fine, 40 Minutes running without a failure, zip-Files have been uploaded from Nr 0 to 5 Supporting BOINC, a great concept ! |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,708,278 RAC: 9,361 |
Yikes - there are 123 upload files in all, and the first one was over 15 MB. Your band is going to get very bored, Dave! |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,022,240 RAC: 20,762 |
I wonder if there is a problem with the tasks being uploaded. I have only got one, a request an hour later said, "project has no tasks available." That seems a bit quick for them all to go for Linux tasks. Well check again after next request.Maybe just getting there slowly. I have a second one downloading now and two is my limit without overloading my connection. (I could let five or six go via my phone which would be faster and still have enough headway for other usage. Edit: Estimate based on percentage completed rather than BOINC'S guess is just over 10 hours on my Ryzen7 |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
I wonder if there is a problem with the tasks being uploaded. I have only got one, a request an hour later said, "project has no tasks available." That seems a bit quick for them all to go for Linux tasks. Well check again after next request. It's working fine. It just takes time to process all the tasks to be uploaded. They get taken up very quickly. --- CPDN Visiting Scientist |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,708,278 RAC: 9,361 |
The server status page has finally updated, and both 'Unsent' and 'In progress' have gone up substantially. Looks like the workunit generator is running at about twice the speed of our demand load, which is fine. |
Send message Joined: 31 Aug 04 Posts: 37 Credit: 9,581,380 RAC: 3,853 |
Someone can have the retry for my first one of this batch: it got a "double free or corruption"... The system in use is a Ryzen 3700X with 32GB RAM, and it is only using about half that under the current load (including a second OIFS task it got when reporting this one.) I only run one CPDN task at a time. and none of the other BOINC stuff I'm currently running on that system (a maximum of 9 other processes) will get up to a single GB of RAM! I'll keep an eye on both this system and the other one that also has a single CPDN task in its BOINC mix (a Ryzen 5600H with 32GB RAM, currently showing about 24GB free...)... Cheers - Al. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
I have only got one, a request an hour later said, "project has no tasks available." That seems a bit quick for them all to go for Linux tasks. I have one running. Since I just doubled my RAM size to 128 GBytes, I diddled app_config.xml to run two at a time. I then got this: Fri 24 Feb 2023 09:23:34 AM EST | | Re-reading cc_config.xml Fri 24 Feb 2023 09:23:34 AM EST | climateprediction.net | Config: excluded GPU. Type: all. App: all. Device: all Fri 24 Feb 2023 09:23:34 AM EST | | Config: event log limit 5000 lines Fri 24 Feb 2023 09:23:34 AM EST | | log flags: file_xfer, sched_ops, task Fri 24 Feb 2023 09:23:34 AM EST | climateprediction.net | Found app_config.xml Fri 24 Feb 2023 09:23:59 AM EST | climateprediction.net | Sending scheduler request: To send trickle-up message. Fri 24 Feb 2023 09:23:59 AM EST | climateprediction.net | Requesting new tasks for CPU Fri 24 Feb 2023 09:24:01 AM EST | climateprediction.net | Scheduler request completed: got 0 new tasks Fri 24 Feb 2023 09:24:01 AM EST | climateprediction.net | No tasks sent Fri 24 Feb 2023 09:24:01 AM EST | climateprediction.net | This computer has finished a daily quota of 1 tasks This is true enough, since I have never completed one of those before. OpenIFS 43r3 1.21 x86_64-pc-linux-gnu Number of tasks completed 0 Max tasks per day 1 Number of tasks today 1 Consecutive valid tasks 0 Average turnaround time 0.00 days The boincmgr thinks this task has about 2 1/2 days to go before it finishes. top - 09:38:45 up 15:48, 1 user, load average: 11.36, 11.45, 11.38 Tasks: 456 total, 12 running, 444 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.3 us, 0.2 sy, 68.1 ni, 31.2 id, 0.0 wa, 0.1 hi, 0.0 si, 0.0 st MiB Mem : 128345.2 total, 72454.7 free, 10400.3 used, 45490.2 buff/cache MiB Swap: 15992.0 total, 15992.0 free, 0.0 used. 116670.2 avail Mem PID PPID USER PR NI S RES %MEM %CPU P TIME+ COMMAND 56011 56007 boinc 39 19 R 4.5g 3.6 98.9 2 131:48.65 /var/lib/boinc/slots/9/oifs_43r3_model.exe |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
Someone can have the retry for my first one of this batch: it got a "double free or corruption"...This bug is bloody annoying. We've got a development version in testing that makes yet more changes to the memory handling in the code, which may/may not fix it but it'll clean the code up anyway. We didn't have time to use it for this batch, there will probably be some test batches going out after this one. The biggest problem is I can't reproduce it in standalone testing and catching one of these on my machine so I can see exactly where it went wrong is difficult. We are getting there, the error rate has dropped substantially. It's a priority to solve it. --- CPDN Visiting Scientist |
Send message Joined: 5 Aug 04 Posts: 178 Credit: 18,767,744 RAC: 43,954 |
:-( My fastest and best working cruncher had got lastly the definitiv dead tasks, that all errored out and now it has a daily Quota of 1. :-( Supporting BOINC, a great concept ! |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
:-(I had that. I created a new boinc client on the same machine, different port, and attached to cpdn. Works. --- CPDN Visiting Scientist |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
My fastest and best working cruncher had got lastly the definitiv dead tasks, that all errored out and now it has a daily Quota of 1. :-( I am not going to do that. In the last 5 1/2 hours I have completed 38% of the work. So I will just wait it out. For one thing, It will be another day and if there are any tasks left, I would get one. And if this one completes before that, they may raise my limit to 4 or 5 (I forget just what they do, but it is something like that). |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,022,240 RAC: 20,762 |
This task failed with a divide by zero error. Presumably this is one of those cases where the physics of the model get out of control? |
©2024 cpdn.org