Message boards : Number crunching : New work Discussion
Message board moderation
Previous · 1 . . . 62 · 63 · 64 · 65 · 66 · 67 · 68 . . . 91 · Next
Author | Message |
---|---|
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,024,725 RAC: 20,592 |
I found my laptops had switched on their turbo-props. The Taskmanager going haywire. The temperatures in cloud nine. Checked the Task Manager, I had fifteen WU's running on a twelve thread machine. Switched of Virtual Box, still twelve tasks? Opened up Boinc Manager and I found twenty-three Windows WU"s. Managed to mark project "No further tasks" just in time. Well, I suspended the Windows Tasks because the task in my VB is at 92% and has already errored out several times. Almost certainly not the only one. Lots of windows machines have been waiting for work for a long time. This afternoon, 6,360 tasks were released. (They have now all gone.) The large number of tasks you downloaded is due to your BOINC settings. It is a problem mostly due to the nature of this project currently having only sporadic work for Windows. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
I know there no OpenIFS work units for us to process right now, but once they become available, will I automatically get some, or will I need to find out they are available and ask for them? I think I have enough RAM and processing power to run at least one of them at a time. I have both the standard 64-bit libraries and enough of the 32-bit comparability libraries to run my other ClimatePrediction work. CPU type GenuineIntel Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz [Family 6 Model 85 Stepping 7] Number of processors 16 Operating System Linux Red Hat Enterprise Linux Red Hat Enterprise Linux 8.4 (Ootpa) [4.18.0-305.7.1.el8_4.x86_64|libc 2.28 (GNU libc)] BOINC version 7.16.11 Memory 62.4 GB Cache 16896 KB Swap space 15.62 GB Total disk space 117.21 GB Free Disk Space 93.45 GB Measured floating point speed 6.58 billion ops/sec Measured integer speed 31.66 billion ops/sec |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,024,725 RAC: 20,592 |
I know there no OpenIFS work units for us to process right now, but once they become available, will I automatically get some, or will I need to find out they are available and ask for them? I think I have enough RAM and processing power to run at least one of them at a time. I have both the standard 64-bit libraries and enough of the 32-bit comparability libraries to run my other ClimatePrediction work. With 64GB of RAM before some is nicked for video etc, you would have no problems running a few of them at once. I think from memory the testing ones wouldn't get sent out to a machine with less than 5 or 6GB or RAM. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
Right now, I do not seem to be using any RAM to speak of. I would run more CPDN once they start downloading again. But will I need to do anything once those new tasks become available, or will they just start coming? top - 16:55:34 up 3 days, 8:48, 1 user, load average: 8.68, 8.52, 8.49 Tasks: 448 total, 9 running, 439 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.2 us, 0.1 sy, 49.5 ni, 50.1 id, 0.0 wa, 0.1 hi, 0.0 si, 0.0 st MiB Mem : 63902.3 total, 4001.0 free, 6087.4 used, 53813.9 buff/cache MiB Swap: 15992.0 total, 15972.5 free, 19.5 used. 57006.1 avail Mem PID PPID USER PR NI S RES SHR %MEM %CPU P TIME+ COMMAND 140656 140645 boinc 39 19 R 1.3g 19764 2.1 99.6 3 1318:44 /var/lib/boinc/projects/cli+ 334619 140341 boinc 39 19 R 946728 87800 1.4 99.8 2 184:49.64 ../../projects/boinc.bakerl+ 327310 140341 boinc 39 19 R 567364 76884 0.9 94.6 1 269:07.99 ../../projects/boinc.bakerl+ 346508 140341 boinc 39 19 R 350140 70540 0.5 99.5 4 38:59.93 ../../projects/boinc.bakerl+ 347621 140341 boinc 39 19 R 321620 55712 0.5 99.5 6 23:06.11 ../../projects/www.worldcom+ 343562 140341 boinc 39 19 R 153360 2676 0.2 99.7 5 73:17.28 ../../projects/www.worldcom+ 348671 140341 boinc 39 19 R 141852 49808 0.2 99.6 15 7:49.47 ../../projects/www.worldcom+ 349089 140341 boinc 39 19 R 101544 2668 0.2 99.6 0 6:25.71 ../../projects/www.worldcom+ 140341 1 boinc 30 10 S 34452 17716 0.1 0.0 14 21227:08 /usr/bin/boinc 140645 140341 boinc 39 19 S 18176 16892 0.0 0.0 13 1:50.86 ../../projects/climatepredi+ |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
We have no information about that. It depends on how the researchers / project people decide to do things. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,024,725 RAC: 20,592 |
Just had two testing tasks for windows which may herald a new batch but don't hold your breath, they were going to take 32 days and they both crashed. I am currently waiting for someone else on testing site to demonstrate either that it is a problem with my BOINC running under WINE in VB or that it is a problem with the tasks. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,024,725 RAC: 20,592 |
Issue causing first batch to crash in 2 minutes (at point where it switches from global to regional model on first model day) fixed. Another tester is estimating about 20 days. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,024,725 RAC: 20,592 |
First of my #920s crashed at the end with file size limit exceeded. I am going to fiddle and increase the limit so second will succeed. I have alerted the project. Edit: I have edited <max_nbytes>150000000.000000</max_nbytes> to <max_nbytes>600000000.000000</max_nbytes> for 4.zip on my second task. I have also turned off suspended internet access so when it gets there I can check the file size to see if the first was a one off. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,708,278 RAC: 9,361 |
I have four 920s 'in flight' - the first couple just passed 70%. I can check the allowances now, and increase any that look low. Are you sure that it was the _4 zip that went over? Usually, all the zips are about the same size - but only the ones still active when the task finishes trip the size check. Perhaps I can trap the _3 zip at 75% and take a look. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,024,725 RAC: 20,592 |
Are you sure that it was the _4 zip that went over? Usually, all the zips are about the same size - but only the ones still active when the task finishes trip the size check. Perhaps I can trap the _3 zip at 75% and take a look. Mon 25 Oct 2021 22:30:46 BST | climateprediction.net | Output file hadam4h_h02w_200802_4_920_012115322_0_r75796790_4.zip for task hadam4h_h02w_200802_4_920_012115322_0 exceeds size limit. Seems indicative. I see that four successes are now showing for this batch so mine may have been an outlier. I had wondered if it was why no successes were showing up but I guess I was early enough in getting some that no one else was fast enough to finish before mine failed. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,708,278 RAC: 9,361 |
OK, I see the lie of the land - six output files, four zips, an out, and a restart. All given a limit of 150 MB (decimal), 143 MB (binary). Hopefully this afternoon... |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,024,725 RAC: 20,592 |
OK, I see the lie of the land - six output files, four zips, an out, and a restart. All given a limit of 150 MB (decimal), 143 MB (binary). Hopefully this afternoon... I think I will do a search and replace on the limits, even though most seem OK. |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
The four finished ones were mine. Looking back at the message log, I didn't see any error messages during upload or completion. Maybe I got lucky? |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,708,278 RAC: 9,361 |
OK, here they come - and we seem to be in that horrible corridor of uncertainty. hadam4h_h15a_201102_4_920_012116704_0_r395361359_3.zip first: BOINC Manager (transfers tab) says that it is 147.75 MB, which at first sight would be OK. Linux says that it's 154.9 MB, and the file size property is said to be 154,924,888 bytes. That's not OK - if BOINC was checking these intermediate file sizes, that would be rejected. hadam4h_h1i6_201202_4_920_012117168_0_r150029775_3.zip is a little smaller - 147.12 MB (BOINC), 154.3 MB (Linux file manager), 154,271,614 bytes (file size property). So the project needs to be careful in internal communications: is a megabyte 1,000,000 bytes (as hard disk manufacturers would have you believe), or 1,048,576 bytes (1,024 x 1,024 bytes), as RAM manufacturers would have you believe? I'm on a fast line in the UK, so it took just over three minutes to upload both these files, and a good few others from other projects - I suspect it would have been even quicker to upload a single file on its own. Last time we had this conversation, didn't we conclude that if you could sneak the last .zip through before the task finished, it would be OK - but on Dave's bored band, or on a congested cable from overseas, the 'end of task' check might cut it off before it had finished? |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
I'm on a fast line in the UK, so it took just over three minutes to upload both these files, and a good few others from other projects - I suspect it would have been even quicker to upload a single file on its own. Last time we had this conversation, didn't we conclude that if you could sneak the last .zip through before the task finished, it would be OK - but on Dave's bored band, or on a congested cable from overseas, the 'end of task' check might cut it off before it had finished? I believe that is correct. We were concerned about slow uploads, or people who suspended boinc comms until the end with several large uploads queued, or problems if an upload server was down for quite awhile. |
Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318 |
I have a few 920s running. Should I abort them and lose a week's work now or let them fail at the end and lose a month's work? What is this catch and set a new limit you guys are talking about? Is that something we civilians can do? |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
I'm on a fast line in the UK, so it took just over three minutes to upload both these files, and a good few others from other projects - I suspect it would have been even quicker to upload a single file on its own. Last time we had this conversation, didn't we conclude that if you could sneak the last .zip through before the task finished, it would be OK - but on Dave's bored band, or on a congested cable from overseas, the 'end of task' check might cut it off before it had finished? I am in a fast Internet connection here in USA and my most recent uploads seem to take 20 seconds or a little more. Mon 25 Oct 2021 02:20:15 PM EDT | climateprediction.net | Started upload of hadam4h_h1bc_200602_4_920_012116922_0_r1467636988_3.zip Mon 25 Oct 2021 02:20:35 PM EDT | climateprediction.net | Finished upload of hadam4h_h1bc_200602_4_920_012116922_0_r1467636988_3.zip Mon 25 Oct 2021 04:06:11 PM EDT | climateprediction.net | Started upload of hadam4h_h0h8_201002_4_920_012115838_0_r905931088_3.zip Mon 25 Oct 2021 04:06:30 PM EDT | climateprediction.net | Finished upload of hadam4h_h0h8_201002_4_920_012115838_0_r905931088_3.zip Mon 25 Oct 2021 04:54:12 PM EDT | climateprediction.net | Started upload of hadam4h_h14m_201002_4_920_012116680_0_r1942181916_3.zip Mon 25 Oct 2021 04:54:37 PM EDT | climateprediction.net | Finished upload of hadam4h_h14m_201002_4_920_012116680_0_r1942181916_3.zip Mon 25 Oct 2021 07:13:33 PM EDT | climateprediction.net | Started upload of hadam4h_h0c7_200602_4_920_012115657_0_r77250837_3.zip Mon 25 Oct 2021 07:13:53 PM EDT | climateprediction.net | Finished upload of hadam4h_h0c7_200602_4_920_012115657_0_r77250837_3.zip I do not seem to be using a lot of available RAM. $ df Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdb3 122908728 21114208 95528048 19% /var/lib/boinc |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Arum The trick with these is to stagger the completion times. Suspend all but one, give it an hours head start, Resume one and wait another hour, and so on. That way all of the files won't get bunched up waiting for a turn to upload. And make sure that nothing else wants to use your net connection at an upload time. |
Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318 |
The trick with these is to stagger the completion times.I already decided that I'm only going to run one CP WU per computer. So I've already got that covered. And make sure that nothing else wants to use your net connection at an upload time.Now I'm confused. I thought the error under discussion is: Output file hadam4h_h02w_200802_4_920_012115322_0_r75796790_4.zip for task hadam4h_h02w_200802_4_920_012115322_0 exceeds size limit.Now instead of exceeding a file size you're talking about how many files are being uploaded at the same time. I'm now running 3,201 WUs of various projects so that will be next to impossible. One of these commands in ones cc_config file may be useful: <max_file_xfers>32</max_file_xfers> <max_file_xfers_per_project>32</max_file_xfers_per_project> |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
When a task finishes, it produces a large zip file, an "out" file, and a "restart" file. (Which contains the data needed to start the next task in the series, if the researcher is going to continue with that task.) All of which add up to just enough more data than the plain zips along the way, and this can tip things over the limit. But these are created at a slight time interval, which should be long enough for the zip, created first, to get out of the way before the others show up. ******************** And one wu per computer can still mean that they all finish at the same time. |
©2024 cpdn.org