Questions and Answers : Macintosh : Signal 10, zip error
Message board moderation
Author | Message |
---|---|
Send message Joined: 31 Aug 04 Posts: 2 Credit: 94,996 RAC: 0 |
Whenever a work unit comes close to completion, the work unit terminates with a client error. I have included the error text below. Does anybody know what this means? Thanks in advance! Whenever a work unit comes close to completion, the work unit terminates with a client error. I have included the error text below. Does anybody know what this means? Thanks in advance! <core_client_version>4.13</core_client_version> <message>process got signal 10 </message> <active_task_state>3</active_task_state> <signal>10</signal> <stderr_txt> zip warning: Too many open files zip warning: could not open for reading: 2bjeba.ph35c10.x2.nc zip warning: zip file empty zip I/O error: Too many open files zip error: Temporary file failure (ziMFXzBx) </stderr_txt> |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
I've seen this "too many files open" before when Mac people have had trouble, but, apart from the obvious, I don't know what it means or how to fix it. And I don't recall anyone offering help. You could try <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=93"> this</a> thread, and go to the MacNN team site. Les |
Send message Joined: 12 Feb 05 Posts: 3 Credit: 143,853 RAC: 0 |
I have the same issue. My G5 always completes the unit but on the last trickle I get that error. I get the credit for the time but I don't get a valid result. It is getting annoying. |
Send message Joined: 12 Feb 05 Posts: 3 Credit: 143,853 RAC: 0 |
I wonder if it is the way the client is trying to zip the file and submit it after it is completed. I have many iterations of this error but it looks to me as if whatever my boinc_4.19_ppcG5 client is trying to do in terms of zipping is not working. Is this 4.19 client looking for a specific zip app somewhere? Here is an example of my error message. It is like Karl's. 4.19 process got signal 10 3 10 zip warning: Too many open files zip warning: could not open for reading: 24g4ba.ph34c10.x2.nc zip warning: zip file empty zip I/O error: Too many open files zip error: Temporary file failure (zixaciuv) |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
I don't know what it's trying to do. And the new BOINC version apparently crashes right at the start of a run for Macs. Les |
Send message Joined: 12 Feb 05 Posts: 3 Credit: 143,853 RAC: 0 |
Yes mine does as well. I installed the 4.43 version and it downloads the packet but never starts it. I end up exceeding my number of units allows per day. I tried restting it mutilple days to no avail. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Rico, This problem will probably only be solved with a new version of hadsm. As Macs are only 2% of the computers running this project, and the programmer is up to his eyebrows with work, a cure may be months away. Les |
Send message Joined: 31 Jan 05 Posts: 5 Credit: 909,503 RAC: 296 |
> I wonder if it is the way the client is trying to zip the file and submit it > after it is completed. I have many iterations of this error but it looks to me > as if whatever my boinc_4.19_ppcG5 client is trying to do in terms of zipping > is not working. Is this 4.19 client looking for a specific zip app somewhere? > > Here is an example of my error message. It is like Karl's. > > 4.19 > process got signal 10 > > 3 > 10 > > zip warning: Too many open files > zip warning: could not open for reading: 24g4ba.ph34c10.x2.nc > zip warning: zip file empty > zip I/O error: Too many open files > > zip error: Temporary file failure (zixaciuv) > exactly the same thing for me. This is from the terminal window that I saved when I completed my first project, about a month ago. At the end of a long list of adding files, this is how it ends: adding: 1retba.ph32c10.x2.nc (deflated 8%) adding: 1retba.ph33c10.x2.nc (deflated 8%) adding: 1retba.ph34c10.x2.nc 2005-05-05 02:37:31 [climateprediction.net] Unrecoverable error for result 1ret_400103021_0 (process got signal 10) 2005-05-05 02:37:31 [climateprediction.net] Unrecoverable error for result 1ret_400103021_0 (process got signal 10) and the stderr out says: 4.19 process got signal 10 3 10 zip warning: Too many open files zip warning: could not open for reading: 1retba.ph34c10.x2.nc zip warning: zip file empty zip I/O error: Too many open files zip error: Temporary file failure (ziFfyJtV) The list of zipped files that are still sitting in the project's folder ends with 1retba.ph34c10.x2.nc.zip, but that file is empty (4 Kb). It is followed by 1retba.ph36c10.x2.nc (there is no .ph35c10). In the dataout folder, there is 1retaa.pc.8yac sitting on top, followed by what seems to be the next files in line for zipping: 1retba.ph37c10.x2.nc, 1retba.ph38c10.x2.nc etc. It is clear that this is an systematic error in the zip procedure on the Mac, because it's always around the same file (###.ph34c10.x2.nc or ###.ph35c10.x2.nc) that things stop working. Is the procedure opening, but not closing files as it zips along, hitting an upper limit to the number of open files? Going on the number of zip-files produced, this must be arount 240. It also surprises me that the zip procedure seemingly has deletedthe original files before it was even done with all the work. Or is that because the files that were zipped were open when the error occurred, and were lost? On a more practical side: are our results still usable, without the two files that are missing (###.ph34c10.x2.nc and ###.ph35c10.x2.nc, in my case)? Could we send them manually? If all Mac-users are experiencing this problem, than CP is losing 2% of its data, and lots of valuable CPU time. Personally, I don't care about getting credits, I care about supplying useful results to a useful project. If no solution is found for this problem, I could just as well stop participating, which I will now do as soon as my current project nears completion. Or at least make a copy of the original files in dataout before zipping starts. I'll also post a warning on the MacNN-forum, maybe some geek over there can help the programmer solve this. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Raw It certainly needs someone looking at it. There have been lots of posts from people with a problem, who turn out to have this zip problem. Not being a Mac person, I've no idea what is normal about number of zip files. I don't even know which is the problem: hadsm, or BOINC. Perhaps you should also post on the BOINC (SETI), site. Hope you get a cure. And please post back here if you do. Les |
Send message Joined: 18 Apr 05 Posts: 3 Credit: 7,245 RAC: 0 |
I have the exact same problem. Are anyone successfully finishing models on a Mac these days? Not sure how to check (browsed top hosts link and found none, but it was a quick look). How long has this problem been known? I fully understand that the overworked programmer can't prioritize the 2% apparently running Macs, but if it is not working, Climate Prediction should stop asking Mac users to contribute, or post a warning or something. Has anyone been in touch with them about this? Also, I have to say... if you ask a group to help you with your model and they volunteer their machines and then have major problems, you kind of have to set some time aside to help out (or you shouldn't have asked for help in the first place). Christian Hansson |
Send message Joined: 16 Mar 05 Posts: 2 Credit: 107,907 RAC: 0 |
> Raw > It certainly needs someone looking at it. There have been lots of posts from > people with a problem, who turn out to have this zip problem. > > Not being a Mac person, I've no idea what is normal about number of zip > files. > I don't even know which is the problem: hadsm, or BOINC. > > Perhaps you should also post on the BOINC (SETI), site. > Hope you get a cure. And please post back here if you do. > It works fine with Seti Boinc, so I suspect that the problem is with the hadsm. I use Macs, and would be interested in debugging this problem. Is there a link to the source code for this particular bit? In particular the section wrapping up the files prior to sending them back. I suspect that manually duplicating the same process might reveal the issue.... Cheers, Joel |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Lupus, The hadsm is propriety code belonging to the Met Office, so few people have access to it. It is probably something simple that could be fixed quickly if: There is/was a programmer familiar with the Mac, There is a modern Mac available for him to use, He can justify the time to look, keeping in mind the huge size of the program, and the possibliity that it ISN'T a simple matter. It's possible that something changed in the Mac hardware/software a few models back, which is no longer compatable with the hadsm code. Les |
Send message Joined: 5 Aug 04 Posts: 11 Credit: 63,408 RAC: 0 |
Hey there, you should not worry too much about this error. Last November Carl Christensen posted in the thread below that this problem isn't really relevant to this project. No work or credit gets lost. Have a look here: http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=1076 I hope that helps and explains why nobody seems to care to get this fixed. Bye! Christian |
Send message Joined: 31 Aug 04 Posts: 2 Credit: 94,996 RAC: 0 |
> Hey there, > > you should not worry too much about this error. Last November Carl Christensen > posted in the thread below that this problem isn't really relevant to this > project. No work or credit gets lost. Have a look here: > http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=1076 > Quoting from Carl's response in : > As long as the final trickle (phase 3, timestep 259248) went through, and > there are no "result#_[1-5].zip" files in your boinc/climateprediction.net > directory, it's probably OK. I don't really use the BOINC status for credits, > what "counts" are the trickles and the file uploads. Yup, my last trickle did go through, so that answers most of my question. There's just one thing: What do I do with the old files from the completed-yet-not-completed work unit? |
Send message Joined: 5 Aug 04 Posts: 11 Credit: 63,408 RAC: 0 |
> There's just one thing: What do I do with the old files from the > completed-yet-not-completed work unit? It is always a good idea to keep the file for as long as possible. I know from the pre-boinc time of this project that only a fraction of the results produced are actually send back at the end of the computations. It might be that the people from climateprediction.net consider you model as very interesting and like to take a closer look. Then it would be nice to have the files on your hd. I am not sure whether this is still true. :-/ Sorry! |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
> What do I do with the old files from the completed-yet-not-completed work unit? You can put them onto cds, then delete them from your computer if you need the room. Possibly safer there as well. It's what I've been doing. Any that haven't finished at least one phase are useless, if you have any like that. There was talk that the researchers may at some stage look through partially completed models to see why they failed, but they have so much else to do that it may never happen. Les |
Send message Joined: 31 Oct 04 Posts: 336 Credit: 3,316,482 RAC: 0 |
Maybe fixing this "open files" problem will fix nearly all Trickle24 problems on most systems. When I had them (reproduceable!) on a dual CPU win2k system, one of my assumptions about the reason was exactly this - but the PC froze and gave no related error message. I just could see that there have been 0x00 bytes in a bunch of results instead of data. In this case I'm even a bit happy about your errors as there is a "good" error message for this problem now, which (hopefully) will lead to a really important bugfix. This one should have one of the highest priorities after having solved all database and server problems. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
> This one should have one of the highest priorities after having solved all database and server problems. I agree with this. But how to push it? Have a look at <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/show_user.php?userid=20179"> this</a> users results. I think he's given up. |
Send message Joined: 31 Oct 04 Posts: 336 Credit: 3,316,482 RAC: 0 |
> > This one should have one of the highest priorities after having solved > all database and server problems. > > I agree with this. But how to push it? > ... If this stuff is modular (and I bet it is), why not publish the source of those not science-related parts so everyone can help? I have sent in a few source changes for the BOINC client and PHP and they are working even without having compiled and tested them on my PC. So having just parts of the project sources in public might help here too. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
It's fortran, which uses subroutines, so it IS modular. But it's a matter of the legal agreement between the Met Office and Oxford Uni. It might work if a few people agree to secrecy if they work on it. |
©2024 cpdn.org