Message boards : Number crunching : Project communication failed
Message board moderation
Author | Message |
---|---|
Send message Joined: 21 Sep 15 Posts: 8 Credit: 4,854,775 RAC: 0 |
I've been trying to upload for 2 days???? 9/2/2016 9:17:32 AM | | Project communication failed: attempting access to reference site 9/2/2016 9:17:34 AM | | Internet access OK - project servers may be temporarily down. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Models go to servers all over the world: England, North America, Mexico, and Australia. So you need to be more specific about which model type is having problems. But if I have to guess, it may be Mexico. In which case there's a sticky post near the top of the Number crunching section, called Uploading Mexico models. |
Send message Joined: 21 Sep 15 Posts: 8 Credit: 4,854,775 RAC: 0 |
I don't remember what the project was. I unfortunately had to abort after a week to clear the queue. |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
Indeed. 22Sep was a bad day for that machine -- nine crashes. (No MEX tasks among them.) However, one item in "stderr" suggests a likely problem: All nine have pages of: <stderr_txt> This indicates that your settings don't include a tick in the box to leave CPDN in memory when suspended (the machine has 16Meg. for 8 CPU threads). Sooner or later, all that swapping tends to bite CPDN because of the large number of incoming / outgoing files. We have long recommended leaving CPDN in memory when suspended. "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 30 Aug 06 Posts: 27 Credit: 1,878,749 RAC: 1,231 |
I was wonder about those "Suspended CPDN Monitor - Suspend request from BOINC..." messages. Even with successful completions I get pages and pages of them and I do have LAIM checked. Why would are they happening? |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
Sorry, it's beyond my understanding. I was unaware of that problem with the box ticked. Anything I add would be pure conjecture. Hopefully, someone 'out there' knows that part of boinc code. "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
There's a preference to suspend when CPU usage is above a certain percentage, or perhaps "while computer is in use" depending on boinc version. These can cause a lot of suspends as well. |
Send message Joined: 29 Jun 15 Posts: 1 Credit: 99,611 RAC: 0 |
Hi all I started 2 SAS50 jobs about 5.5 days ago. Several zip files have been sent back to the project servers with no obvious problem. However, 3 jobs have been stalled from the beginning of the work, each retry stopping at the same percent progress (in one case, at a tantalising 98.14%). The client has made dozens of attempts at uploading these zips, but has backed off for between approximately 1 to 4 hours. The log file has many entries reporting transient http errors and project communications failures. Most zips get through with no error. The stalled zips suffer these errors several times every day. I have BOINC client 7.6.31 (x64) running on Mint Linux with kernel version 4.4.0-21-generic. Both of the SAS50 jobs are coming up for completion in less than 24 hours. I would like to know whether there's anything I can do to break the zip logjam, or whether there is a problem at the server end that needs to be fixed. Thanks |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Possibly a restart: Suspend BOINC. Wait a few seconds to allow the models to stop. Exit BOINC. Re-start BOINC. Unsuspend BOINC. The models should restart from the previous check point. And the uploads "may" start uploading. Models for different areas go to different servers, usually somewhere in the area being modelled. So it's always a good idea to give a link to the models in question, especially when there's lots of computers and lots of running models. I think that the SAMs go to South America, and the SAS to Africa, so different servers. |
©2024 cpdn.org