Message boards : Number crunching : Is there a problem with HADAM3P Jobs on some AMD Processors?
Message board moderation
Author | Message |
---|---|
Send message Joined: 3 Sep 09 Posts: 5 Credit: 509,410 RAC: 0 |
Two of my WinXP machines running AMD 64 3500+ processors are generating lots of errors with these jobs: 10 of the last 12 jobs ended in error on one (mostly with 0 credits) and 4 out of the last 6 on the other (all with 0 credit). One machine using a newer AMD 64 processor with Windows Vista and one using an Intel processor with WinXP aren't having problems. Are there known problems associated with older AMD 64 processors? Or have I just hit a stream of bad luck? (Lots of CPU cycles wasted in any event...) |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
The problem is more likely to be that those 2 computers, (and your Intels), are light on memory. You really need a minimum of 2 gigs to ensure that all of the processes have enough to co-exist without having to swap out to the hd all of the time. And, judging by the number of BOINC suspends, you have them set for the default of Suspend work if CPU usage is above, so that the models are constantly being stopped and started. You can get away with this for a while, but sooner or later, this stopping/starting is going to coincide with a critical point in the program and it's going to pack it in. They are supercomputer programs after all, and not designed for this behaviour. Backups: Here |
Send message Joined: 3 Sep 09 Posts: 5 Credit: 509,410 RAC: 0 |
The problem is more likely to be that those 2 computers, (and your Intels), are light on memory. You really need a minimum of 2 gigs to ensure that all of the processes have enough to co-exist without having to swap out to the hd all of the time. The two machines in question have single processor/single core CPUs and 1GB of memory. They are dedicated BOINC machines and have no other processes running except when I login to see what is going on. The task manager always indicates at least 200kB of free memory -- usually more. And, judging by the number of BOINC suspends, you have them set for the default of Suspend work if CPU usage is above, so that the models are constantly being stopped and started. I don't understand where you are getting that information -- both machines have the "suspend if work is above..." set to 0 (which means no restriction, as I understand it...). The machine info on this web site indicates that one is running BOINC 99.8046% of the time and the other 99.7719%. You can get away with this for a while, but sooner or later, this stopping/starting is going to coincide with a critical point in the program and it's going to pack it in. They are supercomputer programs after all, and not designed for this behaviour. Maybe so, but this stuff is being farmed out to volunteers running PCs, NOT supercomputers... |
Send message Joined: 3 Sep 09 Posts: 5 Credit: 509,410 RAC: 0 |
You really need a minimum of 2 gigs to ensure that all of the processes have enough to co-exist without having to swap out to the hd all of the time. If that's true, then the Technical Requirements page should be updated - it indicates that the minimum memory for WinXP systems is 256MB (!) -- and does not even mentioned Windows Vista or Windows 7! |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
The two machines in question have single processor/single core CPUs and 1GB of memory.That's not what is shown here They are dedicated BOINC machines and have no other processes runningExcept OS related, antivirus, etc. And when you look at what the models are doing, this will use ram. Which will suddenly decrease what's available for the modelling. Do they have a separate graphics card with it's own display memory, or is it an on-board chip, using some of the main memory? I don't understand where you are getting that informationI'm getting it from the error messages of your models. Here for instance. Click on the + alongside stderr. Maybe so, but this stuff is being farmed out to volunteers running PCs, NOT supercomputers...And most people have no problems in running the models on pcs. :) But what ever is wrong, it's NOT an AMD 'thing'. The Technical Requirements page was written way back when the only models available were the original (slab) models. I'll add 'an update needed' to the work load of the project people. Backups: Here |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
You say that no other processes are running on those PCs while cpdn is. Do they have anti-spyware/anti-virus checkers running? If so, what type of such software may be running. A bunch of the failed results have error messages related to a file being open while being written to. |
Send message Joined: 3 Sep 09 Posts: 5 Credit: 509,410 RAC: 0 |
Maybe so, but this stuff is being farmed out to volunteers running PCs, NOT supercomputers...And most people have no problems in running the models on pcs. :) And I am only having problems on two of the six machines (under two different user IDs, in case you check and tell me that I am wrong...) attached to this project. So, if they keep erroring-out, I will simply detatch them and let them work on other projects that they can handle OK. Case closed. |
©2024 cpdn.org