The following error in a jobid.error file:
jmesterh@cyblue:~/cpi> cat 92.error
<Feb 13 14:28:25.561417> BRIDGE (ERROR): pm_create_partition() - Partition state can be set to
"ALLOCATED" only if partition is "FREE"
<Feb 13 14:28:25.561515> BRIDGE (ERROR): pm_create_partition() - A sequence error occurred
<Feb 13 14:28:25.561551> BE_MPI (ERROR): Error booting partition - INCOMPATIBLE_STATE
<Feb 13 14:28:25.926125> FE_MPI (ERROR): Failure list:
<Feb 13 14:28:25.926318> FE_MPI (ERROR): - 1. Failed to boot the partition (failure #35)
This error occurs when the scheduler attempts to run a job on a partition that has been allocated to another user. This is a common bug in the scheduler, and is being investigated by the Cobalt developers.
The solution is to submit the job again, and if you are submitting multiple jobs at once, wait at least one minute between submitting each job. This helps avoid the race condition causing the error.
The following error in a jobid.error file:
<Feb 13 13:19:25.800818> BE_MPI (ERROR): Job execution failed
<Feb 13 13:19:25.800902> BE_MPI (ERROR): Job 144 is in state ERROR ('E')
<Feb 13 13:19:26.005825> BE_MPI (ERROR): The error message in the job record is as follows:
<Feb 13 13:19:26.005857> BE_MPI (ERROR): "Load failed on 172.16.1.55: Error memory mapping
executable file: No such device"
<Feb 13 13:19:26.147276> FE_MPI (ERROR): Job execution failed (error code - 50)
<Feb 13 13:19:26.310313> BE_MPI (ERROR): Job 144 is in state ERROR ('E')
<Feb 13 13:19:26.310343> BE_MPI (ERROR): The job will be moved to history table after partition
deallocation
<Feb 13 13:19:26.311686> BE_MPI (ERROR): The error message in the job record is as follows:
<Feb 13 13:19:26.311715> BE_MPI (ERROR): "Load failed on 172.16.1.55: Error memory mapping
executable file: No such device"
<Feb 13 13:19:37.343037> FE_MPI (ERROR): Failure list:
<Feb 13 13:19:37.343083> FE_MPI (ERROR): - 1. Job execution failed - job switched to an
error state (failure #50)
This error means that your application can not be found. Make sure you are in the directory containing your application when you submit the job, or you have specified the correct working directory using the '-C' switch to cqsub.