Quantcast

Running repasthpc on slurm

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Running repasthpc on slurm

Sophie Liu

Dear John,

 

Based on the zombie model, I extended the patch class and added other agent types to build my model. The code works fine on PC with four processors now. I managed to get access to high performance computing clusters in our school. Since it is not likely to install software in the cluster, I compiled the codes with static libraries. Running in the slurm environment, the error massage is as follows:

“mpirun noticed that process rank 0 with PID 41810 on node ec01b01 exited on signal 11 (Segmentation fault).”

 

At this moment, I’m not sure how to debug this problem as the code runs smoothly on my PC (also linking static libs and on Ubuntu with only mpi installed). Some of the printed results show the problem may be the management of processors.

 

I tried to print how many processors and the current processor number in runZombies(std::string propsFile, int argc, char ** argv) in main.cpp with std::cout << " Starting... " <<world.rank()<<", "<<world.size()<<std::endl;

 

Results output from PC:

Starting... 0, 4

Starting... 1, 4

Starting... 2, 4

Starting... 3, 4

 

After compiling in my PC, running on slurm output:

Starting... 0, 1

Starting... 0, 1

Starting... 0, 1

Starting... 0, 1

 

The script used to run on slurm is as follows:

#!/bin/bash

#SBATCH --job-name=zombie_mpi

#SBATCH --mail-user=[hidden email]

#SBATCH --mail-type=ALL            

#SBATCH --ntasks=4

#SBATCH --cpus-per-task=1         

#SBATCH --time=00:30:00

#SBATCH --mem-per-cpu=2048

#SBATCH --partition=debug

cd static_run_debug

module load openmpi

mpirun -n 4 ./zombie_model1 zombie1_config.props zombie1_model.props

 

In my case, why is running on slurm seems to have four processes but they do not recognize each other? Is there document regarding how to run a repasthpc model on slurm? Since I’m new to use slurm, any suggestion is appreciated.

 

Regards,

Sophie


------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are
consuming the most bandwidth. Provides multi-vendor support for NetFlow,
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports.http://sdm.link/zohodev2dev
_______________________________________________
Repast-interest mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/repast-interest
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Running repasthpc on slurm

srcnick
I’m not sure it applies directly to your schools setup but our cluster that uses slurm doesn’t require the -n option to mpirun.


Its probably worth poking around your schools cluster docs to see if they provide any info on running MPI jobs via slurm in case there are some options that are missing. As far a cluster is concerned repast_hpc is pretty much like any MPI-based multiprocess application.

Nick

On Jul 27, 2016, at 3:36 AM, Sophie Liu <[hidden email]> wrote:

Dear John,
 
Based on the zombie model, I extended the patch class and added other agent types to build my model. The code works fine on PC with four processors now. I managed to get access to high performance computing clusters in our school. Since it is not likely to install software in the cluster, I compiled the codes with static libraries. Running in the slurm environment, the error massage is as follows:
“mpirun noticed that process rank 0 with PID 41810 on node ec01b01 exited on signal 11 (Segmentation fault).”
 
At this moment, I’m not sure how to debug this problem as the code runs smoothly on my PC (also linking static libs and on Ubuntu with only mpi installed). Some of the printed results show the problem may be the management of processors.
 
I tried to print how many processors and the current processor number in runZombies(std::string propsFile, int argc, char ** argv) in main.cpp with std::cout << " Starting... " <<world.rank()<<", "<<world.size()<<std::endl; 
 
Results output from PC:
Starting... 0, 4 
Starting... 1, 4
Starting... 2, 4 
Starting... 3, 4
 
After compiling in my PC, running on slurm output:
Starting... 0, 1
Starting... 0, 1 
Starting... 0, 1
Starting... 0, 1
 
The script used to run on slurm is as follows:
#!/bin/bash
#SBATCH --job-name=zombie_mpi
#SBATCH --[hidden email]
#SBATCH --mail-type=ALL             
#SBATCH --ntasks=4
#SBATCH --cpus-per-task=1          
#SBATCH --time=00:30:00
#SBATCH --mem-per-cpu=2048
#SBATCH --partition=debug
cd static_run_debug
module load openmpi
mpirun -n 4 ./zombie_model1 zombie1_config.props zombie1_model.props
 
In my case, why is running on slurm seems to have four processes but they do not recognize each other? Is there document regarding how to run a repasthpc model on slurm? Since I’m new to use slurm, any suggestion is appreciated.
 
Regards,
Sophie
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports.http://sdm.link/zohodev2dev_______________________________________________
Repast-interest mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/repast-interest


------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are
consuming the most bandwidth. Provides multi-vendor support for NetFlow,
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports.http://sdm.link/zohodev2dev
_______________________________________________
Repast-interest mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/repast-interest
Loading...