Threads on batch runs on remote computer

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Threads on batch runs on remote computer

Mike F Nelson
Hello,

I am running a bunch of individual simulations over many cores on multiple nodes, and I’ve noticed that it is taking longer for batch runs to complete than I would expect from the average run time of the individual simulations.  It seems that when a group of simulations, each with a particular combination of parameter values, is sent to the cores that the next group of simulations isn’t sent out until every one of the individual simulations in the first group complete.  

The individual running times for my simulations are really variable with most taking only a minute or two, but some taking up to about 15 minutes.  It seems like each group is taking roughly 15 minutes or so to complete even though most of the individual runs within the group only need a few minutes.

I’m wondering if there’s any way to make it so that when an individual simulation finishes on a core, another can begin immediately rather than waiting for every one of the simulations in a group to finish.  I guess this would somehow involve asynchronous submitting of individual runs.  

Is this a general limitation with running large jobs on multiple nodes, or is there a way that I can stage my runs more efficiently?

I’m not very experienced with how queueing systems work, so please excuse me if there is a simple answer that I have overlooked.

Thank you for any suggestions you may have!

P.S. I came across this exchange on the group from 2014 which seems to address a similar question:

Message: 1
Date: Tue, 06 May 2014 08:59:18 -0400
From: Nick Collier <nick.collier@...<mailto:nick.collier@...>>
Subject: Re: [Repast-interest] Questions re: Running Repast Batches on
Remote Computers
To: Alexander S. Mentis <asmentis@...<mailto:asmentis@...>>
Cc: repast-interest@...<mailto:repast-interest@...>
Message-ID: <639860BA-EAF4-4775-9F9E-BCEA216EC8C0@...<mailto:EAF4-4775-9F9E-BCEA216EC8C0@...>>
Content-Type: text/plain; charset=us-ascii

Alex,

We are certainly looking into load balancing and smarter parameter sweeping. Its an exciting area. I also have ideas about how to avoid the necessity of the local GUI running. This is essentially automating and launching Jonathan's clever workaround from the GUI and then having some remote monitoring process sent you an email when the runs are finished.

Nick

On May 5, 2014, at 9:31 AM, Alexander S. Mentis wrote:

> I understand. Thanks.
>
> On a related note, that makes it sound like all of the machines are given
> their parameter combination assignments at the outset of the batch run. In
> other words, if one instance gets "lucky" and gets all the parameter
> combinations that take a short amount of time to complete and a second
> instance gets all the long simulations, then that first instance will end up
> sitting idle, waiting for the second instance to finish. I guess I had
> assumed the thread running the batch monitoring was also dynamically
> assigning parameter configurations to each instance as it was ready for more
> work. Is there any possibility of seeing a workload balancing capability
> like this in the future? It seems that a feature like this could be somewhat
> related to a capability to stop and re-start the batch monitoring console.
>
> Alex
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Repast-interest mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/repast-interest
Reply | Threaded
Open this post in threaded view
|

Re: Threads on batch runs on remote computer

srcnick
Mike,

It sounds like you are running on a cluster. If so, we have some new tools that given a set of parameters can iterate over that set launching each run as soon as a previous run completes. There’s a tutorial at:


The first use case is a sweep over a repast simulation so hopefully that will help.

Nick

On Mar 1, 2017, at 7:41 PM, Mike Nelson <[hidden email]> wrote:

Hello,

I am running a bunch of individual simulations over many cores on multiple nodes, and I’ve noticed that it is taking longer for batch runs to complete than I would expect from the average run time of the individual simulations.  It seems that when a group of simulations, each with a particular combination of parameter values, is sent to the cores that the next group of simulations isn’t sent out until every one of the individual simulations in the first group complete.  

The individual running times for my simulations are really variable with most taking only a minute or two, but some taking up to about 15 minutes.  It seems like each group is taking roughly 15 minutes or so to complete even though most of the individual runs within the group only need a few minutes.

I’m wondering if there’s any way to make it so that when an individual simulation finishes on a core, another can begin immediately rather than waiting for every one of the simulations in a group to finish.  I guess this would somehow involve asynchronous submitting of individual runs.  

Is this a general limitation with running large jobs on multiple nodes, or is there a way that I can stage my runs more efficiently?

I’m not very experienced with how queueing systems work, so please excuse me if there is a simple answer that I have overlooked.

Thank you for any suggestions you may have!

P.S. I came across this exchange on the group from 2014 which seems to address a similar question:

Message: 1
Date: Tue, 06 May 2014 08:59:18 -0400
From: Nick Collier <nick.collier@...<mailto:nick.collier@...>>
Subject: Re: [Repast-interest] Questions re: Running Repast Batches on
Remote Computers
To: Alexander S. Mentis <asmentis@...<mailto:asmentis@...>>
Cc: repast-interest@...<mailto:repast-interest@...>
Message-ID: <639860BA-EAF4-4775-9F9E-BCEA216EC8C0@...<mailto:EAF4-4775-9F9E-BCEA216EC8C0@...>>
Content-Type: text/plain; charset=us-ascii

Alex,

We are certainly looking into load balancing and smarter parameter sweeping. Its an exciting area. I also have ideas about how to avoid the necessity of the local GUI running. This is essentially automating and launching Jonathan's clever workaround from the GUI and then having some remote monitoring process sent you an email when the runs are finished.

Nick

On May 5, 2014, at 9:31 AM, Alexander S. Mentis wrote:

I understand. Thanks.

On a related note, that makes it sound like all of the machines are given
their parameter combination assignments at the outset of the batch run. In
other words, if one instance gets "lucky" and gets all the parameter
combinations that take a short amount of time to complete and a second
instance gets all the long simulations, then that first instance will end up
sitting idle, waiting for the second instance to finish. I guess I had
assumed the thread running the batch monitoring was also dynamically
assigning parameter configurations to each instance as it was ready for more
work. Is there any possibility of seeing a workload balancing capability
like this in the future? It seems that a feature like this could be somewhat
related to a capability to stop and re-start the batch monitoring console.

Alex
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Repast-interest mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/repast-interest


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Repast-interest mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/repast-interest