20

I need to get an ideal number of threads in a batch program, which runs in batch framework supporting parallel mode, like parallel step in Spring Batch.

As far as I know, it is not good that there are too many threads to execute steps of a program, it may has negative effect to the performance of the program. Some factors could arise performance degradation(context switching, race condition when using shared resources(locking, sync..) ... (are there any other factors?)).

Of course the best way of getting the ideal number of threads is for me to have actual program tests adjusting the number of threads of the program. But in my situation, it is not that easy to have the actual test because many things are needed for the tests(persons, test scheduling, test data, etc..), which are too difficult for me to prepare now. So, before getting the actual tests, I want to know the way of getting a guessable ideal number of threads of my program, as best as I can. What should I consider to get the ideal number of threads(steps) of my program?? number of CPU cores?? number of processes on a machine on which my program would run?? number of database connection?? Is there a rational way such as a formula in a situation like this?

3
  • 2
    Usually Runtime.getRuntime().availableProcessors(); will suffice.
    – Mordechai
    Feb 3, 2017 at 2:30
  • 5
    The most important consideration is whether your application/calculation is CPU-bound or IO-bound. If it's IO-bound (a single thread is spending most of its time waiting for external esources such as database connections, file systems, or other external sources of data) then you can assign (many) more threads than the number of available processors - of course how many depends also on how well the external resource scales though - local file systems, not that much probably. If it's (mostly) CPU bound, then slightly over the number of available processors is probably best. Feb 3, 2017 at 2:35
  • @Erwin I would upvote this. Post it as an answer.
    – Mordechai
    Feb 3, 2017 at 2:39

3 Answers 3

35

The most important consideration is whether your application/calculation is CPU-bound or IO-bound.

  • If it's IO-bound (a single thread is spending most of its time waiting for external esources such as database connections, file systems, or other external sources of data) then you can assign (many) more threads than the number of available processors - of course how many depends also on how well the external resource scales though - local file systems, not that much probably.
  • If it's (mostly) CPU bound, then slightly over the number of available processors is probably best.
2
  • Thank you for the good answer. But I got one more question about it. Setting the number of available processors to the number of threads of program is probably the best way when it's CPU bound, as what you said. Is this still valid if many of other programs running on the same machine on which I would run my programs?? I'm saying about how CPU works. I think CPU is already busy without running my program.
    – ParkCheolu
    Feb 3, 2017 at 8:14
  • 4
    @thatsyou: don’t waste time thinking about things that are outside of you control anyway. The point is, having less threads than cores implies that you can never utilize all cores. Having more threads than (available) cores, just implies that the threads have to share the cores. Unless we’re talking about hundreds or thousands of threads per core, the scheduling overhead is negligible.
    – Holger
    Feb 3, 2017 at 17:00
8

General Equation:

Number of Threads <= (Number of cores) / (1 - blocking factor)

Where 0 <= blocking factor < 1

Number of Core of a machine : Runtime.getRuntime().availableProcessors()

Number of Thread you can parallelism, you will get by printing out this code :

ForkJoinPool.commonPool()

And the number parallelism is Number of Core of your machine - 1. Because that one is for main thread.

Source link

Time : 1:09:00

1
  • 4
    What is blocking factor here or how do you calculate it? Oct 22, 2019 at 5:16
3

What should I consider to get the ideal number of threads(steps) of my program?? number of CPU cores?? number of processes on a machine on which my program would run?? number of database connection?? Is there a rational way such as a formula in a situation like this?

This is tremendously difficult to do without a lot of knowledge over the actual code that you are threading. As @Erwin mentions, IO versus CPU-bound operations are the key bits of knowledge that are needed before you can determine even if threading an application will result is any improvements. Even if you did manage to find the sweet spot for your particular hardware, you might boot on another server (or a different instance of a virtual cloud node) and see radically different performance numbers.

One thing to consider is to change the number of threads at runtime. The ThreadPoolExecutor.setCorePoolSize(...) is designed to be called after the thread-pool is in operation. You could expose some JMX hooks to do this for you manually.

You could also allow your application to monitor the application or system CPU usage at runtime and tweak the values based on that feedback. You could also keep AtomicLong throughput counters and dial the threads up and down at runtime trying to maximize the throughput. Getting that right might be tricky however.

I typically try to:

  • make a best guess at a thread number
  • instrument your application so you can determine the effects of different numbers of threads
  • allow it to be tweaked at runtime via JMX so I can see the affects
  • make sure the number of threads is configurable (via system property maybe) so you don't have to rerelease to try different thread numbers

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.