39

I'm trying to find the faster way to do batch insert.

I tried to insert several batches with jdbcTemplate.update(String sql), where sql was builded by StringBuilder and looks like:

INSERT INTO TABLE(x, y, i) VALUES(1,2,3), (1,2,3), ... , (1,2,3)

Batch size was exactly 1000. I inserted nearly 100 batches. I checked the time using StopWatch and found out insert time:

min[38ms], avg[50ms], max[190ms] per batch

I was glad but I wanted to make my code better.

After that, I tried to use jdbcTemplate.batchUpdate in way like:

    jdbcTemplate.batchUpdate(sql, new BatchPreparedStatementSetter() {
        @Override
        public void setValues(PreparedStatement ps, int i) throws SQLException {
                       // ...
        }
        @Override
        public int getBatchSize() {
            return 1000;
        }
    });

where sql was look like

INSERT INTO TABLE(x, y, i) VALUES(1,2,3);

and I was disappointed! jdbcTemplate executed every single insert of 1000 lines batch in separated way. I loked at mysql_log and found there a thousand inserts. I checked the time using StopWatch and found out insert time:

min[900ms], avg[1100ms], max[2000ms] per Batch

So, can anybody explain to me, why jdbcTemplate doing separated inserts in this method? Why method's name is batchUpdate? Or may be I am using this method in wrong way?

6
  • 3
    I ran into a similar problem trying to use spring batch. I found that doing the inserts manually using a JDBC connection (PreparedStatement.addBatch() and calling executeBatch() and committing every thousand records) was an order of magnitude faster than using Spring. Never really figured out why though I suspected it had to do with how commits were being applied. I was using various batch sizes from 100 to 10000 in Spring.
    – reblace
    Dec 3, 2013 at 20:58
  • Might be related to the flush value of the underlying connection?
    – Taylor
    Dec 3, 2013 at 21:14
  • 4
    Are you using rewriteBatchedStatements=true in the JDBC connection string? The Spring documentation states that Will fall back to separate updates on a single PreparedStatement if the JDBC driver does not support batch updates. Apr 8, 2015 at 19:03
  • 7
    "&rewriteBatchedStatements=true"; worked for me and approved performance.
    – Jeryl Cook
    May 7, 2015 at 16:52
  • 2
    This could be because auto-commit being true on the connection. If spring transactions are leveraged or datasource has auto-commit off, this behaviour should not happen.
    – Krishna
    May 25, 2015 at 10:20

9 Answers 9

27

These parameters in the JDBC connection URL can make a big difference in the speed of batched statements --- in my experience, they speed things up:

?useServerPrepStmts=false&rewriteBatchedStatements=true

See: JDBC batch insert performance

7
  • This should be accepted answer. For me, it improved performance 10 fold.
    – Garbage
    Jan 23, 2017 at 16:40
  • @Community I was trying ti use this for DB2, but getting Connection Reset exception. if I remove this from URL everything is working fine. Can you please tell me why I am getting this and how to fix it ?
    – newcoder
    Jan 3, 2018 at 15:58
  • so what about PostgreSQL ? Jun 28, 2018 at 10:31
  • For Postgres I found that the equivalent is to set prepareThreshold=0. But I'm not sure about the fact that it hurt performances in some situation...
    – Philippe
    Jul 17, 2018 at 15:43
  • and for Oracle?
    – Saurabh
    Mar 14, 2019 at 19:35
26

I found a major improvement setting the argTypes array in the call.

In my case, with Spring 4.1.4 and Oracle 12c, for insertion of 5000 rows with 35 fields:

jdbcTemplate.batchUpdate(insert, parameters); // Take 7 seconds

jdbcTemplate.batchUpdate(insert, parameters, argTypes); // Take 0.08 seconds!!!

The argTypes param is an int array where you set each field in this way:

int[] argTypes = new int[35];
argTypes[0] = Types.VARCHAR;
argTypes[1] = Types.VARCHAR;
argTypes[2] = Types.VARCHAR;
argTypes[3] = Types.DECIMAL;
argTypes[4] = Types.TIMESTAMP;
.....

I debugged org\springframework\jdbc\core\JdbcTemplate.java and found that most of the time was consumed trying to know the nature of each field, and this was made for each record.

Hope this helps !

0
19

I have also faced the same issue with Spring JDBC template. Probably with Spring Batch the statement was executed and committed on every insert or on chunks, that slowed things down.

I have replaced the jdbcTemplate.batchUpdate() code with original JDBC batch insertion code and found the Major performance improvement.

DataSource ds = jdbcTemplate.getDataSource();
Connection connection = ds.getConnection();
connection.setAutoCommit(false);
String sql = "insert into employee (name, city, phone) values (?, ?, ?)";
PreparedStatement ps = connection.prepareStatement(sql);
final int batchSize = 1000;
int count = 0;

for (Employee employee: employees) {

    ps.setString(1, employee.getName());
    ps.setString(2, employee.getCity());
    ps.setString(3, employee.getPhone());
    ps.addBatch();

    ++count;

    if(count % batchSize == 0 || count == employees.size()) {
        ps.executeBatch();
        ps.clearBatch(); 
    }
}

connection.commit();
ps.close();

Check this link as well JDBC batch insert performance

5
  • In my case, reduced the time in half.
    – myuce
    Jan 11, 2018 at 11:03
  • Big performance boost for me as well (10x). For Oracle users, this seems to be the only option. @Transactional did not make any difference.
    – Saurabh
    Mar 14, 2019 at 19:34
  • @Saurabh which version of oracle db/driver did you encounter the slowdown issue?
    – yolob 21
    Apr 21, 2019 at 18:07
  • @yolob21 - Oracle 11g
    – Saurabh
    Apr 22, 2019 at 19:05
  • The other topic of using a single commit for large number of uncommitted batches needs to be explored too i.e. If you happen to have 100k items & even if you keep executing - ps.executeBatch() at regular intervals ( say for 1000 items ) but commit happens in the end for all the held up statements in one go, that might still crash the app.
    – Sabir Khan
    Jul 2, 2019 at 9:42
11

Simply use transaction. Add @Transactional on method.

Be sure to declare the correct TX manager if using several datasources @Transactional("dsTxManager"). I have a case where inserting 60000 records. It takes about 15s. No other tweak:

@Transactional("myDataSourceTxManager")
public void save(...) {
...
    jdbcTemplate.batchUpdate(query, new BatchPreparedStatementSetter() {

            @Override
            public void setValues(PreparedStatement ps, int i) throws SQLException {
                ...

            }

            @Override
            public int getBatchSize() {
                if(data == null){
                    return 0;
                }
                return data.size();
            }
        });
    }
2
  • Impressive, 15-20 x speedup in my case. Nov 3, 2017 at 17:23
  • 2
    I was using BatchPreparedStatementSetter and rewriteBatchedStatements=true in the connection URL as well. But the batch update was even slower than single update statements. As the last resort I gave a shot with the @Transactional annotation. And its working the batch statement was 5 times faster. Can someone shed a light on why is this so? I am really eager to know why. Aug 24, 2018 at 13:07
7

Change your sql insert to INSERT INTO TABLE(x, y, i) VALUES(1,2,3). The framework creates a loop for you. For example:

public void insertBatch(final List<Customer> customers){

  String sql = "INSERT INTO CUSTOMER " +
    "(CUST_ID, NAME, AGE) VALUES (?, ?, ?)";

  getJdbcTemplate().batchUpdate(sql, new BatchPreparedStatementSetter() {

    @Override
    public void setValues(PreparedStatement ps, int i) throws SQLException {
        Customer customer = customers.get(i);
        ps.setLong(1, customer.getCustId());
        ps.setString(2, customer.getName());
        ps.setInt(3, customer.getAge() );
    }

    @Override
    public int getBatchSize() {
        return customers.size();
    }
  });
}

IF you have something like this. Spring will do something like:

for(int i = 0; i < getBatchSize(); i++){
   execute the prepared statement with the parameters for the current iteration
}

The framework first creates PreparedStatement from the query (the sql variable) then the setValues method is called and the statement is executed. that is repeated as much times as you specify in the getBatchSize() method. So the right way to write the insert statement is with only one values clause. You can take a look at http://docs.spring.io/spring/docs/3.0.x/reference/jdbc.html

4
  • If you look at mysql_log you'l see the sequence of records: set auto_commit=0, insert into table(x,y,i) values(1,2,3), insert, more insert and more insert, commit, set autocommit=1. But it is NOT a "batch", it looks like a TRANSACTION. It is the the SLOWEST way to do inserts. Is there any instrument, witch can make create requests like "insert into t(x,y,i) values(),(),();"? Dec 4, 2013 at 21:44
  • 1
    While this is a good description of how to use the jdbcTemplate batch update, I don't see how this is any different than the original example.
    – netta
    Apr 12, 2016 at 7:31
  • @netta The OP executes INSERT INTO TABLE(x, y, i) VALUES(1,2,3), (1,2,3), ... , (1,2,3) a 1000 times, while he/she has to execute INSERT INTO TABLE(x, y, i) VALUES(1,2,3) 1000 times Apr 12, 2016 at 8:01
  • Your answer seems to be wrong in all ways. 1. Specifying multiple placeholder groups in one SQL query is a better optimization than to simply send multiple SQL queries to DB server. 2. If jdbc driver supports batch execution then jdbcTemplate will never work the way you described, instead it will create a batch of SQL queries and send the whole batch to database. See source on github, the batch update have not been changed since 2008. To understand better read this stackoverflow.com/questions/47664889/… Nov 16, 2021 at 10:24
7

I had also some bad time with Spring JDBC batch template. In my case, it would be, like, insane to use pure JDBC, so instead I used NamedParameterJdbcTemplate. This was a must have in my project. But it was way slow to insert hundreds os thousands of lines in the database.

To see what was going on, I've sampled it with VisualVM during the batch update and, voilà:

visualvm showing where it was slow

What was slowing the process was that, while setting the parameters, Spring JDBC was querying the database to know the metadata each parameter. And seemed to me that it was querying the database for each parameter for each line every time. So I just taught Spring to ignore the parameter types (as it is warned in the Spring documentation about batch operating a list of objects):

    @Bean(name = "named-jdbc-tenant")
    public synchronized NamedParameterJdbcTemplate getNamedJdbcTemplate(@Autowired TenantRoutingDataSource tenantDataSource) {
        System.setProperty("spring.jdbc.getParameterType.ignore", "true");
        return new NamedParameterJdbcTemplate(tenantDataSource);
    }

Note: the system property must be set before creating the JDBC Template object. It would be possible to just set in the application.properties, but this solved and I've never after touched this again

4
5

I don't know if this will work for you, but here's a Spring-free way that I ended up using. It was significantly faster than the various Spring methods I tried. I even tried using the JDBC template batch update method the other answer describes, but even that was slower than I wanted. I'm not sure what the deal was and the Internets didn't have many answers either. I suspected it had to do with how commits were being handled.

This approach is just straight JDBC using the java.sql packages and PreparedStatement's batch interface. This was the fastest way that I could get 24M records into a MySQL DB.

I more or less just built up collections of "record" objects and then called the below code in a method that batch inserted all the records. The loop that built the collections was responsible for managing the batch size.

I was trying to insert 24M records into a MySQL DB and it was going ~200 records per second using Spring batch. When I switched to this method, it went up to ~2500 records per second. so my 24M record load went from a theoretical 1.5 days to about 2.5 hours.

First create a connection...

Connection conn = null;
try{
    Class.forName("com.mysql.jdbc.Driver");
    conn = DriverManager.getConnection(connectionUrl, username, password);
}catch(SQLException e){}catch(ClassNotFoundException e){}

Then create a prepared statement and load it with batches of values for insert, and then execute as a single batch insert...

PreparedStatement ps = null;
try{
    conn.setAutoCommit(false);
    ps = conn.prepareStatement(sql); // INSERT INTO TABLE(x, y, i) VALUES(1,2,3)
    for(MyRecord record : records){
        try{
            ps.setString(1, record.getX());
            ps.setString(2, record.getY());
            ps.setString(3, record.getI());

            ps.addBatch();
        } catch (Exception e){
            ps.clearParameters();
            logger.warn("Skipping record...", e);
        }
    }

    ps.executeBatch();
    conn.commit();
} catch (SQLException e){
} finally {
    if(null != ps){
        try {ps.close();} catch (SQLException e){}
    }
}

Obviously I've removed error handling and the query and Record object is notional and whatnot.

Edit: Since your original question was comparing the insert into foobar values (?,?,?), (?,?,?)...(?,?,?) method to Spring batch, here's a more direct response to that:

It looks like your original method is likely the fastest way to do bulk data loads into MySQL without using something like the "LOAD DATA INFILE" approach. A quote from the MysQL docs (http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html):

If you are inserting many rows from the same client at the same time, use INSERT statements with multiple VALUES lists to insert several rows at a time. This is considerably faster (many times faster in some cases) than using separate single-row INSERT statements.

You could modify the Spring JDBC Template batchUpdate method to do an insert with multiple VALUES specified per 'setValues' call, but you'd have to manually keep track of the index values as you iterate over the set of things being inserted. And you'd run into a nasty edge case at the end when the total number of things being inserted isn't a multiple of the number of VALUES lists you have in your prepared statement.

If you use the approach I outline, you could do the same thing (use a prepared statement with multiple VALUES lists) and then when you get to that edge case at the end, it's a little easier to deal with because you can build and execute one last statement with exactly the right number of VALUES lists. It's a bit hacky, but most optimized things are.

2
  • Probably with Spring Batch the statement was executed and commited on every insert or on chunks, that slowed things down. Here you have only one commit at the end. +1 Dec 3, 2013 at 21:36
  • unfortunately, preparedStatement.executeBatch() gets the same result, each insert was invoked separatly. Dec 4, 2013 at 21:50
1

Solution given by @Rakesh worked for me. Significant improvement in performance. Earlier time was 8 min, with this solution taking less than 2 min.

DataSource ds = jdbcTemplate.getDataSource();
Connection connection = ds.getConnection();
connection.setAutoCommit(false);
String sql = "insert into employee (name, city, phone) values (?, ?, ?)";
PreparedStatement ps = connection.prepareStatement(sql);
final int batchSize = 1000;
int count = 0;

for (Employee employee: employees) {

    ps.setString(1, employee.getName());
    ps.setString(2, employee.getCity());
    ps.setString(3, employee.getPhone());
    ps.addBatch();

    ++count;

    if(count % batchSize == 0 || count == employees.size()) {
        ps.executeBatch();
        ps.clearBatch(); 
    }
}

connection.commit();
ps.close();
1
  • which database was this for?
    – Sameer
    Jan 4, 2021 at 16:32
0

Encountered some serious performance issue with JdbcBatchItemWriter.write() (link) from Spring Batch and find out the write logic delegates to JdbcTemplate.batchUpdate() eventually.

Adding a Java system properties of spring.jdbc.getParameterType.ignore=true fixed the performance issue entirely ( from 200 records per second to ~ 5000 ). The patch was tested working on both Postgresql and MsSql (might not be dialect specific)

... and ironically, Spring documented this behaviour under a "note" section link

In such a scenario, with automatic setting of values on an underlying PreparedStatement, the corresponding JDBC type for each value needs to be derived from the given Java type. While this usually works well, there is a potential for issues (for example, with Map-contained null values). Spring, by default, calls ParameterMetaData.getParameterType in such a case, which can be expensive with your JDBC driver. You should use a recent driver version and consider setting the spring.jdbc.getParameterType.ignore property to true (as a JVM system property or in a spring.properties file in the root of your classpath) if you encounter a performance issue — for example, as reported on Oracle 12c (SPR-16139).

Alternatively, you might consider specifying the corresponding JDBC types explicitly, either through a 'BatchPreparedStatementSetter' (as shown earlier), through an explicit type array given to a 'List<Object[]>' based call, through 'registerSqlType' calls on a custom 'MapSqlParameterSource' instance, or through a 'BeanPropertySqlParameterSource' that derives the SQL type from the Java-declared property type even for a null value.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.