Sunday, January 13, 2008

Hibernate vs JDBC Performance

For my first blog entry ever, I thought I'd look at the performance of Hibernate. I was interested to know how much of an overhead Hibernate adds over straight JDBC, hoping to alay fears about the impact on system performance.

I've run a number of tests (see description and results below) comparing the equivalent code in JDBC and Hibernate. For insertion and update operations, the overhead of Hibernate is not really significant. However, for reads/queries, Hibernate does account for a significant amount of the cost of retrieval. I'm not sure why yet and hope to do some profiling to determine the root cause.

I have to admit that the overhead of using Hibernate was larger than I was expecting (I thought it would be sub millisecond). Still for most applications I doubt this will prove a significant issue, with the increased development productivity more than making up for the drop in performance. Also my PC is relatively dated now - Pentium 1.73GHz with 1 GB of memory.

The test used a database schema for a very simple running log (yes, I've been know to do the odd bit of exercise). The details of a run (distance, heart rate, etc) are recorded against a particular runner, giving a simple one-to-many mapping from a Runner entity to a Run entity. I've put the source code here if anybody wants to try and reproduce my findings. I tested using the Derby database, but it would obviously be easy to adapt to a different database.

Insertions: This test measures the time taken to insert a runner and there associated runs into the database as a single transaction. The graph below shows the insertion time against the number of runs inserted (child entities of runner) per runner.

There really isn't much in it, but the performance of Hibernate does seem to improve as the number of rows inserted increases. I'd speculate that the cost of setting up the Hibernate session become proportionally less significant as more work is peformed in each session.

I have also included the effect of switching on caching (using EhCache) and it can be seen that this unsurprisingly does have an impact on performance.

Updates: This test measures the time it takes to update a runner and all their associated runs. The graph below shows the update time against the number of runs updated. There is a bit of a mismatch between the semantics of JDBC and Hibernate here. Typically Hibernate would only update the objects that have changed, but I'm forcing it to update all the entity so I can measure the overhead of mapping the object to the database.

Queries: This tests measure the time it takes to retrieve a runner and all their associated runs.

You can see that when the cache is enabled, the performance is on a par with straight JDBC. I suspect that in a typical production system, the straight JDBC peformance would be worse than the Hibernate caching, but I happen to be running the database locally so there is little cost in going to the database.

Hibernate's performance for retrieval without a cache is quite bad and I'm at a loss to explain why this should be. I've tried to optimise the performance, using a connection pool with a prepared statement cache, but it still lags the straight JDBC performance. My only explanation is that this is the time it takes to construct the object graph from the database, which the straight JDBC does not have to do, but it seems much more expensive than I expected.

One useful thing to note is that there is no difference in performance between a HQL and using load. Hibernate is very good at caching the execution plan for HQL queries, and subsequent queries don't require any further Hibernate parsing/anlysis.

No comments: