Thanks for the great work on the Spring Data JPA project! There is one potential important improvement. Currently, the Spring Data JPA generates queries like:
SELECT COUNT(*) > 0 FROM my_table WHERE some_column = ?
for repository methods like:
boolean existsBySomeColumn(String value);
While this works just fine, a more optimal SQL pattern exists and is supported by all major relational databases: using SELECT EXISTS (...). This allows the database engine to short-circuit and return as soon as a matching row is found, potentially improving performance, especially on very large datasets. (think about tables with rows magnitude of 10^x).
Another reason: many DB engines skip full index/table scans for EXISTS, but must scan all matching rows for COUNT(*). This can be proved using EXPLAIN ANALYZE or EXPLAIN for both queries.
Main proposal
Instead of generating:
SELECT COUNT(*) > 0 FROM my_table WHERE some_column = ?
Generate, using universal JPQL as usual (no need of nativeQuery):
SELECT EXISTS (SELECT 1 FROM my_table WHERE some_column = ?)
Compatibility
The SELECT EXISTS (...) form is ANSI SQL-compliant and has been supported by virtually all relational databases for decades. There is no risk of incompatibility. MySQL, Postgres, SQLite, SQLServer, Oracle and more.
Quick Example
Repository:
public interface UserRepository extends JpaRepository<User, Long> {
boolean existsByEmail(String email);
}
Generated SQL (suggested):
SELECT EXISTS (SELECT 1 FROM users WHERE email = ?)
Backwards Compatibility Concerns ?
If there are edge cases or dialect-specific concerns, or backwards compatibility (opting-out), perhaps this could be controllable via a Spring Data JPA property (in .yml or .properties). For example:
spring.data.jpa.repository.exists-query-strategy=exists-subquery
Thanks for the great work on the Spring Data JPA project! There is one potential important improvement. Currently, the Spring Data JPA generates queries like:
for repository methods like:
While this works just fine, a more optimal SQL pattern exists and is supported by all major relational databases: using
SELECT EXISTS (...). This allows the database engine to short-circuit and return as soon as a matching row is found, potentially improving performance, especially on very large datasets. (think about tables with rows magnitude of 10^x).Another reason: many DB engines skip full index/table scans for
EXISTS, but must scan all matching rows forCOUNT(*). This can be proved usingEXPLAIN ANALYZEorEXPLAINfor both queries.Main proposal
Instead of generating:
Generate, using universal JPQL as usual (no need of nativeQuery):
Compatibility
The
SELECT EXISTS (...)form is ANSI SQL-compliant and has been supported by virtually all relational databases for decades. There is no risk of incompatibility. MySQL, Postgres, SQLite, SQLServer, Oracle and more.Quick Example
Repository:
Generated SQL (suggested):
Backwards Compatibility Concerns ?
If there are edge cases or dialect-specific concerns, or backwards compatibility (opting-out), perhaps this could be controllable via a Spring Data JPA property (in .yml or .properties). For example:
spring.data.jpa.repository.exists-query-strategy=exists-subquery