Advanced PHP Database Patterns: Beyond ORMs for High-Performance Applications
When working on projects with heavy database lifting, working directly with PDO and MySQL often outperforms using an ORM. The key insight: you can use both approaches side-by-side. ORMs excel at simple CRUD operations, but when you need maximum performance for complex queries, bulk operations, or memory-efficient processing, direct database access gives you fine-grained control. In most applications, the database is the real bottleneck, and these patterns help you squeeze every bit of performance from it.
Why Direct Database Access Matters
ORMs like Doctrine and Eloquent provide convenience and rapid development, but they introduce overhead that becomes significant at scale. According to 2025 performance benchmarks, direct PDO queries can be 3-5x faster than ORM-generated queries for complex operations. The difference becomes dramatic when processing millions of rows or performing bulk updates.
The patterns in this article come from production systems handling high-volume database operations. They address real-world challenges: connection failures, memory exhaustion, slow bulk updates, and brittle tests that pass despite broken SQL. These aren't theoretical patterns - they're battle-tested solutions.
The Hybrid Approach
You don't need to choose between ORMs and direct database access. Use your ORM for typical application code where developer productivity matters more than raw performance. Switch to direct PDO when you need:
- Bulk operations - Updating thousands of rows efficiently
- Complex queries - Multi-table joins, derived tables, or aggregations
- Memory-efficient processing - Streaming millions of rows without exhausting memory
- Maximum performance - When every millisecond counts
- Fine-grained control - Transaction isolation levels, connection management, statement caching
Pattern 1: Retry Mechanisms for Transient Failures
Database connections fail. MySQL servers restart. Networks hiccup. Long-running processes encounter "MySQL server has gone away" errors. Production systems need to handle these transient failures gracefully without crashing or requiring manual intervention.
The Problem
When your application loses its database connection mid-operation, the default behavior is catastrophic: exceptions bubble up, processes crash, and data operations fail. For batch jobs processing millions of records, a single connection timeout can waste hours of work.
The Solution: Automatic Retry with Connection Reset
Implement a PDO wrapper that detects connection failures and automatically retries operations after resetting the connection. The pattern uses PHP's callable types and exception handling to wrap database operations in retry logic.
First, define a clean interface for database operations:
<?php
declare(strict_types=1);
namespace App\Database;
use Generator;
/**
* Service for database operations with automatic retry functionality.
* Handles connection errors and transient failures gracefully.
*/
interface DatabaseServiceInterface
{
/**
* Execute a callback with the database connection, with retry on connection loss.
*
* @template T
* @param callable(\PDO): T $callback Function to execute with the PDO connection
* @return T The result from the callback
*/
public function withConnection(callable $callback): mixed;
/**
* Execute a query and return all results.
*
* @param string $sql SQL query with placeholders
* @param array<string|int,scalar|null> $params Parameters for the query
* @param bool $stmtCache Whether to use statement caching (default: true)
* @return array<int,array<string,mixed>> Query results
*/
public function query(string $sql, array $params = [], bool $stmtCache = true): array;
/**
* Execute a non-query SQL statement.
*
* @param string $sql SQL statement with placeholders
* @param array<string|int,scalar|null> $params Parameters for the statement
* @param bool $stmtCache Whether to use statement caching (default: true)
* @return int Number of affected rows
*/
public function execute(string $sql, array $params = [], bool $stmtCache = true): int;
/**
* Stream results from a query using a generator.
* This is memory-efficient for large result sets.
*
* @param string $sql SQL query with placeholders
* @param array<string|int,scalar|null> $params Parameters for the query
* @param bool $stmtCache Whether to use statement caching (default: true)
* @return Generator<int,array<string,mixed>> Generator yielding rows
*/
public function stream(string $sql, array $params = [], bool $stmtCache = true): Generator;
/**
* Execute an INSERT statement and return the last insert ID.
*
* @param string $sql INSERT SQL statement with placeholders
* @param array<string|int,scalar|null> $params Parameters for the statement
* @param bool $stmtCache Whether to use statement caching (default: true)
* @return int The last insert ID
*/
public function insert(string $sql, array $params = [], bool $stmtCache = true): int;
}
The retry mechanism implementation handles multiple connection error types and provides configurable retry behavior:
<?php
declare(strict_types=1);
namespace App\Database;
use PDO;
use PDOException;
use Psr\Log\LoggerInterface;
use Throwable;
/**
* Implements automatic retry logic for database operations.
* Handles transient connection failures gracefully.
*/
final class DatabaseService implements DatabaseServiceInterface
{
private const string MYSQL_SERVER_GONE_AWAY_CODE = '2006';
private const array CONNECTION_ERROR_MESSAGES = [
'server has gone away',
'Lost connection',
'Error while sending',
'is dead or not enabled',
'decryption failed or bad record mac',
'server closed the connection unexpectedly',
'SSL connection has been closed unexpectedly',
'Error writing data to the connection',
'Resource deadlock avoided',
'Transaction deadlock',
'Connection timed out',
];
public function __construct(
private readonly PDOFactory $pdoFactory,
private readonly DatabaseConfig $config,
private readonly LoggerInterface $logger,
) {
}
/**
* Execute a callback with retry logic for database operations.
* Automatically handles connection errors by resetting the connection and retrying.
*
* @template T
* @param callable(PDO): T $callback Function to execute with the PDO connection
* @return T The result from the callback
* @throws Throwable Re-throws any exceptions that occur after retries are exhausted
*/
public function withConnection(callable $callback): mixed
{
$maxAttempts = $this->config->getRetryAttempts();
$delay = $this->config->getRetryDelay();
$attempts = 0;
while (true) {
++$attempts;
try {
$connection = $this->pdoFactory->getConnection();
return $callback($connection);
} catch (Throwable $e) {
if ($attempts > $maxAttempts || !$this->isConnectionLostError($e)) {
throw $e; // Not a connection error or max retries exceeded
}
$this->logger->warning(
sprintf(
'Database connection lost (attempt %d/%d): %s',
$attempts,
$maxAttempts,
$e->getMessage()
),
['exception' => $e]
);
// Reset connection and wait before retry
$this->pdoFactory->resetConnection();
usleep($delay * 1000); // Convert ms to microseconds
}
}
}
/**
* Check if an exception was caused by a lost database connection.
*/
private function isConnectionLostError(Throwable $e): bool
{
// Check for PDO exception with MySQL "server has gone away" error
if ($e instanceof PDOException && str_contains($e->getMessage(), self::MYSQL_SERVER_GONE_AWAY_CODE)) {
return true;
}
foreach (self::CONNECTION_ERROR_MESSAGES as $errorMessage) {
if (str_contains($e->getMessage(), $errorMessage)) {
return true;
}
}
return false;
}
// Other interface methods use withConnection() internally...
}
Key Features
- Automatic detection - Recognizes 11+ types of connection errors including deadlocks, timeouts, and SSL failures
- Configurable retries - Set maximum attempts and delay between retries based on your environment
- Connection reset - Forces PDO to establish a new connection after failures
- Logging integration - Uses PSR-3 LoggerInterface for monitoring retry patterns
- Non-retryable errors - Only retries connection errors, not SQL syntax errors or constraint violations
Real-World Impact
In production systems, this pattern eliminates manual intervention for transient failures. Batch jobs that once required monitoring and manual restarts now complete reliably. The retry logic adds negligible overhead (microseconds) while providing significant resilience.
Pattern 2: Prepared Statement Caching
Prepared statements are essential for security and performance, but repeatedly preparing the same statement wastes resources. While MySQL caches execution plans server-side, PHP destroys PDOStatement objects between requests. Within a single request, however, you can cache prepared statements for significant performance gains.
The Problem
When executing the same query repeatedly with different parameters (common in loops and batch operations), calling $pdo->prepare()
for each execution creates unnecessary overhead. Each prepare operation involves parsing SQL, allocating memory, and communicating with the database server.
The Solution: Request-Scoped Statement Cache
Implement a caching layer that reuses prepared statements within the same connection. The cache uses spl_object_hash() to ensure statements are only reused with the same PDO connection:
<?php
declare(strict_types=1);
namespace App\Database;
use PDO;
use PDOStatement;
use RuntimeException;
/**
* Implements prepared statement caching for improved performance.
* Reuses prepared statements within the same request/connection.
*/
trait StatementCachingTrait
{
/**
* Cache storage for prepared statements.
* Key is statement ID, value contains the prepared statement and connection hash.
*
* @var array<string,array{stmt: PDOStatement, connection_hash: string, useCnt: int}>
*/
private static array $statementCache = [];
/**
* Get a cached prepared statement or create a new one.
*
* @param PDO $pdo The PDO connection
* @param string $sql The SQL statement to prepare
* @param bool $stmtCache Whether to use the statement cache
* @return PDOStatement The prepared statement
*/
private function getPreparedStatement(PDO $pdo, string $sql, bool $stmtCache = true): PDOStatement
{
if (!$stmtCache) {
return $pdo->prepare($sql);
}
$statementId = $this->generateStatementId($sql);
$connectionHash = spl_object_hash($pdo);
// Check if we have a cached statement with matching connection
if (
isset(self::$statementCache[$statementId])
&& self::$statementCache[$statementId]['connection_hash'] === $connectionHash
) {
// Close any pending result set before reusing the statement
self::$statementCache[$statementId]['stmt']->closeCursor();
++self::$statementCache[$statementId]['useCnt'];
return self::$statementCache[$statementId]['stmt'];
}
// Prepare a new statement and cache it
$stmt = $pdo->prepare($sql);
self::$statementCache[$statementId] = [
'stmt' => $stmt,
'connection_hash' => $connectionHash,
'useCnt' => 1,
];
return $stmt;
}
/**
* Generate a unique, readable ID for a SQL statement for caching purposes.
* Normalizes whitespace and appends an MD5 hash for uniqueness.
*
* @param string $sql The SQL statement to generate an ID for
* @return string The generated statement ID
*/
private function generateStatementId(string $sql): string
{
// Normalize whitespace (replace multiple spaces, tabs, newlines with single space)
$normalized = preg_replace('/\s+/', ' ', trim($sql));
if (null === $normalized) {
throw new RuntimeException('Failed to normalize SQL statement');
}
// Take first 30 chars for readability + md5 for uniqueness
$prefix = substr($normalized, 0, 30);
$hash = md5($sql);
return sprintf('%s_%s', $prefix, $hash);
}
/**
* Reset the statement cache (typically called when connection is reset).
*/
private function resetStatementCache(): void
{
self::$statementCache = [];
}
}
Performance Characteristics
According to benchmarks on Stack Overflow, statement reuse provides 15-30% performance improvement for queries executed in loops. The gain comes from:
- Eliminated parsing - SQL only parsed once per connection
- Reduced memory allocation - Statement objects reused instead of recreated
- Less garbage collection - Fewer objects for PHP to clean up
Important Limitations
Statement caching only works within a single request. As noted in the PHP manual, PDOStatement objects cannot persist between requests because they're tied to resources that get deallocated when the script ends. Don't attempt to cache statements in sessions or APCu.
Pattern 3: Bulk Update Single Column
Updating thousands of rows individually is painfully slow. Each UPDATE statement involves a full round-trip to the database. For 10,000 rows, that's 10,000 network round-trips. The bulk update pattern uses MySQL's CASE WHEN clause to update thousands of rows in a single query.
The Problem
Standard approaches to bulk updates are inadequate:
- Individual UPDATEs - Loop with 10,000 UPDATE statements takes minutes
- UPDATE with IN clause - Can only set all rows to the same value
- Multiple separate queries - Still requires thousands of round-trips
The Solution: CASE WHEN Bulk Updates
Transform multiple updates into a single SQL statement using CASE WHEN. This pattern generates SQL like:
UPDATE products
SET price = CASE id
WHEN 101 THEN 29.99
WHEN 102 THEN 39.99
WHEN 103 THEN 49.99
-- ... thousands more
END
WHERE id IN (101, 102, 103, ...)
The PHP implementation accumulates changes and executes them in configurable chunk sizes:
<?php
declare(strict_types=1);
namespace App\Database;
use InvalidArgumentException;
use Psr\Log\LoggerInterface;
use RuntimeException;
use Throwable;
/**
* Efficiently updates a single column for multiple rows using CASE WHEN.
* Accumulates changes and executes them in optimized bulk operations.
*/
final class BulkUpdateSingleColumn
{
/**
* SQL template for bulk update using CASE WHEN.
*/
private const string SQL_BULK_UPDATE_TEMPLATE = 'UPDATE %s SET %s = CASE %s %s END WHERE %s IN (%s)';
/**
* SQL template for each CASE WHEN clause.
*/
private const string SQL_CASE_WHEN_TEMPLATE = "WHEN '%s' THEN ?";
/** @var array<string|int,scalar|null> */
private array $idsToVals = [];
private int $chunkSize = 5000;
/**
* @param DatabaseServiceInterface $dbService Database service with retry capability
* @param string $table Table name
* @param string $idColumnName Name of the ID column
* @param string $updateColumnName Name of the column to update
* @param string $extraWhereConditions Additional WHERE conditions
* @param LoggerInterface|null $logger Logger for output
*/
public function __construct(
private readonly DatabaseServiceInterface $dbService,
private readonly string $table,
private readonly string $idColumnName,
private readonly string $updateColumnName,
private readonly string $extraWhereConditions = '',
private readonly ?LoggerInterface $logger = null,
) {
}
public function __destruct()
{
// Ensure pending updates are flushed
if ([] === $this->idsToVals) {
return;
}
try {
$this->runBulkUpdate();
} catch (Throwable) {
throw new RuntimeException(
"BulkUpdateSingleColumn('{$this->table}.{$this->updateColumnName}') was destroyed with pending updates. " .
'You must call runBulkUpdate() before destroying.'
);
}
}
/**
* Add an ID and value pair to the update queue.
* Automatically triggers bulk update when chunk size is reached.
*
* @param int|string $id The ID value
* @param bool|float|int|string|null $val The value to set
*/
public function addIdAndValue(int|string $id, bool|float|int|string|null $val): void
{
$this->idsToVals[$id] = $val;
if (count($this->idsToVals) > $this->chunkSize) {
$this->runBulkUpdate();
}
}
/**
* Execute the accumulated bulk update.
*/
public function runBulkUpdate(): void
{
if ([] === $this->idsToVals) {
return;
}
$this->logger?->info(
'Running Bulk Update of ' . $this->table . '.' . $this->updateColumnName .
' on ' . count($this->idsToVals) . ' Rows'
);
$sql = $this->buildSql();
$params = array_values($this->idsToVals);
$start = microtime(true);
$affectedRows = $this->dbService->execute($sql, $params);
$this->logger?->info(
'Bulk Update of ' . $this->table . '.' . $this->updateColumnName .
' updated ' . $affectedRows . ' rows in ' . (microtime(true) - $start) . ' seconds'
);
$this->idsToVals = [];
}
public function setChunkSize(int $chunkSize): void
{
$this->chunkSize = $chunkSize;
}
/**
* Build the SQL statement for bulk update using CASE WHEN.
*/
private function buildSql(): string
{
$caseWhenClauses = '';
foreach ($this->idsToVals as $id => $val) {
$caseWhenClauses .= ' ' . sprintf(self::SQL_CASE_WHEN_TEMPLATE, $id) . "\n";
}
$inClause = "'" . implode("','", array_keys($this->idsToVals)) . "'";
$sql = sprintf(
self::SQL_BULK_UPDATE_TEMPLATE,
$this->table,
$this->updateColumnName,
$this->idColumnName,
$caseWhenClauses,
$this->idColumnName,
$inClause
);
if ('' !== $this->extraWhereConditions) {
if (!str_starts_with(strtolower(trim($this->extraWhereConditions)), 'and')) {
throw new InvalidArgumentException('Extra where condition must start with an "AND"');
}
$sql .= $this->extraWhereConditions;
}
return $sql;
}
}
Usage Example
<?php
declare(strict_types=1);
namespace App\Example;
use App\Database\BulkUpdateSingleColumn;
use App\Database\DatabaseServiceInterface;
use Psr\Log\LoggerInterface;
/**
* Example: Using BulkUpdateSingleColumn to efficiently update thousands of rows.
*/
final readonly class UpdateProductPricesService
{
public function __construct(
private DatabaseServiceInterface $dbService,
private LoggerInterface $logger,
) {
}
/**
* Update prices for thousands of products efficiently.
*
* @param array<int,float> $productPrices Map of product ID to new price
*/
public function updatePrices(array $productPrices): void
{
$bulkUpdate = new BulkUpdateSingleColumn(
dbService: $this->dbService,
table: 'products',
idColumnName: 'id',
updateColumnName: 'price',
extraWhereConditions: 'AND deleted_at IS NULL',
logger: $this->logger,
);
// Set chunk size to control batch size
$bulkUpdate->setChunkSize(5000);
// Add all price updates - bulk operations happen automatically
foreach ($productPrices as $productId => $newPrice) {
$bulkUpdate->addIdAndValue($productId, $newPrice);
}
// Flush remaining updates
$bulkUpdate->runBulkUpdate();
// No need to call runBulkUpdate() explicitly - destructor handles it
// But it's good practice to call it explicitly for clarity
}
}
Performance Impact
In production systems, this pattern reduces bulk update time by 100-1000x:
- 10,000 individual UPDATEs - 2-5 minutes
- Single CASE WHEN query - 1-3 seconds
The chunk size parameter (default 5000) balances memory usage against network round-trips. Larger chunks mean fewer queries but more memory for the SQL string. For most scenarios, 5000 is optimal.
Pattern 4: Query, Statement, and Generator Classes
Raw SQL strings scattered throughout your codebase create maintenance nightmares. Changes to table structure require hunting through hundreds of files. SQL injection vulnerabilities hide in plain sight. The solution: encapsulate SQL in dedicated classes with clear purposes.
Query Classes: Execute Once in Constructor
Query classes execute immediately when instantiated and provide strongly-typed results. They use PHPStan type annotations to guarantee result structure:
<?php
declare(strict_types=1);
namespace App\Database\Query;
use App\Database\DatabaseServiceInterface;
use LogicException;
use RuntimeException;
/**
* @phpstan-type UserArray array{id: int, email: string, name: string}
*
* Encapsulates a database query with validation and type safety.
* Query classes execute immediately in the constructor.
*/
final readonly class ActiveUsersQuery
{
/**
* SQL query to fetch all active users.
*/
private const string SQL_FETCH_ACTIVE_USERS = <<<'SQL'
SELECT
id,
email,
name
FROM users
WHERE status = 'active'
ORDER BY name
SQL;
/**
* @var array<UserArray>
*/
public array $results;
/**
* @param DatabaseServiceInterface $dbService Database service with retry capability
* @throws RuntimeException When query fails or returned data is invalid
*/
public function __construct(DatabaseServiceInterface $dbService)
{
$fetchedUsers = $dbService->query(self::SQL_FETCH_ACTIVE_USERS);
if ([] === $fetchedUsers) {
throw new RuntimeException('No active users found in database');
}
/** @var array<UserArray> $validatedUsers */
$validatedUsers = [];
foreach ($fetchedUsers as $index => $user) {
if (!isset($user['id'])) {
throw new LogicException(sprintf('User at index %d is missing "id" key', $index));
}
if (!isset($user['email'])) {
throw new LogicException(sprintf('User at index %d is missing "email" key', $index));
}
if (!isset($user['name'])) {
throw new LogicException(sprintf('User at index %d is missing "name" key', $index));
}
$id = $user['id'];
if (!is_int($id) && !is_numeric($id)) {
throw new LogicException(sprintf(
'id at index %d is not an integer or numeric value: %s',
$index,
get_debug_type($id)
));
}
$validatedUsers[] = [
'id' => (int)$id,
'email' => (string)$user['email'],
'name' => (string)$user['name'],
];
}
$this->results = $validatedUsers;
}
}
PreparedStmt Classes: Reusable Parameterized Queries
Unlike Query classes, PreparedStmt classes have methods to execute with different parameters. Use them for queries called multiple times with varying inputs:
<?php
declare(strict_types=1);
namespace App\Database\PreparedStmt;
use App\Database\DatabaseServiceInterface;
/**
* @phpstan-type OrderArray array{id: int, customer_id: int, total: float, status: string}
*
* Encapsulates a prepared statement for reusable parameterized queries.
* Unlike Query classes, PreparedStmt classes have methods to execute with different parameters.
*/
final readonly class GetOrderByIdStmt
{
/**
* SQL to fetch order by ID.
*/
private const string SQL_FETCH_ORDER = <<<'SQL'
SELECT
id,
customer_id,
total,
status
FROM orders
WHERE id = :order_id
SQL;
public function __construct(
private DatabaseServiceInterface $dbService
) {
}
/**
* Execute the query and return the order, or null if not found.
*
* @param int $orderId The order ID to fetch
* @return OrderArray|null The order data, or null if not found
*/
public function getResult(int $orderId): ?array
{
$results = $this->dbService->query(
self::SQL_FETCH_ORDER,
['order_id' => $orderId]
);
if ([] === $results) {
return null;
}
$order = $results[0];
// Validate and cast types
return [
'id' => (int)$order['id'],
'customer_id' => (int)$order['customer_id'],
'total' => (float)$order['total'],
'status' => (string)$order['status'],
];
}
}
Generator Classes: Memory-Efficient Streaming
Generator classes use PHP generators for memory-efficient processing of large result sets. According to 2025 benchmarks, generators can process millions of rows while using constant memory (typically 2-5MB regardless of result set size):
<?php
declare(strict_types=1);
namespace App\Database\Generator;
use App\Database\DatabaseServiceInterface;
use Generator;
use RuntimeException;
/**
* @phpstan-type ProductGeneratorType Generator<array{id: int, name: string, price: float}>
*
* Streams database results using PHP generators for memory efficiency.
* Ideal for processing large datasets without loading everything into memory.
*/
final readonly class ProductGenerator
{
/**
* SQL query to fetch all products.
*/
private const string SQL_FETCH_PRODUCTS = <<<'SQL'
SELECT
id,
name,
price
FROM products
WHERE deleted_at IS NULL
ORDER BY name
SQL;
/** @var ProductGeneratorType */
public Generator $generator;
public function __construct(DatabaseServiceInterface $dbService)
{
$this->generator = $this->createGenerator($dbService);
}
/**
* @return ProductGeneratorType
*/
private function createGenerator(DatabaseServiceInterface $dbService): Generator
{
$rawGenerator = $dbService->stream(self::SQL_FETCH_PRODUCTS);
foreach ($rawGenerator as $row) {
if (
!is_array($row)
|| !isset($row['id'], $row['name'], $row['price'])
|| !is_numeric($row['id'])
|| !is_string($row['name'])
|| '' === $row['name']
|| !is_numeric($row['price'])
) {
throw new RuntimeException('Failed getting valid row from query: ' . print_r($row, true));
}
yield [
'id' => (int)$row['id'],
'name' => $row['name'],
'price' => (float)$row['price'],
];
}
}
}
Usage Example: Processing Millions of Rows
<?php
declare(strict_types=1);
namespace App\Example;
use App\Database\DatabaseServiceInterface;
use App\Database\Generator\ProductGenerator;
/**
* Example: Processing large datasets with generators for memory efficiency.
*/
final readonly class ExportProductsService
{
public function __construct(
private DatabaseServiceInterface $dbService,
) {
}
/**
* Export millions of products to CSV without exhausting memory.
*
* Using generators, this can process millions of rows while using
* only a few megabytes of memory, regardless of result set size.
*/
public function exportToCsv(string $filename): void
{
$generator = new ProductGenerator($this->dbService);
$handle = fopen($filename, 'w');
// Write CSV header
fputcsv($handle, ['ID', 'Name', 'Price']);
// Process results one at a time - memory stays constant
foreach ($generator->generator as $product) {
fputcsv($handle, [
$product['id'],
$product['name'],
$product['price'],
]);
}
fclose($handle);
}
/**
* Alternative: Stream query results directly without separate generator class.
*/
public function exportWithDirectStream(string $filename): void
{
$sql = 'SELECT id, name, price FROM products WHERE deleted_at IS NULL ORDER BY name';
$handle = fopen($filename, 'w');
fputcsv($handle, ['ID', 'Name', 'Price']);
// Direct streaming - no intermediate array storage
foreach ($this->dbService->stream($sql) as $row) {
fputcsv($handle, [
$row['id'],
$row['name'],
$row['price'],
]);
}
fclose($handle);
}
}
When to Use Each Pattern
- Query class - Results needed immediately, data set fits in memory
- PreparedStmt class - Same query executed multiple times with different parameters
- Generator class - Large result sets, streaming processing, memory constraints
Buffered vs Unbuffered Queries
The PHP manual explains the difference: buffered queries (default) load all results into memory immediately, while unbuffered queries fetch rows on demand. Generators use unbuffered queries internally for memory efficiency.
Pattern 5: Derived Tables for Performance
Complex queries often benefit from derived tables (subqueries in the FROM clause). MySQL's query optimizer can materialize derived tables, drastically reducing the result set size before joins. According to the MySQL 9.1 documentation, derived table optimization can improve query performance by 10-100x for complex aggregations.
The Problem
When joining large tables and then aggregating, MySQL processes millions of rows unnecessarily. This query pattern is inefficient:
-- BAD: Joins first, aggregates later
SELECT c.id, c.name, SUM(o.total), COUNT(*)
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
WHERE o.status = 'completed'
GROUP BY c.id
The Solution: Aggregate in Derived Table
Pre-aggregate in a derived table before joining. MySQL materializes the aggregated result set (much smaller), then joins against it:
<?php
declare(strict_types=1);
namespace App\Database\Query;
use App\Database\DatabaseServiceInterface;
/**
* Example using derived tables for performance optimization.
* Derived tables allow complex filtering before joining, reducing result set size.
*/
final readonly class TopCustomersByRevenueQuery
{
/**
* Uses a derived table to pre-aggregate order totals before joining with customers.
* This is more efficient than joining first and then aggregating.
*/
private const string SQL_TOP_CUSTOMERS = <<<'SQL'
SELECT
c.id,
c.name,
c.email,
revenue.total_revenue,
revenue.order_count
FROM customers c
INNER JOIN (
-- Derived table: pre-aggregate order data
SELECT
customer_id,
SUM(total) as total_revenue,
COUNT(*) as order_count
FROM orders
WHERE status = 'completed'
AND created_at >= DATE_SUB(NOW(), INTERVAL 12 MONTH)
GROUP BY customer_id
HAVING total_revenue > 1000
) AS revenue ON c.id = revenue.customer_id
WHERE c.status = 'active'
ORDER BY revenue.total_revenue DESC
LIMIT 100
SQL;
/**
* @var array<array{id: int, name: string, email: string, total_revenue: float, order_count: int}>
*/
public array $results;
public function __construct(DatabaseServiceInterface $dbService)
{
$this->results = $dbService->query(self::SQL_TOP_CUSTOMERS);
}
}
Performance Impact
For a table with 1 million orders and 100,000 customers:
- Without derived table - Processes 1,000,000 rows, takes 15-30 seconds
- With derived table - Processes 50,000 aggregated rows, takes 0.5-2 seconds
The MySQL optimizer uses materialization strategies to create temporary tables for derived tables, enabling index usage and reducing memory requirements.
Pattern 6: Transaction Isolation Levels
Transaction isolation levels control how concurrent transactions interact. The default REPEATABLE READ level prevents many issues but introduces unnecessary locking for some scenarios. Choosing the right isolation level based on operation type improves both performance and correctness.
Available Isolation Levels
- READ UNCOMMITTED - Fastest, allows dirty reads (reading uncommitted changes)
- READ COMMITTED - Prevents dirty reads, good balance for most operations
- REPEATABLE READ - MySQL default, prevents non-repeatable reads
- SERIALIZABLE - Strongest isolation, full transaction isolation
Implementation Pattern
Set the isolation level before calling beginTransaction():
<?php
declare(strict_types=1);
namespace App\Database;
use PDO;
/**
* Example: Managing transaction isolation levels for different scenarios.
*/
final readonly class TransactionService
{
public function __construct(
private DatabaseServiceInterface $dbService,
) {
}
/**
* Execute a callback within a transaction with specific isolation level.
*
* @template T
* @param callable(PDO): T $callback
* @param string $isolationLevel READ UNCOMMITTED|READ COMMITTED|REPEATABLE READ|SERIALIZABLE
* @return T
*/
public function withTransaction(callable $callback, string $isolationLevel = 'READ COMMITTED'): mixed
{
return $this->dbService->withConnection(function (PDO $pdo) use ($callback, $isolationLevel) {
// Set isolation level BEFORE starting transaction
$pdo->exec("SET TRANSACTION ISOLATION LEVEL {$isolationLevel}");
$pdo->beginTransaction();
try {
$result = $callback($pdo);
$pdo->commit();
return $result;
} catch (\Throwable $e) {
$pdo->rollBack();
throw $e;
}
});
}
/**
* Example: Transfer funds between accounts with SERIALIZABLE isolation.
* This prevents phantom reads and ensures complete isolation.
*/
public function transferFunds(int $fromAccountId, int $toAccountId, float $amount): void
{
$this->withTransaction(function (PDO $pdo) use ($fromAccountId, $toAccountId, $amount) {
// Deduct from source account
$stmt = $pdo->prepare('UPDATE accounts SET balance = balance - :amount WHERE id = :id AND balance >= :amount');
$stmt->execute(['amount' => $amount, 'id' => $fromAccountId]);
if ($stmt->rowCount() === 0) {
throw new \RuntimeException('Insufficient funds');
}
// Add to destination account
$stmt = $pdo->prepare('UPDATE accounts SET balance = balance + :amount WHERE id = :id');
$stmt->execute(['amount' => $amount, 'id' => $toAccountId]);
}, 'SERIALIZABLE'); // Use highest isolation level for financial transactions
}
/**
* Example: Generate reports with READ UNCOMMITTED for performance.
* Allows reading uncommitted changes for real-time dashboards where
* absolute accuracy is less critical than performance.
*/
public function generateRealtimeDashboard(): array
{
return $this->withTransaction(function (PDO $pdo) {
$stmt = $pdo->query('SELECT COUNT(*) as active_users FROM users WHERE last_seen > NOW() - INTERVAL 5 MINUTE');
return $stmt->fetchAll(PDO::FETCH_ASSOC);
}, 'READ UNCOMMITTED'); // Fastest, but may read uncommitted data
}
}
Choosing the Right Level
- Financial operations - Use SERIALIZABLE for complete isolation
- Standard updates - Use READ COMMITTED for good balance
- Reporting/analytics - Use READ UNCOMMITTED for maximum performance
- Default - READ COMMITTED recommended over REPEATABLE READ according to Drupal documentation
As noted in PHP transaction best practices, always wrap transactions in try-catch blocks to ensure rollback on failure. Keep transactions short to minimize locking.
Pattern 7: PHPStan Rules for Test Correctness
Unit tests that mock database operations provide false confidence. They pass even when SQL references non-existent tables or columns. The solution: use PHPStan rules to enforce integration tests that execute real SQL against real databases.
The Problem
This test passes but the SQL is broken:
public function testGetUsers(): void
{
// Mock database - test passes even if SQL references wrong table
$mockDb = $this->createMock(DatabaseServiceInterface::class);
$mockDb->method('query')->willReturn([
['id' => 1, 'email' => 'test@example.com']
]);
$query = new ActiveUsersQuery($mockDb);
self::assertCount(1, $query->results); // ✓ Test passes
}
// Meanwhile, the actual SQL references wrong_table_name
// This bug won't be caught until production!
Solution 1: Prevent Mocking DatabaseServiceInterface
Create a custom PHPStan rule that fails static analysis when tests mock the database service:
<?php
declare(strict_types=1);
namespace App\PHPStan\Rules;
use PhpParser\Node;
use PhpParser\Node\Arg;
use PhpParser\Node\Expr\ClassConstFetch;
use PhpParser\Node\Expr\MethodCall;
use PhpParser\Node\Expr\Variable;
use PhpParser\Node\Name;
use PHPStan\Analyser\Scope;
use PHPStan\Rules\Rule;
use PHPStan\Rules\RuleErrorBuilder;
/**
* PHPStan rule to enforce: Do not mock DatabaseServiceInterface in tests.
*
* This prevents brittle tests that mock database operations instead of
* testing against real database connections, which can hide SQL errors.
*
* @implements Rule<MethodCall>
*/
final readonly class NoMockDatabaseServiceRule implements Rule
{
private const string DB_SERVICE_INTERFACE = 'App\Database\DatabaseServiceInterface';
private const array MOCK_METHODS = ['createMock', 'getMockBuilder'];
public function getNodeType(): string
{
return MethodCall::class;
}
/**
* @param MethodCall $node
* @return list<\PHPStan\Rules\IdentifierRuleError>
*/
public function processNode(Node $node, Scope $scope): array
{
// Check if method call is on $this
if (!$node->var instanceof Variable || 'this' !== $node->var->name) {
return [];
}
// Check if method is createMock or getMockBuilder
if (!$node->name instanceof Node\Identifier) {
return [];
}
$methodName = $node->name->toString();
if (!in_array($methodName, self::MOCK_METHODS, true)) {
return [];
}
// Check if first argument is DatabaseServiceInterface::class
if ([] === $node->args) {
return [];
}
$firstArg = $node->args[0];
if (!$firstArg instanceof Arg || !$firstArg->value instanceof ClassConstFetch) {
return [];
}
$classConstFetch = $firstArg->value;
// Check if constant is 'class'
if (!$classConstFetch->name instanceof Node\Identifier || 'class' !== $classConstFetch->name->toString()) {
return [];
}
// Get the class name
if (!$classConstFetch->class instanceof Name) {
return [];
}
$className = $classConstFetch->class->toString();
// Check if it's DatabaseServiceInterface
if (self::DB_SERVICE_INTERFACE !== $className) {
return [];
}
// Build error message
return [
RuleErrorBuilder::message(
'Do not mock DatabaseServiceInterface in tests. Use a test database service with ' .
'real database connections instead. Mocking database operations hides SQL errors ' .
'and creates brittle tests that pass even when queries reference wrong tables or columns.'
)
->identifier('app.noMockDatabaseService')
->build(),
];
}
}
Solution 2: Require Integration Tests
Enforce that database test classes implement an integration test interface:
<?php
declare(strict_types=1);
namespace App\PHPStan\Rules;
use PhpParser\Node;
use PHPStan\Analyser\Scope;
use PHPStan\Node\InClassNode;
use PHPStan\Rules\Rule;
use PHPStan\Rules\RuleErrorBuilder;
/**
* Enforces that all database test classes implement DatabaseIntegrationTestInterface.
*
* This rule prevents bugs where SQL references wrong tables/columns but tests pass
* because they only use mocked data and never execute real SQL.
*
* @implements Rule<InClassNode>
*/
final class DatabaseTestClassRequiresIntegrationTestRule implements Rule
{
public function getNodeType(): string
{
return InClassNode::class;
}
/**
* @param InClassNode $node
*/
public function processNode(Node $node, Scope $scope): array
{
$classReflection = $node->getClassReflection();
// Only process test classes in tests/Database/ directory
if (!$this->isDatabaseTestClass($classReflection->getName())) {
return [];
}
// Check if class implements DatabaseIntegrationTestInterface
if (!$classReflection->implementsInterface('App\Tests\Database\DatabaseIntegrationTestInterface')) {
return [
RuleErrorBuilder::message(sprintf(
"Test class %s must implement DatabaseIntegrationTestInterface and have itWorksWithRealDb() method.\n\n" .
"Database Query/PreparedStmt/Generator test classes MUST have integration tests that execute\n" .
"SQL against real database to catch bugs like table name errors, missing columns, etc.\n\n" .
"Unit tests with mocked data are NOT sufficient - they don't validate SQL correctness.\n\n" .
"To fix:\n" .
"1. Implement App\\Tests\\Database\\DatabaseIntegrationTestInterface\n" .
"2. Add itWorksWithRealDb() method with #[Test] attribute\n" .
"3. Use real DatabaseServiceInterface from container (not mocked)\n" .
"4. Mark test class with #[Medium] or #[Large] attribute",
$classReflection->getDisplayName()
))
->identifier('databaseTest.missingIntegrationTest')
->build(),
];
}
// Check if itWorksWithRealDb() method exists
if (!$classReflection->hasMethod('itWorksWithRealDb')) {
return [
RuleErrorBuilder::message(sprintf(
"Test class %s implements DatabaseIntegrationTestInterface but is missing itWorksWithRealDb() method.\n\n" .
"The interface requires a method named 'itWorksWithRealDb' that:\n" .
"- Uses real DatabaseServiceInterface from container\n" .
"- Executes actual SQL against real database\n" .
"- Verifies query returns expected structure\n" .
"- Has #[Test] attribute",
$classReflection->getDisplayName()
))
->identifier('databaseTest.missingMethod')
->build(),
];
}
return [];
}
/**
* Check if class is a database test class in tests/Database/ directory.
*/
private function isDatabaseTestClass(string $className): bool
{
// Must be in App\Tests\Database namespace
if (!str_starts_with($className, 'App\Tests\Database\\')) {
return false;
}
// Must end with "Test"
if (!str_ends_with($className, 'Test')) {
return false;
}
// Must be in Query, PreparedStmt, or Generator subdirectory
return str_contains($className, '\Query\\')
|| str_contains($className, '\PreparedStmt\\')
|| str_contains($className, '\Generator\\');
}
}
Real-World Impact
These rules prevent an entire class of bugs where:
- SQL references wrong table names
- Queries reference dropped columns
- JOIN conditions use incorrect column names
- WHERE clauses have syntax errors
The rules enforce testing discipline at development time through static analysis, catching bugs before code review rather than in production.
Pattern 8: PHP Hash Lookups vs SQL Joins
Sometimes pulling data into PHP and using associative arrays for lookups is dramatically faster than complex SQL joins. This is especially true when you need to cross-check two tables with string processing, deduplication, or complex business logic that's difficult to express in SQL.
When PHP Processing Wins
SQL joins excel at set-based operations, but PHP hash lookups can be faster when:
- String processing required - Normalizing, trimming, regex matching, or case-insensitive comparisons
- Complex matching logic - Business rules that don't map cleanly to SQL WHERE clauses
- Multiple passes needed - Iterative processing where each row affects subsequent decisions
- Small-to-medium datasets - Under 100,000 rows that fit comfortably in memory
- Mixed data sources - Combining database results with API data or file system information
The Hash Lookup Pattern
PHP arrays with $array[$key] = true
structure provide O(1) lookup performance. According to PHP's hashtable implementation, this is one of the most optimized data structures in PHP:
<?php
// Basic hash lookup pattern - O(1) performance
class FastUserLookup
{
/** @var array<string, true> */
private array $activeUserEmails = [];
public function __construct(DatabaseServiceInterface $db)
{
// Load all active users into a hash map
$users = $db->query('SELECT email FROM users WHERE status = ?', ['active']);
foreach ($users as $user) {
// Use email as key, value is just 'true' for memory efficiency
$this->activeUserEmails[strtolower($user['email'])] = true;
}
}
public function isActiveUser(string $email): bool
{
// O(1) lookup - instant even with 100,000 users
return isset($this->activeUserEmails[strtolower($email)]);
}
public function filterActiveUsers(array $emailList): array
{
// Process thousands of emails in milliseconds
return array_filter(
$emailList,
fn(string $email) => $this->isActiveUser($email)
);
}
}
// Usage: Fast filtering of large lists
$lookup = new FastUserLookup($db);
$active = $lookup->filterActiveUsers($potentialSpamList); // Instant filtering
Real-World Example: Deduplication with Normalization
Consider matching customer records between two systems where names might have extra whitespace, different casing, or special characters. SQL can do fuzzy matching with LOWER()
and TRIM()
, but complex normalization is cleaner in PHP:
<?php
class CustomerMatcher
{
private function normalizeCustomerName(string $name): string
{
// Complex normalization that's cleaner in PHP than SQL
$name = trim($name);
$name = preg_replace('/\s+/', ' ', $name); // Collapse whitespace
$name = preg_replace('/[^\w\s]/', '', $name); // Remove special chars
$name = strtolower($name);
// Remove common business suffixes
$name = preg_replace('/\b(inc|llc|ltd|corp|corporation)\b/', '', $name);
$name = trim($name);
return $name;
}
public function matchCustomers(DatabaseServiceInterface $db): array
{
// Load both systems into memory
$system1 = $db->query('SELECT id, name, email FROM customers_system1');
$system2 = $db->query('SELECT id, name, email FROM customers_system2');
// Build hash map from system1 with normalized keys
$system1Map = [];
foreach ($system1 as $customer) {
$normalizedName = $this->normalizeCustomerName($customer['name']);
$system1Map[$normalizedName] = $customer;
}
// Match system2 customers against system1
$matches = [];
$unmatched = [];
foreach ($system2 as $customer) {
$normalizedName = $this->normalizeCustomerName($customer['name']);
if (isset($system1Map[$normalizedName])) {
// Found a match - O(1) lookup
$matches[] = [
'system1_id' => $system1Map[$normalizedName]['id'],
'system2_id' => $customer['id'],
'matched_name' => $normalizedName,
];
} else {
$unmatched[] = $customer;
}
}
return [
'matches' => $matches,
'unmatched' => $unmatched,
'match_rate' => count($matches) / count($system2),
];
}
}
// Trying this with SQL would require multiple joins with LOWER(), TRIM(),
// and REGEXP_REPLACE(), making it slow and hard to maintain.
// PHP version: simple, fast, and easy to debug.
Memory vs Performance Tradeoff
The key consideration: memory usage. Loading 100,000 rows into PHP arrays might consume 50-100MB of memory, but provides instant O(1) lookups. As discussed in PHP's garbage collection documentation, modern PHP handles large arrays efficiently.
Compare the tradeoffs:
- SQL JOIN approach - Minimal memory (streaming results), but slower for complex matching logic
- PHP hash lookup approach - Higher memory usage (load full datasets), but dramatically faster for complex operations
Performance Guidelines
Based on production experience, PHP hash lookups outperform SQL when:
- Dataset fits in available memory (check with
memory_get_usage()
) - Each row requires multiple string operations (regex, normalization, validation)
- Business logic is complex and would require multiple SQL passes
- You're joining more than 3-4 tables with complex conditions
When to Stick with SQL
Don't abandon SQL for everything. SQL remains superior for:
- Simple equi-joins - Straightforward foreign key relationships
- Large datasets - Millions of rows that won't fit in memory
- Aggregations - SUM, COUNT, GROUP BY operations
- Set operations - UNION, INTERSECT, EXCEPT
- Index-driven queries - When proper indexes make SQL lookups instant
According to Use The Index, Luke, a well-indexed SQL query can outperform any in-memory structure. The decision point: test both approaches with realistic data volumes.
Hybrid Approach: Best of Both Worlds
Often the optimal solution combines SQL and PHP processing:
<?php
class OrderProcessor
{
public function processHighValueOrders(DatabaseServiceInterface $db): array
{
// Step 1: Use SQL for initial filtering and aggregation
// Let the database do what it does best: filtering and joining
$sql = '
SELECT
o.id,
o.customer_id,
o.total,
c.email,
c.country,
c.vip_status
FROM orders o
INNER JOIN customers c ON o.customer_id = c.id
WHERE o.total > 1000
AND o.status = "pending"
AND o.created_at > DATE_SUB(NOW(), INTERVAL 7 DAY)
ORDER BY o.total DESC
';
$orders = $db->query($sql);
// Step 2: Use PHP for complex business logic
// Build lookup maps for fast processing
$vipCustomers = [];
$countryTaxRates = [];
foreach ($orders as $order) {
if ($order['vip_status']) {
$vipCustomers[$order['customer_id']] = true;
}
// Complex tax calculation that would be messy in SQL
if (!isset($countryTaxRates[$order['country']])) {
$countryTaxRates[$order['country']] = $this->calculateTaxRate(
$order['country'],
$order['total']
);
}
}
// Step 3: Process orders with complex business rules
$processed = [];
foreach ($orders as $order) {
$customerId = $order['customer_id'];
// Apply VIP discount (complex tiered logic)
if (isset($vipCustomers[$customerId])) {
$order['discount'] = $this->calculateVipDiscount($order['total']);
}
// Apply country-specific tax
$order['tax'] = $order['total'] * $countryTaxRates[$order['country']];
// Check if this triggers any promotional rules
$order['promotions'] = $this->checkPromotionalRules($order);
// Calculate final total with all adjustments
$order['final_total'] = $order['total']
- ($order['discount'] ?? 0)
+ $order['tax'];
$processed[] = $order;
}
return $processed;
}
private function calculateTaxRate(string $country, float $amount): float
{
// Complex tax logic with thresholds, exemptions, etc.
// Much cleaner in PHP than trying to express in SQL
return match($country) {
'US' => $amount > 5000 ? 0.08 : 0.06,
'UK' => 0.20,
'DE' => 0.19,
default => 0.0,
};
}
private function calculateVipDiscount(float $total): float
{
// Tiered discount structure
return match(true) {
$total >= 10000 => $total * 0.15,
$total >= 5000 => $total * 0.10,
$total >= 2000 => $total * 0.05,
default => 0.0,
};
}
private function checkPromotionalRules(array $order): array
{
// Complex promotional logic that would be nightmare in SQL
$promotions = [];
if ($order['total'] > 5000 && $order['country'] === 'US') {
$promotions[] = 'FREE_SHIPPING';
}
if (isset($order['vip_status']) && date('w') === '5') {
$promotions[] = 'FRIDAY_VIP_BONUS';
}
return $promotions;
}
}
This pattern uses SQL for initial filtering and joins, then PHP for complex business logic that's difficult or impossible to express in SQL. The result: minimal memory usage with maximum processing flexibility.
Additional Patterns and Considerations
Connection Pooling in PHP
Traditional PHP-FPM doesn't support true connection pooling due to PHP's stateless nature. However, OpenSwoole and Swoole extensions enable connection pooling in PHP. According to performance studies, connection pooling allows 10 database connections to serve 300 concurrent HTTP requests efficiently.
For traditional PHP-FPM deployments, use persistent connections via the PDO::ATTR_PERSISTENT
option. While not true pooling, persistent connections reduce connection overhead when using PHP-FPM's worker processes.
Query Result Caching
For frequently-accessed data that changes infrequently, implement query result caching using Redis or Memcached. According to 2025 PHP best practices, caching can improve access times by more than 80% for read-heavy workloads.
Database Indexing Strategy
Proper indexing remains the foundation of database performance. Focus indexes on:
- Foreign key columns used in JOINs
- Columns frequently used in WHERE clauses
- Columns used for sorting (ORDER BY)
- Covering indexes for frequently-run queries
Use EXPLAIN to analyze query execution plans and identify missing indexes.
Read Replicas and Scaling
For high-traffic applications, implement MySQL replication with read replicas. Route read queries to replicas and write queries to the primary server. This pattern, discussed in scaling strategies, distributes load and improves throughput.
Conclusion
These patterns represent years of production experience handling high-volume database operations in PHP. They're not theoretical exercises - they solve real problems that emerge at scale:
- Retry mechanisms eliminate manual intervention for transient failures
- Statement caching improves loop performance by 15-30%
- Bulk updates reduce operation time by 100-1000x
- Query/Statement/Generator classes organize SQL and provide type safety
- Derived tables optimize complex queries by 10-100x
- Transaction isolation balances correctness with performance
- PHPStan rules catch SQL errors at development time
- PHP hash lookups can outperform SQL for complex string processing and business logic
The key insight: use the right tool for each job. ORMs for typical CRUD operations, direct database access for performance-critical code. The patterns in this article give you the tools to build high-performance database layers when you need them, while maintaining the productivity benefits of ORMs for standard operations.
Remember that premature optimization wastes time. Start with an ORM for rapid development. Profile your application under realistic load. When you identify database bottlenecks, apply these patterns strategically to the hot paths. The combination of thoughtful design and targeted optimization produces applications that are both maintainable and performant.
Further Reading
- PHP PDO Documentation - Official PDO reference
- MySQL Optimization Guide - Comprehensive optimization strategies
- PHPStan Custom Rules - Creating your own static analysis rules
- PHP Generators - Official generator documentation
- InnoDB Transaction Isolation - Understanding isolation levels