Introduction
Quartz scheduler overlapping job executions occur when a new trigger fires before the previous execution of the same job has completed. This can lead to duplicate processing, race conditions on shared resources, database lock contention, and data corruption. The default Quartz behavior allows concurrent execution of the same job, which is appropriate for stateless jobs but dangerous for jobs that process queues, update shared state, or call idempotent-lacking external services.
Symptoms
Database shows duplicate processing:
```sql SELECT COUNT(*), batch_id FROM processing_log GROUP BY batch_id HAVING COUNT(*) > 1;
count | batch_id -------+---------- 2 | BATCH-042 -- Processed twice! 2 | BATCH-043 ```
Or logs show overlapping execution:
2024-03-15 10:00:00 INFO [quartz-1] DataSyncJob - Starting batch sync
2024-03-15 10:00:05 INFO [quartz-2] DataSyncJob - Starting batch sync <-- Second execution started before first completed
2024-03-15 10:00:10 ERROR [quartz-2] DataSyncJob - Failed: batch already in progress
2024-03-15 10:00:30 INFO [quartz-1] DataSyncJob - Batch sync completeOr Quartz misfire warnings:
WARN org.quartz.core.ErrorLogger - MisfireHandler: Error handling misfires:
Fired trigger 'DataSyncJob.trigger' has missed its scheduled fire-time by 120000msCommon Causes
- No @DisallowConcurrentExecution annotation: Quartz allows the same job to run concurrently by default
- Job takes longer than trigger interval: 5-minute cron job that takes 7 minutes to complete
- Misfire policy fires immediately:
MisfireInstruction.MISFIRE_INSTRUCTION_FIRE_NOWcauses backlog - Clustered scheduler without proper locking: Multiple Quartz instances on different nodes pick up the same trigger
- Stateful job not using @PersistJobDataAfterExecution: Shared state becomes stale between executions
- Thread pool too small: Jobs queue up in the thread pool, causing delayed execution and misfires
Step-by-Step Fix
Step 1: Prevent concurrent execution
```java import org.quartz.DisallowConcurrentExecution; import org.quartz.Job; import org.quartz.JobExecutionContext; import org.quartz.JobExecutionException;
@DisallowConcurrentExecution // Prevents overlapping executions public class DataSyncJob implements Job {
@Override public void execute(JobExecutionContext context) throws JobExecutionException { log.info("Starting batch sync"); try { dataSyncService.syncAll(); } catch (Exception e) { throw new JobExecutionException("Sync failed", e, true); // true = refire } log.info("Batch sync complete"); } } ```
If the job is also stateful:
```java @DisallowConcurrentExecution @PersistJobDataAfterExecution // Saves JobDataMap changes after execution public class StatefulSyncJob implements Job { @Override public void execute(JobExecutionContext context) throws JobExecutionException { JobDataMap data = context.getJobDetail().getJobDataMap(); int retryCount = data.getInt("retryCount");
// Update the counter - persisted after execution data.put("retryCount", retryCount + 1); } } ```
Step 2: Configure misfire handling
Trigger trigger = TriggerBuilder.newTrigger()
.withIdentity("dataSyncTrigger")
.withSchedule(CronScheduleBuilder.cronSchedule("0 0/5 * * * ?")
.withMisfireHandlingInstructionDoNothing()) // Skip missed executions
.build();Misfire strategies:
- withMisfireHandlingInstructionDoNothing(): Skip the missed fire, wait for next scheduled time
- withMisfireHandlingInstructionFireNow(): Fire immediately (can cause backlog)
- withMisfireHandlingInstructionIgnoreMisfires(): Fire all missed executions (use with caution)
Step 3: Configure clustered scheduler
# quartz.properties
org.quartz.jobStore.class = org.quartz.impl.jdbcjobstore.JobStoreTX
org.quartz.jobStore.driverDelegateClass = org.quartz.impl.jdbcjobstore.PostgreSQLDelegate
org.quartz.jobStore.isClustered = true
org.quartz.jobStore.clusterCheckinInterval = 15000
org.quartz.threadPool.threadCount = 10Prevention
- Always annotate jobs with
@DisallowConcurrentExecutionunless concurrent execution is explicitly required - Set trigger intervals longer than the expected job execution time (with buffer)
- Use
withMisfireHandlingInstructionDoNothing()for jobs where catching up is not useful - Monitor job execution duration and alert when it approaches the trigger interval
- In clustered mode, ensure all nodes have synchronized clocks (use NTP)
- Use a database-backed
JobStorefor clustered deployments, notRAMJobStore