Technical Architecture & Reference Documentation
| Runtime | Java 21 / Spring Boot 3.4 |
| Build | Maven / Docker Compose |
| DB Client | PostgreSQL 16 |
| Targets | PostgreSQL, MySQL 8, MongoDB 7 |
| Pattern | Strategy + Async Worker |
| Exposed Port | 8080 (REST API) |
- The Problem This Project Solves
- High-Level System Architecture
- Component Architecture
- The Polymorphic Strategy Pattern
- Asynchronous Execution Architecture
- Data Model
- Docker Infrastructure
- API Reference
- Deployment
- Key Design Decisions and Trade-offs
- Future Roadmap
Most backup setups are a cron job running pg_dump and hoping for the best. No retries, no timeouts, no audit trail — if it silently fails, you find out during a restore.
ResilientDB Engine treats backup jobs as first-class entities: every job is tracked, retried on failure, killed if it hangs, and queryable by status at any time.
| Problem | How the Engine Solves It |
|---|---|
| Silent hung processes | Every backup has a 30-second OS-level hard kill timeout. The actual pg_dump/mysqldump process is terminated via ProcessHandle, not just the Java thread wrapper. |
| No audit trail | Every job is persisted to PostgreSQL with full lifecycle timestamps (createdAt, startedAt, completedAt), status transitions, retry counts, and error messages. |
| Transient failures | The retry engine re-attempts failed backups up to 3 times with a 5-second backoff before permanently marking a job FAILED. |
| HTTP thread starvation | Backup execution is fully asynchronous. The API returns 201 immediately. A dedicated Spring thread pool handles all I/O work off the request thread. |
| Credential exposure | Passwords are never passed as shell arguments. PGPASSWORD and MYSQL_PWD are injected as process environment variables. MongoDB uses a URI with authSource. |
| Multi-database complexity | The Strategy Pattern routes each backup to the correct native tool automatically based on the registered DatabaseType, with no conditional logic in the core service. |
| Disk accumulation | A cleanup routine runs after every completed job, retaining the 5 most recent backup files and deleting older ones automatically. |
The system is composed of three logical tiers: a Spring Boot REST layer that accepts requests and immediately returns, an asynchronous worker layer that executes the backup using native OS tools, and a PostgreSQL persistence layer that records every state transition.
Figure 1: End-to-end request and job lifecycle flow
React / Vite Spring Boot Application (Port 8080)
Dashboard UI
| |
|---- HTTP POST /api/backups/trigger ---->|
| |
| BackupController
| |
| BackupServiceImpl
| |
| saveAndFlush(PENDING) --> PostgreSQL [commit]
| |
|<---- 201 CREATED (job: PENDING) --------|
| |
| BackupAsyncWorker (@Async)
| [task enqueued in thread pool]
| |
| fetchJob --> set IN_PROGRESS
| |
| executeWithTimeout(30s)
| |
| ProcessBuilder (OS)
| pg_dump / mysqldump / mongodump
| |
| set COMPLETED / FAILED
| saveAndFlush()
| |
|---- HTTP GET /api/backups (3s polling)->|
|<---- status updates --------------------|
| |
| PostgreSQL DB
| backup_operations
| database_configs
The critical design decision is the separation between the HTTP thread and the backup execution thread. The REST endpoint returns a 201 CREATED with the job in PENDING state before a single byte of backup data has been written. The UI polls GET /api/backups every 3 seconds to observe status transitions.
Three controllers handle all external interaction:
-
BackupController(/api/backups): Handles backup job CRUD. Trigger, read by ID, list all, filter by status, update status, delete, and download. The download endpoint streams the physical backup file usingFileSystemResourcewithContent-Disposition: attachment. -
DatabaseConfigController(/api/databases): Registers and lists database connection configurations. Configurations are stored asDatabaseConfigentities and referenced by ID when triggering backups, so credentials are never sent in the trigger request. -
HealthCheckController(/api/health): Exposes thread pool diagnostics at/api/health/executorincluding active thread count, queue depth, queue capacity, and a computed health status (healthy/degraded/critical). Useful for monitoring executor saturation.
Owns the trigger flow. Has no @Transactional annotation on triggerBackup() deliberately: the saveAndFlush() call commits the PENDING row to PostgreSQL atomically before the async dispatch is made. This eliminates the read-before-commit race condition where the async worker calls findById before the row is visible.
Strategy resolution uses Spring's automatic Map injection. Because each strategy is annotated @Component("POSTGRESQL"), @Component("MYSQL"), and @Component("MONGODB"), Spring populates Map<String, BackupStrategy> with the bean name as key. The lookup is a single map.get(databaseType.name()) with no if/else or switch.
A dedicated @Service class holding the @Async method. This is architecturally significant: Spring's @Async proxy only works when the method is called on a Spring-managed proxy, not through this.method() on the same bean. Extracting the async worker into its own class guarantees the Spring proxy intercepts the call and dispatches to the backupTaskExecutor thread pool every time.
The worker is responsible for the full execution lifecycle: fetching the job, transitioning to IN_PROGRESS, running the strategy inside a timeout boundary, handling retries, and writing the final terminal state. Each saveAndFlush() inside the worker is an independent, immediately-committed transaction.
A thin service over DatabaseConfigRepository. Exposes registerDatabase() and getById(). The getById() method throws a RuntimeException if the ID is not found, which propagates as a 500 to the caller and prevents a backup job from being created against a non-existent configuration.
The Strategy Pattern is the architectural backbone of the backup execution layer. It implements the Open/Closed Principle: the core orchestration logic (BackupServiceImpl, BackupAsyncWorker) is closed for modification, but open for extension. Adding support for a new database type requires only creating a new class that implements BackupStrategy and annotating it with the correct @Component name.
Figure 2: Strategy routing based on DatabaseType
BackupServiceImpl.triggerBackup()
|
| Map<String, BackupStrategy> backupStrategies
| (Spring auto-wires by @Component bean name)
|
+-- config.getDatabaseType() == POSTGRESQL
| --> backupStrategies.get("POSTGRESQL")
| --> PostgresBackupStrategy.execute()
| ProcessBuilder: pg_dump ...
| env: PGPASSWORD=<secret>
|
+-- config.getDatabaseType() == MYSQL
| --> backupStrategies.get("MYSQL")
| --> MySQLBackupStrategy.execute()
| ProcessBuilder: mysqldump ...
| env: MYSQL_PWD=<secret>
|
+-- config.getDatabaseType() == MONGODB
--> backupStrategies.get("MONGODB")
--> MongoBackupStrategy.execute()
ProcessBuilder: mongodump ...
uri: mongodb://user:pass@host/db
The interface defines two members:
-
getSupportedType(): Returns theDatabaseTypeenum value this strategy handles. Used for documentation and self-description. -
execute(BackupJob job, ProcessHolder processHolder): Executes the backup. TheProcessHolderparameter is a thread-safe holder that the strategy populates with the live OSProcessobject (pg_dump,mysqldump, ormongodump) immediately afterProcessBuilder.start(). The timeout monitor retrieves this handle to calldestroyForcibly()if the 30-second deadline is exceeded.
The ProcessHolder is a volatile field wrapper:
class ProcessHolder {
private volatile Process process;
public void setProcess(Process p) { this.process = p; }
public Process getProcess() { return this.process; }
}The volatile keyword ensures the process reference written by the BackupExec thread is immediately visible to the TimeoutMonitor thread without requiring explicit synchronization.
| Database | Credential Mechanism | Why |
|---|---|---|
| PostgreSQL | PGPASSWORD env var |
pg_dump with --no-password refuses interactive prompts. PGPASSWORD is read by the client library, not logged by the shell. |
| MySQL | MYSQL_PWD env var |
mysqldump with --no-defaults prevents my.cnf conflicts. MYSQL_PWD is deprecated in newer versions but remains the only non-interactive option for ProcessBuilder. |
| MongoDB | URI connection string | mongodump --uri embeds credentials in the URI passed to the driver, not as separate shell arguments, preventing shell history and process list exposure. |
The asynchronous layer is the most technically significant part of the system. It must satisfy three competing requirements: the HTTP thread must return immediately, backup jobs must execute with bounded runtime, and failed jobs must retry without losing state.
Figure 3: Thread interaction diagram for a single backup job
HTTP Thread (Tomcat) BackupWorker-N (Executor Pool)
───────────────────────────────────── ────────────────────────────────────
triggerBackup() called
|
| saveAndFlush(job, PENDING) --> PostgreSQL commit [immediate]
|
| backupAsyncWorker.performBackupAsync(id, strat)
| |
| [task enqueued in LinkedBlockingQueue]
| |
[returns 201 to client] thread picks up task
|
fetchJobWithRetry(id) --> findById()
job.status = IN_PROGRESS
saveAndFlush() --> PostgreSQL commit
|
executeBackupWithTimeout(30s)
|
new Thread: BackupExec-Job#N
strategy.execute(job, holder)
ProcessBuilder.start()
holder.setProcess(osProcess)
[pg_dump / mysqldump running]
job.setFilePath / setSizeBytes
|
[if > 30s]
TimeoutMonitor fires
holder.getProcess().destroyForcibly()
backupFuture.completeExceptionally()
job.status = FAILED
saveAndFlush()
|
[success path]
job.status = COMPLETED
saveAndFlush()
cleanupOldBackups()
The executor is configured in AsyncConfig:
backupTaskExecutor
├── CorePoolSize : 5 (always-on workers)
├── MaxPoolSize : 10 (burst capacity)
├── QueueCapacity : 25 (LinkedBlockingQueue)
├── RejectionPolicy: CallerRunsPolicy
├── KeepAlive : 60s (core threads time out when idle)
├── Shutdown : awaits termination 60s
├── Thread naming : BackupWorker-{N}
└── Monitored via : GET /api/health/executor
BackupTimeoutMonitor (dedicated ScheduledExecutorService)
├── Pool size : 2 daemon threads
├── Role : schedules a 30s kill task per backup job
└── On fire : osProcess.destroyForcibly()
backupFuture.completeExceptionally()
CallerRunsPolicy is the rejection handler. If the queue is full (25 jobs waiting with 10 threads active), the next submission runs on the HTTP request thread instead of being dropped. This provides backpressure without silent data loss.
The timeout mechanism kills the actual OS process, not just the Java thread. This distinction matters: interrupting a Java thread that is blocked on Process.waitFor() does not kill the pg_dump process. The child process continues running, holding a database connection and writing to disk, while the Java thread silently exits.
The correct sequence is:
strategy.execute()spawns the OS process and stores it inProcessHolder.- A
ScheduledFutureis submitted to theTimeoutMonitorexecutor with a 30-second delay. - If the
CompletableFutureis not done after 30 seconds, the monitor callsholder.getProcess().destroyForcibly(). - The monitor waits up to 2 seconds for the OS process to terminate.
- The monitor then interrupts the
BackupExecthread and completes the future exceptionally withTimeoutException. - The worker catches
TimeoutException, sets statusFAILED, saves, and returns.
Figure 4: Backup job state transitions
[PENDING]
|
| BackupAsyncWorker picks up
v
[IN_PROGRESS] <──────────────────────────────+
| |
| strategy.execute() throws |
v |
retryCount++ |
| |
+-- retryCount < maxRetries(3) -- sleep 5s (retry loop) -->+
|
+-- retryCount >= maxRetries
| |
| v
| [FAILED] (permanent)
|
+-- TimeoutException (30s hard kill)
| |
| v
| [FAILED] (timeout)
|
+-- success
|
v
[COMPLETED]
The state machine has exactly one terminal success state (COMPLETED) and two terminal failure states (FAILED via retry exhaustion, FAILED via timeout). The CANCELLED state is reserved for future manual cancellation support via the PATCH /api/backups/{id}/status endpoint.
| Column | Type | Description |
|---|---|---|
id |
BIGINT PK |
Auto-generated identity. |
jobName |
VARCHAR |
Human-readable label provided at trigger time. |
status |
VARCHAR (enum) |
PENDING, IN_PROGRESS, COMPLETED, FAILED, CANCELLED. |
databaseType |
VARCHAR (enum) |
POSTGRESQL, MYSQL, MONGODB (and reserved others). |
database_config_id |
BIGINT FK |
Foreign key to database_configs. EAGER fetched. |
filePath |
TEXT |
Absolute path to the generated backup file on the container filesystem. |
sizeBytes |
BIGINT |
File size in bytes, set after successful backup. |
errorMessage |
TEXT |
Full error detail including retry context. TEXT type to handle long stack traces. |
retryCount |
INT |
Number of attempts made. Starts at 0. |
maxRetries |
INT |
Configured maximum retries. Defaults to 3. |
createdAt |
TIMESTAMP |
Set by @PrePersist. Immutable after insert (updatable=false). |
startedAt |
TIMESTAMP |
Set when async worker transitions to IN_PROGRESS. |
completedAt |
TIMESTAMP |
Set on COMPLETED terminal state. |
updatedAt |
TIMESTAMP |
Set by @PreUpdate on every save. |
| Column | Type | Description |
|---|---|---|
id |
BIGINT PK |
Auto-generated identity. |
databaseType |
VARCHAR (enum) |
Determines which BackupStrategy is used. |
host |
VARCHAR |
Hostname or IP. Use host.docker.internal for local machine databases. |
port |
INTEGER |
Database port. |
dbName |
VARCHAR |
Target database name. |
username |
VARCHAR |
Authentication username. |
password |
VARCHAR |
Authentication password. Stored in plaintext (intended for local/demo use). |
createdAt |
TIMESTAMP |
Set by @PrePersist. |
Figure 5: Container topology
Host Machine
┌──────────────────────────────────────────────────────────┐
│ Docker Compose Network: resilientdb_network │
│ │
│ ┌───────────────────────────────────────────────────┐ │
│ │ resilientdb_app │ │
│ │ Image: custom Ubuntu layer Port: 8080 (exposed)│<──┼──> Browser / Postman
│ │ Tools installed: │ │
│ │ postgresql-client-16 │ │
│ │ mysql-client │ │
│ │ mongodb-database-tools │ │
│ │ │ │
│ │ Volume: ./backups:/app/backups (bind mount) │ │
│ └──────────────────────┬────────────────────────────┘ │
│ │ JDBC / internal network │
│ v │
│ ┌───────────────────────────────────────────────────┐ │
│ │ resilientdb_postgres │ │
│ │ Image: postgres:16 Port: 5432 (internal│ │
│ │ Database: resilientdb_meta Port: 5433 (exposed)│ │
│ │ Tables: │ │
│ │ backup_operations │ │
│ │ database_configs │ │
│ │ │ │
│ │ Volume: pgdata (named) │ │
│ └───────────────────────────────────────────────────┘ │
│ │
│ External DBs (host.docker.internal): │
│ PostgreSQL :5432 MySQL :3306 MongoDB :27017 │
└──────────────────────────────────────────────────────────┘
The application container is built on a custom Ubuntu base that installs the exact CLI tool versions required:
postgresql-client-16: Providespg_dump. Version-pinned to 16 to match the PostgreSQL wire protocol version in common use.mysql-client: Providesmysqldump. The--column-statistics=0flag is passed explicitly to suppress theINFORMATION_SCHEMA.COLUMN_STATISTICSquery that fails on older MySQL server versions.mongodb-database-tools: Providesmongodump. Installed separately from the MongoDB server package to keep the image lean.
The backups directory is bind-mounted from the host (./backups:/app/backups). This means generated backup files survive container restarts and are directly accessible on the host filesystem without docker cp.
The PostgreSQL 16 container stores job metadata and configurations only. It does not store backup data. A named Docker volume (pgdata) is used for the PostgreSQL data directory rather than a bind mount, which provides better I/O performance for database workloads.
The internal port 5432 is used for JDBC connections from the application container. Port 5433 is exposed to the host for direct inspection with psql or a GUI client during development.
| Method | Endpoint | Request Body | Response |
|---|---|---|---|
POST |
/api/databases |
DatabaseConfig JSON |
201 + saved config with id |
GET |
/api/databases |
— | 200 + list of all configs |
| Method | Endpoint | Request | Response |
|---|---|---|---|
POST |
/api/backups/trigger |
{ databaseConfigId, jobName } |
201 + BackupJobResponse (PENDING) |
GET |
/api/backups |
— | 200 + list of all jobs |
GET |
/api/backups/{id} |
— | 200 + single job |
GET |
/api/backups/latest |
— | 200 + top 10 by createdAt |
GET |
/api/backups/status/{status} |
— | 200 + jobs filtered by status |
PATCH |
/api/backups/{id}/status |
?status=CANCELLED |
200 + updated job |
DELETE |
/api/backups/{id} |
— | 204 No Content, deletes file |
GET |
/api/backups/{id}/download |
— | 200 + octet-stream (Content-Disposition: attachment) |
| Method | Endpoint | Response |
|---|---|---|
GET |
/api/health/ping |
{ status: ok, service, timestamp } |
GET |
/api/health/executor |
Thread pool stats: active_threads, pool_size, queue_size, status (healthy / degraded / critical) |
- Docker Engine 24.0+: Required for compose v2 syntax and bind mount support.
- Maven 3.9+ or included wrapper: For building the JAR before Docker image creation.
# 1. Clone the repository
git clone https://github.com/shravan606756/resilient-db-engine.git
cd resilientdb-engine
# 2. Build the application JAR
./mvnw clean package -DskipTests
# 3. Build the Docker image and start all containers
docker compose up -d --build
# 4. Verify all tools are installed inside the container
docker exec -it resilientdb_app sh -c \
"pg_dump --version && mysqldump --version && mongodump --version"
# 5. Confirm the API is reachable
curl http://localhost:8080/api/health/pingNote: When running inside Docker,
localhostinside the container refers to the container itself, not the host machine.To target a database running on your local machine, use
host.docker.internalas thehostvalue when registering a database configuration.Additionally, ensure the target database is configured to accept connections from external addresses (e.g.,
listen_addresses = '*'inpostgresql.conf, and an appropriate entry inpg_hba.conf).
Example registration payload for a host machine PostgreSQL database:
{
"databaseType": "POSTGRESQL",
"host": "host.docker.internal",
"port": 5432,
"dbName": "myapp_production",
"username": "backup_user",
"password": "<password>"
}| Decision | Rationale | Trade-off |
|---|---|---|
| Native CLI tools over JDBC | pg_dump and mysqldump produce complete, consistent snapshots including schema, stored procedures, and triggers that are difficult to replicate via JDBC. |
Requires tool installation in the container. Versions must be compatible with the target server. |
ProcessBuilder over Runtime.exec() |
ProcessBuilder allows per-process environment variable injection without shell interpretation, preventing injection attacks and credential leakage. |
More verbose. Arguments must be passed as discrete tokens, not a shell string. |
Dedicated BackupAsyncWorker class |
Spring @Async only works through a proxy. Self-invocation via this. bypasses the proxy. A separate bean guarantees proxy-mediated dispatch on every call. |
Additional class. Slightly less obvious code locality. |
No @Transactional on triggerBackup() |
saveAndFlush() without an enclosing transaction commits immediately. The async worker can always findById() successfully on the first attempt. |
No rollback if strategy lookup fails after save. The dangling PENDING row must be cleaned up manually. |
| OS-level process kill on timeout | Java thread interruption does not kill child processes. destroyForcibly() on the OS process handle is the only reliable mechanism. |
Partial backup files may be left on disk after a kill. The cleanup routine only deletes files older than the 5 most recent. |
| 3s polling over WebSocket | Simpler infrastructure. No session state. Works correctly with CORS. Sufficient for demo purposes. | Slight status display lag. Not suitable for high-frequency update scenarios. |
The backend is deployed to AWS EC2 through a GitHub Actions CD pipeline. On every push to main, the pipeline builds the JAR, packages it into a Docker image, pushes it to Amazon ECR, and SSH-deploys it to the EC2 instance.
Pipeline overview (.github/workflows/deploy.yml):
name: Deploy to AWS EC2
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up JDK 21
uses: actions/setup-java@v3
with:
java-version: '21'
distribution: 'temurin'
- name: Build JAR
run: ./mvnw clean package -DskipTests
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v2
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: us-east-1
- name: Login to Amazon ECR
uses: aws-actions/amazon-ecr-login@v1
- name: Build and push Docker image
run: |
docker build -t ${{ secrets.ECR_REGISTRY }}/resilientdb-engine:latest .
docker push ${{ secrets.ECR_REGISTRY }}/resilientdb-engine:latest
- name: Deploy to EC2 via SSH
uses: appleboy/ssh-action@v1
with:
host: ${{ secrets.EC2_HOST }}
username: ec2-user
key: ${{ secrets.EC2_SSH_KEY }}
script: |
aws ecr get-login-password --region us-east-1 | \
docker login --username AWS --password-stdin ${{ secrets.ECR_REGISTRY }}
docker pull ${{ secrets.ECR_REGISTRY }}/resilientdb-engine:latest
docker stop resilientdb-engine || true
docker rm resilientdb-engine || true
docker run -d --name resilientdb-engine \
-p 8080:8080 \
-e SPRING_DATASOURCE_URL=${{ secrets.DB_URL }} \
-e SPRING_DATASOURCE_USERNAME=${{ secrets.DB_USER }} \
-e SPRING_DATASOURCE_PASSWORD=${{ secrets.DB_PASS }} \
${{ secrets.ECR_REGISTRY }}/resilientdb-engine:latestRequired GitHub secrets: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, ECR_REGISTRY, EC2_HOST, EC2_SSH_KEY, DB_URL, DB_USER, DB_PASS.
The EC2 instance should have Docker installed, the ECR registry accessible via IAM role, and port 8080 open in its security group. Use Amazon RDS (PostgreSQL) as the managed database to replace the local resilientdb_postgres container.
Alternative — GCP Compute Engine: If deploying to GCP instead, replace ECR with Google Artifact Registry and the EC2 SSH step with a
gcloud compute sshcommand or Cloud Run deployment. The GitHub Actions structure remains identical — only the registry login and deploy target change.
The React/Vite dashboard deploys to Vercel. Point the API base URL at the EC2 instance's public IP or domain.
# Install Vercel CLI and deploy
npm i -g vercel
vercel deployIn your Vercel project settings, add:
VITE_API_BASE_URL=http://<your-ec2-public-ip>:8080
Then reference it in your API client:
const BASE_URL = import.meta.env.VITE_API_BASE_URL;Vercel handles CI/CD automatically on every push to main — no further configuration needed.
ResilientDB Engine | Java 21 / Spring Boot 3.4 | github.com/shravan606756/resilient-db-engine