DevOps Story: How We Fixed 92 Blocked Migrations in One Day

Date: November 03, 2025

Journey: From Legacy System to Updated Version

Duration: 730 Days Without Updates

Prologue: The Wake-Up Call

It was early morning when the first alert came through. 3:47 AM. The notification buzzed on our phones. Then another. And another. Red warning lights flashing on the monitoring dashboard, each one telling us something was wrong.

The old server was showing signs of age. Memory leaks were slowing things down. Response times were getting longer. Hardware warnings appeared more frequently. We had to move everything to a new server. We had to restore from a backup.

The good news was we had backups. We had been backing up our database to S3 every single day for the past two years. Automated backups ran every night at 1 AM. We had hundreds of backup files sitting in our S3 bucket, ready to use.

The bad news was we had never updated the application code. Not once in two years.

The application was open source. The developers released new versions constantly. Version 1.0 came out in 2023. Then 1.1, 1.2, 1.3, all the way to version 1.12.3. Each version had bug fixes, security patches, new features, and database migrations.

But we never updated our code. We kept running the old version. The old server ran the same code for two years. Same application. Same database schema. Frozen in time while the world moved forward.

Now we were restoring one of those daily backups—from October 29, 2023, exactly two years ago—to a brand new server. But this time, we had downloaded the latest code. Version 1.12.3. The newest version with all the latest features.

We thought it would work. We thought the backup would be compatible with the new code.

We were wrong.

The Restoration

The new server was clean and ready. We set it up with the latest version of the application code. Version 1.12.3. All the newest features. All the latest security patches. Everything modern and up to date.

Then we downloaded the backup from S3. We had backups every day, so we chose one from two years ago to test the restoration process:

docker compose -f production.yml run --rm awscli download backup_2025_10_29T01_00_02.sql.gz

The command executed. The file downloaded from S3. Progress bars showed the transfer was successful. The backup file was there, ready to restore.

We restored it to the database. The restoration process completed without errors. The database was populated. All the old data was there. Users, documents, teams—everything from two years ago was restored.

We started the application. The containers came online. The services started. Everything looked good.

We checked the logs. No errors. Everything green. The system was running.

We thought we were done. We thought everything was working.

Then someone tried to log in.

The Login Failure

The first user tried to sign in. They entered their email and password. They clicked the login button. Nothing happened.

They tried again. Same result. The page just reloaded, showing an error message: “Authentication failed.”

Another user tried. Same thing. Couldn’t log in.

Another one. Same problem.

We tried our own test accounts. None of them worked. The login page loaded fine. The form was there. But when we clicked the login button, nothing happened. No user could sign in.

We checked the application logs. That’s when we saw the problem.

Error messages filled the screen. Red text scrolling down, one error after another:

Column "Session.ipAddress" does not exist
Error: Cannot write to column that doesn't exist

The application was trying to save the user’s IP address when they logged in. But the column didn’t exist in the database.

The new application code—version 1.12.3—expected the Session table to have an ipAddress column. This column was supposed to store the IP address of each user’s login session. It was added in a migration that ran months ago.

But our database was from two years ago. The old database didn’t have this column. The old version of the application didn’t track IP addresses.

Every time someone tried to log in, the application tried to write the IP address to the database. But the column didn’t exist, so the database rejected the query. The session never saved. The login failed.

“Why can’t they log in?” someone asked.

“The new code expects columns that don’t exist in the old database,” I replied. “The application code has been updated, but the database schema hasn’t. They don’t match anymore.”

The Blocking Migration

We needed to run the migrations to update the database schema. The application code had all the migration files included. We just needed to apply them to our database.

So we ran the migration command:

npx prisma migrate deploy

But it failed immediately. The error message was clear:

Migration "20240408142543_add_recipient_document_delete" failed.
Cannot proceed with other migrations until this one is resolved.

One of the migrations had failed. And because it failed, Prisma wouldn’t let us run any other migrations. It was a safety feature—Prisma prevented us from applying new migrations when an old one was in a failed state.

The failed migration was from April 2024. It was supposed to:

  1. Add a documentDeletedAt column to the Recipient table
  2. Update existing recipients to mark deleted documents
  3. Delete old PENDING documents that were soft-deleted

The migration had tried to delete documents, but foreign key constraints prevented it. The documents had relationships with other tables. You couldn’t delete them without breaking those relationships. So the migration failed halfway through.

And because it failed, every migration after it was blocked. We couldn’t apply any new migrations until we fixed this one.

We were stuck. The database couldn’t be updated. The application couldn’t run properly. Users couldn’t log in.

We had to fix the failed migration first.

The Email Exposure

While we were investigating the login problem, we noticed something else. Something that made us pause.

When users tried to access the application, their email addresses appeared in the browser’s URL bar. Clear as day. Visible to anyone who looked at the screen.

For example, if a user named john.doe@company.com tried to log in, the URL would show:

<https://ourserver.com/auth?email=john.doe@company.com>

This was a security problem. User emails shouldn’t be in URLs. They shouldn’t be visible in the browser address bar. They shouldn’t be logged in server access logs for everyone to see.

We traced the problem back. The old application code had an authentication flow that used query parameters to pass user emails. It was a quick way to handle authentication, but it wasn’t secure.

The new code had fixed this. The new version used secure tokens instead of emails. But when we restored the old database, something caused the application to fall back to the old authentication method. The old database structure triggered old code paths.

So even though we were running the new code, the old database made it behave like the old code. User emails were exposed in URLs.

Now we had two problems:

  1. Users couldn’t log in (missing database columns)
  2. When they could log in, their emails were exposed (old authentication flow)

We needed to fix both. But we couldn’t run migrations until we fixed the failed one first.

Fixing the Failed Migration

We had to resolve the failed migration before we could do anything else. We couldn’t just skip it. We had to mark it as completed or rolled back, then manually apply the safe parts.

First, we marked it as rolled back in Prisma’s tracking system:

npx prisma migrate resolve --rolled-back 20240408142543_add_recipient_document_delete

This told Prisma that we were handling this migration ourselves. It cleared the path for other migrations to run.

But we couldn’t just mark it and move on. The migration had tried to do important things. The documentDeletedAt column needed to be added. Recipients needed to be updated. We had to manually apply the safe parts of the migration.

We connected directly to the database. We ran SQL commands one by one, carefully:

-- Add the columnALTER TABLE "Recipient" ADD COLUMN IF NOT EXISTS "documentDeletedAt" TIMESTAMP(3);

The column was added. The database accepted the change. Good.

Next, we updated the recipients to mark which documents were deleted:

UPDATE "Recipient"
SET "documentDeletedAt" = "Document"."deletedAt"FROM "Document"WHERE "Recipient"."documentId" = "Document"."id"AND "Document"."deletedAt" IS NOT NULLAND "Recipient"."documentDeletedAt" IS NULL;

This updated all recipients whose documents had been deleted. We skipped the DELETE operation that had caused the migration to fail. We didn’t need to delete old documents right now—we just needed the column and the data updates.

Finally, we updated Prisma’s migration tracking table to mark this migration as complete:

UPDATE _prisma_migrations
SET
  started_at = COALESCE(started_at, CURRENT_TIMESTAMP),
  finished_at = CURRENT_TIMESTAMP,
  rolled_back_at = NULLWHERE migration_name = '20240408142543_add_recipient_document_delete';

The failed migration was resolved. The path forward was clear. Now we could run the other migrations.

Running All Migrations

With the failed migration resolved, we ran the migration deployment command again:

npx prisma migrate deploy

This time it worked. The migrations began to apply, one after another.

Each migration appeared on the screen:

  • Migration 1: Creating new table…
  • Migration 2: Adding new column…
  • Migration 3: Creating index…
  • Migration 4: Adding foreign key…

On and on they went. Migration after migration. Each one updating the database schema, bringing it closer to what the new application code expected.

Ninety-two migrations in total. Each one applying changes:

  • Creating 12 new tables
  • Adding 47 missing columns
  • Creating 23 new indexes
  • Establishing 8 foreign key relationships

One important migration was 20250217120859_add_session_fields. This one added the ipAddress column to the Session table—the column that was causing the login failures.

Another important one was 20250522054050_add_organisations. This one added the entire organization management system. New tables for organizations, teams, members, settings. Everything needed for the multi-organization features.

The migrations ran for what felt like hours. Each one completing successfully. The database schema transforming from the old version to the new one.

Finally, the last migration completed. All 92 migrations were applied. The database schema was now up to date. It matched what version 1.12.3 expected.

Testing the Login

With all migrations applied, we restarted the application. The services came back online. The containers restarted. Everything initialized.

We tried to log in again. We entered our test credentials. We clicked the login button. We waited.

It worked.

The login succeeded. The session was created. The IP address was saved. We were logged in.

We tested with other accounts. They all worked. Users could sign in again. The authentication system was functional.

But our celebration was short-lived. When users logged in and tried to access their documents, they found nothing.

The Missing Documents

Users could log in now. That was good. But when they looked for their documents, they found empty lists.

“I had twenty documents here,” one user said. “Where are they?”

Another user: “My signed documents are gone. They were here before.”

A third user: “I can’t see any of my work. Everything is missing.”

We checked the database. The documents were definitely there. We ran queries. The data existed. The records were intact. Users had documents. Documents had recipients. Everything was connected properly.

But the application couldn’t find them. The queries were returning empty results.

We examined the application code. The document listing queries filtered results using the documentDeletedAt column. The query looked something like:

SELECT * FROM "Document" d
JOIN "Recipient" r ON r."documentId" = d.idWHERE r."documentDeletedAt" IS NULLAND d."deletedAt" IS NULL

This query only returned documents where documentDeletedAt was NULL—meaning the document wasn’t deleted. But when we checked the database, we found that for many recipients, the documentDeletedAt column wasn’t set to NULL. It wasn’t set to anything. It was just empty.

When we added the column during migration, we only updated recipients whose documents were deleted. We set their documentDeletedAt to the document’s deletedAt timestamp. But we didn’t explicitly set it to NULL for recipients whose documents were still active.

Some database systems treat empty or uninitialized columns differently. Some might return them as NULL in queries. Others might not. It depended on the specific situation.

We needed to explicitly set the column to NULL for all active recipients. We ran another update query:

UPDATE "Recipient"
SET "documentDeletedAt" = NULLWHERE "documentDeletedAt" IS NULLAND EXISTS (
  SELECT 1 FROM "Document" d
  WHERE d.id = "Recipient"."documentId"
  AND d."deletedAt" IS NULL);

This ensured that all recipients with active documents had their documentDeletedAt set to NULL. The queries would now work properly.

We tested again. The documents began to appear. Users could see their work again. The document lists were complete.

Fixing the Email Exposure

With login working and documents visible, we turned our attention back to the email exposure problem. User emails shouldn’t be in URLs. It was a security flaw.

The problem was in the authentication routing code. The old code path used query parameters:

// Old code (insecure):redirect(`/auth?email=${user.email}`);

The new code should use secure tokens instead:

// New code (secure):redirect(`/auth?token=${sessionToken}`);

But the old database structure was causing the application to fall back to the old method. We needed to fix the authentication routing to always use tokens, never emails.

We updated the code. We changed the authentication flow. We made sure emails were never passed in URLs. We switched everything to use secure session tokens.

We deployed the fix. We tested the login flow. The email disappeared from the URL. Instead of ?email=john.doe@company.com, it now showed ?token=abc123xyz...—a secure token that couldn’t be used to identify the user.

The security flaw was fixed. User emails were no longer exposed.

The Certificate Problem

While we were working through all these issues, we discovered another problem. One that had been hiding in the code the whole time.

The Dockerfile had a syntax error. The certificate generation code had an extra RUN statement with wrong indentation. It looked like this:

# Broken code:RUN mkdir -p /app/certs
    RUN cd /app/certs && \\    # <- Extra RUN, wrong indentation    openssl genrsa ...

This caused the certificates to generate incorrectly. They were being created, but with wrong permissions and paths. The document signing feature wasn’t working properly.

We fixed it by combining everything into a single RUN command with proper indentation:

# Fixed code:
RUN mkdir -p /app/certs && \    
    cd /app/certs && \
    openssl genrsa -out private.key 2048 && \
    openssl req -new -x509 -key private.key -out certificate.crt -days 365 \
    -subj "/C=US/ST=State/L=City/O=Organization/CN=documenso" && \
    openssl pkcs12 -export -out cert.p12 \
    -inkey private.key \
    -in certificate.crt \
    -legacy \
    -passout pass:documenso && \
    chown 1001:1001 cert.p12 private.key certificate.crt

The certificates would now generate correctly. But we had to rebuild the Docker image for this fix to take effect. We couldn’t do that right now—we had to finish fixing the other issues first.

The Pending Status Problem

As we worked through the issues, users reported another problem. Documents were showing as “pending” when they shouldn’t be.

Documents that had been completed and signed months ago appeared as if they were still waiting for signatures. Users were confused. Workflows looked incomplete.

The problem was related to the same migration we had fixed earlier—the one that added documentDeletedAt. But the status tracking system depended on multiple columns working together:

  • documentDeletedAt – Whether the recipient’s document was deleted
  • signedAt – When the recipient signed
  • readStatus – Whether the document was opened
  • sendStatus – Whether the document was sent
  • signingStatus – The current signing status

Several of these columns were missing or improperly initialized. Some had been added in migrations we just ran. Others existed but had wrong values because they were created two years ago with old logic.

We had to update all these columns to ensure status tracking worked correctly. We ran update queries to fix the status values. We synchronized recipients with their documents. We updated workflow states.

Slowly, the pending statuses began to resolve. Documents showed their correct states. Completed documents showed as completed. Pending documents showed as pending. Workflows unblocked.

The Complete Picture

After resolving all the immediate issues, we took a step back to understand what had happened.

We had restored a two-year-old database backup to a new server running the latest application code. The backup was from November 2025 Old version Postgres backup. The code was version 1.12.3 from October 29, 2025.

During those two years:

  • The application code evolved from version 1.0 to 1.12.3
  • Ninety-two database migrations were created
  • New features were added: organizations, enhanced security, improved workflows
  • Security fixes were implemented
  • Performance optimizations were applied

But our old server never got these updates. It kept running version 1.0 with the old database schema. For two whole years.

When we restored the old database to the new server, we created a mismatch:

  • New code expected new database columns (like Session.ipAddress)
  • Old database didn’t have these columns
  • New code expected new tables (like Organisation)
  • Old database didn’t have these tables
  • New code used new authentication methods
  • Old database triggered old code paths

Every problem we encountered came from this mismatch:

  1. Login issues → Missing Session.ipAddress column
  2. Missing documents → Missing documentDeletedAt column not properly initialized
  3. Pending statuses → Missing or incorrect status tracking columns
  4. Email exposure → Old authentication flow triggered by old database
  5. Certificate problems → Old Dockerfile with syntax errors

We had to apply two years of migrations in one day. We had to fix two years of accumulated technical debt. We had to catch up to the present.

The Final Verification

Once everything was fixed, we verified the entire system. We checked every feature. We tested every function.

Login: Working. Users could sign in. Authentication was functional.

Documents: Visible. All documents appeared for their owners. Document lists were complete.

Email Exposure: Fixed. Emails no longer appeared in URLs. Security flaw was closed.

Status Tracking: Working. Documents showed correct states. Workflows operated smoothly.

Migrations: Complete. All 92 migrations were applied. Database schema matched version 1.12.3.

Features: Available. Organization management, enhanced security, improved workflows—all the features from the past two years were now available.

The system was operational. Everything was working. The migration was complete.

Lessons Learned

We learned several important lessons from this experience.

Lesson 1: Regular Updates Are Essential

The application was open source. Updates were available. But we never applied them. We thought “if it’s not broken, don’t fix it.” But that thinking led to two years of technical debt. When we finally needed to update, we had to apply two years of changes at once.

Regular updates—monthly or quarterly—would have prevented all these problems. Each migration would have been applied as it was released. Each feature would have been tested as it was added. Each security fix would have been deployed immediately.

Lesson 2: Database and Code Must Stay in Sync

The application code and database schema must always match. When code is updated, migrations must be applied. When migrations are applied, code must be updated. They can’t be out of sync.

In our case, the code had been updated (we downloaded version 1.12.3), but the database had not (we restored a two-year-old backup). This mismatch caused every problem we encountered.

Lesson 3: Automated Backups Are Not Enough

We had automated backups running every day. That was good. But backups alone aren’t enough. You also need to:

  • Keep the application code updated
  • Keep the database schema current
  • Test restorations regularly
  • Verify backups actually work

Backups ensure you don’t lose data. Updates ensure you can actually use that data with modern code.

Lesson 4: Migration Safety Features Exist for a Reason

Prisma blocked us from running new migrations when an old one failed. At first, this seemed like a problem. But it was actually protecting us. It prevented us from creating more problems by applying migrations in the wrong order.

We had to fix the failed migration first, then apply the others. This was the correct approach, even though it took longer.

Lesson 5: Manual Intervention Is Sometimes Necessary

Not everything can be automated. Sometimes you need to manually fix things. The failed migration couldn’t be automatically fixed. We had to manually resolve it, apply the safe parts, and mark it as complete.

This required understanding what the migration was trying to do and why it failed. Then we had to apply the safe parts ourselves while skipping the parts that caused the failure.

The Prevention Plan

To prevent this from happening again, we implemented a new process:

Monthly:

  • Review pending migrations
  • Test migrations in staging environment
  • Apply migrations to production
  • Verify all features still work

Quarterly:

  • Review application updates
  • Test major version upgrades in staging
  • Apply upgrades to production
  • Verify complete system functionality

Ongoing:

  • Monitor migration status
  • Track database schema changes
  • Verify feature availability
  • Test backup restoration process
  • Document all changes

Automation:

  • Automated migration checks
  • Automated schema validation
  • Automated feature testing
  • Automated backup verification

We would never let two years pass without updates again. We would never let the database and code get out of sync again. We would never let technical debt accumulate like this again.

Epilogue: The Victory

After a long day of work, everything was finally complete.

What We Accomplished:

  • Resolved 1 failed migration that blocked everything
  • Applied 92 pending migrations spanning two years
  • Added 47 missing database columns
  • Created 12 new database tables
  • Created 23 new database indexes
  • Established 8 foreign key relationships
  • Fixed login authentication issues
  • Restored document visibility
  • Fixed email exposure security flaw
  • Corrected status tracking problems
  • Fixed certificate generation in Dockerfile

The Result:

  • System fully operational
  • All features available from version 1.12.3
  • Users can log in and access their documents
  • Security issues resolved
  • All migrations applied and verified

From a frozen system running two-year-old code to a fully up-to-date platform running version 1.12.3 with all features enabled.

The journey was complete. The system was modern. The technical debt was eliminated.

End of Entry


“Regular updates aren’t optional—they’re essential. A system that stays frozen in time is a system that will eventually break.”

mrcloudbook.com avatar

Ajay Kumar Yegireddi is a DevSecOps Engineer and System Administrator, with a passion for sharing real-world DevSecOps projects and tasks. Mr. Cloud Book, provides hands-on tutorials and practical insights to help others master DevSecOps tools and workflows. Content is designed to bridge the gap between development, security, and operations, making complex concepts easy to understand for both beginners and professionals.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *