A practical guide for ISV engineering leaders navigating legacy systems, compounding technical debt, and the AI tools that can help — without the hype and without a big-bang rewrite.

There is a number that most engineering leaders know but rarely say out loud: somewhere between 60 and 80 percent of the IT budget is keeping existing systems alive. Not for improving or adding features, just making sure they do not fail. That is the consistent finding across McKinsey, and Stripe's research, and it describes the reality of most enterprise engineering organizations today.

The system in question might be the core billing engine written in 2008. It might be the patient management module that has been patched so many times that no one is sure exactly how it works anymore. It might be the scheduling service that two senior engineers understand and whose knowledge walks out the door every time one of them leaves. Whatever the specific system, the cost is the same: resources that should be building the next version of your product are being consumed by the previous one.

The traditional answer is to rewrite it from scratch, but it is usually wrong. Big-bang rewrites fail at a remarkable rate, cost more than budgeted, and take longer than planned. The organization ends up with two systems to maintain during the transition, and no guarantee that the new one will work as well as the old one.

A different path has become available in the last two years: AI-accelerated refactoring. Not replacing your codebase but progressively improving it with AI tools handling the most labor-intensive parts of the process. This post is a practical guide to what that looks like, what the research says, and how to structure a refactoring program that modernizes your product without breaking it.

The Cost of Standing Still

The cost of maintaining legacy code is not simply a line item on the IT budget. It compounds.

Stripe's Developer Coefficient research found that developers spend an average of 13.5 hours per week dealing with technical debt, roughly a third of their working week. That is not time spent building new features or improving performance. It is time spent debugging workarounds, reverse-engineering undocumented logic, and testing changes in tightly coupled systems where a single edit can ripple unpredictably. For a ten-person engineering team, that adds up to more than three full-time engineers' worth of capacity absorbed by maintenance rather than development.

13.5 hrs
Stripe Developer Coefficient
Per week the average developer spends on technical debt, roughly one-third of their working week. For a 10-person team, that is the equivalent of 3+ engineers doing nothing but keeping old code alive.

AlixPartners estimates that by 2025, CISQ projects approximately 40 percent of IT budgets will be spent on maintaining technical debt specifically, and that fixing each line of legacy code costs an average of $3.60. For a one-million-line codebase, that is $3.6 million in remediation cost, growing at roughly 20 percent per year if unaddressed.

McKinsey research on technical debt found that CIOs report 10 to 20 percent of their new product budget is diverted to resolving legacy issues. More importantly, McKinsey found that actively addressing this reverses the drain: companies that manage technical debt systematically free up engineers to spend significantly more time on work that actually moves the product forward.

Some companies find that actively managing their tech debt frees up engineers to spend up to 50 percent more of their time on work that supports business goals.

— McKinsey CIO survey on technical debt

What AI-Accelerated Refactoring Actually Means

The phrase has become loose enough to mean almost anything, so it is worth being precise. AI-accelerated refactoring is the use of generative AI tools to assist engineers in restructuring existing code. The process improves its internal structure without changing its external behavior at a pace that manual effort alone could not achieve.

The productivity gains are real and well-evidenced. The tasks where AI measurably accelerates refactoring are specific and significant:

  • Understanding large, undocumented codebases: AI tools can index entire repositories, map dependencies, and answer system-level questions like 'which services consume this field after this event?', work that would take senior engineer days to do manually.
  • Code documentation: McKinsey's controlled study found that documenting code functionality can be completed in half the time with AI assistance. For legacy codebases where documentation is absent or outdated, this alone removes one of the biggest barriers to safe refactoring.
  • Identifying refactoring candidates: AI tools surface code smells, duplicated logic, overly complex functions, and outdated dependencies across a codebase systematically, rather than relying on individual developers to discover them through normal code review.
  • Initial refactoring drafts: McKinsey's research found that optimizing existing code can be completed in roughly two-thirds the time with AI assistance, a meaningful compression on tasks that traditionally absorb disproportionate engineering effort.
  • Test generation: AI tools generate unit tests for legacy functions that currently have no coverage, directly de-risking the refactoring work that follows.
~⅔
McKinsey, 'Unleashing Developer Productivity with Generative AI', 2023
The time required for code refactoring tasks with AI assistance, versus without. Documentation tasks are cut to half the time. These are McKinsey's findings from a controlled study of 40+ developers.

The right way to think about AI in this context is as a force multiplier on structured engineering effort. It does not make decisions; it executes them faster. AI excels at the high-volume, repetitive work like scanning, surfacing, drafting, and generating. The judgment calls on which modules to prioritize. Whether a refactored version actually preserves business logic, when the system is ready for the next step remains firmly with the engineers who understand the system.

That division of labor is not limited. It is what makes the combination so effective: AI handles the work that exhausts experienced engineers without challenging them; engineers handle the decisions that require genuine expertise. The result is a refactoring program that gets done, rather than one that keeps getting pushed to the next quarter.

The Pattern That Works: Progressive Refactoring

The organizations making the most progress with legacy modernization are not those that launched a sweeping AI transformation initiative. They are those who adopted a disciplined, incremental approach, progressively modernizing the codebase while keeping the live system operational throughout.

The architectural pattern at the heart of this is the Strangler Fig. Named after the tree that gradually envelops and replaces its host, the Strangler Fig pattern routes new functionality through new, clean modules while the legacy system continues to handle existing behavior. Over time, as functionality is migrated, the legacy system shrinks and is eventually retired. Nothing breaks, because nothing is replaced at all at once.

Combined with AI tooling, this pattern becomes significantly more executable. The table below shows exactly how AI and human roles divide across each phase and why that division is what makes the whole program work:

Phase 1: Understand Before You Touch

Before any refactoring begins, AI tools build a complete map of the existing codebase. A large language model with an adequate context window can analyze thousands of lines of code, surface dependency graphs, identify business logic embedded in functions that were never documented, and flag the highest-risk components, those with the most downstream dependencies, the highest change frequency, and the worst test coverage.

The output of this phase is a prioritized risk map: these are the modules safe to touch first; these require extra care; these should be left alone until everything else is stable. This is the phase where AI delivers its clearest, most unambiguous value, understanding a system at scale, in hours rather than weeks.

Phase 2: Enforce Boundaries Before Extracting Services

Before extracting any module into a separate service, enforce strict internal boundaries within the existing codebase. This means eliminating direct cross-module database calls, replacing shared global state with well-defined interfaces, and ensuring every cross-domain communication path goes through an explicit contract rather than an implicit dependency.

AI tools identify existing violations of these boundaries systematically, functions that directly access tables they should not own, services calling each other's internal logic, and modules sharing state through global variables. Surfacing and fixing these violations at scale is exactly the kind of high-volume, repetitive work where AI assistance compresses timelines most reliably.

Phase 3: Refactor High-Value Modules with AI Assistance

With boundaries enforced and a risk map in hand, begin refactoring the highest-value modules, those under the most load, with the most active development, or with the worst current maintainability scores. AI tools generate initial refactored versions; engineers review, correct, and validate against the test suite before merging.

One thing to watch: GitClear's analysis of 211 million lines of code from 2020 to 2024 found that as AI-assisted development grew, the proportion of changes that actually improved existing code, rather than simply adding to it, fell significantly. Teams defaulted to addition over improvement.

The Hidden Risk: AI Makes It Easier to Add Than to Improve

AI tools are naturally optimized for generations. When left without explicit constraints, they will produce new code faster than they will restructure existing code, because generation requires less context than safe restructuring. The result is a codebase that grows without getting cleaner. Left unmanaged, this accelerates the accumulation of the very debt you set out to reduce.

This is precisely what a structured refactoring program counteracts. By reserving dedicated sprint capacity for improvement and framing AI tasks explicitly around restructuring, teams can ensure that velocity translates into quality gains. Nalashaa's methodology builds this discipline into the program from the outset, so the drift toward addition over improvement never takes hold.

Phase 4: AI-Generated Test Coverage for Safe Extraction

Before extracting any module into an independent service, AI tools generate a test suite covering the module's current behavior to capture exactly what the existing code does. This ensures any refactoring that inadvertently changes behavior is caught immediately.

Test generation is one of the clearest, most consistent AI productivity wins in this entire process. But it is worth making this concrete, because the abstract description understates the practical impact.

What This Looks Like in Practice

A healthcare platform was preparing to extract its appointment scheduling service from a monolith that had accumulated twelve years of patches. The module had no meaningful test coverage, a common condition in legacy systems where engineers were always too busy fixing things to write tests for them.

Using AI-generated test suites, the team produced coverage for the module's core behavior in a single sprint: covering over 90 functions, including edge cases that surfaced only when the AI analyzed conditional branches that had not been touched in years. When the refactoring drafts from Phase 3 were validated against this suite, three silent behavior changes were caught and corrected before any code reached staging.

In a related engagement, a fintech ISV used AI-assisted refactoring to eliminate N+1 query patterns across 40 API endpoints in a single sprint, a class of performance issue that had been on the backlog for two years because manual identification and remediation was too slow to fit into normal sprint cycles. With AI surfacing candidates and generating initial fixes, the work that had been perpetually deprioritized was completed in four days.

Phase 5: Extract, Validate, Retire

With full test coverage, bounded interfaces, and a refactored module, extraction into an independent service becomes straightforward. The Strangler Fig routing directs traffic to the new service. Validation runs in parallel against the legacy system for a defined period. When confidence is sufficient, the legacy path is retired.

What this looks like in practice matters for an engineering team deciding whether to commit to the approach. Consider a billing service extraction: once the refactored billing module is deployed, both the legacy billing path and the new service receive every incoming transaction simultaneously for a defined window, typically two weeks. An automated comparison layer records the outputs of both paths for every request and flags any discrepancy. The new service does not process payments in production during this window; it runs in shadow mode, its outputs compared but not acted upon.

The comparison layer monitors for three categories of divergence: numeric differences in calculated amounts, differences in state transitions (for example, an invoice marked as paid in one path but pending in the other), and latency outliers that suggest the new service is handling edge cases less efficiently. Discrepancies above a defined threshold trigger an automatic alert and are routed to engineering review before the validation window closes.

After two weeks with zero unresolved discrepancies and latency within acceptable bounds, the Strangler Fig routing is updated to send all traffic to the new service. The legacy billing path is deprecated, but removed from the active call graph and scheduled for final removal in the following sprint cycle. This staged retirement is what makes the pattern safe: there is always a known-good fallback until the team has chosen to remove it.

Using AI Refactoring Correctly

The teams that get the most from AI-assisted refactoring share one consistent practice: they treat AI output as a high-quality first draft, not a finished commit. That framing is not a constraint — it is what unlocks the productivity gains.

AI tools excel at execution within clear boundaries. They surface candidates, generate drafts, create tests, and document logic faster than any human engineer could. Where they need support is with ambiguous scope, undocumented business intent, and decisions where 'correct' depends on context that is not in the code. Combine those strengths with an engineer who provides that context, and the combination is genuinely powerful.

Getting the Maximum Value From AI Refactoring

Structure AI refactoring as a two-role workflow: AI generates, senior engineer validates. Neither role can substitute for the other well.

The AI cannot reliably judge whether a refactored function preserves subtle business logic accumulated over years of production use. The senior engineer cannot read 40,000 lines in an afternoon to find and prioritize refactoring candidates. Together, each doing what they do best, the combination unlocks delivery velocity that neither achieves alone.

Practical approach: define bounded, specific refactoring tasks rather than broad goals. 'Eliminate the N+1 query pattern in this function and improve readability' produces high-quality AI output. 'Improving the billing module' does not. The narrower the scope, the more reliably AI accelerates execution.

Require test coverage before and after every AI-generated change. This is not overhead; it is what makes the pace sustainable.

There is also a quality trajectory to manage over time. Teams that use AI tools without a deliberate improvement discipline can find themselves accumulating code faster than they are improving it. The structured program described in this post, with explicit sprint capacity for refactoring and a phase-by-phase framework, is that keeps the quality trajectory moving in the right direction.

A Practical Checklist Before You Start

If you are scoping an AI-accelerated refactoring program, these five questions determine whether it will succeed:

1. Observability first
Can you measure current system behavior? Without a baseline, performance metrics, error rates, functional tests: you cannot validate that refactoring is safe. Set up monitoring before any AI tool touches production code.
2. Scope clarity
Have you defined a clear boundary for the first module? AI tools work best with specific, bounded scope. 'Eliminate the N+1 query pattern in this function' produces useful output. 'Improve the billing module' does not.
3. Domain knowledge owner
Who understands the business logic in the system? AI can read the code but cannot infer the intent behind a ten-year-old conditional added for a specific client requirement. That knowledge must be in a human who reviews AI output.
4. Review capacity
Do you have senior engineering capacity to review AI output? The productivity gains assume a skilled developer is validating the results. If review is the bottleneck, more AI generation speed without more review capacity creates a backlog, not an acceleration.
5. Tool selection
Have you evaluated AI tools for your specific stack? Tool performance varies significantly by language. For older Java, COBOL, or VB.NET codebases, verify context window handling and codebase indexing capability before committing.

What to Watch Out For: The Real Risks of AI-Assisted Refactoring

AI tooling accelerates the execution of a refactoring program. It does not replace the judgment required to run one safely. Engineering teams that have worked through this process consistently identify the same set of issues worth being clear-eyed about before starting.

Hallucinated logic preservation. AI tools can produce refactored code that looks correct and passes surface-level review, but silently changes behavior in edge cases, particularly around conditional branches that activate only under specific data conditions. This is not a reason to avoid AI assistance; it is a reason to treat test coverage as non-negotiable before any refactored code merges.

Context window limitations on large modules. For very large legacy modules, functions spanning thousands of lines, or files with extensive inline dependencies, AI tools may not have full context when generating refactoring drafts. Outputs produced without a complete context are more likely to miss cross-function dependencies. Chunking large modules into bounded segments before submitting them to AI tools significantly reduces this risk.

Drift toward superficial changes. AI tools, when given a refactoring task without tight constraints, tend toward low-risk changes, renaming variables, reformatting conditionals, splitting long functions, rather than structural improvements that require a deeper understanding of the system. These changes improve readability but do not reduce coupling, eliminate duplication, or improve testability at the architectural level. Engineers reviewing AI output should evaluate it against those structural goals, not just code quality metrics.

Compounding debt if the review is skipped. The productivity gains from AI-assisted refactoring depend on a skilled engineer validating the output. If review becomes a bottleneck and teams start merging AI-generated changes without rigorous validation because the generation speed creates pressure to keep pace, quality can degrade faster than it improves. More AI generation without proportional review capacity creates debt, not acceleration.

Over-reliance on AI for business logic decisions. AI tools can document what code does. They cannot reliably determine whether what it does is correct. In systems with years of accumulated business logic, the distinction matters enormously. Every refactoring decision that touches core business rules should have a human sign-off, regardless of how confident the AI output appears.

Where Nalashaa Fits

The teams that get the most out of AI-accelerated refactoring typically combine their own domain knowledge with an external engineering partner who brings the structured methodology and tooling experience. Internal teams know the system; external partners have seen this problem across enough codebases to know what goes wrong and when.

Nalashaa's product engineering solutions work with organizations at exactly this intersection. We run structured architecture assessments that map your current codebase against the risk and priority framework described in this post, identifying which modules are safe to touch, which need careful preparation, and which are best left alone until surrounding systems are stable. The refactoring program that follows is phased, measurable, and designed to deliver improved delivery speed within the first quarter.

Carrying a codebase that's slowing your team down? Nalashaa's product engineering team runs a structured 5-day Architecture Assessment mapping your technical debt, identifying AI-refactoring candidates, and delivering a phased modernization roadmap your team can execute.

The Bottom Line

The AI tools available today are genuinely capable of compressing the most labor-intensive parts of legacy modernization, codebase analysis, documentation, initial refactoring drafts, and test generation in ways that were simply not possible two years ago. The McKinsey evidence is clear: refactoring tasks are completed in roughly two-thirds the time, and documentation in half. That is not an incremental improvement; it is a meaningful change to what a team of a given size can actually deliver.

What makes this work is pairing those tools with a disciplined program: clear scope boundaries, human review at every decision point, structured sprint capacity reserved for improvement. AI accelerates the execution; the program provides the direction.

Done that way, the refactoring program that has been sitting in the backlog for two years, deprioritized by every sprint because there is always something more urgent, becomes deliverable. Not in one heroic rewrite. In measured, validated, compounding increments that free up your engineering team to build what comes next.

Frequently Asked Questions

What is AI-accelerated refactoring?

AI-accelerated refactoring is the use of generative AI tools to assist engineers in restructuring existing code, improving its internal design without changing its external behavior, at a pace that manual effort alone could not achieve. AI tools handle the high-volume, repetitive parts: codebase analysis, documentation, candidate identification, initial refactoring drafts, and test generation. Human engineers provide the judgement, validation, and domain knowledge that AI tools lack.

Does AI actually make refactoring faster?

Yes, for the task types where it is designed to help. For the most common, well-scoped refactoring work, restructuring functions, improving readability, eliminating duplication, and generating tests. AI delivers consistent, measurable time savings.

What is the Strangler Fig pattern and why does it matter?

The Strangler Fig pattern is a migration strategy where new functionality is routed through new, modernized modules while the legacy system continues to handle existing behavior. Over time, as functionality is migrated, the legacy system is gradually retired without any single high-risk cutover. It is the recommended approach for large legacy systems where a full rewrite would be too risky or disruptive. Combined with AI-assisted refactoring for the individual module improvements, it provides a structured path from legacy to modern that maintains system stability throughout.

How should we prioritize which legacy modules to refactor first?

Prioritize based on four factors: business impact (modules that directly affect delivery velocity or customer-facing performance), change frequency (modules modified most often carry the highest cost-per-change in a poor-quality state), dependency complexity (modules with many downstream dependencies need more preparation), and test coverage (modules with existing coverage can be refactored with lower risk). Start with high-impact, frequently changed, lower-complexity modules that have some existing test coverage; these deliver the best risk-adjusted return earliest.