HeroDevs Blog | 68% of Codebases Contain License Conflicts and AI-Generated Code Is Making It Worse

For Qualys admins, NES for .NET directly resolves the EOL/Obsolete Software: Microsoft .NET Version 6 Detected vulnerability, ensuring your systems remain secure and compliant. Fill out the form to get pricing details and learn more.

A number that should be on your legal team's radar

The Black Duck 2026 Open Source Security and Risk Analysis report documents license conflicts in 68% of audited codebases. That is the largest year-over-year increase in the 11-edition history of the OSSRA dataset — a jump from 56% the prior year, representing a 12-point increase that is the largest single-year jump the study has recorded.

The OSSRA report does not present this as a novel finding. License conflicts in open source have been a documented risk since organizations began incorporating open source at scale. What makes the 2026 data notable is the rate of increase and the identified driver: AI coding assistants are contributing to codebase-level license conflicts in ways that existing IP review processes are not catching.

Only 54% of organizations in the OSSRA survey report evaluating AI-generated code for IP risk before it enters their codebase. That gap — the 46% who are not — is where a significant portion of the new license conflicts are accumulating.

‍

How AI-generated code creates license conflicts

The mechanism is structural, not intentional. AI coding assistants — GitHub Copilot, Cursor, Amazon Q Developer, and similar tools — are trained on large corpora of public code repositories. That training data includes code licensed under GPL, AGPL, LGPL, EUPL, and other copyleft licenses that impose conditions on how derivative works may be distributed.

When a developer prompts an AI assistant for a data structure implementation, a utility function, or an algorithm, the model generates code based on patterns in its training data. That generated code may incorporate — at the level of logic patterns, structural patterns, or in some cases near-verbatim segments — material from copyleft-licensed sources in the training data. The model does not surface the provenance. The developer sees a useful code completion and accepts it. The license provenance of the training sources does not travel with the generated output.

This is what the OSSRA report calls "license laundering" — not deliberate, but systematic. Code with restrictive license provenance enters the codebase through AI generation without any of the review checkpoints that would normally catch it. It does not appear in a dependency manifest. It does not trigger standard SCA tooling. It accumulates silently until an audit reaches the code level.

‍

What copyleft conflicts actually mean for a product

License conflicts are not all equivalent. The severity depends on the licenses involved, the nature of the incorporation, and the distribution model of the product. But the worst-case scenarios for the most common copyleft conflicts are significant enough to warrant serious attention.

GPL (GNU General Public License) requires that any work that incorporates GPL-licensed code and is distributed to others must also be distributed under GPL terms — including making the source code available. For a commercial product that incorporates GPL code, this can mean an obligation to open-source the entire product.

AGPL (Affero General Public License) extends this requirement to network use. A SaaS product that incorporates AGPL-licensed code may be required to provide source code to users who access the software over a network — which, for most web and API products, means anyone who uses it. AGPL compliance in a SaaS context is frequently incompatible with commercial software models.

LGPL (Lesser GNU General Public License) is less restrictive — it generally allows incorporation in proprietary software provided the LGPL component can be replaced. But LGPL conflicts can still create legal complexity in products where the boundary between the LGPL component and the proprietary code is unclear.

For organizations in M&A situations, the consequences are direct. An acquirer conducting software due diligence that discovers GPL or AGPL conflicts in a target product has found a material issue. Depending on the distribution model and the nature of the incorporation, the conflict may create obligations that affect the product's commercial viability or require significant rework before the acquisition can proceed.

‍

The standard SCA tooling gap

Standard Software Composition Analysis tools operate on package manifests — the requirements.txt, package.json, pom.xml, and similar files that declare the open source dependencies a project explicitly depends on. These tools are effective at identifying license conflicts in declared dependencies.

They are not effective at identifying license conflicts in AI-generated code, copy-pasted code snippets, vendored code incorporated directly rather than through a package manager, or any other code not declared in a manifest.

The 2026 OSSRA report makes this gap concrete: 17% of open source components enter codebases outside of standard package managers — through copy-pasted snippets, direct vendor inclusions, and AI generation — making them invisible to manifest-based scanning tools. That 17% is where the undeclared license risk accumulates, and it is growing as AI coding tool adoption scales.

This is not a new problem — the SCA tooling gap around non-declared code has been documented for years. What is new is the scale at which AI-generated code is filling that gap with potentially problematic material. Every developer using an AI coding assistant is producing code that is invisible to their organization's standard license review process.

‍

What M&A teams need to think about differently in 2026

Software M&A due diligence has historically relied heavily on SBOM analysis and SCA tool output to assess license risk. The 2026 data suggests that reliance needs to be revisited.

An SBOM-based assessment will identify declared dependencies with copyleft licenses. It will not identify the GPL pattern embedded in an AI-generated data structure, the AGPL utility function copied from a Stack Overflow answer, or the LGPL component vendored directly into the codebase three years ago without being added to the manifest.

Acquirers conducting software due diligence in 2026 should build in code-level review for high-value targets — particularly in areas of the codebase where AI-generated code is likely to be present. The indicators are not difficult to identify: recent changes to the codebase, high commit velocity, developer tooling that includes AI coding assistants.

They should also request AI code governance documentation from targets. Does the organization have a policy requiring AI-generated code to go through license review before being committed? If not, the SBOM-based assessment is incomplete by design.

‍

What organizations should do about this now

The policy change that has the largest impact is also the simplest: require AI-generated code to pass through a license review checkpoint before it is committed to the codebase. This does not mean prohibiting AI coding tools — it means adding one review step that catches the license provenance issue before it accumulates into a codebase-level problem.

For organizations that have been using AI coding tools without this checkpoint, a code-level audit of AI-intensive areas of the codebase is warranted — particularly if the codebase has regulatory, M&A, or government contract exposure. The OSSRA finding that 68% of codebases have conflicts suggests the baseline assumption should be that conflicts exist, not that the absence of a previous finding means the codebase is clean.

HeroDevs EOLDS identifies EOL dependencies across the full dependency graph, including components outside the declared manifest. For organizations building out their compliance posture or conducting M&A due diligence, EOLDS provides the starting point for the visibility layer that standard SCA tooling does not provide.

‍

FAQ

What percentage of codebases contain open source license conflicts according to OSSRA 2026? The Black Duck 2026 OSSRA report found license conflicts in 68% of audited codebases — up from 56% the prior year. This 12-point jump represents the largest single-year increase in the 11-edition history of the dataset.

How does AI-generated code create open source license conflicts? AI coding assistants generate code based on training data that includes open source repositories with restrictive licenses like GPL and AGPL. The generated code may incorporate patterns or logic from copyleft-licensed sources without surfacing that provenance to the developer. The code enters the codebase without license review, accumulating conflicts that standard SCA tools — which scan package manifests, not source code patterns — cannot detect. The OSSRA report calls this "license laundering."

What is the difference between GPL, AGPL, and LGPL in terms of compliance risk? GPL requires that any work distributing GPL-licensed code must also be distributed under GPL terms, potentially requiring commercial products to be open-sourced. AGPL extends this to network use — SaaS products incorporating AGPL code may need to provide source to users accessing the software over a network. LGPL is less restrictive but creates compliance complexity when the boundary between the LGPL component and proprietary code is unclear.

Why can't standard SCA tools detect AI-generated code license conflicts? Standard SCA tools scan package manifests to identify declared dependencies and their licenses. AI-generated code, copy-pasted code, and vendored code that is not declared in a manifest are invisible to manifest-based tooling. According to OSSRA 2026, 17% of open source components enter codebases outside standard package managers — through copy-pasted snippets, direct vendor inclusions, and AI generation — making them invisible to traditional scanning tools.

What should M&A due diligence include that it has not historically included? M&A due diligence for software-intensive targets should now include code-level review of AI-intensive areas of the codebase, a review of the target's AI code governance policies, and an assumption that the SBOM-based assessment is incomplete if the target uses AI coding assistants without a license review checkpoint.

What is the practical fix for organizations that have been using AI coding tools without license review? The most impactful change is implementing a license review checkpoint for AI-generated code before it is committed to the codebase. For organizations already using AI tools without this checkpoint, a code-level audit of high-risk areas is warranted. HeroDevs EOLDS provides dependency visibility across the full graph, including components outside the declared manifest, as a starting point for organizations assessing their exposure.

Share via: