Skip to content
PatentWorld
Chapter 05

The Language of Innovation

Semantic analysis of 8.45 million patent abstracts

8.45 million patent abstracts provide a rich corpus for examining the thematic structure of US patenting activity. By applying Non-negative Matrix Factorization — a matrix decomposition method used for topic modeling that produces interpretable, additive topic representations. topic modeling to every utility patent abstract filed with the USPTO since 1976, this analysis uncovers the latent themes of US patenting. Using Term Frequency–Inverse Document Frequency — a numerical statistic that reflects how important a word is to a document relative to a larger collection. Common words are down-weighted. to convert raw text into numerical features, and non-negative matrix factorization to discover 25 latent topics, the results indicate which themes are rising, which are declining, and how they correspond to the formal technology classification system.

The stacked area chart below illustrates how the share of each topic has evolved over time. Computing, semiconductor, and communications topics have expanded substantially, while traditional mechanical and chemical engineering topics have experienced a decline in relative share, though not necessarily in absolute volume.

View:
Figure 1

Computing and Semiconductor Topics Grew From 12% to 33% of All Patents Since 1976

Share of patents belonging to each of 25 NMF-derived topics by year, revealing the shift toward computing and digital technology themes.

Showing all 25 topics as stacked percentage

Share of patents belonging to each of 25 NMF-derived topics, 1976–2025, sorted by total patent count. The most prominent trend is the expansion of computing, semiconductor, and communications topics, which grew from 12% to 33% of the total.
The language of innovation has shifted decisively toward computing and digital technology over 50 years. Topics related to software, semiconductors, and wireless communications now dominate patent abstracts.

Topic Map

To visualize the full semantic landscape of patents, a stratified sample of 5,000 patents is projected from high-dimensional TF-IDF space into two dimensions using UMAP. Each point represents a patent, colored by its dominant topic. Clusters indicate families of related inventions, and overlapping regions suggest technology convergence. Note that the axes of a UMAP projection are unitless — only the relative distances and clustering patterns are interpretable, not the absolute positions.

Figure 2

UMAP Projection of 5,000 Patents Reveals 25 Distinct Technology Clusters With Meaningful Spatial Relationships

UMAP projection of 5,000 patent abstracts from TF-IDF space into 2D, colored by dominant topic, revealing semantic clustering and overlap.

5,000 patents projected into 2D via UMAP on TF-IDF vectors (200 per topic, stratified). Each point represents one patent, colored by dominant topic. UMAP axes are unitless projections — only the relative distances between points are meaningful, not the absolute positions. Source: PatentsView / USPTO.
The UMAP projection reveals clear topic clusters with meaningful spatial relationships: computing and electronics topics cluster together, while chemistry and biotechnology form a distinct neighborhood. Patents bridging clusters often represent the most novel cross-domain inventions.

Topics Across Technology Sections

The relationship between discovered topics and the formal CPC classification system warrants examination. The chart below cross-tabulates the eight most prevalent topics against CPC sections. Some topics align closely with a single CPC section (for example, chemistry-related topics correspond to Section C), while others, particularly computing, span multiple sections.

Figure 3

Computing-Related Topics Appear Across All 8 CPC Sections, Confirming Their General-Purpose Nature

Share of patents in each CPC section belonging to the top 8 NMF topics, cross-tabulating text-derived themes with formal classification.

Share (%) of patents in each CPC section belonging to each of the top 8 topics, ordered A through H. The most notable pattern is that computing and data-processing topics appear across nearly all sections, suggesting that digital technology has become a general-purpose innovation platform.
Topics related to computing and data processing appear across nearly all CPC sections, consistent with the characterization of digital technology as a general-purpose innovation platform that pervades virtually every industry.

Novelty

The degree to which contemporary patents differ from those of earlier decades constitutes an important empirical question. This analysis measures novelty as the Shannon entropy of each patent's topic distribution: patents that draw approximately equally from many topics (high entropy) are more thematically diverse, and may be considered more novel, than patents concentrated in a single topic (low entropy).

Figure 4

Patent Novelty Rose 6.4% (Median Entropy 1.97 to 2.10), With an Upward Trend Despite a 2004–2014 Dip

Median and average Shannon entropy of patent topic distributions by year, measuring thematic diversity as a proxy for novelty.

Median and average Shannon entropy of patent topic distributions by year; higher entropy indicates more thematically diverse patents. The upward trend since the late 1980s suggests that modern inventions increasingly combine ideas from multiple technology domains, though a dip between 2004 and 2014 preceded acceleration in the late 2010s.
Patent novelty has trended upward since the late 1980s, though with a notable dip between 2004 and 2014, suggesting that modern inventions increasingly combine ideas from multiple technology domains. This trend accelerated in the late 2010s, coinciding with the rise of AI and other general-purpose technologies.

Having uncovered the latent thematic structure of patent language, the analysis turns next to the legal and policy framework governing the patent system. The topics and trends identified in this chapter provide essential context for understanding how legislative and judicial decisions have shaped the direction and character of US patenting activity.

Data coverage: January 1976 through September 2025. All 2025 figures reflect partial-year data.