Quadric rides the shift from cloud AI to on-device inference — and it's paying off
Quadric Rides the Shift From Cloud AI to On-Device Inference—And It's Paying Off The artificial intelligence industry is experiencing a fundamental realignment. After years of betting heavily on centralized cloud infrastructure and massive data centers, companies and governments are increasingly recognizing the limitations of that approach. High latency, bandwidth constraints, privacy concerns, and mounting infrastructure costs are pushing a critical inflection point: moving AI inference away from the cloud and onto devices themselves. Quadric, a San Francisco-based chip-IP startup founded by veterans of early bitcoin mining firm 21E6, is positioned squarely at the center of this shift. The company doesn't manufacture chips itself. Instead, it licenses programmable AI processor intellectual property—essentially blueprints that customers can embed into their own silicon—along with a complete software stack and toolchain to run AI models locally on devices ranging from automotive systems to laptops to industrial equipment. The market's response has been striking. Quadric posted between 15 million and 20 million dollars in licensing revenue in 2025, more than tripling from approximately 4 million dollars in 2024. The company is now targeting up to 35 million dollars in revenue this year, with CEO Veerbhan Kheterpal characterizing the expansion as driven by a "sharp business inflection" over the past eighteen months as enterprises increasingly seek to run AI locally rather than depend on cloud services. The timing matters. The proliferation of transformer-based models in 2023 fundamentally changed the economics of AI deployment. Unlike earlier vision-focused models that worked reasonably well in cloud environments, today's large language models and advanced vision systems create new imperatives: latency-sensitive applications, privacy-conscious organizations, and countries seeking to build sovereign AI capabilities independent of U.S.-based infrastructure. Quadric's core innovation addresses a persistent challenge in semiconductor design: the speed gap between hardware and software. AI models evolve rapidly, with architectures shifting from month to month. But chip design cycles stretch years. This mismatch creates a fundamental problem for traditional chip vendors. Qualcomm, for instance, typically integrates AI technology inside its own processors, which can lock customers into proprietary silicon. Other IP suppliers like Synopsys and Cadence sell neural processing engine blocks that many customers find difficult to program and update. Quadric's solution centers on programmability. The company's Chimera general-purpose NPU architecture provides what the company describes as "fully programmable" on-device AI processing. Rather than relying on fixed-function hardware that becomes obsolete when model architectures shift, customers can update their AI capabilities through software changes. This approach allows customers to support new AI models without redesigning hardware—a critical advantage in an industry where model architectures change faster than silicon can be manufactured. The technical specifications underscore the company's performance ambitions. The Chimera processor delivers up to 840 TOPS (tera operations per second) in automotive-grade safety-enhanced versions, optimized for running both vision and language model inference locally. This is not theoretical capability. Quadric demonstrated edge LLM functionality at the 2025 Embedded Vision Summit, showcasing real-time inference with minimal power consumption—a crucial metric in markets where performance-per-watt often determines adoption. Perhaps more telling is what customers can achieve practically. According to Quadric's technical specifications, chip designers can move from initial engagement to production-ready LLM-capable silicon in under six months. The company's toolchain was built from the ground up specifically for this purpose, not retrofitted or bolted onto existing infrastructure. This allows customers to deploy computer vision and on-device LLM applications, including models up to 30 billion parameters, with industry-leading efficiency metrics. The customer roster demonstrates traction across multiple sectors. Kyocera has adopted the Chimera NPU for next-generation office automation. Denso, Japan's major auto supplier that builds chips for Toyota vehicles, is among Quadric's automotive customers. Tier IV licensed Quadric's IP for its autonomous systems, with the company's SDK now selected for AI processing evaluation and optimization supporting Autoware deployment in next-generation autonomous vehicles. These represent not merely licensing agreements but active design-wins with tier-one suppliers. The broader market context explains the urgency behind this shift. The World Economic Forum has highlighted this transition, noting how AI inference is moving closer to users and away from purely centralized architectures. EY reported in November that sovereign AI strategies have gained traction as policymakers and industry groups push for domestic AI capabilities spanning compute, models, and data infrastructure, rather than relying entirely on foreign systems. This geopolitical dimension opens new markets for Quadric. The company is exploring customers in India and Malaysia, with Moglix CEO Rahul Garg serving as a strategic investor helping shape its India-focused sovereign AI approach. Kheterpal explicitly acknowledged that the push toward on-device inference is being driven partly by rising costs of centralized infrastructure and the difficulty many countries face in building hyperscale data centers. Distributed AI setups—where inference runs on laptops or small on-premise servers inside offices rather than relying on cloud-based services for every query—represent a more realistic path for many organizations. The market opportunity aligns with these trends. According to industry reports, the on-device AI market is expected to surpass 50 billion dollars by 2030, driven by LLMs in edge servers and AIoT devices. Quadric's focus on these high-growth verticals positions it to capture significant share if its technology delivers on its promises. The company's capital efficiency also warrants attention. The recent 30 million dollar Series C funding round, while significant, remains a fraction of what chip manufacturing competitors spend on research and development. By focusing on IP licensing rather than chip manufacturing, Quadric avoids the capital intensity of foundry partnerships, allowing faster scaling and better margins. The company now employs nearly 70 people worldwide, including about 40 in the United States and around 10 in India. Yet Quadric remains relatively early in its buildout. The company has signed a handful of customers so far, and much of its longer-term upside depends on converting today's licensing deals into high-volume shipments and recurring royalties. The first products based on Quadric's technology are expected to ship this year, beginning with laptops. Success will ultimately be measured not by licensed IP but by actual silicon hitting markets and proving the efficiency and performance claims in production environments. The broader implication is clear: the era of cloud-centric AI infrastructure is giving way to a more distributed architecture. For companies like Quadric positioned at this inflection point with the right technical approach and market timing, the opportunity is substantial. The question now is execution—whether Quadric can maintain its momentum and translate design wins into the volume shipments that define success in semiconductor markets.