Accelerating the world’s AI ambitions

Llama3.1-405B

320,000

tokens/sec

2000+

users

Rack

Our Mission

We are building the most efficient multimodal GenAI inference system to make GenAI economical and accurate, so that the world can use GenAI profitably and with confidence.

A graph showing the number of tokens in a LIMAMA 3-400B Inference.

A graph showing the number of tokens in a given time period.

A graph showing the efficiency of a system.

Future-proof GenAI Inference System

Designed to accelerate the largest generative AI models of tomorrow at lowest cost and highest speed.

Latest technology

3nm

TSMC Technology Node

Our silicon is designed and manufactured in the latest available node, guaranteeing the best-possible energy efficiency and cost. 

LARGEST MODELS

TP > 100

Tensor Paralellism

Parallelizing AI models across chips makes them run faster and allows for bigger models. Scaling to more than 100 chips with one high-bandwidth network unlocks ultra-low latencies, super high per-user throughputs and extremely long contexts. 

HIGHEST SPEED

HBM3e

Highest bandwidth memory

Generating outputs with autoregressive models such as LLMs is usually bound by memory bandwidth, not compute. Implementing the latest memory technology with large capacity and highest bandwidth, results in highest output speeds when running GenAI models.

Design and Development approach

Hardware & software
imagined together

Our performance is only possible through best practices and principles of Hardware-Software co-design.

Hardware Design

Early emulation of every aspect of our silicon design allows us to deeply and continuously optimize every trade-off: generic-to-specialized compute ratios, compute-to-memory bandwidths, and chip-to-chip communication for bottleneck-free data movement and extremely high degrees of tensor parallelism.

Software Design

We stay closely connected with our customers and partners to understand their exact needs in an inference system. From CapEx and OpEx to compute density, scalability, power efficiency, and AI graph customizability, we ensure our innovations are driven by real-world requirements. We believe we've created the most customer-aligned design in the industry.

Innovation depth

We optimize along the entire stack to achieve better results.

User-aligned products

We work closely to and derive all learning from the market.

Time to market

HW/SW Co-design drastically lowers time to market.

Low cost & high quality

Built for (not just adapted to) GenAI is better in every way.

Core Technology

The world’s first logarithmic math numbers system to solve the biggest challenges in GenAI: Trust, Cost, UX.

Explore Recogni Pareto

TRUST

99.9%

Highest Accuracy

Models deployed for inference production need to produce the same quality of results as the baseline model after training, as the tolerance for compromised quality is usually low. Greater than 99.9% accuracy is maintained after quantization to our logarithmic math number system.

COST

4x

Less power than standard math

Running a model in Recogni's Pareto FP16 math consumes as much power on a system level as running that same model in FP4 on other systems. In other words: We are delivering uncompromised AI model quality at a cost that makes applications economically feasible. 

UX

< 10 min

Llama 405b compilation time

Compiling a model from PyTorch to executable files ready for deployment should not make any developer wait. We have built our Compiler from scratch, guaranteeing very low compilation times even for very large modern models.

Pareto SDK
is now available.

Developer? Join the Waitlist.

Thank you!
Please check your inbox to confirm the double opt in.

Oops! Something went wrong while submitting the form.

The Bigger Vision

We exist to accelerate
the world’s AI ambitions.

More About Recogni

GET STARTED

Let us show you how Recogni can accelerate your GenAI ambitions.

Get a demo of next-gen AI Inference

Schedule a demo call with our
Co-founder and VP of AI, Gilles Backhus

Book A Call

For Hyperscalers / For Cloud Service Providers / For Enterprises /

FAQs

What are the performance metrics?

We will soon be talking more precisely about our systems' performance. For now we can confidently state that it is on a path to beat any other inference solution by the time of launching.

Will you offer inference as a service?

We are currently not categorically excluding any business model or way of deployment. laaSs are gaining a lot of traction at the moment, and as such one of the paths we are exploring. Recogni's technologies and the datacenter products built on them are certainly well suited.

When can I test the system?

We will be releasing more precise timelines around our product launch and beta phases in the coming quarters. Stay tuned! Get in touch with us if you want to make sure to be one of the first.

Accelerating the world’s AI ambitions

Our Mission

Profitability

Sustainability

Accuracy

Future-proof GenAI Inference System

Designed to accelerate the largest generative AI models of tomorrow at lowest cost and highest speed.

Latest technology

3nm