Accelerating the world’s AI ambitions
Our Mission
Future-proof GenAI Inference System
Designed to accelerate the largest generative AI models of tomorrow at lowest cost and highest speed.
Latest technology
3nm
TSMC Technology Node
Our silicon is designed and manufactured in the latest available node, guaranteeing the best-possible energy efficiency and cost.
LARGEST MODELS
TP > 100
Tensor Paralellism
Parallelizing AI models across chips makes them run faster and allows for bigger models. Scaling to more than 100 chips with one high-bandwidth network unlocks ultra-low latencies, super high per-user throughputs and extremely long contexts.
HIGHEST SPEED
HBM3e
Highest bandwidth memory
Generating outputs with autoregressive models such as LLMs is usually bound by memory bandwidth, not compute. Implementing the latest memory technology with large capacity and highest bandwidth, results in highest output speeds when running GenAI models.
Design and Development approach
Hardware & software
imagined together
Our performance is only possible through best practices and principles of Hardware-Software co-design.
Hardware Design
Early emulation of every aspect of our silicon design allows us to deeply and continuously optimize every trade-off: generic-to-specialized compute ratios, compute-to-memory bandwidths, and chip-to-chip communication for bottleneck-free data movement and extremely high degrees of tensor parallelism.
Software Design
We stay closely connected with our customers and partners to understand their exact needs in an inference system. From CapEx and OpEx to compute density, scalability, power efficiency, and AI graph customizability, we ensure our innovations are driven by real-world requirements. We believe we've created the most customer-aligned design in the industry.
Innovation depth
We optimize along the entire stack to achieve better results.
User-aligned products
We work closely to and derive all learning from the market.
Time to market
HW/SW Co-design drastically lowers time to market.
Low cost & high quality
Built for (not just adapted to) GenAI is better in every way.
Core Technology
The world’s first logarithmic math numbers system to solve the biggest challenges in GenAI: Trust, Cost, UX.
Explore Recogni ParetoTRUST
99.9%
Highest Accuracy
Models deployed for inference production need to produce the same quality of results as the baseline model after training, as the tolerance for compromised quality is usually low. Greater than 99.9% accuracy is maintained after quantization to our logarithmic math number system.
COST
4x
Less power than standard math
Running a model in Recogni's Pareto FP16 math consumes as much power on a system level as running that same model in FP4 on other systems. In other words: We are delivering uncompromised AI model quality at a cost that makes applications economically feasible.
UX
< 10 min
Llama 405b compilation time
Compiling a model from PyTorch to executable files ready for deployment should not make any developer wait. We have built our Compiler from scratch, guaranteeing very low compilation times even for very large modern models.
Pareto SDK
is now available.
Developer? Join the Waitlist.
We exist to accelerate
the world’s AI ambitions.
More About RecogniLet us show you how Recogni can accelerate your GenAI ambitions.
Get a demo of next-gen AI Inference
Schedule a demo call with our
Co-founder and VP of AI, Gilles Backhus
FAQs
We will soon be talking more precisely about our systems' performance. For now we can confidently state that it is on a path to beat any other inference solution by the time of launching.
We are currently not categorically excluding any business model or way of deployment. laaSs are gaining a lot of traction at the moment, and as such one of the paths we are exploring. Recogni's technologies and the datacenter products built on them are certainly well suited.
We will be releasing more precise timelines around our product launch and beta phases in the coming quarters. Stay tuned! Get in touch with us if you want to make sure to be one of the first.