HippoML Blog – Medium

HippoML Blog

Vision Pro, Decentralized GenAI and AI PC

Vision Pro and Decentralized GenAI

Feb 8, 2024

Vision Pro, Decentralized GenAI and AI PC

Feb 8, 2024

PetaFLOPS Inference Era: 1 PFLOPS Attention, and Preliminary End-to-End Results

Achieve 1 PFLOPS Attention on A Single H100 SXM

Feb 7, 2024

PetaFLOPS Inference Era: 1 PFLOPS Attention, and Preliminary End-to-End Results

Feb 7, 2024

8bit HippoAttention: Up to 3X Faster Compared to FlashAttentionV2

HippoML’s Out-of-The-Box 8bit Inference

Jan 17, 2024

8bit HippoAttention: Up to 3X Faster Compared to FlashAttentionV2

Jan 17, 2024

Unified DataCenter & Local Foundation Model Serving: Beyond Docker Way

Is generalized Docker solution good for Foundation Model serving?

Jan 8, 2024

Unified DataCenter & Local Foundation Model Serving: Beyond Docker Way

Jan 8, 2024

PrivateCanvas Quick Start Tutorial

Invoke Prompt LLM

Jan 2, 2024

Jan 2, 2024

Super AI Creativity App Run with Local GPU on Mac/Windows/Linux [Early Access]

Last year highlighted the limitations and challenges of centralized AI services, including the spike in demand and turmoil in AI company…

Jan 2, 2024

Super AI Creativity App Run with Local GPU on Mac/Windows/Linux [Early Access]

Jan 2, 2024

Up to 80X Speedup in Multi-Head Attention on Apple Silicon

What is Multi-Head Attention

Dec 12, 2023

Up to 80X Speedup in Multi-Head Attention on Apple Silicon

Dec 12, 2023

Large Language Model Inference: from Datacenter to Edge

Just 6 months ago, we wouldn’t have even thought to make such a disclosure. With the rapidly growing awareness of ChatGPT, we thought we’d…

May 30, 2023

Large Language Model Inference: from Datacenter to Edge

May 30, 2023

HippoML Blog

HippoML Blog

Help
Status
About
Careers
Press
Blog
Privacy
Rules
Terms
Text to speech