Filip Makraduli

Founding DevRel Engineer at Superlinked, where he focuses on self-hosted small-model inference, vector search, embeddings, retrieval, and AI search infrastructure. He speaks publicly on production AI systems including the Superlinked Inference Engine.

Small-Model Inference Needs Infrastructure Beyond Model Servers

Filip Makraduli of Superlinked argues that the hard part of small-model inference is no longer simply serving a model, but operating many embeddings, rerankers, extractors and multimodal models efficiently in production. In his account, conventional one-model-per-container deployments waste GPU capacity and leave teams to rebuild routing, autoscaling, monitoring, hot-swapping and eviction themselves. Superlinked’s SIE is presented as an open-source attempt to provide that missing infrastructure layer for AI search and document-processing workloads.

AI EngineerMay 7, 20269 min read