Hugging Face

Hugging Face is a platform where the AI community collaborates on models, datasets, research papers, and applications, with an emphasis on open science and open source.

Hermes Uses a Minimal Agent Loop to Preserve State Across Channels

Alejandro AO’s walkthrough of Hermes presents the agent as a deliberately small always-on system rather than a complex orchestration stack. He argues that Hermes’ usefulness comes from a simple loop that builds context from Markdown files, message history, tools, skills and memory, then preserves state through compression, SQLite transcripts, optional external memory providers, gateway integrations and scheduled cron jobs. The architecture’s central concern is continuity: keeping enough context across channels and time for the agent to behave like a persistent assistant.

Alejandro AOJun 17, 202611 min read

MiniCPM-V 2.6 Runs at 18 Tokens per Second on iPhone

OpenBMB used its Build Small hackathon session to argue that small models are valuable when they can be deployed where applications and data already live: on phones, laptops, mobile apps and edge devices. Its main example was MiniCPM-V 2.6, a vision-language model shown running on an iPhone 15 Pro at 18 tokens per second with llama.cpp and 4-bit quantization. The broader claim was that compact, open models paired with existing runtimes can expand access, reduce cloud dependence, and improve privacy and latency for local AI use cases.

Jun 10, 20266 min read

Hackathon Caps Models at 32B Parameters to Reward Tinkerable AI Apps

Build Small is a Hugging Face and Gradio hackathon organized around a hard constraint: every model used must be under 32 billion parameters. Yuvraj Sharma framed the rule as a way to move AI building away from dependence on giant hosted models and back toward systems that participants can inspect, fine-tune, run locally, and ship as working Gradio Spaces. Sponsor presentations from Black Forest Labs, OpenBMB, OpenAI, NVIDIA, Modal, JetBrains, and Cohere largely reinforced that premise, offering small models, credits, tools, and prize categories meant to turn the constraint into runnable projects rather than demos in name only.

Shashank Verma · Vaibhav Srivastav · Stephen Batifol · Julian Mack · Yuvraj Sharma · Felicia Chang · Nikita Pavlichenko · Hannah Blair · Zhong ZhangJun 5, 202620 min read

LeLab Brings No-Code Training to the LeRobot Robotics Pipeline

Hugging Face presents LeLab as a graphical interface for its LeRobot library that moves much of the robot-learning workflow out of the command line after installation. The source argues that users can configure and calibrate robot arms, add cameras, collect and clean demonstration datasets, train policies locally or on Hugging Face Jobs, and test checkpoints on the robot through one GUI. It also makes clear that LeLab reduces operational friction rather than removing the hard parts of robot learning: the user still has to assemble hardware, teleoperate consistently, record good demonstrations, and evaluate behavior on the physical robot.

Nikodem BartnikJun 3, 20266 min read

FineWeb Shows LLM Dataset Quality Depends on Measured Web Filtering

Alejandro Ao’s overview of Hugging Face’s FineWeb argues that building a competitive LLM pretraining dataset from Common Crawl is a measurement-driven engineering process, not a matter of collecting more web text. He presents FineWeb as an open recipe in which Hugging Face chose raw HTML extraction over Common Crawl’s text extracts, found that global deduplication removed valuable data, and selected filters by training and evaluating small models. The same logic underpins FineWeb-Edu, where Llama-3-70B labels were distilled into a smaller classifier to filter the corpus for educational value at scale.

Alejandro AOJun 2, 202611 min read

Transformers.js Turns Local AI Models Into JavaScript Pipelines

Nico Martin presents Transformers.js as the JavaScript application layer around local AI models, not the engine that performs the model math. In his explanation, ONNX defines the model graph and weights, ONNX Runtime executes the computation, and Transformers.js handles the surrounding work: loading assets, converting inputs to tensors, selecting devices and precision, and decoding outputs. Martin argues that this task-based abstraction is why one `pipeline()` API can support very different workloads, from text generation to depth estimation, while hiding much of the model-specific wiring from developers.

Nico MartinMay 27, 20267 min read

Pre-Training Scale Is Losing Ground to Adaptive AI Systems

Sara Hooker, co-founder of Adaption Labs, argues in a Hugging Face ML Club India talk that AI progress is moving away from ever-larger pre-training runs as the default path and toward systems that adapt more efficiently after deployment. She says compute still matters, but the higher-return questions now concern data curation, post-training, test-time compute, interfaces, routing, and how cheaply models can learn from new information. Her case is that monolithic, one-size-fits-all models push the cost of adaptation onto users and concentrate participation among labs with the largest compute clusters.

Sayak Paul · Aritra Gosthipaty · Sara HookerMay 21, 202620 min read