Why We Built Our AI Agentic Framework in Rust From the Ground Up

12 min readAug 27, 2024

Rust facilitated the creation of a fast, secure, extensible, and auditable autonomous AI Agent Function Tool Calling Framework all contained within a single executable binary file (our “genie-in-a-binary”) that allows us to maintain a self contained “Agent Provenance”

Zectonal is a software company that characterizes and monitors multimodal data sets for defects, errors, poisoned data, and other anomalous conditions that deviate from defined baselines. Our goal is to detect and prevent bad data from polluting data lakes and causing faulty analysis when using business intelligence and AI decision making tools.

We leverage a growing number of specialized AI agents to call specific diagnostic algorithms, or function tools, that can characterize large data sets in order to find and diagnose quality defects or malicious content.

We developed our own Rust AI Agentic framework in order to facilitate efficient and auditable agent communications using OpenAI, Anthropic, and Ollama API’s. Our framework is extensible, and our future roadmap includes the possibility of best-of-breed agents running in any combination of remote online service and/or on-premise in order to maximize capabilities, ensure a more secure AI experience, and minimize costs.

Our Rust framework is allowing us to experiment with autonomous agents that can determine when to create and spawn new agents on-demand based on immediate needs and without human interaction. This includes software function tools that can create new software function tools and assign them to AI created agents— our form of meta AI programming.

Our Initial GPT-3 Chat

Many months ago, our initial use case was a simple “Chat” interface that would allow users to answer questions such as “what is a parquet file and why are they in my data lake?” This Chat capability provided nominal value, but then, as now, we were betting on an AI future of yet-to-emerge capabilities that would enhance our product’s core functionality.

Needle-in-a-Haystack Algorithms As the Basis For Function Tools

Over the past 24 months, we leveraged Rust to develop a large number of efficient query and file diagnostic capabilities for a variety of file codecs and data stores.

We call these set of capabilities Zectonal Deep Data Inspection. What Deep Data Inspection is for multimodal data, Deep Packet Inspection (DPI) is for network data.

We deliberately chose an embedded datastore that was extremely fast in order to take advantage of the inherit speed of compiled Rust code. These queries supported our proprietary set of algorithms that we refer to as the “needle in the haystack” algorithms— they were designed to find anomalous characteristics inside multimodal files in a large data store, data pipeline, or data lake. As an example, think of detecting malformed text inside a single cell in a spreadsheet with a million rows and million columns at sub-second speed. Extend that to analyzing similar files flowing through a data pipeline every few seconds on a 24/7 basis.

Our initial product included an extensive set of coded rules. While Rust was better suited to rules-based analysis due to its inherent speed and zero abstraction data structures, there was still a scalability issue as the number of data sources and file codecs we supported increased in volume and frequency. When we first started hearing about LLM function tool calling from the UC Berkeley AI Research Lab (“BAIR”), we immediately saw the value of having LLM’s decide what function tools to call instead of our pre-defined rules-based approach. While function tool calling is still in its infancy, it has provided us with a unique way to scale our internal software decision making capabilities that leverage our Rust-based algorithms.

We get all of this LLM function tool calling capability inside a single Rust compiled binary application that can be deployed almost anywhere, and still get all of the speed and security benefits afforded by Rust.
We sometimes refer to this Rust LLM function calling capability as our “genie-in-a-binary”

We check the Berkeley Function Calling Leaderboard daily, which gives us an appreciation of how fast this function calling arms-race within an AI arms-race is progressing.

Editorial note, when describing LLM function tool calling, it is often misunderstood that the LLM does not actually call the function tool, but responds back to the calling request with the function that should be executed locally by the software application making the request. Essentially, this offloads decision making to agents, but execution responsibility still remains with the software application, in our case, our Zectonal application. To further confuse things, what only a few month ago was commonly referred to as “function calling”, or “function tool calling”, is now just “tool” calling according to most API documentation. We use these terms interchangeably and will eventually get on the vocabulary bandwagon.

Python For Data Science But Not for AI

Zectonal has been a vocal and visible advocate of Rust since our inception. This was due in part to our prior experiences using Python TensorFlow for machine learning inferencing on smaller form factor hardware devices. Maybe we were too early or asking too much, but Python just could not keep up. We had a similar experience with “Global Interpreter Lock” (“GIL”) a few years prior when capturing network packet data where we had to re-write significant multi-threaded Python code. Using Rust, we have become very accustomed to just having everything blazingly fast, being fearless about concurrency, and not to mention the elimination of a whole class of runtime bugs and vulnerabilities through its type safety features.

As we started to explore AI agents and function tool calling, we observed firms like LangChain and LlamaIndex which were developed in Python — the language of data science. Experimenting with these early Python-based tools, we found them overly complex, and subject to the same unreliabilities as the underlying models. For example, when enabling an LLM to call function tools with a Python based framework, we found it was very difficult to detect when the wrong tool was called by the LLM. While not the tool’s fault, we felt it was still the implied responsibility of the tool to alert the user that something was off. It still required a human to detect and diagnose by intuition if the wrong function tool was called.

When human intuition is an arbiter to determine if the right tool was called by an LLM, it is easy to recognize that it is not going to be a scalable solution for us.

We were already in our own Rust AI world when Elon famously stated Rust was the language of AI. Welcome aboard.

Recognizing the Initial Enterprise AI Experience

Over the past year especially, we observed that enterprises were starting to feel the adverse impact of hallucinations primarily through their Chat and generative text experiences. The AI marketing hype engine had run its course, so we recognized that while AI was going to be a powerful technology enabler for some time to come, we needed to give our users some additional visibility into how our AI operated at a scale.

Auditable AI Agent Communications

Because we had such a large existing corpus of fast-query “needle in a haystack” diagnostic tools, once our framework was in place, we were able to quickly scale to dozens of specialized agents each with their unique set of function tools. Almost immediately, for our own sanity and pocket book, we recognized that we needed to have visibility into how our agents communicated, and under what conditions agents and tools were called, etc.

One of the first observations when starting to use AI agents was how “chatty” they are when communicating amongst themselves.

Tracking Agent Provenance

We call this ability to track agent communications from initial tasking to completion, and everything in between, Agent Provenance, named after the concept of Data Provenance.

Although we could not always predict the behavior of our agents, viewing the Agent Provenance provided visibility to understand what motivated agent behavior, and thereby allowed us to develop better agent prompts, better function tool descriptions, and allowed us to fine tune models more efficiently where needed.

Taken together, these optimizations helped us help the LLM choose the right tools, more of the time.

In our latest release, we can now provide real-time feedback into our UI so users can see what agents and function tools are called, and track their progress responding to a task or question.

Optimizing a General Purpose Utility Function for our AI Agents

A utility function is a key concept used to describe how an artificial intelligence system might make decisions and prioritize actions. It represents the AI’s goals, values, and preferences, essentially defining what the AI considers good or desirable outcomes. As we were instrumenting our Agent Provenance, it dawned on us that this could be a good way to start implementing our own utility function for our agents. We started building more and more subjective and objective metrics into each interaction that became part of the overall provenance, things like intent and confidence. This enriched the Agent Provenance with the goal of providing feedback to reward providing useful information, and penalize otherwise. This is still an interesting area of research that is on-going for us, and was made possible through our Rust-based framework.

Extensibility

OpenAI and Anthropic Take An Early Lead

OpenAI was the first company to introduce dedicated API’s in order to facilitate agent based communications, what they refer to as Assistants. They have now released two versions of their Assistants API, and most recently started supporting structured outputs. Structured LLM tool outputs was a concept we incorporated into our Rust framework out of necessity before it was released by OpenAI. In 2024, Gpt-4o was one of those incremental game changers for us, as was Gpt-4o-mini especially for development cycles when we need to keep costs manageable.

With the release of Anthropic’s Claude Sonnet in mid-June 2024, we saw first hand how quickly things can change. Claude now also supports function tool calling via its API’s, and was a leader on the Berkeley Leaderboard for some time during 2024.

Both OpenAI and Anthropic API’s are supported by our extensible Rust Agentic framework

Ollama — On Premise AI Function Calling Takes Shape

Our experiences with the open source Ollama application are profound, and they really deserve a series of blogs in their own right. Suffice to say, we are big fans of Ollama and the models that can run on it.

A fundamental objective of our software is to allow customers to gain data monitoring insights in any environment, but most importantly within on-premise environments. We are not a SaaS product, an early deliberate design decision based on our past experiences selling into Fortune 50 companies and their liability concerns with allowing external managed service providers to host their data. It is mind-blowing to us after so many years, that in 2024 that we are still seeing record breaking data breaches impacting enterprises who have decided to outsource their data hosting and management.

Ollama and its cousin llama.cpp are important building blocks for running a large number of diverse LLM models on-premise. HuggingFace as a model and training data repository is a great showcase for the amount of innovation that is going into the development and fine tuning of LLM’s that can run on these systems. Throughout 2024 we tracked GitHub requests for Ollama to support function tool calling, and so we were excited to see that capability was finally released at the end of July 2024. We believe this will be a game changer for both Ollama and for running LLM’s on-premise to support a wider array of use cases outside of the common Chat and Code Completion use cases.

What Does Our Future Hold for AI Agents

Hybrid Best-of-Breed Online/On-Premise Agents

Given the ease of extensibility with our Rust Agentic framework, a concept we are currently testing is leveraging best-of-breed agents across multiple vendors and models in order to maximize functionality, avoid lock-in, and minimize costs. This theme should sound familiar.

As ex-AWS, we saw firsthand with cloud computing that it was often out-of-the-box thinking that allowed us to leverage the best the cloud could offer while not giving up on open source and on-premise capabilities. This hybrid approach allowed us flexibility to avoid cloud lock-in as well as the ability to optimize for costs. We believe the same is true for agents.

In our case, we realized not all agents needed to be equal in terms of functionality, and we were highly motivated to reduce our metered billing costs by using less expensive agents whenever possible.

Using tokens as a metric for metered billing is abstract enough that most consumers cannot quantify it, they can’t fully budget for it, it is hard to predict, and the costs (or revenues if you are an LLM provider) can escalate very quickly. 1–2 sentences worth of messages and prompts to an agent might be 30 tokens, a paragraph might be 100, but maybe not! Do system prompts for each agent get sent for every message, and do they contribute towards token counts or are they cached? If cached, for how long? There was a anecdote on Reddit describing chatty agents unexpectedly resulted in a $5,000 bill. This reminds us a lot of those early cloud adoption years where a new customer would get a huge cloud bill out of the blue due to the compute not getting turned off — because who really ever turned off a server before the cloud existed.

Agent Cost Forecasting — No Science all Art

There are no cost calculation tools for using agents from LLM providers. We have found several configuration options and fine tuning techniques that can mitigate token use (and thereby costs) when using agents and tool calling, and the LLM providers now provide sandboxes for testing individual agents. Overall, budgeting and forecasting costs when using agents is still a relatively opaque process.

Our goal is to find an equilibrium where some smaller number of highly specialized and fine-tuned agents might exist within an LLM provider such as OpenAI or Anthropic, and others that do not require such expertise could exist on-premise via Ollama.

Similarly, we want to support customer use case where all agents can run on-premise with no external communication, even if that might mean sacrificing some short-term functionality and not using an LLM service provider.

Since all agents communicate through our Rust framework, it is transparent to the end user where agents run, although their actions would be visible in our audit framework and provenance.

Meta Agents That Create More Agents — Tools That Create Tools

With each of our subsequent Zectonal releases in 2024, the number of agents and tools is growing, yet we always ship with a fixed number of agents and tools.

Another on-going experiment is the concept of autonomous software agents creating and spawning other software agents. This is analogous to meta programming but for AI.

For example, assume we have a dozen pre-configured running agents, and through their process of determining a chain-of-thought execution plan they decide that a specialized capability is lacking, such as analyzing a never before seen file codec. The agents collectively might decide to spawn a new software agent called the New File Codec Agent. LLM’s are surprisingly good at creating system prompts, so this is not that far fetched. Further, let’s assume we already have another pre-configured agent called Create More Tools Agent running an LLM specializing in code development (which are already abundantly available and exist for any language of your choice), and its whole job is to create software tools. Based on the execution plan it is tasked to create a number of new tools and assigns them to the newly spawned New File Codec Agent. Now we have a brand new New File Codec Agent that can call a number of LLM-generated software tools based on the presence of a new file codec. Agents that create Agents. Tools that create more tools.

A big question is where does it end.
Hopefully not with infinite paperclips.

There are governance issues for sure, and fundamental questions remain about what, if any diminishing returns, exists for adding new agents. At what point do too many agents produce less effective results? Financial budgeting become a real concern for enterprise customers if spawning more agents increases costs.

Security questions abound about vulnerabilities that could be accidentally or deliberately introduced in either the spawned agent instructions or tool creation.

We are excited for the future and what other interesting possibilities might exist using our Rust agentic framework.

Try Out Zectonal

We are always interested in getting individuals interested in AI agents and function tool calling to try out our software and provide feedback.

Our software is currently available for download for Mac ARM and x86. For our free command line monitoring tool available to anyone, its as easy as:

brew tap zectonal/zectonal
brew update
brew install zc

For our Zectonal UI with our newly released AI Agentic framework, reach out to us at license@zectonal.com to obtain a trial demonstration license. We will have you up and running in minutes!