背景

基于Google Agent白皮书,做的一些摘记。

Cognitive architectures

To do so, the model must not only have access to a set of external tools, it needs the ability to plan and execute any task in a selfdirected fashion. In its most fundamental form, a Generative AI agent can be defined as an application that attempts to achieve a goal by observing the world and acting upon it using the tools that it has at its disposal.

png

  • In the scope of an agent, a model refers to the language model (LM) that will be utilized as the centralized decision maker for agent processes. The model used by an agent can be one or multiple LM’s of any size (small / large) that are capable of following instruction based reasoning and logic frameworks, like ReAct, Chain-of-Thought, or Tree-of-Thoughts.
  • Tools bridge the gap between the agent’s internal capabilities and the external world, unlocking a broader range of possibilities.
  • The orchestration layer describes a cyclical process that governs how the agent takes in information, performs some internal reasoning, and uses that reasoning to inform its next action or decision. The complexity of the orchestration layer can vary greatly depending on the agent and task it’s performing. It uses the rapidly evolving field of prompt engineering and associated frameworks to guide reasoning and planning, enabling the agent to interact more effectively with its environment and complete tasks.

The quality of agent responses can be tied directly to the model’s ability to reason and act about these various tasks, including the ability to select the right tools, and how well that tools has been defined.

Tools: keys to the outside world

Extensions

The easiest way to understand Extensions is to think of them as bridging the gap between an API and an agent in a standardized way, allowing agents to seamlessly execute APIs regardless of their underlying implementation.

An Extension bridges the gap between an agent and an API by:

  1. Teaching the agent how to use the API endpoint using examples.
  2. Teaching the agent what arguments or parameters are needed to successfully call the API endpoint.

png

Extensions can be crafted independently of the agent, but should be provided as part of the agent’s configuration. The agent uses the model and examples at run time to decide which Extension, if any, would be suitable for solving the user’s query. This highlights a key strength of Extensions, their built-in example types, that allow the agent to dynamically select the most appropriate Extension for the task.

Functions

Functions differ from Extensions in a few ways, most notably:

  1. A model outputs a Function and its arguments, but doesn’t make a live API call.
  2. Functions are executed on the client-side, while Extensions are executed on the agent-side.

png

One key thing to remember about functions is that they are meant to offer the developer much more control over not only the execution of API calls, but also the entire flow of data in the application as a whole.

functions offer a straightforward framework that empowers application developers with fine-grained control over data flow and system execution, while effectively leveraging the agent/model for critical input generation.

Data stores

Data Stores allow developers to provide additional data in its original format to an agent, eliminating the need for time-consuming data transformations, model retraining, or finetuning. The Data Store converts the incoming document into a set of vector database embeddings that the agent can use to extract the information it needs to supplement its next action or response to the user.

One of the most prolific examples of Data Store usage with language models in recent times has been the implementation of Retrieval Augmented Generation (RAG) based applications.

png

Enhancing model performance with targeted learning

  • In-context learning: This method provides a generalized model with a prompt, tools, and few-shot examples at inference time which allows it to learn ‘on the fly’ how and when to use those tools for a specific task. The ReAct framework is an example of this approach in natural language.
  • Retrieval-based in-context learning: This technique dynamically populates the model prompt with the most relevant information, tools, and associated examples by retrieving them from external memory. An example of this would be the ‘Example Store’ in Vertex AI extensions or the data stores RAG based architecture mentioned previously.
  • Fine-tuning based learning: This method involves training a model using a larger dataset of specific examples prior to inference. This helps the model understand when and how to apply certain tools prior to receiving any user queries.

Summary

  1. Agents extend the capabilities of language models by leveraging tools to access realtime information, suggest real-world actions, and plan and execute complex tasks autonomously.
  2. At the heart of an agent’s operation is the orchestration layer, a cognitive architecture that structures reasoning, planning, decision-making and guides its actions.
  3. Tools, such as Extensions, Functions, and Data Stores, serve as the keys to the outside world for agents, allowing them to interact with external systems and access knowledge beyond their training data.

应用

参考

Agents, Julia Wiesinger, Patrick Marlow and Vladimir Vuskovic, google

附件

版本记录

2025-05-19,初稿;