EGG AI Introduction
Welcome to the AI introduction for the EGG project. This guide is designed for developers who are new to AI concepts and want to understand how AI technologies are implemented in our system.
Overview
This section provides a foundational understanding of key AI technologies that power modern applications, with a focus on how they relate to the EGG project.
What is AI?
AI, also known as Artificial Intelligence, is a technology with human-like problem-solving capabilities. AI in action appears to simulate human intelligence—it can recognize images, write poems, and make data-based predictions.^ai
What is deep learning in AI?
Deep learning is an artificial intelligence (AI) method that teaches computers to process data in a way inspired by the human brain. Deep learning models can recognize complex pictures, text, sounds, and other data patterns to produce accurate insights and predictions. You can use deep learning methods to automate tasks that typically require human intelligence, such as describing images or transcribing a sound file into text.^dl
Why is deep learning important?
Deep learning technology drives many artificial intelligence applications used in everyday products, such as the following:
- Chatbots and code generators
- Digital assistants
- Voice-activated television remotes
- Fraud detection
- Automatic facial recognition
It is also a critical component of technologies like self-driving cars, virtual reality, and more. Businesses use deep learning models to analyze data and make predictions in various applications.
What are Large Language Models?
Large language models, also known as LLMs, are very large deep learning models that are pre-trained on vast amounts of data. The underlying transformer architecture consists of neural networks with an encoder and a decoder that have self-attention capabilities. The encoder and decoder work together to extract meaning from text sequences and understand the relationships between words and phrases.^llm
What is Agentic AI?
Agentic AI is an autonomous AI system that can act independently to achieve pre-determined goals. Traditional software follows pre-defined rules, and traditional artificial intelligence also requires prompting and step-by-step guidance. However, agentic AI is proactive and can perform complex tasks without constant human oversight. "Agentic" indicates agency — the ability of these systems to act independently, but in a goal-driven manner.^aai
Prompt Engineering
Prompt engineering is the practice of designing effective inputs (prompts) to guide large language models (LLMs) in generating desired outputs. As LLMs become more prevalent in applications, understanding how to craft effective prompts is essential for maximizing their utility.
Prompt engineering is a relatively new discipline for developing and optimizing prompts to efficiently use language models (LMs) for a wide variety of applications and research topics. Prompt engineering skills help to better understand the capabilities and limitations of large language models (LLMs).^pe
Zero-shot
Zero-shot prompting means that the prompt used to interact with the model won't contain examples or demonstrations. The zero-shot prompt directly instructs the model to perform a task without any additional examples to steer it.
Prompt:
Classify the text into neutral, negative or positive. Text: I think the vacation is okay. Sentiment:
Output:
Neutral
Few-shot
Few-shot prompting can be used as a technique to enable in-context learning where we provide demonstrations in the prompt to steer the model to better performance.
Prompt:
A "whatpu" is a small, furry animal native to Tanzania. An example of a sentence that uses the word whatpu is: We were traveling in Africa and we saw these very cute whatpus.
To do a "farduddle" means to jump up and down really fast. An example of a sentence that uses the word farduddle is:
Output:
When we won the game, we all started to farduddle in celebration.
Chain-of-Thought (CoT)
Introduced in Wei et al. (2022), chain-of-thought (CoT) prompting enables complex reasoning capabilities through intermediate reasoning steps. You can combine it with few-shot prompting to get better results on more complex tasks that require reasoning before responding.^cot
Prompt:
The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.
A: Adding all the odd numbers (9, 15, 1) gives 25. The answer is False.
The odd numbers in this group add up to an even number: 17, 10, 19, 4, 8, 12, 24.
A: Adding all the odd numbers (17, 19) gives 36. The answer is True.
The odd numbers in this group add up to an even number: 16, 11, 14, 4, 8, 13, 24.
A: Adding all the odd numbers (11, 13) gives 24. The answer is True.
The odd numbers in this group add up to an even number: 17, 9, 10, 12, 13, 4, 2.
A: Adding all the odd numbers (17, 9, 13) gives 39. The answer is False.
The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1.
A:
Output:
Adding all the odd numbers (15, 5, 13, 7, 1) gives 41. The answer is False.
For details, please see DeepSeek-R1 笔记.
Retrieval Augmented Generation (RAG)
General-purpose language models can be fine-tuned to achieve several common tasks such as sentiment analysis and named entity recognition. These tasks generally don't require additional background knowledge.
For more complex and knowledge-intensive tasks, it's possible to build a language model-based system that accesses external knowledge sources to complete tasks. This enables more factual consistency, improves reliability of the generated responses, and helps to mitigate the problem of "hallucination".
The following diagram shows the conceptual flow of using RAG with LLMs.^rag
ReAct
A ReAct agent is an AI agent that uses the “reasoning and acting” (ReAct) framework to combine chain of thought (CoT) reasoning with external tool use. The ReAct framework enhances the ability of a large language model (LLM) to handle complex tasks and decision-making in agentic workflows.^react
MCP
While prompt engineering enables effective communication with LLMs, the Model Context Protocol (MCP) provides a standardized way for applications to provide context and tools to these models, enabling more sophisticated interactions.
The Model Context Protocol (MCP) is an open standard that defines how applications can provide context and tools to large language models (LLMs). As AI systems become more sophisticated, MCP has gained significant adoption across the industry, with major tech companies implementing it in their AI solutions.
Examples of MCP implementations include:
MCP is particularly valuable in scenarios
where LLMs need to interact with external tools and systems,
such as retrieving real-time data, accessing databases,
or controlling user interface elements.
The protocol uses
JSON-RPC
as its underlying communication mechanism, which is built on HTTP
and JSON
formats.
Backend MCP
The backend MCP implementation enables our system to provide tools and context to the LLM,
allowing it to perform actions like fetching real-time data.
The following sequence diagram illustrates how the aggs
server processes
user requests through the MCP protocol:
sequenceDiagram
user->>sage: Send message
sage->>LLM: Complete
LLM->>sage: Call tool
sage->>mcps: Call tool (MCP)
mcps->>sage: Tool response
sage->>LLM: Complete
LLM->>sage: response
sage->>user: response
As an example, consider our weather tool implementation:
When implementing MCP tools, it's important to properly handle errors.
For tool-specific errors, we should use mcp.NewToolResultErrorFromErr
rather than returning a generic error.
This allows the Agent to understand what went wrong and potentially recover from the error.
TODO: Implement a wrapper to simplify error handling in MCP tools.
In essence, implementing an MCP tool is similar to creating an RPC method that generates text based on the provided arguments.
Frontend MCP
While backend MCP tools are ideal for accessing external systems and APIs, frontend MCP tools are particularly useful for creating interactive user experiences where the LLM needs to control UI elements in real-time.
For UI actions, we use the MCP protocol to create a clean architectural separation.
All UI actions are registered with an MCP server that is deployed alongside our backend services.
The web page communicates with the UI MCP server using the WebSocket
protocol,
which enables real-time, bidirectional communication for implementing complex interactive logic.
The following sequence diagram shows the flow of interactions in our frontend MCP implementation:
sequenceDiagram
user->>sage: Send message
sage->>LLM: Complete
LLM->>sage: Call tool
sage->>web-mcps: Call tool (MCP)
web-mcps->>user: Call tool (WebSocket)
user->>web-mcps: Call result
web-mcps->>sage: Tool response
sage->>LLM: Complete
LLM->>sage: response
sage->>user: response
In this architecture, business logic is handled in the frontend components. Each tool must be registered on the server and handled in the frontend code.
Example implementation:
Server registration:
TypeScript | |
---|---|
Client handling:
Example tool call format:
JavaScript | |
---|---|
Postscript
Let's ship for the AI era.