本地部署LLM教程：Ollama与Langchain构建RAG应用

2024年03月21日由 alex 发表 2578 0

Ollama

Ollama 是一个轻量级的灵活框架，专为在个人电脑上本地部署 LLM 而设计。它通过直观的API简化了 LLM 的开发、执行和管理，并提供了一系列预配置模型，可供各种应用立即使用。其设计的核心是将模型权重、配置和数据打包成一个统一的包，封装在一个模型文件中。

该框架提供了一系列经过预先量化和优化的模型，如 Llama 2、Mistral 和 Gemma，随时可供部署。这些模型经过专门设计，可在标准消费类硬件（包括 CPU 和 GPU）上运行，并兼容多种操作系统，包括 macOS、Linux 和 Windows。这种方法使用户无需自行承担复杂的模型优化任务。

鉴于 LLM 因其相当大的体积通常需要强大的 GPU 来运行，Ollama 支持的模型采用了神经网络量化技术。这项技术大大降低了对硬件的要求，使 LLM 可以在没有互联网连接的普通计算设备上高效运行。因此，Ollama 使 LLM 技术更加普及，个人和组织都能在消费级硬件上利用这些先进的模型。

RAG 应用程序

该 RAG 应用程序包含一个定制的数据集，该数据集是从一个在线网站动态抓取的。用户可以通过 API（如 REST API）与网站数据交互。为了演示，我选择了 Open5GS 文档网站（Open5GS 是 5G 核心的 C 语言实现）。Open5GS 文档中的数据会被剪切、分割，然后以矢量嵌入的形式存储在 Chroma 矢量数据库中。因此，用户可以通过 API 与 Open5GS 文档的内容进行无缝交互。

对于这个 RAG 应用程序的 LLM 部分，我选择了通过 Ollama 运行的 Llama2 7B 模型。在 Ollama 上运行的 Llama2 是 Meta 基于 Llama-2 的 LLM，为在 CPU 等消费级硬件上实现最佳性能而进行了量化。在这个 RAG 应用程序中，与 Ollama 一起运行的 Llama2 LLM 会根据 Open5GS 文档中的内容回答用户的问题。RAG 应用程序和 LLM 的集成通过 Langchain 实现。

以下是 RAG 应用程序的主要功能。包含这些不同组件的综合功能架构详见下图。

1. 抓取网络数据

Langchain 提供了不同类型的文档加载器，可将不同来源的数据加载为文档。RecursiveUrlLoader 就是这样一种文档加载器，可用于将网络 url 中的数据加载为文档。本步骤使用 Langchain 的 RecursiveUrlLoader 将数据从网络抓取为文档。RecursiveUrlLoader 会以给定的最大深度递归刮取给定的 url，并读取网络上的数据。这些数据用于创建向量嵌入和回答用户的问题。

2. 分割文档

在处理冗长的文本时，必须将文本分割成较小的片段。虽然这项任务看似简单，但其实相当复杂。目标是确保语义相关的文本片段保持在一起。Langchain 文本分割器能有效地完成这项任务。从本质上讲，它将文本分割成小的、有语义意义的单元（通常是句子）。然后，这些较小的片段被组合成较大的块，直到它们达到由特定函数决定的一定大小。达到一定大小后，该语块就会被指定为单独的文本片段，然后在一定程度的重叠后重新开始。在这种特殊情况下，我使用了 RecursiveCharacterTextSplitter 将刮擦文档分割成易于管理的文本块。

3. 创建向量嵌入

数据收集和分割完成后，下一步就是将文本信息转换为矢量嵌入。然后根据分割后的数据创建这些嵌入。文本嵌入对 LLM 运行至关重要。虽然使用自然语言处理语言模型在技术上是可行的，但存储和检索此类数据的效率非常低。为了提高效率，有必要将文本数据转换为向量形式。有一些专用的机器学习模型，专门用于从文本中创建嵌入。在本例中，我使用了开放的 HuggingFaceEmbedding 模型 all-MiniLM-L6-v2 来生成向量嵌入。文本因此被转换成多维向量，这些向量本质上是捕捉语义和上下文细微差别的高维数字表示。嵌入后，这些数据可以进行分组、排序、搜索等。我们可以计算两个句子之间的距离，以确定它们的关联程度。重要的是，这些操作超越了依赖关键词的传统数据库搜索，而是捕捉句子之间的语义密切程度。

4. 在 Chroma 中存储向量嵌入

生成的矢量嵌入会存储在 Chroma 矢量数据库中。Chroma（通常称为 ChromaDB）是一个开源嵌入式数据库，通过存储和检索嵌入式及其元数据以及文档和查询，可以轻松构建 LLM 应用程序。Chroma 能有效地处理这些嵌入，从而快速检索和比较基于文本的数据。传统数据库能很好地满足精确查询的要求，但在理解人类语言的细微差别方面却存在不足。矢量数据库的出现改变了语义搜索的处理方式。与依赖精确单词或短语的传统文本匹配不同，矢量数据库（如使用 pgvector 的 Postgres）从语义上处理信息。该数据库是系统将用户查询与搜刮内容中最相关信息进行匹配的基石，从而实现快速、准确的响应。

5. 用户提问

系统提供 API，用户可通过该 API 提交问题。在此用例中，用户可以提出与 Open5GS 文档内容相关的任何问题。该 API 是用户与聊天机器人互动的主要接口。API 需要一个参数 user_id，用于识别不同的用户会话。该 user_id 用于演示目的。在现实世界中，它可以通过 HTTP 请求中的授权头（如 JWT Bearer 标记）进行管理。应用程序接口的设计直观易用，用户可以轻松输入查询并接收响应。

6. 创建问题的矢量嵌入

当用户通过 API 提交问题时，系统会将问题转换为矢量嵌入。嵌入的生成由 ConversationalRetrievalChain 自动处理。这有助于在向量数据库中对与问题相关的文档进行语义搜索。

7. 语义搜索向量数据库

问题的矢量嵌入创建完成后，系统会利用语义搜索功能扫描矢量数据库，找出与用户查询最相关的内容。通过比较问题的矢量嵌入和存储数据的矢量嵌入，系统可以准确定位与查询上下文相似或相关的信息。在这种情况下，我使用了会话检索链（ConversationalRetrievalChain），它可以根据输入的查询自动处理语义搜索。然后，语义搜索的结果被识别为 LLM 的上下文。

8. 生成提示

接下来，ConversationalRetrievalChain 会根据用户的问题和语义搜索结果（上下文）生成自定义提示。语言模型的提示是用户提供的一组指令或输入，用于指导模型做出响应。这有助于模型理解上下文，并生成相关的、连贯的、基于语言的输出，如回答问题、完成句子或进行对话。

9. 将提示发布到 LLM

生成提示后，它将通过 Langchain 库 Ollama（Langchain 在 langchain_community.llms 中正式支持 Ollama）发布到 LLM（在我们的例子中是 Llama2 7B）。然后，LLM 根据所提供的上下文找到问题的答案。ConversationalRetrievalChain 负责处理向 LLM 发布查询的功能（在幕后，它使用 OpenAI API 提交问题）。

10. LLM 生成答案

LLM 利用 Meta 的 Llama-2 的高级功能，根据所提供的内容处理问题。然后生成回复并发送回去。

11. 在 MongoDB 聊天历史中保存查询和响应

Langchain 为管理会话记忆提供了多种组件。在本聊天机器人中，MongoDB 被用于管理会话记忆。在此阶段，用户的问题和聊天机器人的回复都会作为聊天历史的一部分记录在 MongoDB 存储中。这种方法可确保所有用户的聊天历史记录都持久地存储在 MongoDB 中，从而能够检索以前的交互。数据按用户会话存储在 MongoDB 中。如前所述，为了区分用户会话，API 使用了 user_id 参数。这些历史数据对塑造未来的交互至关重要。当同一用户提出后续问题时，聊天历史记录和新的语义搜索结果（上下文）会被转发给 LLM。这一过程确保聊天机器人能在整个对话过程中保持上下文，从而做出更精确、更有针对性的回应。

12. 将回复发送给用户

最后，从 LLM 收到的答案会通过 HTTP API 转发给用户。用户只需提供相同的用户id，就能在后续请求中继续提出不同的问题。然后，系统会识别用户的聊天历史记录，并将其与新的语义搜索结果一起纳入发送给 LLM 的信息中。这一过程确保了无缝和上下文感知的对话，丰富了每次交互的用户体验。

实现

本聊天机器人的完整实现详述如下。

1. 配置

在 config.py 文件中，我定义了聊天机器人使用的各种配置。这些配置通过环境变量读取，符合 12 因子应用程序的原则。

import os
# define init index
INIT_INDEX = os.getenv('INIT_INDEX', 'false').lower() == 'true'
# vector index persist directory
INDEX_PERSIST_DIRECTORY = os.getenv('INDEX_PERSIST_DIRECTORY', "./data/chromadb")
# target url to scrape
TARGET_URL =  os.getenv('TARGET_URL', "https://open5gs.org/open5gs/docs/")
# http api port
HTTP_PORT = os.getenv('HTTP_PORT', 7654)
# mongodb config host, username, password
MONGO_HOST = os.getenv('MONGO_HOST', 'localhost')
MONGO_PORT = os.getenv('MONGO_PORT', 27017)
MONGO_USER = os.getenv('MONGO_USER', 'testuser')
MONGO_PASS = os.getenv('MONGO_PASS', 'testpass')

2. HTTP API

HTTP API 在 api.py 中实现。该应用程序接口包括一个 HTTP POST 端点 api/question，它接受一个包含问题和用户 ID 的 JSON 对象。user_id 用于演示目的。在实际应用中，这可以通过 HTTP 请求中的授权标头（如 JWT Bearer 标记）来管理。收到用户的问题请求后，会将其转发给 ChatBot 模型中的聊天功能。

from flask import Flask
from flask import jsonify
from flask import request
from flask_cors import CORS
import logging
import sys
from model import init_index
from model import init_conversation
from model import chat
from config import *
app = Flask(__name__)
CORS(app)
logging.basicConfig(stream=sys.stdout, level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
@app.route('/api/question', methods=['POST'])
def post_question():
    json = request.get_json(silent=True)
    question = json['question']
    user_id = json['user_id']
    logging.info("post question `%s` for user `%s`", question, user_id)
    resp = chat(question, user_id)
    data = {'answer':resp}
    return jsonify(data), 200
if __name__ == '__main__':
    init_index()
    init_conversation()
    app.run(host='0.0.0.0', port=HTTP_PORT, debug=True)

3. 模型

以下是模型的实现。它包括一个函数 init_index，该函数从给定的 Web URL 抓取数据并创建向量存储。环境变量 INIT_INDEX 用于决定是否创建索引。init_conversation 函数使用 Ollama 的 Llama2 LLM 初始化 ConversationalRetrievalChain，该 LLM 可通过 Ollama 的模型 REST API <host>:11434 使用（Ollama 提供了用于与 LLM 交互的 REST API）。

from langchain_community.llms import Ollama
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import PyPDFLoader
from langchain.document_loaders import DirectoryLoader
from langchain.document_loaders.recursive_url_loader import RecursiveUrlLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory
from langchain.llms import OpenAI
from langchain.embeddings import HuggingFaceEmbeddings
from bs4 import BeautifulSoup as Soup
from langchain.utils.html import (PREFIXES_TO_IGNORE_REGEX,
                                  SUFFIXES_TO_IGNORE_REGEX)
from config import *
import logging
import sys
logging.basicConfig(stream=sys.stdout, level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

global conversation
conversation = None

def init_index():
    if not INIT_INDEX:
        logging.info("continue without initializing index")
        return
    # scrape data from web
    documents = RecursiveUrlLoader(
        TARGET_URL,
        max_depth=4,
        extractor=lambda x: Soup(x, "html.parser").text,
        prevent_outside=True,
        use_async=True,
        timeout=600,
        check_response_status=True,
        # drop trailing / to avoid duplicate pages.
        link_regex=(
            f"href=[\"']{PREFIXES_TO_IGNORE_REGEX}((?:{SUFFIXES_TO_IGNORE_REGEX}.)*?)"
            r"(?:[\#'\"]|\/[\#'\"])"
        ),
    ).load()
    logging.info("index creating with `%d` documents", len(documents))
    # split text
    # this chunk_size and chunk_overlap effects to the prompt size
    # execeed promt size causes error `prompt size exceeds the context window size and cannot be processed`
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
    documents = text_splitter.split_documents(documents)
    # create embeddings with huggingface embedding model `all-MiniLM-L6-v2`
    # then persist the vector index on vector db
    embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
    vectordb = Chroma.from_documents(
        documents=documents,
        embedding=embeddings,
        persist_directory=INDEX_PERSIST_DIRECTORY
    )
    vectordb.persist()

def init_conversation():
    global conversation
    # load index
    embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
    vectordb = Chroma(persist_directory=INDEX_PERSIST_DIRECTORY,embedding_function=embeddings)
    # llama2 llm which runs with ollama
    # ollama expose an api for the llam in `localhost:11434`
    llm = Ollama(
        model="llama2",
        base_url="http://localhost:11434",
        verbose=True,
    )
    # create conversation
    conversation = ConversationalRetrievalChain.from_llm(
        llm,
        retriever=vectordb.as_retriever(),
        return_source_documents=True,
        verbose=True,
    )

def chat(question, user_id):
    global conversation
    chat_history = []
    response = conversation({"question": question, "chat_history": chat_history})
    answer = response['answer']
    logging.info("got response from llm - %s", answer)
    # TODO save history
    return answer

运行应用程序

以下是操作 ChatBot 应用程序并与其交互的主要步骤。可以使用 HTTP API 提交问题，并收到相应的回复。

1. 安装依赖项

在此应用程序中，我使用了许多 Python 软件包，在运行应用程序之前，需要使用 Python 的 pip 包管理器安装这些软件包。requirements.txt 文件列出了所有必要的软件包。

huggingface-hub
sentence-transformers
Flask==2.0.1
Werkzeug==2.2.2
flask-cors
langchain==0.0.352
chromadb==0.3.29
tiktoken
unstructured
unstructured[local-pdf]
unstructured[local-inference]

我使用了 python 虚拟环境来设置这些依赖项。执行 pip install -r requirements.txt 命令即可轻松安装这些软件包。

# create virtual environment in `ollama` source directory
❯❯ cd ollama 
❯❯ python -m venv .venv
# enable virtual environment
❯❯ source .venv/bin/activate
# install dependencies
❯❯ pip install -r requirements.txt

2. 运行 Ollama Llama2

Ollama 提供多种部署选项，可作为独立二进制文件在 macOS、Linux 或 Windows 上运行，也可在 Docker 容器中运行。这种灵活性可确保用户在自己喜欢的平台上轻松设置 LLM 并与之交互。Ollama 支持命令行和 REST API 交互，可以无缝集成到各种工作流程和应用中。通过 Ollama 运行 Llama2 模型就是其实用性的一个例子，展示了其高效托管和管理 LLM 的能力。下面是使用 Docker 部署 Ollama 的图解方法，重点介绍我在该平台上运行 Llama2 模型的经验。

# run ollama with docker
# use directory called `data` in current working as the docker volume, 
# all the data in the ollama(e.g downloaded llm images) will be available in that data director
❯❯ docker run -d -v $(PWD)/data:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
# connect to ollama container
❯❯ docker exec -it ollama bash
# run llama2 llm
# this will download the llm image and run it
# if llm image already exists it will start the llm image
root@150bc5106246:/# ollama run llama2
pulling manifest
pulling 8934d96d3f08... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 3.8 GB
pulling 8c17c2ebb0ea... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 7.0 KB
pulling 7c23fb36d801... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 4.8 KB
pulling 2e0493f67d0c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏   59 B
pulling fa304d675061... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏   91 B
pulling 42ba7f8a01dd... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  557 B
verifying sha256 digest
writing manifest
removing any unused layers
success
>>> 
# exit from llm console
>>> /bye
root@c96f4fc1be6f:/#
# list running llms
root@150bc5106246:/# ollama list
NAME          ID           SIZE   MODIFIED
llama2:latest 78e26419b446 3.8 GB 10 hours ago
# reconnect to the llm console
root@c96f4fc1be6f:/# ollama run llama2
>>>
# ask question via llm console
root@c96f4fc1be6f:/# ollama run llama2
>>> what is docker
Docker is an open-source platform that enables you to create, deploy, and run applications in containers. Containers are lightweight and portable, allowing you to move your application between different
environments without worrying about compatibility issues. Docker provides a consistent and reliable way to deploy applications, making it easier to manage and scale your infrastructure.


---


# ollama exposes REST API(`api/generate`) to the llm which runs on port `11434`
# we can ask question via the REST API(e.g using `curl`)
# ask question and get answer as streams
# `"stream": true` will streams the output of llm(e.g send word by word as stream)
❯❯ curl http://localhost:11434/api/generate -d '{
  "model": "llama2",
  "prompt": "what is docker?",
  "stream": true
}'
{"model":"llama2","created_at":"2024-03-17T10:41:53.358162047Z","response":"\n","done":false}
{"model":"llama2","created_at":"2024-03-17T10:41:53.494021698Z","response":"D","done":false}
{"model":"llama2","created_at":"2024-03-17T10:41:53.630381369Z","response":"ocker","done":false}
{"model":"llama2","created_at":"2024-03-17T10:41:53.766590368Z","response":" is","done":false}
{"model":"llama2","created_at":"2024-03-17T10:41:53.902649027Z","response":" an","done":false}
{"model":"llama2","created_at":"2024-03-17T10:41:54.039338585Z","response":" open","done":false}
{"model":"llama2","created_at":"2024-03-17T10:41:54.175494123Z","response":"-","done":false}
{"model":"llama2","created_at":"2024-03-17T10:41:54.311130558Z","response":"source","done":false}
{"model":"llama2","created_at":"2024-03-17T10:41:54.447809241Z","response":" platform","done":false}
{"model":"llama2","created_at":"2024-03-17T10:41:54.585971524Z","response":" that","done":false}
{"model":"llama2","created_at":"2024-03-17T10:41:54.723769251Z","response":" enables","done":false}
{"model":"llama2","created_at":"2024-03-17T10:41:54.862244297Z","response":" you","done":false}
{"model":"llama2","created_at":"2024-03-17T10:41:54.999796889Z","response":" to","done":false}
{"model":"llama2","created_at":"2024-03-17T10:41:55.136406278Z","response":" create","done":false}
{"model":"llama2","created_at":"2024-03-17T10:41:55.273430683Z","response":",","done":false}
{"model":"llama2","created_at":"2024-03-17T10:41:55.411326998Z","response":" deploy","done":false}
{"model":"llama2","created_at":"2024-03-17T10:41:55.54792922Z","response":",","done":false}
{"model":"llama2","created_at":"2024-03-17T10:41:55.68550623Z","response":" and","done":false}
# ask question and get answer without stream
# that will wail till getting full response from llm and output
❯❯ curl http://localhost:11434/api/generate -d '{
  "model": "phi",
  "prompt": "Why is docker?",
  "stream": false
}'
{"model":"phi","created_at":"2024-03-16T23:42:34.140800795Z","response":" Docker is a containerization platform that allows you to package your application code, dependencies, and runtime environment into a single executable container. This makes it easy to run your applications on any machine with Docker installed, as long as the container has the necessary dependencies and configuration settings. Containers are also more isolated from each other than traditional installations of applications, which can help improve performance and security. Additionally, Docker provides tools for automating common tasks such as building and deploying containers, making it a popular choice among developers and IT professionals.\n","done":true,"context":[11964,25,317,8537,1022,257,11040,2836,290,281,11666,4430,8796,13,383,8796,3607,7613,7429,284,262,2836,338,2683,13,198,12982,25,4162,318,36253,30,198,48902,25,25716,318,257,9290,1634,3859,326,3578,345,284,5301,534,3586,2438,11,20086,11,290,19124,2858,656,257,2060,28883,9290,13,770,1838,340,2562,284,1057,534,5479,319,597,4572,351,25716,6589,11,355,890,355,262,9290,468,262,3306,20086,290,8398,6460,13,2345,50221,389,635,517,11557,422,1123,584,621,4569,26162,286,5479,11,543,460,1037,2987,2854,290,2324,13,12032,11,25716,3769,4899,329,3557,803,2219,8861,884,355,2615,290,29682,16472,11,1642,340,257,2968,3572,1871,6505,290,7283,11153,13,198],"total_duration":6343449574,"load_duration":21148773,"prompt_eval_duration":65335000,"eval_count":107,"eval_duration":6256397000}%

3. 运行 RAG 应用程序

RAG 应用程序可通过 api.py 启动，如下所示。在运行之前，有必要通过环境变量设置一些配置。执行 app.py 后，它将启动 HTTP API，使用户可以发布问题。

# enable virtual environment in `ollama` source directory 
❯❯ cd ollama
❯❯ source .venv/bin/activate
# set env variabl INIT_INDEX which determines weather needs to create the index
❯❯ export INIT_INDEX=true
# run aplication
❯❯ python api.py
2024-03-16 20:54:05,715 - INFO - index creating with `18` documents
2024-03-16 20:54:06,682 - INFO - Load pretrained SentenceTransformer: all-MiniLM-L6-v2
2024-03-16 20:54:09,036 - INFO - Use pytorch device_name: mps
2024-03-16 20:54:09,373 - INFO - Anonymized telemetry enabled. See https://docs.trychroma.com/telemetry for more information.
2024-03-16 20:54:09,467 - INFO - loaded in 361 embeddings
2024-03-16 20:54:09,468 - INFO - loaded in 1 collections
2024-03-16 20:54:09,469 - INFO - collection with name langchain already exists, returning existing collection
2024-03-16 20:54:11,177 - INFO - Persisting DB to disk, putting it in the save folder: ./data/chromadb
2024-03-16 20:54:11,197 - INFO - Load pretrained SentenceTransformer: all-MiniLM-L6-v2
2024-03-16 20:54:12,418 - INFO - Use pytorch device_name: mps
2024-03-16 20:54:12,476 - INFO - Anonymized telemetry enabled. See https://docs.trychroma.com/telemetry for more information.
2024-03-16 20:54:12,490 - INFO - loaded in 722 embeddings
2024-03-16 20:54:12,490 - INFO - loaded in 1 collections
2024-03-16 20:54:12,491 - INFO - collection with name langchain already exists, returning existing collection
 * Serving Flask app 'api' (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: on
2024-03-16 20:54:12,496 - INFO - WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:7654
 * Running on http://192.168.0.110:7654

4. 发布问题

运行 RAG 应用程序后，我可以通过 HTTP API 提交与 Open5GS 文档相关的问题。

# post question
❯❯ curl -i -XPOST "http://localhost:7654/api/question" \
--header "Content-Type: application/json" \
--data '
{
  "question": "what is open5gs",
  "user_id": "kakka"
}
'

# ConversationalRetrievalChain generate following prompt with question, semantic seach result and send to llm
> Entering new LLMChain chain...
Prompt after formatting:
Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
Open5GS     Sukchan Lee  acetcom@gmail.com        GitHub  open5gs      Open5GS is a C-language implementation of 5G Core and EPC, i.e. the core network of NR/LTE network (Release-17)
Open5GS     Sukchan Lee  acetcom@gmail.com        GitHub  open5gs      Open5GS is a C-language implementation of 5G Core and EPC, i.e. the core network of NR/LTE network (Release-17)
Question: what is open5gs
Helpful Answer:
> Finished chain.
> Finished chain.
2024-03-16 20:55:05,843 - INFO - got response from llm - Based on the provided context, Open5GS appears to be an open-source implementation of the 5G Core and EPC networks in C language. It is being developed by Sukchan Lee and can be found on GitHub.

# response
{
  "answer": "Based on the provided context, Open5GS appears to be an open-source implementation of the 5G Core and EPC networks in C language. It is being developed by Sukchan Lee and can be found on GitHub."
}


---


# post next question
❯❯ curl -i -XPOST "http://localhost:7654/api/question" \
--header "Content-Type: application/json" \
--data '
{
  "question": "what is EPC",
  "user_id": "kakka"
}
'

# ConversationalRetrievalChain generate following prompt with question, semantic seach result and send to llm
> Entering new LLMChain chain...
Prompt after formatting:
Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
Open5GS     Sukchan Lee  acetcom@gmail.com        GitHub  open5gs      Open5GS is a C-language implementation of 5G Core and EPC, i.e. the core network of NR/LTE network (Release-17)
Open5GS     Sukchan Lee  acetcom@gmail.com        GitHub  open5gs      Open5GS is a C-language implementation of 5G Core and EPC, i.e. the core network of NR/LTE network (Release-17)
Question: what is EPC
Helpful Answer:
> Finished chain.
2024-03-16 20:56:24,053 - INFO - got response from llm - EPC stands for "Evolved Packet Core." It is a component of the 5G network that provides the core networking functions for the NR/LTE network.

# response
{
  "answer": "EPC stands for \"Evolved Packet Core.\" It is a component of the 5G network that provides the core networking functions for the NR/LTE network."
}

文章来源：https://medium.com/rahasak/build-rag-application-using-a-llm-running-on-local-computer-with-ollama-and-langchain-e6513853fda0

标签：

人工智能

0 评论

欢迎关注ATYUN官方公众号

商务合作及内容投稿请联系邮箱:bd@atyun.com

上一篇使用机器学习实现自动化测试：提高性能和准确性

下一篇 LlamaIndex：大型语言模型的分块策略

评论登录

要发表评论，您必须先登录。

jonatasgrosman/wav2vec2-large-xlsr-53-english facebook/dino-vitb16 bert-base-uncased xlm-roberta-large xlm-roberta-base gpt2 microsoft/resnet-50 facebook/dino-vits8

AGENTIC AI如何塑造未来