创新视觉 AI 助手：VoxelGPT为你带来卓越计算机视觉能力

2023年06月26日由 Alex 发表 376673 0

你想要在不编写代码的情况下，对图像和视频数据集产生有趣的见解吗？现在有了VoxelGPT，你就可以做到了！

VoxelGPT将大型语言模型（LLM）的强大能力与FiftyOne灵活的计算机视觉查询语言相结合，使得使用自然语言对计算机视觉数据集进行语义切片和构建更好的机器学习模型变得比以往更加容易。

这一突破将通过自然语言为你提供对数据的前所未有的控制权。你可以过滤图像和视频数据集，发现新的见解，并成为更好的计算机视觉工程师，而无需编写任何一行代码！

是的，你没看错：VoxelGPT会将你的英文查询翻译成Python代码，为你过滤数据集！更好的是，VoxelGPT是完全开源且免费使用的。

什么是VoxelGPT

VoxelGPT是一个由FiftyOne、GPT-3.5和LangChain构建的LLM应用程序。VoxelGPT提供了一个类似聊天的界面，可以将自然语言查询转化为FiftyOne Python语法，并构建出相应的数据集视图。

这是一个重大突破，因为掌握FiftyOne查询语言及其灵活性通常具有陡峭的学习曲线。借助VoxelGPT，你可以立即运用FiftyOne的全部功能，对数据进行语义切片，而无需了解查询语言的任何先前知识。

更重要的是，VoxelGPT还可以更广泛地回答特定的FiftyOne使用问题和机器学习问题。

你可以通过一个简单的Python API使用VoxelGPT，也可以通过将其安装为FiftyOne插件，在FiftyOne应用程序中本地使用VoxelGPT！

VoxelGPT不能为你完成计算机视觉工作，但它可以显著提高你的速度和效率。它是一个协作编程工具、一种翻译器和一个教育工具的集合。

VoxelGPT功能

VoxelGPT能够处理以下任何类型的查询：

1. 数据集查询

2. FiftyOne文档查询

3. 一般计算机视觉查询

当你向VoxelGPT提问时，它会理解你的意图，并确定你提问的是哪种类型的查询。如果VoxelGPT不确定，它会要求你澄清。

数据集查询

它是如何工作的？VoxelGPT解析你的查询，将其转化为FiftyOne查询语言的Python代码，并显示结果视图。它知道如何处理样本为图像或视频的数据集，并完全支持基于其所开发的示例集的一段或两段式查询。

在解析你的查询时，VoxelGPT会执行以下操作：

1. 识别字段和类名：VoxelGPT能够基于自然语言查询和特定数据集的信息，选择适当的字段和类名。它使用命名实体识别来识别字段和类名，对于类数小于1000的字段，它使用语义匹配进行类名识别。

2. 推断相关计算：VoxelGPT确定自然语言查询是否与Brain运行或Evaluation运行有关，如果有关，它会自动选择相关的运行。

3. 打印有用的消息：如果VoxelGPT确定需要对数据集运行计算，或者查询不包含转换为ViewStages所需的所有信息，它将以消息形式提示相关信息

以下是你可以向VoxelGPT询问的一些数据集查询示例：

1. 检索10个随机样本

2. 显示具有误报预测的最独特图像

3. 仅显示检测到至少两个人且置信度很高的图像

4. 显示与第一张带有狗的图像最相似的25张图像

FiftyOne文档查询

VoxelGPT不仅是一个协作编程工具，它还是一个教育工具。该模型可以访问整个FiftyOne文档，包括教程、用户指南和API参考，并利用这些信息回答你的问题。

以下是你可以向VoxelGPT询问的一些文档查询示例：

1. 如何从FiftyOne Zoo加载数据集?

2. 文档：match()阶段做什么?

3. 我能将我的数据集导出为COCO格式吗？

通过轻松切换数据集查询和文档查询，你可以使用VoxelGPT更好地理解FiftyOne查询语言的工作原理。

一般计算机视觉查询

VoxelGPT还可以回答计算机视觉、机器学习和数据科学的一般问题。它可以帮助你理解基本概念并解决数据质量问题。

以下是你可以向VoxelGPT询问的一些计算机视觉查询示例：

1. 精确率和召回率有什么区别？

2. 如何在图像中检测人脸？

3. 有哪些减少数据集冗余的方法？

VoxelGPT做不到的

虽然VoxelGPT功能强大，但为了提供一个专注的用户体验，并为未来可能提供高价值的升级留出了空间，我们故意限制了其范围。

VoxelGPT目前无法实现的：

1. 进行一般对话：VoxelGPT不是通用的聊天机器人。如果你的查询被认为超出了范围，VoxelGPT将要求你提供新的查询。

2. 执行计算：某些计算，例如生成矢量相似性指数，可能是耗时且费时的。我们（尚）没有使VoxelGPT有能力代表你执行这些计算，但VoxelGPT会识别出它们可能需要运行，并告诉你。

3. 永久操作：同样地，VoxelGPT目前不能永久地改变你的数据，例如，它不能从底层数据集中删除样本、复制数据集或更改任何媒体文件的位置。

4. 委派任务：目前，VoxelGPT没有配备任何像HuggingGPT那样的分派能力。它无法将计算机视觉任务委派给其他机器学习模型。

普遍性和反馈

VoxelGPT的当前实现基于有限的示例集，因此可能不适用于所有数据。你提出的查询越具体，结果就会越好。如果你有一个更复杂的任务，请尝试将其分解为多个自然语言查询，并结合VoxelGPT的结果。

开始使用

在线演示

如果你想体验VoxelGPT，模型如何将自然语言转化为计算机视觉见解你可以在FiftyOne App中使用原生的VoxelGPT来处理一些示例数据集。

本地安装VoxelGPT

你还可以通过按照下面的说明在Python中以编程方式与VoxelGPT进行交互和处理自己的数据集，从而在本地安装VoxelGPT。

安装FiftyOne

如果你还没有安装，安装FiftyOne：

pip install fiftyone

提供OpenAI API密钥

接下来提供一个OpenAI API密钥：

export OPENAI_API_KEY=XXXXXXXX

如果你没有OpenAI密钥，则需要创建一个。

在OpenAI API调用中，单个查询通常只需花费0.01美元

仅应用程序使用

如果你只想在FiftyOne应用程序中使用VoxelGPT，则可以运行以下命令将其安装为插件：

fiftyone plugins download https://github.com/voxel51/voxelgpt

fiftyone plugins requirements @voxel51/voxelgpt --install

Python使用

或者，如果你想以Python/开发人员方式与VoxelGPT进行交互，或者你想为项目做出贡献，你可以克隆voxelgpt存储库：

git clone https://github.com/voxel51/voxelgpt

cd voxelgpt

并安装要求:

pip install -r requirements.txt

要使插件在FiftyOne应用程序中可用，你可以将其符号链接到FiftyOne插件目录：

ln -s "$(pwd)" "$(fiftyone config plugins_dir)/voxelgpt"

在应用中使用VoxelGPT

安装了VoxelGPT之后，你可以在FiftyOne应用程序中本地使用它，无论数据集是什么：

import fiftyone as fo

import fiftyone.zoo as foz



## load quickstart dataset.

## -- If you want to load another dataset from the Zoo,

## -- Ask VoxelGPT: "what datasets are in the FiftyOne Dataset Zoo?"



dataset = foz.load_zoo_dataset("quickstart")

session = fo.launch_app(dataset)

在FiftyOne 应用程序中，只需：

1. 点击网格上方的OpenAI图标

2. 按下Samples选项卡旁边的+图标并选择VoxelGPT

专业提示：使用now关键字将之前的提示作为上下文，用作下一个查询的一部分！

在Python中使用VoxelGPT

如果你选择通过Python/开发方式进行安装，你还可以以Python编程方式与VoxelGPT进行交互。

交互模式

你可以通过ask_voxelgpt_interactive()在交互模式下使用VoxelGPT。在交互模式中，你可以通过input()提示与VoxelGPT进行对话。

要开始使用，只需提供一个数据集和一个应用程序会话：

import fiftyone as fo

import fiftyone.zoo as foz



from voxelgpt import ask_voxelgpt_interactive



dataset = foz.load_zoo_dataset("quickstart")

session = fo.launch_app(dataset)



ask_voxelgpt_interactive(dataset, session=session)

和往常一样，你可以用任何组合的数据集、文档和一般的计算机视觉查询来提问VoxelGPT。例如，一个对话可能如下所示：

You:

what is the difference between precision and recall?

VoxelGPT:

Precision and recall are two important metrics used to evaluate the

performance of a machine learning model in classification tasks. Precision

measures the proportion of true positive predictions among all positive

predictions made by the model. Recall, on the other hand, measures the

proportion of true positive predictions among all actual positive instances

in the dataset.

In other words, precision is the ability of the model to correctly identify

positive instances, while recall is the ability of the model to identify

all positive instances. A high precision means that the model is making

very few false positive predictions, while a high recall means that the

model is able to identify most of the positive instances in the dataset.

It is important to balance precision and recall when evaluating a model's

performance. A model with high precision but low recall may be missing

important positive instances, while a model with high recall but low

precision may be making too many false positive predictions.

In the context of the FiftyOne library, precision and recall can be

computed using the `fiftyone.utils.metrics` module, which provides

functions for computing various classification metrics. For example, the

`fiftyone.utils.metrics.binary_classification` function can be used to

compute precision and recall for binary classification tasks.

You:

does FiftyOne integrate with Label Studio?

VoxelGPT:

Yes, FiftyOne integrates with Label Studio, which is an open-source data

labeling tool with a user-friendly UI. The integration allows you to easily

upload your data directly from FiftyOne to Label Studio for labeling. You

can get started with Label Studio through a simple pip install to get a

local server up and running. FiftyOne provides simple setup instructions

that you can use to specify the necessary account credentials and server

endpoint to use. FiftyOne also provides an API to create projects, upload

data, define label schemas, and download annotations using Label Studio,

all programmatically in Python.

You:

show me predicted airplanes

VoxelGPT:

Identified potential algorithms: evaluation

No evaluation runs found.

Identified potential fields: predictions

Identified potential label classes: {'predictions': ['airplane']}

Identified potential view stages: ['match', 'filter_labels', 'match_labels', 'exclude_labels', 'filter_field']

Okay, I'm going to load dataset.filter_labels("predictions",F("label") == "airplane")

You:

now only show me the first 10 samples

VoxelGPT:

Identified potential algorithms: evaluation

No evaluation runs found.

Identified potential fields: predictions

Identified potential label classes: {'predictions': ['airplane']}

Identified potential view stages: ['match', 'limit', 'limit_labels', 'skip', 'sort_by']

Okay, I'm going to load dataset.match(F("predictions.detections.label").contains("airplane")).limit(10)

You: exit

在交互模式中，VoxelGPT会自动在应用程序中加载其创建的任何视图，并且你可以通过会话对象访问这些视图：

print(session.view.count("predictions.detections"))

单一查询模式

如果你只想运行一个查询，你可以使用ask_voxelgpt()：

from voxelgpt import ask_voxelgpt

ask_voxelgpt("Does FiftyOne integrate with CVAT?")



Yes, FiftyOne integrates with CVAT, which is an open-source image and video

annotation tool. You can upload your data directly from FiftyOne to CVAT to

add or edit labels. FiftyOne provides simple setup instructions that you can

use to specify the necessary account credentials and server endpoint to use.

CVAT provides three levels of abstraction for annotation workflows: projects,

tasks, and jobs.

如果你将数据集与你的查询一起传递，并且VoxelGPT将你的提示解释为加载视图的请求，它将将视图返回给你：

import fiftyone as fo

import fiftyone.zoo as foz



dataset = foz.load_zoo_dataset("quickstart")

view = ask_voxelgpt("show me 10 random samples", dataset)

从那里，你可以像处理其他FiftyOne中的任何其他DatasetView一样与视图进行交互或进一步完善。

num_objects = view.count_values("ground_truth.detections.label")

结论

VoxelGPT为你的计算机视觉数据集带来了大型语言模型的强大能力。它完全是开源的。

来源：https://medium.com/voxel51/voxelgpt-your-ai-assistant-for-computer-vision-57920eb5c982

标签：

支撑计算机视觉机器学习人工智能数据科学

0 评论

欢迎关注ATYUN官方公众号

商务合作及内容投稿请联系邮箱:bd@atyun.com

上一篇 ChessGPT：用人工智能下棋的创新方法和其他5种有趣的选择

下一篇什么是Deepfake技术——起源及影响

评论登录

要发表评论，您必须先登录。

jonatasgrosman/wav2vec2-large-xlsr-53-english facebook/dino-vitb16 bert-base-uncased xlm-roberta-large xlm-roberta-base gpt2 microsoft/resnet-50 facebook/dino-vits8

AGENTIC AI如何塑造未来