数据科学:使用DSPy自动提示优化和测试

2024年05月08日 由 alex 发表 91 0

本文旨在展示这种“提示编程”是如何完成的,以更深入地解释优化过程背后发生的事情。


DSPy的基本概念:签名和模块

它们是 DSPy 中即时编程的构建块。让我们深入了解它们是关于什么的!


签名: 输入/输出说明

签名是 DSPy 提示编程中最基本的构件,是对 DSPy 模块输入/输出行为的声明性规范。签名允许你告诉 LM 它需要做什么,而不是指定我们应该如何要求 LM 去做。


比方说,我们想获得一个句子的情感,传统上我们可能会写这样的提示语:


Given a sentence {the_sentence_itself}, deduce its sentiment.


但在 DSPy 中,我们可以通过定义如下签名来实现同样的目的。签名的最基本形式就是一个用 -> 分隔输入和输出的字符串。


# Define signature
signature = 'sentence -> sentiment'
classify = dspy.Predict(signature)
# Run
sentence = "it's a charming and often affecting journey."
classify(sentence=sentence).sentiment


--- Output ---
"I'm sorry, but I am unable to determine the sentiment of the sentence without additional context or information. If you provide me with more details or specific criteria for determining sentiment, I would be happy to assist you further."


这个预测并不好,但为了便于教学,让我们来看看发出的提示是什么。


# This is how we inpect the last issued prompt to the LM
lm.inspect_history(n=1)


--- Output ---
Given the fields `sentence`, produce the fields `sentiment`.
---
Follow the following format.
Sentence: ${sentence}
Sentiment: ${sentiment}
---
Sentence: it's a charming and often affecting journey.
Sentiment: I'm sorry, but I am unable to determine the sentiment of the sentence without additional context or information. If you provide me with more details or specific criteria for determining sentiment, I would be happy to assist you further.


我们可以看到,上述提示是由sentence -> sentiment特征组合而成的。但 DSPy 是如何在提示中使用 "Given the fields… "的呢?


通过查看 dspy.Predict() 类,我们可以看到,当我们向其传递签名时,签名将被解析为signature类的属性,并随后组装为提示语。该instructions是 DSPy 库中硬编码的默认指令。


# Check the variables of the `classify` object,
# which was created by passing the signature to `dspy.Predict()` class
vars(classify)


--- Output ---
{
 'signature': StringSignature(sentence -> sentiment
     instructions='Given the fields `sentence`, produce the fields `sentiment`.'
     sentence = Field(annotation=str required=True json_schema_extra={'__dspy_field_type': 'input', 'prefix': 'Sentence:', 'desc': '${sentence}'})
     sentiment = Field(annotation=str required=True json_schema_extra={'__dspy_field_type': 'output', 'prefix': 'Sentiment:', 'desc': '${sentiment}'})
 ),
 'some_other_attributes': 'xxx'}


如果除了基本的sentence -> sentiment签名之外,我们还想向 LLM 提供更详细的目标描述,该怎么办?为此,我们需要以基于类的 DSPy 签名的形式提供更详细的签名。


请注意,我们没有明确指示 LLM 应如何获取情感。我们只是描述了手头的任务以及预期输出。


# Define signature in Class-based form
class Emotion(dspy.Signature):
    # Describe the task
    """Classify emotions in a sentence."""
    
    sentence = dspy.InputField()
    # Adding description to the output field
    sentiment = dspy.OutputField(desc="Possible choices: sadness, joy, love, anger, fear, surprise.")
classify_class_based = dspy.Predict(Emotion)
# Issue prediction
classify_class_based(sentence=sentence).sentiment


--- Output ---
Sentence: It's a charming and often affecting journey.
Sentiment: joy


现在,它输出的预测结果要好得多!我们再次看到,我们在定义基于类的 DSPy 签名时所作的描述被组合成了一个提示。


Classify emotions in a sentence.
---
Follow the following format.
Sentence: ${sentence}
Sentiment: Possible choices: sadness, joy, love, anger, fear, surprise.
---
Sentence: it's a charming and often affecting journey.
Sentiment: Sentence: It's a charming and often affecting journey.
Sentiment: joy


对于简单的任务来说,这样做也许可以,但高级应用可能需要复杂的提示技术,如 Chain of Thought 或 ReAct。在 DSPy 中,这些技术以模块的形式实现


模块: 抽象提示技术

我们可能习惯于通过在提示语中硬编码 "let’s think step by step "等短语来应用 "提示技术"。在 DSPy 中,这些提示技术被抽象为模块。下面我们来看一个将基于类的签名应用到 dspy.ChainOfThought 模块的例子


# Apply the class-based signature to Chain of Thought
classify_cot = dspy.ChainOfThought(Emotion)
# Run
classify_cot(sentence=sentence).sentiment
# Inspect prompt
lm.inspect_history(n=1)


--- Output ---
Classify emotions in a sentence.
---
Follow the following format.
Sentence: ${sentence}
Reasoning: Let's think step by step in order to ${produce the sentiment}. We ...
Sentiment: Possible choices: sadness, joy, love, anger, fear, surprise.
---
Sentence: it's a charming and often affecting journey.
Reasoning: Let's think step by step in order to Sentence: It's a charming and often affecting journey.
Reasoning: Let's think step by step in order to determine the sentiment. The use of the words "charming" and "affecting" suggests positive emotions associated with enjoyment and emotional impact. We can infer that the overall tone is positive and heartwarming, evoking feelings of joy and possibly love.
Sentiment: Joy, love


根据 DSPy 的文档,截至本文撰写之时,DSPy 以模块的形式提供了以下提示技术。请注意,我们在初始示例中使用的 dspy.Predict 也是一个模块,不代表任何提示技术!


  1. dspy.Predict:基本预测器。不修改签名。处理学习的主要形式(即存储指令、演示和更新 LM)。
  2. dspy.ChainOfThought(思维链): 教导 LM 在作出签名响应之前逐步思考。
  3. dspy.ProgramOfThought: 教导 LM 输出代码,其执行结果将决定响应。
  4. dspy.ReAct: 可使用工具执行给定签名的代理。
  5. dspy.MultiChainComparison:多链比较: 可以比较 ChainOfThought 的多个输出结果,从而得出最终预测。


它还有一些函数式模块:


6. dspy.majority: 可以进行基本投票,从一组预测中返回最受欢迎的回应。


连锁模块

另一方面,RAG 怎么办?我们可以将模块串联起来,处理更大的问题!


首先,我们定义一个检索器,在我们的示例中,我们使用 ColBERT 检索器从Wikipedia Abstracts 2017中获取信息


# Configure retriever
rm = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')
dspy.settings.configure(rm = rm)


然后,我们定义继承自 dspy.Module 的 RAG 类。它需要两个方法:


  • __init__ 方法将简单地声明它需要的子模块:dspy.Retrieve 和 dspy.ChainOfThought。定义后者是为了实现我们的context, question -> answer签名。
  • forward 方法将描述使用现有模块回答问题的控制流。


# Define a class-based signature
class GenerateAnswer(dspy.Signature):
    """Answer questions with short factoid answers."""
    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")
# Chain different modules together to retrieve information from Wikipedia Abstracts 2017, then pass it as context for Chain of Thought to generate an answer
class RAG(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()
        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
    
    def forward(self, question):
        context = self.retrieve(question).passages
        answer = self.generate_answer(context=context, question=question)
        return answer


然后,我们利用该类进行 RAG


# Initilize our RAG class
rag = RAG()
# Define a question and pass it into the RAG class
my_question = "When was the first FIFA World Cup held?"
rag(question=my_question).answer


--- Output ---
'1930'


通过检查提示,我们可以看到,从Wikipedia Abstracts 2017中检索到的 3 个段落被插入作为 "思维链 "生成的上下文


Answer questions with short factoid answers.
---
Follow the following format.
Context: may contain relevant facts
Question: ${question}
Reasoning: Let's think step by step in order to ${produce the answer}. We ...
Answer: often between 1 and 5 words
---
Context:
[1] «History of the FIFA World Cup | The FIFA World Cup was first held in 1930, when FIFA president Jules Rimet decided to stage an international football tournament. The inaugural edition, held in 1930, was contested as a final tournament of only thirteen teams invited by the organization. Since then, the World Cup has experienced successive expansions and format remodeling to its current 32-team final tournament preceded by a two-year qualifying process, involving over 200 teams from around the world.»
[2] «1950 FIFA World Cup | The 1950 FIFA World Cup, held in Brazil from 24 June to 16 July 1950, was the fourth FIFA World Cup. It was the first World Cup since 1938, the planned 1942 and 1946 competitions having been cancelled owing to World War II. It was won by Uruguay, who had won the inaugural competition in 1930, clinching the cup by beating the hosts Brazil 2–1 in the deciding match of the four-team final group (this was the only tournament not decided by a one-match final). It was also the first tournament where the trophy was referred to as the Jules Rimet Cup, to mark the 25th anniversary of Jules Rimet's presidency of FIFA.»
[3] «1970 FIFA World Cup | The 1970 FIFA World Cup was the ninth FIFA World Cup, the quadrennial international football championship for men's national teams. Held from 31 May to 21 June in Mexico, it was the first World Cup tournament staged in North America, and the first held outside Europe and South America. Teams representing 75 nations from all six populated continents entered the competition, and its qualification rounds began in May 1968. Fourteen teams qualified from this process to join host nation Mexico and defending champions England in the sixteen-team final tournament. El Salvador, Israel, and Morocco made their first appearances at the final stage, and Peru their first since 1930.»
Question: When was the first FIFA World Cup held?
Reasoning: Let's think step by step in order to Answer: 1930
Answer: 1930


上面的例子可能看起来并不多。在最基本的应用中,DSPy 似乎只是做了一些 f-string 无法做到的事情,但它实际上为提示语的编写带来了范式上的转变,因为它为提示语的组成带来了模块化!


DSPy 的强大之处不仅限于模块化,它还可以根据训练样本优化我们的提示,并对其进行系统测试。


优化器: 像机器学习一样训练我们的提示

我们将尝试使用 DSPy 对 RAG 应用程序的提示进行优化。


以 Chain of Thought 为例,除了添加 "让我们一步步思考 "短语外,我们还可以通过一些调整来提高其性能:


  1. 添加合适的示例(又称少量学习)。
  2. 此外,我们还可以引导推理演示,教导 LM 运用适当的推理方法来处理手头的任务。


手动完成这项工作非常耗时,而且无法推广到不同的问题,但有了 DSPy,这项工作就能自动完成。让我们深入了解一下!


准备工作

1:加载测试数据: 与机器学习一样,为了训练我们的提示,我们需要准备训练数据集和测试数据集。起初,这一单元需要运行 20 分钟左右。


from dspy.datasets.hotpotqa import HotPotQA
# For demonstration purpose we will use a small subset of the HotPotQA dataset, 20 for training and testing each
dataset = HotPotQA(train_seed=1, train_size=20, eval_seed=2023, dev_size=20, test_size=0)
trainset = [x.with_inputs('question') for x in dataset.train]
testset = [x.with_inputs('question') for x in dataset.dev]
len(trainset), len(testset)


检查我们的数据集,它基本上是一组问答对


Example({'question': 'At My Window was released by which American singer-songwriter?', 'answer': 'John Townes Van Zandt'}) (input_keys={'question'})


2 为可观测性设置 Phoenix: 为了便于理解优化过程,我们启动 Phoenix 来观察我们的 DSPy 应用程序,它是一般 LLM 可观察性的好工具!


提示优化

然后,我们就可以看看这次优化的目的了!要 "训练 "我们的提示符,我们需要三样东西:


  1. 训练集。我们将使用训练集中的 20 个问答示例。
  2. 一个验证指标。在这里,我们使用本地的 dspy.evaluate.answer_exact_match,它可以检查预测答案是否与正确答案完全匹配(虽然有疑问,但用于演示已经足够)。对于实际应用,你可以定义自己的评估标准
  3. 特定优化器(原提词器)。DSPy 库包含许多优化策略,你可以在这里查看。在我们的示例中,我们使用了 BootstrapFewShot。与其在这里长篇大论,不如随后用代码来演示。


现在我们来训练我们的提示。


from dspy.teleprompt import BootstrapFewShot
# Simple optimizer example. I am explicitly stating the default values for max_bootstrapped_demos and max_labeled_demos for demonstration purposes
optimizer = BootstrapFewShot(metric=dspy.evaluate.answer_exact_match, max_bootstrapped_demos=4)
# Compile!
compiled_rag = optimizer.compile(RAG(), trainset=trainset)


--- Successful execution should show this output ---
Bootstrapped 4 full traces after n examples in round 0


在使用 compiled_rag 回答问题之前,我们先来看看训练过程(又称编译)的幕后过程。我们通过浏览器访问 http://localhost:6006/ 来启动 Phoenix 控制台


2


在我的运行中,我使用 RAG 类进行了 14 次调用,在每次调用中,我们都会向 LM 提出一个问题,以获得预测结果。


请参阅我笔记本中的结果汇总表,从这 14 个样本中得出了 4 个正确答案,从而达到了我们的 max_bootstrapped_demos 参数并停止了调用。


但 DSPy 发出了哪些提示来获取引导演示呢?下面是问题 #14 的提示。我们可以看到,当 DSPy 尝试生成一个引导演示时,它会从我们的训练集中随机添加样本,以进行短时学习。


Answer questions with short factoid answers.
---
{Pairs of question-and-answer as samples}
---
Follow the following format.
Context: may contain relevant facts
Question: ${question}
Reasoning: Let's think step by step in order to ${produce the answer}. We ...
Answer: often between 1 and 5 words
---
Context:
[1] «Eric Davis (baseball) | Eric Keith Davis (born May 29, 1962) is a former center fielder for several Major League Baseball teams. Davis was 21 years old when he broke into the big leagues on May 19, 1984 with the Cincinnati Reds, the team for which he is most remembered. Blessed with a rare combination of excellent foot speed and bat speed, Davis became the first major league player to hit at least 30 home runs and steal at least 50 bases in the same season in 1987.»
[2] «Willie Davis (baseball) | William Henry Davis, Jr. (April 15, 1940 – March 9, 2010) was a center fielder in Major League Baseball who played most of his career for the Los Angeles Dodgers. At the end of his career he ranked seventh in major league history in putouts (5449) and total chances (5719) in the outfield, and third in games in center field (2237). He was ninth in National League history in total outfield games (2274), and won Gold Glove Awards from 1971 to 1973. He had 13 seasons of 20 or more stolen bases, led the NL in triples twice, and retired with the fourth most triples (138) by any major leaguer since 1945. He holds Los Angeles club records (1958–present) for career hits (2091), runs (1004), triples (110), at bats (7495), total bases (3094) and extra base hits (585). His 31-game hitting streak in 1969 remains the longest by a Dodger. At one point during the streak, when the team was playing at home, the big message board at Dodger Stadium quoted a message from a telegram sent to Davis and the team from Zack Wheat, the team's former record holder, at his home in Missouri.»
[3] «1992 Los Angeles Dodgers season | The 1992 Los Angeles Dodgers season was a poor one for the team as it finished last in the Western Division of the National League with a record of 63 wins and 99 losses. Despite boasting what was nicknamed the "Outfield of Dreams", being manned by Eric Davis, Brett Butler, and Darryl Strawberry, injuries to key players and slumps from others contributed to the franchise's worst season since moving to Los Angeles. Additionally, the Dodgers cancelled four home games during the season due to the L.A. Riots. Despite the poor finish, the Dodgers had some hope for the future as first baseman Eric Karros won the National League Rookie of the Year Award, the first of five consecutive Dodger players to do so. The 1992 season also saw the Dodgers drop television station KTTV Ch.11 as their chief broadcaster of Dodger baseball, ending a 34 year-35 consecutive season association with that station. Additionally, it was the first time the Dodgers lost 90 games in a season since 1944.»
Question: Having the combination of excellent foot speed and bat speed helped Eric Davis, create what kind of outfield for the Los Angeles Dodgers?
Reasoning: Let's think step by step in order to Answer: "Outfield of Dreams"
Answer: "Outfield of Dreams"


是时候对compiled_rag 进行测试了!在这里,我们提出一个在汇总表中回答错误的问题,看看这次能否得到正确答案。


compiled_rag(question="Which of these publications was most recently published, Who Put the Bomp or Self?")"Which of these publications was most recently published, Who Put the Bomp or Self?")


--- Output ---
Prediction(
    rationale='Answer: Self',
    answer='Self'
)


现在我们得到了正确答案!


让我们再次检查发出的提示。请注意编译后的提示与引导过程中使用的提示有什么不同。除了少数几个例子外,提示中还添加了从正确预测中引导出的上下文-问题-推理-答案演示,从而提高了 LM 的能力。


Answer questions with short factoid answers.
---
{Pairs of question-and-answer as samples}
---
Follow the following format.
Context: may contain relevant facts
Question: ${question}
Reasoning: Let's think step by step in order to ${produce the answer}. We ...
Answer: often between 1 and 5 words
---
{4 sets of Context-Question-Reasoning-Answer demonstrations}
---
Context:
[1] «Who Put the Bomp | Who Put The Bomp was a rock music fanzine edited and published by Greg Shaw from 1970 to 1979. Its name came from the hit 1961 doo-wop song by Barry Mann, "Who Put the Bomp". Later, the name was shortened to "Bomp!"»
[2] «Bompiani | Bompiani is an Italian publishing house based in Milan, Italy. It was founded in 1929 by Valentino Bompiani.»
[3] «What Color is Your Parachute? | What Color is Your Parachute? by Richard Nelson Bolles is a book for job-seekers that has been in print since 1970 and has been revised every year since 1975, sometimes substantially. Bolles initially self-published the book (December 1, 1970), but it has been commercially published since November 1972 by Ten Speed Press in Berkeley, California. As of September 28, 2010, the book is available in 22 languages, it is used in 26 countries around the world, and over ten million copies have been sold worldwide. It is one of the most highly regarded career advice books in print. In the latest edition of the book, the author writes about how to adapt one's job search to the Web 2.0 age.»
Question: Which of these publications was most recently published, Who Put the Bomp or Self?
Reasoning: Let's think step by step in order to Answer: Self
Answer: Self


因此,下面的内容基本上就是 BootstrapFewShot 在编译过程中的幕后情况:


3


上述例子与我们通常使用机器学习所做的仍有差距: 即使 boostrapping 也许有用,但我们还没有证明它能提高响应的质量。


理想情况下,就像传统的机器学习一样,我们应该定义几个候选模型,看看它们在测试集上的表现如何,然后选择一个性能得分最高的模型。这就是我们接下来要做的!


正式示例: 使用 LLM 进行提示比较


本例的目的

我们将评估在使用 LM(GPT 3.5 Turbo)的情况下,针对 HotpotQA 数据集(以 CC BY-SA 4.0 许可发布)执行 RAG 的 "最佳提示"(以模块和优化器组合表示)。


评估的模块有:

  • Vanilla:单跳 RAG,根据检索到的上下文回答问题,不含 "让我们一步步思考 "等关键短语
  • COT:带有思维链的单跳 RAG
  • ReAct: 带有 ReAct 提示的单跳 RAG
  • BasicMultiHop:带有思维链的双跳 RAG


候选优化器是:

  • 无: 除签名外无其他说明
  • 带标记的少量实例: 只需从提供的标记 Q/A 对中构建少量示例
  • 引导少量示例: 正如我们演示的那样,为模块的每个阶段自生成完整的演示。只需使用生成的演示(如果它们通过了度量标准),而无需进一步优化。对于 Vanilla 来说,它就等同于 "标记的少量演示"(Labeled few-shot)。


至于评估指标,我们再次使用精确匹配作为测试集的标准(dspy.evaluate.metrics.answer_exact_match)。


比较

让我们开始吧!首先,我们定义模块


# Vanilla
class Vanilla(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()
        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.Predict("context, question -> answer")
    
    def forward(self, question):
        context = self.retrieve(question).passages
        answer = self.generate_answer(context=context, question=question)
        return answer
    
vanilla = Vanilla()
# COT
class COT(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()
        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought("context, question -> answer")
    
    def forward(self, question):
        context = self.retrieve(question).passages
        answer = self.generate_answer(context=context, question=question)
        return answer
    
cot = COT()
# ReAct
react = dspy.ReAct("question-> answer", tools=[dspy.Retrieve(k=3)], max_iters=5)
# BasicMultiHop
class BasicMultiHop(dspy.Module):
    def __init__(self, passages_per_hop=3):
        self.retrieve = dspy.Retrieve(k=passages_per_hop)
        self.generate_query = dspy.ChainOfThought("context, question-> search_query")
        self.generate_answer = dspy.ChainOfThought("context, question-> answer")
    def forward(self, question):
        context = []
        for hop in range(2):
            query = self.generate_query(context=context, question=question).search_query
            context += self.retrieve(query).passages
        return self.generate_answer(context=context, question=question)
    
multihop = BasicMultiHop(passages_per_hop=3)


然后为我们的候选模型定义排列组合


from dspy.teleprompt import LabeledFewShot, BootstrapFewShot
metric = dspy.evaluate.metrics.answer_exact_match
modules = {
    'vanilla': vanilla,
    'cot': cot,
    'react': react,
    'multihop': multihop,
}
optimizers = {
    'none': None,
    'labeled_few_shot': LabeledFewShot(),
    'bootstrap_few_shot': BootstrapFewShot(metric=metric, max_errors=20),
}


现在我们准备开始评估,大约需要 20 分钟完成


# Compile the models
ms = ModelSelection(modules=modules, optimizers=optimizers, metric=metric, trainset=trainset)
# Evaluate them
ms.evaluate(testset=testset)


下面是评估结果。我们可以看到,使用 BootstrapFewShot 优化器的 COT 模块性能最佳。分数代表测试集的正确答案百分比(根据精确匹配判断)。


4


不过,在结束演练之前,我们不妨对结果进行更深入的研究: 使用 BootstrapFewShot 的 Multihop 本应比使用 BootstrapFewShot 的 COT 拥有更多相关上下文,但其性能却更差。这很奇怪!


调试并微调我们的提示

现在前往 Phoenix Console 看看发生了什么。我们随机选择一个问题William Hughes Miller was born in a city with how many inhabitants ?,并检查 COT、ReAct、BasicMultiHop 和 BoostrapFewShot 优化器是如何得出答案的。你可以在搜索栏中输入以下内容进行过滤:"""William Hughes Miller was born in a city with how many inhabitants ?""" in input.value


5


以下是我运行过程中 3 个模型提供的答案:


  • 使用 BootstrapFewShot 进行多跳:The answer will vary based on the specific city of William Hughes Miller’s birthplace.
  • 使用 BootstrapFewShot 进行反应:Kosciusko, Mississippi
  • COT 与 BootstrapFewShot:The city of Kosciusko, Mississippi, has a population of approximately 7,402 inhabitants.


根据 2010 年人口普查,正确答案为 7,402 人。使用 BootstrapFewShot 的 ReAct 和使用 BootstrapFewShot 的 COT 都提供了相关答案,但使用 BootstrapFewShot 的 Multihop 却没有提供答案。


查看 Phoenix 中使用 BootstrapFewShot 的 Multihop 的执行跟踪,LM 似乎无法理解签名中指定的 search_query 的预期结果。


6


因此,我们修改了签名,并用下面的代码重新进行了评估


# Define a class-based signature
class GenerateAnswer(dspy.Signature):
    """Answer questions with short factoid answers."""
    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")
class BasicQA(dspy.Signature):
    """Answer questions with short factoid answers."""
    
    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")
class FollowupQuery(dspy.Signature):
    """Generate a query which is conducive to answering the question"""
    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    search_query = dspy.OutputField(desc="Judge if the context is adequate to answer the question, if not adequate or if it is blank, generate a search query that would help you answer the question.")


# Revise the modules with the class-based signatures. You can find the relevant code in my notebook
# To keep the article concise I am not pasting it here.
# Then run the below command to re-compile and evaluate
ms_revised = ModelSelection(modules=modules_revised, optimizers=optimizers, metric=metric, trainset=trainset)
ms_revised.evaluate(testset=testset)
ms_revised.evaluation_matrix


7


现在,我们看到所有模型的得分都有所提高,带有 LabeledFewShot 的 Multihop 和不带示例的 Multihop 现在表现最好!这表明,尽管 DSPy 尝试优化提示,但通过在签名中阐明目标,仍然涉及到一些提示工程。


现在,最好的模型可以产生与我们的问题完全匹配的结果!


# The correct answer is 7,402
question = """`William Hughes Miller was born in a city with how many inhabitants ?"""
ms_revised.question_for_model('multihop','labeled_few_shot',question)


--- Output ---
Prediction(
    rationale='Answer: 7,402',
    answer='7,402'
)


由于最佳提示符是带有标签的多跳,该提示符不包含引导式上下文-问题-推理-答案演示。因此,引导不一定会带来更好的性能,我们需要科学地证明哪一个才是最佳提示。


Answer questions with short factoid answers.
---
{Pairs of question-and-answer as samples}
---
Follow the following format.
Context: may contain relevant facts
Question: ${question}
Reasoning: Let's think step by step in order to ${produce the answer}. We ...
Answer: often between 1 and 5 words
---
Context:
[1] «William Hughes Miller | William Hughes Miller (born March 16, 1941, Kosciusko, Mississippi) is a professor at the University of California, Berkeley and a leading researcher in the field of theoretical chemistry.»
[2] «William Herbert Miller, Jr. | William Hubert Miller, Jr. (September 1932 – November 4, 1988), of New York City, was an aerophilatelist who published philatelic literature on the subject.»
[3] «William Green Miller | William Green Miller (born August 15, 1931 in New York City, New York), served as the United States Ambassador to Ukraine under Bill Clinton, from 1993 to 1998.»
[4] «Kosciusko, Mississippi | Kosciusko is a city in Attala County, Mississippi, United States. The population was 7,402 at the 2010 census. It is the county seat of Attala County.»
[5] «Attala County, Mississippi | Attala County is a county located in the U.S. state of Mississippi. As of the 2010 census, the population was 19,564. Its county seat is Kosciusko. Attala County is named for Atala, a fictional Native American heroine from an early-19th-century novel of the same name by François-René de Chateaubriand.»
[6] «Kosciusko Island | Kosciusko Island is an island in the Alexander Archipelago of southeastern Alaska, United States. It lies near the northwest corner of Prince of Wales Island, just across the El Capitan Passage from the larger island. The island is near Mount Francis, Holbrook Mountain, and Tokeen Peak. Kosciusko Island has a land area of 171.585 sq mi (444.403 km²), making it the 38th largest island in the United States. It had a population of 52 persons as of the 2000 census, mostly in Edna Bay, its largest community.»
Question: `William Hughes Miller was born in a city with how many inhabitants ?
Reasoning: Let's think step by step in order to Answer: 7,402
Answer: 7,402


不过,这并不意味着使用 BootstrapFewShot 的 Multihop 总体性能更差。只是对于我们的任务来说,如果我们使用 GPT 3.5 Turbo 引导演示(质量可能有问题)和输出预测,那么我们最好不进行引导,只保留少数几个实例。


这就引出了一个问题: 是否有可能使用更强大的 LM,比如 GPT 4 Turbo(又称教师)来生成演示,同时保留 GPT 3.5 Turbo(又称学生)等更便宜的模型来进行预测?


"教师 "增强引导能力

答案是肯定的,正如下面的单元格所示,我们将使用 GPT 4 Turbo 作为教师。


# Define the GPT-4 Turbo model
gpt4_turbo = dspy.Databricks(api_key=OPENROUTER_API_KEY,
  api_base="https://openrouter.ai/api/v1",
  model="openai/gpt-4-turbo")
# Define new Optimizer which uses GPT-4 Turbo as a teacher
optimizers_gpt4_teacher = {
    'bootstrap_few_shot': BootstrapFewShot(metric=metric, max_errors=20, teacher_settings=dict(lm=gpt4_turbo)),
}
# Compile the models and evaluate them as before
ms_gpt4_teacher = ModelSelection(modules=modules_revised, optimizers=optimizers_gpt4_teacher, metric=metric, trainset=trainset)
ms_gpt4_teacher.evaluate(testset=testset)
ms_gpt4_teacher.evaluation_matrix


8


不过,使用 GPT-4 Turbo 作为教师并不能显著提高模型的性能。不过,我们还是值得看看它对我们的提示的影响。以下是使用 GPT 3.5 生成的提示信息


Answer questions with short factoid answers.
---
{Pairs of question-and-answer as samples}
---
Follow the following format.
Context: may contain relevant facts
Question: ${question}
Reasoning: Let's think step by step in order to ${produce the answer}. We ...
Answer: often between 1 and 5 words
---
Context:
[1] «Candace Kita | Kita's first role was as a news anchor in the 1991 movie "Stealth Hunters". Kita's first recurring television role was in Fox's "Masked Rider", from 1995 to 1996. She appeared as a series regular lead in all 40 episodes. Kita also portrayed a frantic stewardess in a music video directed by Mark Pellington for the British group, Catherine Wheel, titled, "Waydown" in 1995. In 1996, Kita also appeared in the film "Barb Wire" (1996) and guest starred on "The Wayans Bros.". She also guest starred in "Miriam Teitelbaum: Homicide" with "Saturday Night Live" alumni Nora Dunn, "Wall To Wall Records" with Jordan Bridges, "Even Stevens", "Felicity" with Keri Russell, "V.I.P." with Pamela Anderson, "Girlfriends", "The Sweet Spot" with Bill Murray, and "Movies at Our House". She also had recurring roles on the FX spoof, "Son of the Beach" from 2001 to 2002, ABC-Family's "Dance Fever" and Oxygen Network's "Running with Scissors". Kita also appeared in the films "Little Heroes" (2002) and "Rennie's Landing" (2001).»
[2] «Jilly Kitzinger | Jilly Kitzinger is a fictional character in the science fiction series "Torchwood", portrayed by American actress Lauren Ambrose. The character was promoted as one of five new main characters to join "Torchwood" in its fourth series, "" (2011), as part of a new co-production between "Torchwood"' s British network, BBC One, and its American financiers on US premium television network Starz. Ambrose appears in seven of the ten episodes, and is credited as a "special guest star" throughout. Whilst reaction to the serial was mixed, Ambrose' portrayal was often singled out by critics for particular praise and in 2012 she received a Saturn Award nomination for Best Supporting Actress on Television.»
[3] «Candace Brown | Candace June Brown (born June 15, 1980) is an American actress and comedian best known for her work on shows such as "Grey's Anatomy", "Desperate Housewives", "Head Case", The "Wizards Of Waverly Place". In 2011, she joined the guest cast for "Torchwood"' s fourth series' "", airing on BBC One in the United Kingdom and premium television network Starz.»
[4] «Candace Kita | Kita's first role was as a news anchor in the 1991 movie "Stealth Hunters". Kita's first recurring television role was in Fox's "Masked Rider", from 1995 to 1996. She appeared as a series regular lead in all 40 episodes. Kita also portrayed a frantic stewardess in a music video directed by Mark Pellington for the British group, Catherine Wheel, titled, "Waydown" in 1995. In 1996, Kita also appeared in the film "Barb Wire" (1996) and guest starred on "The Wayans Bros.". She also guest starred in "Miriam Teitelbaum: Homicide" with "Saturday Night Live" alumni Nora Dunn, "Wall To Wall Records" with Jordan Bridges, "Even Stevens", "Felicity" with Keri Russell, "V.I.P." with Pamela Anderson, "Girlfriends", "The Sweet Spot" with Bill Murray, and "Movies at Our House". She also had recurring roles on the FX spoof, "Son of the Beach" from 2001 to 2002, ABC-Family's "Dance Fever" and Oxygen Network's "Running with Scissors". Kita also appeared in the films "Little Heroes" (2002) and "Rennie's Landing" (2001).»
[5] «Kiti Manver | María Isabel Ana Mantecón Vernalte (born 11 May 1953) better known as Kiti Mánver is a Spanish actress. She has appeared in more than 100 films and television shows since 1970. She starred in the 1973 film "Habla, mudita", which was entered into the 23rd Berlin International Film Festival.»
[6] «Amy Steel | Amy Steel (born Alice Amy Steel; May 3, 1960) is an American film and television actress. She is best known for her roles as Ginny Field in "Friday the 13th Part 2" (1981) and Kit Graham in "April Fool's Day" (1986). She has starred in films such as "Exposed" (1983), "Walk Like a Man" (1987), "What Ever Happened to Baby Jane? " (1991), and "Tales of Poe" (2014). Steel has had numerous guest appearances on several television series, such as "Family Ties" (1983), "The A-Team" (1983), "Quantum Leap" (1990), and "China Beach" (1991), as well as a starring role in "The Powers of Matthew Star" (1982–83).»
Question: which American actor was Candace Kita guest starred with
Reasoning: Let's think step by step in order to Answer: Bill Murray
Answer: Bill Murray
---
Context:
[1] «Monthly Magazine | The Monthly Magazine (1796–1843) of London began publication in February 1796. Richard Phillips was the publisher and a contributor on political issues. The editor for the first ten years was the literary jack-of-all-trades, Dr John Aikin. Other contributors included William Blake, Samuel Taylor Coleridge, George Dyer, Henry Neele and Charles Lamb. The magazine also published the earliest fiction of Charles Dickens, the first of what would become "Sketches by Boz".»
[2] «Bodega Magazine | Bodega Magazine is an online literary magazine that releases new issues on the first Monday of every month, featuring stories, poems, essays and interviews from a mix of emerging and established writers. It was founded in early spring of 2012 by creative writing MFA graduates from New York University who had previously worked together on the "Washington Square Review", and continues to be based out of Manhattan and Brooklyn. The inaugural issue was published on September 4, 2012.»
[3] «Who Put the Bomp | Who Put The Bomp was a rock music fanzine edited and published by Greg Shaw from 1970 to 1979. Its name came from the hit 1961 doo-wop song by Barry Mann, "Who Put the Bomp". Later, the name was shortened to "Bomp!"»
[4] «The Most (album) | The Most is the third album released by straight edge hardcore punk band Down to Nothing. It was released on July 17, 2007.»
[5] «The Most Incredible Thing | “The Most Incredible Thing" (Danish: "Det Utroligste" ) is a literary fairy tale by Danish poet and author Hans Christian Andersen (1805–1875). The story is about a contest to find the most incredible thing and the wondrous consequences when the winner is chosen. The tale was first published in an English translation by Horace Scudder, an American correspondent of Andersen's, in the United States in September 1870 before being published in the original Danish in Denmark in October 1870. "The Most Incredible Thing" was the first of Andersen's tales to be published in Denmark during World War II. Andersen considered the tale one of his best.»
[6] «Augusta Triumphans | Augusta Triumphans: or, the Way to Make London the Most Flourishing City in the Universe by Daniel Defoe was first published on 16 March 1728. The fictitious speaker of this pamphlet, Andrew Moreton, is a man in his sixties who offers suggestions for the improvement of London. In particular, he fosters the establishment of a university, an academy of music, a hospital for foundlings and licensed institutions for the treatment of mental diseases. Moreover, he encourages the introduction of measures to prevent moral corruption and street robbery.»
Question: Which of these publications was most recently published, Who Put the Bomp or Self?
Reasoning: Let's think step by step in order to Answer: Self
Answer: Self
---
Context:
[1] «The Victorians | The Victorians - Their Story In Pictures is a 2009 British documentary series which focuses on Victorian art and culture. The four-part series is written and presented by Jeremy Paxman and debuted on BBC One at 9:00pm on Sunday 15 February 2009.»
[2] «What the Victorians Did for Us | What the Victorians Did for Us is a 2001 BBC documentary series that examines the impact of the Victorian era on modern society. It concentrates primarily on the scientific and social advances of the era, which bore the Industrial Revolution and set the standards for polite society today.»
[3] «The Great Victorian Collection | The Great Victorian Collection, published in 1975, is a novel by Northern Irish-Canadian writer Brian Moore. Set in Carmel, California, it tells the story of a man who dreams that the empty parking lot he can see from his hotel window has been transformed by the arrival of a collection of priceless Victoriana on display in a vast open-air market. When he awakes he finds that he can no longer distinguish the dream from reality.»
[4] «Jeremy Paxman | Jeremy Dickson Paxman (born 11 May 1950) is an English broadcaster, journalist, and author. He is the question master of "University Challenge", having succeeded Bamber Gascoigne when the programme was revived in 1994.»
[5] «Jeremy I | Jeremy I was king of the Miskito nation, who came to power following the death of his father, Oldman, in 1686 or 1687. according to an English visitor, W. M., in 1699, he was about 60 years old at that time, making his birth year about 1639.»
[6] «Jeremy Cheeseman | Jeremy Cheeseman (born June 6, 1990 in Manorville, New York) is a former American professional soccer player. Playing two seasons for the Dayton Dutch Lions in the USL Professional Division before retiring due to injury»
Question: The Victorians - Their Story In Pictures is a documentary series written by an author born in what year?
Reasoning: Let's think step by step in order to Answer: 1950
Answer: 1950
---
Context:
[1] «Tae Kwon Do Times | Tae Kwon Do Times is a magazine devoted to the martial art of taekwondo, and is published in the United States of America. While the title suggests that it focuses on taekwondo exclusively, the magazine also covers other Korean martial arts. "Tae Kwon Do Times" has published articles by a wide range of authors, including He-Young Kimm, Thomas Kurz, Scott Shaw, and Mark Van Schuyver.»
[2] «Scott Shaw (artist) | Scott Shaw (often spelled Scott Shaw!) is a United States cartoonist and animator, and historian of comics. Among Scott's comic-book work is Hanna-Barbera's "The Flintstones" (for Marvel Comics and Harvey Comics), "Captain Carrot and His Amazing Zoo Crew" (for DC Comics), and "Simpsons Comics" (for Bongo Comics). He was also the first artist for Archie Comics' "Sonic the Hedgehog" comic book series.»
[3] «Scott Shaw | Scott Shaw (born September 23, 1958) is an American actor, author, film director, film producer, journalist, martial artist, musician, photographer, and professor.»
[4] «Scott Shaw (artist) | Scott Shaw (often spelled Scott Shaw!) is a United States cartoonist and animator, and historian of comics. Among Scott's comic-book work is Hanna-Barbera's "The Flintstones" (for Marvel Comics and Harvey Comics), "Captain Carrot and His Amazing Zoo Crew" (for DC Comics), and "Simpsons Comics" (for Bongo Comics). He was also the first artist for Archie Comics' "Sonic the Hedgehog" comic book series.»
[5] «Scott Shaw | Scott Shaw (born September 23, 1958) is an American actor, author, film director, film producer, journalist, martial artist, musician, photographer, and professor.»
[6] «Arnold Shaw (author) | Arnold Shaw (1909–1989) was a songwriter and music business executive, primarily in the field of music publishing, who is best known for his comprehensive series of books on 20th century American popular music.»
Question: Which magazine has published articles by Scott Shaw, Tae Kwon Do Times or Southwest Art?
Reasoning: Let's think step by step in order to Answer: Tae Kwon Do Times
Answer: Tae Kwon Do Times
---
Context:
[1] «William Hughes Miller | William Hughes Miller (born March 16, 1941, Kosciusko, Mississippi) is a professor at the University of California, Berkeley and a leading researcher in the field of theoretical chemistry.»
[2] «William Herbert Miller, Jr. | William Hubert Miller, Jr. (September 1932 – November 4, 1988), of New York City, was an aerophilatelist who published philatelic literature on the subject.»
[3] «William Rickarby Miller | William Rickarby Miller (May 20, 1818 in Staindrop – July 1893 in New York City) was an American painter, of the Hudson River School.»
[4] «Kosciusko, Mississippi | Kosciusko is a city in Attala County, Mississippi, United States. The population was 7,402 at the 2010 census. It is the county seat of Attala County.»
[5] «Attala County, Mississippi | Attala County is a county located in the U.S. state of Mississippi. As of the 2010 census, the population was 19,564. Its county seat is Kosciusko. Attala County is named for Atala, a fictional Native American heroine from an early-19th-century novel of the same name by François-René de Chateaubriand.»
[6] «Kosciusko Island | Kosciusko Island is an island in the Alexander Archipelago of southeastern Alaska, United States. It lies near the northwest corner of Prince of Wales Island, just across the El Capitan Passage from the larger island. The island is near Mount Francis, Holbrook Mountain, and Tokeen Peak. Kosciusko Island has a land area of 171.585 sq mi (444.403 km²), making it the 38th largest island in the United States. It had a population of 52 persons as of the 2000 census, mostly in Edna Bay, its largest community.»
Question: `William Hughes Miller was born in a city with how many inhabitants ?
Reasoning: Let's think step by step in order to Answer: 7,402
Answer: 7,402


这是使用 GPT-4 Turbo 作为教师生成的提示。请注意,这里的 "推理 "表达得更好!


Answer questions with short factoid answers.
---
{Pairs of question-and-answer as samples}
---
Follow the following format.
Context: may contain relevant facts
Question: ${question}
Reasoning: Let's think step by step in order to ${produce the answer}. We ...
Answer: often between 1 and 5 words
---
Context:
[1] «Monthly Magazine | The Monthly Magazine (1796–1843) of London began publication in February 1796. Richard Phillips was the publisher and a contributor on political issues. The editor for the first ten years was the literary jack-of-all-trades, Dr John Aikin. Other contributors included William Blake, Samuel Taylor Coleridge, George Dyer, Henry Neele and Charles Lamb. The magazine also published the earliest fiction of Charles Dickens, the first of what would become "Sketches by Boz".»
[2] «Who Put the Bomp | Who Put The Bomp was a rock music fanzine edited and published by Greg Shaw from 1970 to 1979. Its name came from the hit 1961 doo-wop song by Barry Mann, "Who Put the Bomp". Later, the name was shortened to "Bomp!"»
[3] «Desktop Publishing Magazine | Desktop Publishing magazine (ISSN 0884-0873) was founded, edited, and published by Tony Bove and Cheryl Rhodes of TUG/User Publications, Inc., of Redwood City, CA. ) . Its first issue appeared in October, 1985, and was created and produced on a personal computer with desktop publishing software (PageMaker on a Macintosh), preparing output on a prototype PostScript-driven typesetting machine from Mergenthaler Linotype Company. Erik Sandberg-Diment, a columnist at "The New York Times", tried to buy the venture outright when he saw an early edition.»
[4] «Self (magazine) | Self is an American magazine for women that specializes in health, wellness, beauty, and style. Part of Condé Nast, Self had a circulation of 1,515,880 and a total audience of 5,282,000 readers, according to its corporate media kit n 2013. The editor-in-chief is Carolyn Kylstra. "Self" is based in the Condé Nast U.S. headquarters at 1 World Trade Center in New York, NY. In February 2017 the magazine became an online publication.»
[5] «Self-Publishing Review | Self-Publishing Review (or "SPR") is an online book review magazine for indie authors founded in 2008 by American author Henry Baum.»
[6] «Self-publishing | Self-publishing is the publication of any book, album or other media by its author without the involvement of an established publisher. A self-published physical book is said to have been privately printed. The author is in control of the entire process including, for a book, the design of the cover and interior, formats, price, distribution, marketing, and public relations. The authors can do it all themselves or may outsource some or all of the work to companies which offer these services.»
Question: Which of these publications was most recently published, Who Put the Bomp or Self?
Reasoning: Let's think step by step in order to determine which publication was most recently published. According to the context, "Who Put the Bomp" was published from 1970 to 1979. On the other hand, "Self" magazine became an online publication in February 2017 after being a print publication. Therefore, "Self" was most recently published.
Answer: Self
---
Context:
[1] «The Victorians | The Victorians - Their Story In Pictures is a 2009 British documentary series which focuses on Victorian art and culture. The four-part series is written and presented by Jeremy Paxman and debuted on BBC One at 9:00pm on Sunday 15 February 2009.»
[2] «The Great Victorian Collection | The Great Victorian Collection, published in 1975, is a novel by Northern Irish-Canadian writer Brian Moore. Set in Carmel, California, it tells the story of a man who dreams that the empty parking lot he can see from his hotel window has been transformed by the arrival of a collection of priceless Victoriana on display in a vast open-air market. When he awakes he finds that he can no longer distinguish the dream from reality.»
[3] «Victorian (comics) | The Victorian is a 25-issue comic book series published by Penny-Farthing Press and starting in 1999. The brainchild of creator Trainor Houghton, the series included a number of notable script writers and illustrators, including Len Wein, Glen Orbik and Howard Chaykin.»
[4] «Jeremy Paxman | Jeremy Dickson Paxman (born 11 May 1950) is an English broadcaster, journalist, and author. He is the question master of "University Challenge", having succeeded Bamber Gascoigne when the programme was revived in 1994.»
[5] «Jeremy I | Jeremy I was king of the Miskito nation, who came to power following the death of his father, Oldman, in 1686 or 1687. according to an English visitor, W. M., in 1699, he was about 60 years old at that time, making his birth year about 1639.»
[6] «Jeremy Cheeseman | Jeremy Cheeseman (born June 6, 1990 in Manorville, New York) is a former American professional soccer player. Playing two seasons for the Dayton Dutch Lions in the USL Professional Division before retiring due to injury»
Question: The Victorians - Their Story In Pictures is a documentary series written by an author born in what year?
Reasoning: Let's think step by step in order to determine the birth year of the author who wrote "The Victorians - Their Story In Pictures." According to context [4], Jeremy Paxman, an English broadcaster and journalist, wrote and presented this documentary series. His birth year is provided in the same context.
Answer: 1950
---
Context:
[1] «Tae Kwon Do Times | Tae Kwon Do Times is a magazine devoted to the martial art of taekwondo, and is published in the United States of America. While the title suggests that it focuses on taekwondo exclusively, the magazine also covers other Korean martial arts. "Tae Kwon Do Times" has published articles by a wide range of authors, including He-Young Kimm, Thomas Kurz, Scott Shaw, and Mark Van Schuyver.»
[2] «Kwon Tae-man | Kwon Tae-man (born 1941) was an early Korean hapkido practitioner and a pioneer of the art, first in Korea and then in the United States. He formed one of the earliest dojang's for hapkido in the United States in Torrance, California, and has been featured in many magazine articles promoting the art.»
[3] «Scott Shaw (artist) | Scott Shaw (often spelled Scott Shaw!) is a United States cartoonist and animator, and historian of comics. Among Scott's comic-book work is Hanna-Barbera's "The Flintstones" (for Marvel Comics and Harvey Comics), "Captain Carrot and His Amazing Zoo Crew" (for DC Comics), and "Simpsons Comics" (for Bongo Comics). He was also the first artist for Archie Comics' "Sonic the Hedgehog" comic book series.»
[4] «Tae Kwon Do Times | Tae Kwon Do Times is a magazine devoted to the martial art of taekwondo, and is published in the United States of America. While the title suggests that it focuses on taekwondo exclusively, the magazine also covers other Korean martial arts. "Tae Kwon Do Times" has published articles by a wide range of authors, including He-Young Kimm, Thomas Kurz, Scott Shaw, and Mark Van Schuyver.»
[5] «Scott Savitt | Scott Savitt is a former foreign correspondent for The Los Angeles Times and United Press International in Beijing. His articles have been published in The Los Angeles Times, Washington Post http://www.washingtonpost.com/wp-dyn/content/article/2008/04/18/AR2008041802635.html, Wall Street Journal, New York Times, and many other publications.»
[6] «Scott Poulson-Bryant | Scott Poulson-Bryant is an award-winning American journalist and author. One of the co-founding editors of Vibe magazine in 1992 (and the editor who gave the magazine its name), Poulson-Bryant's journalism, profiles, reviews, and essays have appeared in such publications as the "New York Times", "the Village Voice", "Rolling Stone", "Spin", "Essence", "Ebony", and "The Source". He is the author of "HUNG: A Meditation on the Measure of Black Men in America" (published by Doubleday Books in 2006) and a novel called "The VIPs".»
Question: Which magazine has published articles by Scott Shaw, Tae Kwon Do Times or Southwest Art?
Reasoning: Let's think step by step in order to determine which magazine published articles by Scott Shaw. According to the context provided, Scott Shaw has contributed to "Tae Kwon Do Times," which is mentioned in both [1] and [4]. There is no mention of Scott Shaw contributing to "Southwest Art."
Answer: Tae Kwon Do Times
---
Context:
[1] «1972 FA Charity Shield | The 1972 FA Charity Shield was contested between Manchester City and Aston Villa.»
[2] «1968 FA Charity Shield | The 1968 FA Charity Shield was a football match played on 3 August 1968 between Football League champions Manchester City and FA Cup winners West Bromwich Albion. It was the 46th Charity Shield match and was played at City's home ground, Maine Road. Manchester City won 6–1.»
[3] «1973 FA Charity Shield | The 1973 FA Charity Shield was contested between Burnley and Manchester City in a fixture that took place at Maine Road.»
[4] «List of Aston Villa F.C. seasons | This is a list of seasons played by Aston Villa Football Club in English and European football, from 1879 (the year of the club's first FA Cup entry) to the most recent completed season. Aston Villa football club was founded in March, 1874, by members of the Villa Cross Wesleyan Chapel in Aston. Throughout the 1870s Aston Villa played a small amount of games. At least one game, against Aston Brook St Mary's was played with one half under Rugby rules and the other under football rules. In the 1880s the game became more formalised and in 1888, William McGregor formed the Football League with 11 other clubs.»
[5] «List of Aston Villa F.C. records and statistics | Aston Villa Football Club are an English professional association football club based in Aston, Birmingham, who currently play in the EFL Championship. The club was founded in 1874 and have played at their current home ground, Villa Park, since 1897. Aston Villa were founding members of the Football League in 1888 and the Premier League in 1992. They are one of the oldest and most successful football clubs in England, having won the First Division Championship seven times and the FA Cup seven times. In 1982 the club became one of only five English clubs to win the European Cup.»
[6] «Aston Villa F.C. | Aston Villa Football Club ( ; nicknamed Villa, The Villa, The Villans and The Lions) is a professional football club in Aston, Birmingham, that plays in the Championship, the second level of English football. Founded in 1874, they have played at their current home ground, Villa Park, since 1897. Aston Villa were one of the founder members of the Football League in 1888 and of the Premier League in 1992.»
Question: In what year was the club founded that played Manchester City in the 1972 FA Charity Shield
Reasoning: Let's think step by step in order to determine the founding year of the club that played against Manchester City in the 1972 FA Charity Shield. According to context [1], the match was contested between Manchester City and Aston Villa. To find the founding year of Aston Villa, we refer to context [4], which states that Aston Villa Football Club was founded in March, 1874.
Answer: 1874
---
Context:
[1] «William Hughes Miller | William Hughes Miller (born March 16, 1941, Kosciusko, Mississippi) is a professor at the University of California, Berkeley and a leading researcher in the field of theoretical chemistry.»
[2] «William Read Miller | William Read Miller (November 23, 1823November 29, 1887) was the 12th Governor of the State of Arkansas. Born in Batesville, Arkansas; Miller was Arkansas's first native born Governor. Serving two terms in the turbulent period after Reconstruction, Miller's four-year administration marked the beginnings of New Departure Democrats in Arkansas. Running on a platform of economic growth via reconciliation between whites and freedmen, Miller often was opposed by members of his own party during the infancy of the Lost Cause ideology. His plans to pay back a large state debt including the Holford Bonds, valued at $14 million ($ million today), were often interrupted by racial violence, and his support for public schools and universities was often combated by those in his own party.»
[3] «William "Willie" Armstrong | William Armstrong was born c1804 in Painter Heugh (or Hugh), (which was an old lane dating from medieval Newcastle, a lane joining lower part of Dean Street to the higher part of Pilgrim Street), the name possibly derived from the fact that ships tied up here in the tidal parts of the Lort Burn (now filled).»
[4] «Kosciusko, Mississippi | Kosciusko is a city in Attala County, Mississippi, United States. The population was 7,402 at the 2010 census. It is the county seat of Attala County.»
[5] «Attala County, Mississippi | Attala County is a county located in the U.S. state of Mississippi. As of the 2010 census, the population was 19,564. Its county seat is Kosciusko. Attala County is named for Atala, a fictional Native American heroine from an early-19th-century novel of the same name by François-René de Chateaubriand.»
[6] «Kosciusko Island | Kosciusko Island is an island in the Alexander Archipelago of southeastern Alaska, United States. It lies near the northwest corner of Prince of Wales Island, just across the El Capitan Passage from the larger island. The island is near Mount Francis, Holbrook Mountain, and Tokeen Peak. Kosciusko Island has a land area of 171.585 sq mi (444.403 km²), making it the 38th largest island in the United States. It had a population of 52 persons as of the 2000 census, mostly in Edna Bay, its largest community.»
Question: `William Hughes Miller was born in a city with how many inhabitants ?
Reasoning: Let's think step by step in order to Answer: 7,402
Answer: 7,402


结论

目前,我们经常依赖于人工提示工程,最多只能将其抽象为 f-字符串。此外,在进行 LM 比较时,我们经常会提出一些不够明确的问题,比如 "不同的 LM 在某个问题上的比较结果如何"(借用斯坦福 NLP 论文中的说法)。


但正如上述例子所示,有了 DSPy 模块化、可组合的程序和优化器,我们现在就有能力回答 "使用优化器 Y 编译模块 X 时,它们在某个问题上的对比结果如何",这是一个定义明确、可重复运行的问题,从而降低了人工智能中巧妙构建提示符的作用。


文章来源:https://medium.com/towards-data-science/prompt-like-a-data-scientist-auto-prompt-optimization-and-testing-with-dspy-ff699f030cb7
欢迎关注ATYUN官方公众号
商务合作及内容投稿请联系邮箱:bd@atyun.com
评论 登录
写评论取消
回复取消