HMAW: Hierarchical Multi-Agent Workflow for Prompt Optimization

Australian National University, Cisco
ICLR 2025 Workshop on Reasoning and Planning for LLMs
MY ALT TEXT

Examples comparing the generalization ability of existing methods and the proposed one. (a) COT uses a handcrafted prompt, which might not be suitable for all tasks. (b) APE fine-tunes the prompt on a specific dataset, and its generalization capability to other scenarios is questionable. (c) ExperPrompting includes few-shot examples in the system prompt to help an LLM convert the user query to a format more suitable for LLM, but these examples might not be able to cover all scenarios. (d) Our method adopts a hierarchical design in reformatting the user query. Free from pre-defined few-shot examples, the interaction between the LLM hierarchy allows for more generalizable yet more adaptive tuning of the prompt.

Abstract

Large language models (LLMs) have shown great progress in responding to user questions, allowing for a multitude of diverse applications. Yet, the quality of LLM outputs heavily depends on the prompt design, where a good prompt might enable the LLM to answer a very challenging question correctly. Therefore, recent works developed many strategies for improving the prompt, including both manual crafting and in-domain optimization. However, their efficacy in unrestricted scenarios remains questionable, as the former depends on human design for specific questions and the latter usually generalizes poorly to unseen scenarios. To address these problems, we give LLMs the freedom to design the best prompts according to themselves. Specifically, we include a hierarchy of LLMs, first constructing a prompt with precise instructions and accurate wording in a hierarchical manner, and then using this prompt to generate the final answer to the user query. We term this pipeline Hierarchical Multi-Agent Workflow, or HMAW. In contrast with prior works, HMAW imposes no human restriction and requires no training, and is completely task-agnostic while capable of adjusting to the nuances of the underlying task. Through both quantitative and qualitative experiments across multiple benchmarks, we verify that despite its simplicity, the proposed approach can create detailed and suitable prompts, further boosting the performance of current LLMs.

Method Overview

MY ALT TEXT

We propose modeling the prompt optimization problem as a zero-shot output within a multi-agent workflow. The initial query, qi, is first inputted into the first layer of our framework (the COE layer). Before being processed by the CEO LLM agent, qi is transformed into an LLM prompt pic by the prompter fc, which also concatenates it with the context Cc in the CEO layer. The output of the first layer, qic, serves as the query from the CEO layer to the Manager layer.

Similarly, the Manager Layer and the Worker Layer each include their own prompters, fm and fw, respectively. Besides concatenating the content of this layer, the initial query qi is also concatenated to enhance stability. The input for the Worker LLM is our optimized prompt Pi*, which directly triggers the LLM agent to generate the final response to the original query qi.

Results

MY ALT TEXT

An example of prompt optimization using HMAW on the Education dataset.

MY ALT TEXT

A case study of HMAW on the CodeNet Dataset.

MY ALT TEXT

A case study of HMAW on the GSM8K Dataset. Colored texts indicate content coherence.

BibTeX

@misc{liu2024hierarchical,
        title={Towards Hierarchical Multi-Agent Workflows for Zero-Shot Prompt Optimization}, 
        author={Yuchi Liu and Jaskirat Singh and Gaowen Liu and Ali Payani and Liang Zheng},
        year={2024},
        eprint={2405.20252},
        archivePrefix={arXiv},
        primaryClass={cs.CL}
  }