Wise AI support for government decision-making
It looks like you're writing a treaty. Can I help you with that?
Michaelah Gertz-Billingsley and I wrote this quickly as a submission to AI Impacts’ essay competition on automating wisdom and philosophy. Future posts here will be more conversational!
Governments have long been early adopters, or even creators, of sense-making tools — from the census to the computer to the internet. Today, they use complex statistical or mathematical models to make decisions about commensurately complex issues: healthcare, finance, infrastructure, military procurement. Present-day AI systems are growing increasingly helpful for processing information and writing code, and their creators aspire to build powerful assistants with general intelligence.
Atypically, governments are behind the curve on adopting advanced AI, in part due to concerns about its reliability. Suppose that developers gradually solve the issues that make AIs poor advisors — issues like confabulation, systemic bias, misrepresentation of their own thought processes, and limited general reasoning skills. What would make smart AI systems a good or poor fit as government advisors?

Much like smart humans, smart AIs might do a great job at answering the wrong questions. They might focus on legible or short-term benefits over more important factors that are harder to analyze, or take their users’ implicit assumptions for granted.
We recommend the development of wise decision support systems that incorporate the following features:
Helping people figure out their principles and make decisions based on them.
Helping people extend, explore, and structure their options under consideration (not just helping them choose from within the limited option space they initially consider).
Identifying and questioning implicit assumptions; exploring different frames in which to consider the problem.
Far-sightedness — incorporating longer-term and higher-order consequences of a decision (e.g. effects on relationships and long-term collaboration; precedent-setting; long-term budgeting and risk).
Aggregating views and preferences across disparate groups and stakeholders.
How can we ensure government access to wise AI decision support?
1. By default, the private interests leading AI today may focus on smart AI decision support over wisdom, as it is easier to train and test tools for short-term decisionmaking.
Many of their private customers may (rightly or wrongly) see short-term benefits as sufficient. This might be based on the incentive to seek short-term profits at the expense of the long term. But companies might also often have specific decisions that are not that subtle and don’t particularly need wisdom — e.g. “how much of X widget should we make this year”. Unlike governments, they may be able to function well entirely by making such decisions well. By contrast, we would argue that functional governments inherently need to consider some subtle, complex, long-term issues.
Governments could encourage the development of wise AI, much as they have done for technologies like vaccines and autonomous vehicles.
Organizations like DARPA could fund prizes for more testable aspects of wisdom.
New mechanisms such as advance market commitments could provide market incentives for forward-looking product development.
2. Key decision makers may not know to ask for features that will lead to the wisest decisions. Government institutions have typically been slow to adopt AI, and impactful policymakers will rarely be AI experts. Policies are often made on the basis of empirical evidence and expert views, which may not favor wisdom-serving systems if they are a new technology with relatively illegible benefits.
We propose that government-serving think tanks and consultancies start developing AI for wisdom-serving decision processes now. Such efforts might start small with today’s limited models, but could scale up in automation and sophistication as AI decision support capabilities improve. A mature field of wise, AI-powered decision support would let governments benefit from these tools without needing to judge them directly. Instead, they could rely on the judgment and track record of organizations they already have relationships with. Multiple organizations building these tools could compete and learn from each other, ensuring that wise decision support does not lag too far behind its merely smart alternatives.
Adopting AI for wise decision support: a playbook
Start by using AI tools to augment traditional decision-making and analysis processes. This allows the organization to a) make informed decisions from the start b) not rely on AI before it’s ready to automate everything, c) generate training data to improve their AIs, d) get experience with AI’s strengths and weaknesses.
Scale up the use of AI over time — relying on it for more tasks as it becomes more capable.
Clarify the value-add provided by wise systems over standard AI products. Identify and publicize incidents where AI systems that went awry could have gone well by incorporating “wisdom-serving” design principles
Concrete example: using near-future LLMs to support a Delphi process
The Delphi method involves the following steps: session runners want to understand a topic better — often a forecast of the future or a recommendation for policymakers. They come up with a set of quantitative questions, ask a field of experts for responses. The experts share their qualitative views as well; the facilitators guide discussion or transmit information, and the participants revise their estimates. (The process can be repeated if disagreements remain, or the organizers want further clarity on key questions.)
In our (limited) experience, there are many frictions to this process, particularly in generating clear language and context for the questions, understanding where experts’ disagreements lie, and making efficient use of a group of experts’ time while keeping all of them updated on what the others believe. Experts often often have valuable qualitative points to share that reframe the analysis, but which are more difficult to identify and convey than quantitative beliefs.
Roughly current-level AIs could already help with this process at many points:
Helping organizers come up with great questions & phrase them well
Expressing organizers’ views. Organizers could provide an LLM with access to a long document describing their background thinking in detail. Participants could query it and get on the same page without slowing down the process. This could be faster and better than providing the same background reading document for everyone, since participants will have different questions or concerns about the framing of the exercise. (Discussing such framing concerns often takes up a large portion of the allotted time.)
Transcribing the discussion and summarizing it, while identifying key points of disagreement.
Helping organizers rewrite questions on the fly in response to input.
Allowing organizers to ask qualitative questions, since LLMs can auto-summarize responses and aggregate them — e.g. “17 out of 24 respondents mentioned that they thought this level of AI capability was infeasible, so they found it difficult to concretely imagine what its impacts would be.”
How can wise decision-support processes scale up over time to take advantage of accumulated data & increasing AI capabilities?
Formal AI training
What kinds of data are available?
For many forms of wisdom, user feedback can provide useful information about the model’s quality (and can provide small amounts of fine-tuning training data). For example: users may recognize better option sets or useful decision-boundary diagrams when they see them; they may understand and appreciate advice processes that put them in touch with their values, etc.
This may break down for far-sightedness — people may not truly appreciate far-sighted decision support until the long term.
OTOH: maybe people would notice — “yes this serves my long-term interests, good job identifying relevant factors ABC”
The amount of data generated is relatively sparse, so conventional fine-tuning likely wouldn’t work.
But: AIs are already okay at multi-shot learning (learning from information provided to them in a prompt).
So sufficiently powerful AIs might be able to learn useful patterns from summaries or transcripts of earlier decision-support processes, if humans flagged what helped them reach a better decision along various dimensions (far-sightedness, etc.).
The evolving role of AI
Current AIs would typically take on an ancillary role. In the AI-assisted Delphi process we sketched out above, most of the “wisdom” is coming from human participants and the structure of the process. However, this dynamic could change over time as AI capabilities increase and deployers have more data and experience in how to use it.
In a Delphi process, more capable AI could be actively involved in every step:
Human organizers could spend more time in consultation with AIs about how to write and frame the questions, including along axes like the long-term implications of particular decisions. More capable AIs could be instructed to look for unquestioned assumptions or blindspots in human organizers’ thinking. They could even forecast which questions would most likely lead to disagreements between participants, which would be most productive, which would unearth key values questions, and so on. As a result, the process could end up revolving around questions that the organizers initially hadn’t considered at all.
Likewise, participants themselves could be in close feedback loops with their own AI advisors, who could help identify their blindspots, and proactively take actions such as reaching out to other participants’ AIs to seek clarity on a potential misunderstanding. As a result, the process may more closely resemble a series of facilitated dialogues between participants than an inefficient large-group conversation.
Over time, different processes could be developed around AI capabilities. For example, the surveys themselves could be written and explained by AIs fine-tuned on relevant documents. Most of the expert dialogue could take place between AI systems instructed to represent a particular viewpoint, which only occasionally check in with human counterparts for clarification. The output could even be a queryable “living document” — a synthesized AI advisor system designed to reflect lessons from the conversation and from documents and schools of thought recommended by the participants.
Conclusions
In principle, capable AI advisor systems could massively improve society. They could compound the earlier digital revolutions’ effects of making information available and useful. But while the computer and the internet undoubtedly enriched society, they have had their downsides. Applications like social media can harm us in subtle long-term ways by offering what we seem to want in the short term, while ubiquitous internet use makes us vulnerable to cyberattacks and privacy breaches.
Similarly, the high modernist ambitions of the 1800s and 1900s were founded on the hope that top-down analysis and optimization could straightforwardly improve society. Such approaches served well for building bridges and dams, bringing millions out of poverty. But we remember them, too, for their failures when applied to systems that were unexpectedly complex, subtle, or ethically fraught.
Building beneficial AI advisor systems requires taking a lesson from history: knowledge and power must be directed wisely, with thought to long-term consequences and an appreciation for unknown unknowns. Much like previous industrial revolutions, smart AIs could present a new opportunity to reshape the world for both better and worse. We should treat that opportunity with the weight it deserves.
This post, like the rest of Expected Surprise, does not reflect the institutional views of my employer or other affiliated organizations.