LibGuides: AI at NETC: Ensembling

Universal Self-Consistency

a prompting technique used to refine and improve the accuracy of answers generated by a Large Language Model (LLM). It compiles multiple responses the model has previously given and then prompts the model to choose the best answer from among them.

USC builds on the concept of self-consistency, which uses multiple reasoning paths to find the most common response as a way to improve prediction confidence. Unlike standard self-consistency, which requires exact answers (like numbers) to tally votes, USC extends this approach to free-form responses by having the LLM select the most internally consistent answer from multiple generated outputs.

To use Universal Self-Consistency, follow these steps:

Generate Multiple Responses with CoT: Begin by prompting the LLM several times on the same question. Record each unique answer.
Select Consistent Answer: Compile all responses into a new prompt, asking the LLM to select the most accurate or reasonable answer.

Mixture of Reasoning Experts (MoRE)

A technique designed to improve the generalization of Large Language Models (LLMs) across different question types in question answering (QA). While LLMs have shown impressive performance, they often struggle when handling questions that require different reasoning skills—such as factual, multihop, mathematical, or commonsense reasoning. MoRE aims to address this challenge by using specialized language models, each trained for a specific reasoning type.

MoRE also introduces a novel approach to selective QA, where the system decides when to abstain from answering if the confidence in its prediction is low. This ensures the system answers accurately when possible and avoids incorrect answers.

To use MoRE, follow this process:

Input Question: You ask a question.
Expert Predictions: Each specialized model generates an answer based on its reasoning expertise.
Answer Selection: The answer selector chooses the most reliable answer, based on agreement among experts and prediction confidence.
Selective Answering: If no reliable answer is found, the system abstains.

Max Mutual Information (MMI) Method

MMI is a way to choose the optimal prompt template for your task by using the mutual information score between the template and the output of the model as a metric, and finding whichever template from your list of templates maximizes that metric.

Mutual information (MI) is a concept from information theory that quantifies how much information two variables share. In this case, it measures how much a given prompt reveals about the model's output. The intuition is that a prompt with high MI is more likely to produce accurate responses, even if we don’t know the "right" answer ahead of time.

To use MMI, follow these steps:

Generate Templates: Generate a set of prompt templates for your task. This can be done manually or by generating them with an LLM.
Run Sample Inputs: The model runs a few sample inputs to verify that the prompts generate reasonable outputs.
Calculate Mutual Information Scores: Plug in each template into the mutual information algorithm to get a score for each template.
Choose the Template With Highest Score: Choose whichever template got the highest mutual information score for your prompt.