Glossary | pkmn.ai

This glossary contains specific definitions for terms that often come up in discussion of competitive Pokémon artificial intelligence. This list is not comprehensive — terms which are less frequently used / more basic / generally well-defined elsewhere are not included. If a term is missing from this page it may be covered by one of the following glossaries:

Pokémon

Artificial Intelligence

abstraction

A method for solving incomplete information games that are too large to be solved in their original form. The original game is abstracted to create a smaller game that’s strategically similar, such that the new abstract game can be solved and the solution can be mapped back to the original to serve as an approximation to the true solution to the original game. In Pokémon, techniques such as bucketing can be used to reduce the state space or the original game may be mapped to a variant which is more tractable.

action

A feasible operation on the state — i.e., a move a player can make at a stage in the game. move is ambiguous in Pokémon, as Pokémon have moves and thus some of the game theoretical moves of a player in a Pokémon battle include the action of literally using one of their Pokémon’s moves. Pokémon Showdown and pkmn instead use the terms choice and choices instead of action and actions, and PokeSim refers to this as a decision - all of these should be interchangeable, though using move to refer to an action / choice / decision should be avoided. Both player’s choices considered together are called the joint action, and applying the joint action to the battle state causes a transition.

activation function

A mathematical function applied to the output of a neuron within a neural network, introducing the non-linearity crucial for enabling the network to learn complex patterns. Various activation functions are commonly used:

sigmoid, also know as the logistic function, an introductory activation unit commonly used in logistic regression that maps values between 0 and 1 though suffers from vanishing gradient issues.
$tanh$ maps values between -1 and 1 and is similar to the sigmoid but with better gradient behavior.
ReLU, or Rectified Linear Units, outputs 0 for negative inputs and the input itself for positive values are popular for resistances of the vanishing gradient problem and fast computation. Many variants exist, such as the Leaky ReLU that avoids the “dying ReLU” problem by having a small non-zero slope for negative inputs
Softmax maps its output to values between 0 and 1 but in such a way that the total sum is 1, making it a probability distribution

agent

An intelligent agent (IA) is an entity that autonomously acts upon an environment to achieve its goals. In computer chess, the individual agents are usually referred to as engines, but in competitive Pokémon artificial intelligence circles agent or bot is preferred as engine is used to refer to the code responsible for the simulation of core Pokémon battle mechanics.

alpha-beta

Alpha-beta pruning is a search algorithm that seeks to decrease the number of nodes that are evaluated by the minimax algorithm in its search tree. When applied to a standard minimax tree, it returns the same move as minimax would, but prunes away branches that can’t possibly influence the final decision. Alpha-beta has been shown to also work in simultaneous move games in (Bošanský 2013) and (Saffidine 2021), where it typically gets abbreviated as SMAB. Failure types are an important concept in alpha-beta search.

AlphaGo

An influential artificial intelligence agent that achieved superhuman Go performance using a combination of Monte Carlo tree search and neural networks (Silver 2016). Along with AlphaZero (Silver 2017) and MuZero (Silver 2018), the AlphaGo architecture serves as the modern foundation for many perfect information game-playing agents (e.g., Lc0).

anytime

An algorithm that can return a valid solution to a problem even if it’s interrupted before it ends.

aspiration windows

A technique for reducing the search space in alpha-beta search by using a window around a guess of the expected value as the alpha-beta bounds. The guess is usually derived from the last iteration in iterative deepening and gradual widening of the window may be used during re-searches in the event of failure.

backward induction

Backward induction is a decision-making process that involves starting at the end of the game and working backward to determine the best course of action. It works by identifying the optimal choice at the final decision point (assuming rational actors), and then using that knowledge to determine the optimal choice at the second-to-last decision point, and so on until the beginning is reached.

bandit

The multi-armed bandit problem is a classic reinforcement learning problem that codifies the exploration–exploitation tradeoff dilemma. In the problem, a fixed limited set of resources must be allocated between competing (alternative) choices in a way that maximizes their expected gain, when each choice’s properties are only partially known at the time of allocation, and may become better understood as time passes or by allocating resources to the choice. In terms of competitive Pokémon AIs, bandit problems arise when determining which parts of the search tree to explore, and bandit usually refers to the particular type of algorithm being used to solve the bandit problem (e.g., UCB or Exp3).

Bayes' rule

The mathematical rule that describes how to update a belief, given some evidence. Fundamental to Bayesian inference used by Pokémon AI in combination with usage statistics to reason about what state they may be in.

Bayeselo

Bayesian Elo Rating, a commonly used tool for estimating the Elo rating based on matches between computerized agents.

belief

A player’s probabilistic assessment of the current state of the game, particularly when they have imperfect information, often represented as probability distributions over possible hidden states. Players update their beliefs based on available observations and the actions of others, and these beliefs crucially influence their decision-making process, as they attempt to choose actions that maximize their expected payoff given their understanding of the game’s true state.

bigram

A sequence of two adjacent symbols in particular order. The concept can be generalized to an arbitrary number of symbols (n-gram), though sometimes the term bigram is (incorrectly) used for the general case as well. While bigrams can crop up in many different places within competitive Pokémon artificial intelligence, when referenced in an unqualified manner they usually refer to joint probabilities processed from usage statistics such as $P(Move \mid Species \land Move)$ . These can also sometimes be called correlations.

bimatrix game

A simultaneous game for two players in which each player has a finite set of actions. Such a game can be fully described by two matrices, each describing the payoffs of one of the players. In an adversarial game, the two matrices will sum to a constant matrix.

binding

Either refers to a language binding, the application programming interface that provides glue code specifically made to allow a programming language to use a foreign library or operating system service, or a specific class of moves in Pokémon known as binding moves which prevent their target from switching and causes damage for multiple turns (also known as wrapping or partial-trapping moves).

BoN

Abbreviation for Best-of- $N$ , where $N$ refers to the number of games used to decide the winner of a set of battles between two players (usually 1 or 3, e.g., BO3).

bot

Slang term for agent derived from the word robot. Alternatively, the term may be used to characterize the parts of the client code responsible for connecting and communicating with the server and distinguishing it from the core search / knowledge components of the agent.

branching factor

The number of children at each node in a game tree; the outdegree.

bucketing

Also known as data binning, a generalization of rounding data where the original data that falls within a small interval (known as a bin or bucket) gets replaced by a value representative of that interval. This is a form of abstraction, the canonical examples of bucketing in competitive Pokémon AI research are reducing the range of possible HP values or shrinking the number of damage rolls.

cartridge

The game of Pokémon as it exists on the original Nintendo console hardware (due to the earliest generations of Pokémon being played on Game Boy ROM cartridges).

CFR

Counterfactual Regret Minimization is an iterative algorithm designed to find approximate Nash equilibria in extensive-form games with imperfect information. It works by calculating the counterfactual regret at each decision point, which measures how much a player would regret not having taken a different action, given the information available at that point. CFR then updates its strategy over multiple iterations, minimizing these regrets over time. This process leads to strategies that converge towards a Nash equilibrium, where no player can benefit by unilaterally changing their strategy. Variants include:

CFR+ which aims to speed up convergence by preventing negative regrets (Brown 2017)
MCCFR which samples a subset of actions or trajectories through the game tree instead of updating all actions (Lanctot 2009)

chance

Elements within a game that are governed by random events outside the direct control of the players. The Chance player is a conceptual player used to model these elements of uncertainty within the game and allows for the analysis of games where the outcomes depend not only on the choices of the players but also on elements of randomness. The pkmn engine exposes APIs for inspecting or influencing chance when built with certain compile-time flags (-Dchance and -Dcalc, respectively).

client

The code that communicates with the server to enable a player to play the game. In standard Pokémon formats, the client only has a partial representation of one perspective of the actual battle state stored on the server. Clients connect to the server and exchange public information with the server via a protocol.

conmeta

A constructed metagame (technically format) where the artificially created rules have been arbitrarily tailored to simplify things for artificial intelligence agents (e.g., the Shanai Cup).

consolidation

Action consolidation is a technique for combining the results of actions with identical outcomes. Consolidation is similar to transpositions in that they both involve symmetry, with the difference being that consolidation allows for skipping similar work by leveraging past results whereas transposition tables allow for skipping work that has already been done. Damage coalition is a common form of consolidation in competitive Pokémon AI.

contempt

The contempt factor reflects the estimated superiority / inferiority of the program over its opponent. Programs incorporating contempt may choose to follow different strategies against opponents of varying perceived skill level.

CSP

Constraint satisfaction problems are mathematical problems where the goal is to find a set of values for variables that satisfy a specific set of constraints or restrictions. CSP solvers are algorithms or techniques designed to find solutions to these problems, often using methods like search, inference, and specialized heuristics to explore the space of possible variable assignments and identify the combinations that fulfill all the constraints. OR-Tools is a best-in-class open-source CSP solver.

cutoff

The point at which search determines a particular branch of the game tree cannot possibly influence the final outcome and thus doesn’t need to be explored further. In negamax implementations of alpha-beta search, all cutoffs are beta-cutoffs — fail-high situations where the score backed up by the search is greater or equal to the upper bound β. Move ordering helps to arrive at cutoffs as early as possible, minimizing the time wasted searching unimportant nodes.

damage calculator

A program that determines the amount of HP a Pokémon will lose as a result of an opponent’s damaging attack. The damage calculator or calc usually takes the form of a stripped-down engine that focuses entirely on the damage formula.

damage coalition

A common form of action consolidation where damage rolls which produce the same results get combined. Damage rolls often result in the same outcome due to rounding or if the damage gets capped due to causing a target to faint or breaking a Substitute.

depth

The depth of a node is the number of edges from the node to the tree’s root node. A root node will have a depth of 0.

determinization

A simplification technique where either a stochastic game is converted into a deterministic one or where various perfect information states get sampled from the set of possible imperfect information states so that methods for solving perfect information games can be applied to games with imperfect information. The latter definition is more common, but unfortunately this technique has several weaknesses: the computational budget must be shared between the sampled perfect information games and various aspects of the search are often duplicated; strategy fusion, which occurs due to playing different strategies in different worlds when it has to find a unique strategy for all the worlds; and non-locality, which arises is from choosing locally optimal moves that are globally inferior.

dimensionality

In the context of data analysis and machine learning, dimensionality refers to the number of features (variables or attributes) that describe a data point whereas the cardinality refers to the number of values each feature can take on. High dimensionality and/or cardinality can lead to problems such as combinatorial explosion and the curse of dimensionality which renders certain approaches intractable, necessitating techniques such as dimensionality reduction, a form of simplification.

drafting

A component of certain Pokémon variants where players take turns choosing which Pokémon species will form their pool of available choices during a later team-building component.

driver

Code which operates / controls the engine, perhaps to provide additional features or to allow for integration within a broader application.

EBC

Endless Battle Clause, a rule that exists in Smogon University’s formats to guarantee that all battles end in 1000 turns or fewer, while also seeking to curtail certain specific degenerate strategies.

Extreme equilibrium, a Nash equilibrium where both players’ strategies can’t be described as convex combinations (weighted averages where the weights are non-negative and add up to 1) of other mixed strategies that form equilibria.

EEE

Enumeration of extreme equilibria, a linear programming algorithm for determining all of the possible Nash equilibrium solutions in a bimatrix game (Avis 2010).

EFG

An extensive-form game is a specification of a game allowing for the explicit representation of a number of key aspects, like the sequencing of players’ possible moves, their choices at every decision point, the (possibly imperfect) information each player has about the other player’s moves when they make a decision, and their payoffs for all possible game outcomes. Extensive-form games also allow for the representation of incomplete information in the form of chance events modeled as moves by nature. Alternatives to the extensive-form game representation include the normal-form representation that simply boils the game down to a payoff matrix, or the FOSG representation.

eligibility trace

An eligibility trace is a short-term memory mechanism in reinforcement learning that temporarily marks recently visited states or state-action pairs. When a reward is received, the eligibility trace determines which of these past experiences are eligible for learning updates, with recently visited states or actions having a stronger influence. This helps solve the problem of delayed rewards by bridging the gap between actions and their subsequent outcomes, leading to faster and more efficient learning.

Elo

The Elo rating system, a method for calculating the relative skill levels of players in zero-sum games. Notably, not “ELO”, as the Elo system is named after its creator, Arpad Elo.

embedding

A relatively low-dimensional space into which high-dimensional vectors can be translated, ideally capturing some of the semantics of the input by placing semantically similar inputs close together in the embedding space. Usually learned and then used as input to a neural network, examples of embeddings in Pokémon include poke2vec and pokemb.

emulator

Code that’s designed to mimic the outwardly observable behavior of both the hardware and software features of a real device, to allow for playing a Pokémon cartridge on arbitrary platforms.

endgame

Also known as the lategame, the final stage of the game, after the midgame. In competitive Pokémon, this stage of the game is far more tractable to solve given enough information has usually been revealed such that the game effectively turns into one of perfect information, and certain common scenarios may have been exhaustively solved and encoded in a precomputed endgame tablebase (EGTB).

engine

The core code responsible for implementing Pokémon battle mechanics. While multiple engines exist, without any other context the term engine may also refer specifically to the pkmn engine.

EPOké

A pkmn client library that parses protocol and builds up a representation of the perceived battle state (information set) using inference.

Exp3

Exponential-weight algorithm for Exploration and Exploitation, a bandit algorithm commonly used in tree bandit search. Exp3 provides weaker theoretical guarantees with respect to convergence compared to UCB though it is effective in practice and is well-suited to non-stationary environments where the best choice might change over time.

expectiminimax

An extension of the minimax algorithm to games of perfect but incomplete information in which the outcome depends on a combination of the player’s skill and chance elements.

exploitability

A measure of the advantage an opponent can gain by deviating from their strategy to specifically counter the player’s. Exploitability is important for assessing the quality of strategies — strategies that are highly exploitable allow an opponent to significantly improve their outcome by exploiting the player’s choices. ε-exploitability is a relaxed measure of exploitability that considers the best outcome an opponent can get while being at most ε away from their own Nash equilibrium payoff. ε-exploitability is much less computationally expensive to determine and can be more robust and realistic than traditional exploitability measures.

failure type

Alpha-beta search often categorizes search failures into two groups:

fail-low is when a node in the search tree results in a score lower than the current best option (α in alpha-beta), meaning a minimizing player has a way of lowering the score even further and thus the rest of the node’s branches can be ignored.
fail-high is when a node in the search tree results in a score higher than the current best option (β in alpha-beta), meaning a maximizing player has a way of improving the score even further and thus the rest of the node’s branches can be skipped.

However, if aspiration windows are being used, these terms when applied to the root position imply a need to re-search with a wider window in order to get a true score instead of an upper or lower bound respectively.

Failure terminology is also used in alpha-beta to describe the search approach:

fail-hard alpha-beta is a more traditional implementation of the algorithm where the return value of a search is strictly constrained within the α and β boundaries.
fail-soft alpha-beta allows the search to potentially return a value slightly outside of the alpha-beta window, potentially providing more information about the true value of the position (useful in combination with a transposition table) and generally the preferred method for modern search implementations).

format

The set of rules and regulations that define the game being played. In Pokémon, the format determines the generation of play as well as which configurations and combinations of Pokémon are allowed. Format is often used interchangeably with metagame, though the latter actually refers to what’s commonly seen in a format at a particular point in time (formats dictate what’s allowed and metagames describe what popular strategies arise under those constraints).

FOSG

A factored-observation stochastic game is a specification of a game that builds on the concept of a partially observable stochastic game (POSG) from multi-agent reinforcement learning by dividing each agent’s observations based on whether they represent public or private information (Kovařík 2022). Whereas an information set representation is concerned with grouping indistinguishable game histories regardless of whether the hidden information is directly related to the environment or another player’s knowledge, FOSGs explicitly factor observations into public and private components allowing for analysis specifically focused on how shared knowledge and private information shape an agent’s perception of the game state.

An overloaded abbreviation that may refer to one of three things depending on context:

Fictitious play, a learning process where each player assumes their opponents are playing stationary (unchanging) strategies. During each round, a player calculates the best response to the historically observed actions of their opponents, believing those actions represent fixed strategies. This iterative process can lead players towards converging on a Nash equilibrium in certain types of games (like zero-sum games), but it can fall short in settings where opponents actively adapt their strategies in response to the fictitious player’s actions.
Futility pruning, an optimization technique used in game tree search algorithms like minimax to improve efficiency. The basic idea is to stop evaluating a particular branch of the game tree if it becomes clear that the position is so poor for the current player that a better move is highly likely to exist elsewhere in the search space, even if it hasn’t been explored yet. This is determined by comparing the position’s current evaluation to a static threshold value or a dynamically calculated margin.
the Pokémon move Focus Punch

game

Any set of circumstances that has a result dependent on the actions of two or more decision-makers (players). In the context of competitive Pokémon battling, the game or battle can either be viewed as the combination of the team-building and piloting components (and sometimes even featuring an initial drafting component) or simply the piloting aspect. In most cases, without additional qualifiers the game can usually be understood to refer exclusively to the piloting component.

game tree

A graph representing all possible game states within a game. The complete game tree for a game is the game tree starting at the initial state and containing all possible moves from each state and is equivalent to the tree obtained from the extensive-form game representation). Given that it’s usually intractable to enumerate the entire game tree in Pokémon, AI agents usually search over a partial game tree which contains a subset of the nodes from the complete game tree.

generation

A grouping of Pokemon game releases that separate them based on the Pokémon and mechanics they include. Generations are typically referred to by acronyms based on their release titles:

Generation I: RBY, RB, RBG
Generation II: GSC, GS
Generation III: ADV, RSE, RS, RSEFRLG
Generation IV: DPP, DPPt, DP, HGSS, DPPtHGSS
Generation V: BW, BW2, B2W2
Generation VI: XY, ORAS
Generation VII: SM, USUM
Generation VIII: SS, SwSh
Generation IX: SV

genetic algorithm

Genetic algorithms (GA) are a method for solving both constrained and unconstrained optimization problems that are based on natural selection. Genetic algorithms are typically useful when the objective function is discontinuous, non-differentiable, stochastic, or highly nonlinear, and are often used to determine the values of hyperparameters.

Game Freak, primary developer and co-owner of Pokémon, responsible for the cartridge implementation.

GGP

General Game Playing is a field within artificial intelligence that focuses on creating systems capable of learning and playing a wide variety of games without being explicitly programmed for each specific one. A GGP system is provided with the rules of a game in a formal language (like the Game Description Language) and must learn to play effectively through self-play, analysis, and strategy adaptation. The goal of GGP is to develop intelligent agents that can reason about new game situations, demonstrating a level of flexibility that surpasses AI programs specialized for single games like chess or Go.

gimmick

Sometimes used to describe an unorthodox and usually suboptimal Pokémon set, more commonly used to refer to generational gimmicks which are novel mechanics which have been introduced to certain Pokémon generations which alter which actions are possible. Generational gimmicks include Mega Evolution, Z-Moves, Dynamax, and Terastallization.

Glicko

The Glicko rating system, a method used by Pokémon Showdown to assess a player’s strength. Used primarily for usage stats.

GXE

Glicko X-Act Estimate or GLIXARE, an estimate used by Pokémon Showdown of a player’s win chance against an average player.

HCE

Hand-crafted evaluation — a evaluation function written manually (as opposed to learned), often leveraging heuristics or other rule-based approaches.

heuristic

An approach for arriving at a solution by trial and error or by rules that are only loosely defined via methods which aren’t optimal, perfect, or rational, but is nevertheless sufficient for reaching an immediate, short-term goal or approximation.

history heuristic

A dynamic move ordering technique that works by keeping track of how often a particular move has caused cutoffs in previous searches, irrespective of the state in which the move had been made. Moves that have historically caused a high number of cutoffs are prioritized early in the search, under the assumption that they are more likely to lead to good positions and cause cutoffs again. To apply to competitive Pokémon AIs, logical or contextual inputs are usually required.

horizon effect

A problem that arises in depth-limited search algorithms where positions can be inaccurately assessed in circumstances where important consequences occur just beyond the search horizon (e.g., when facing an opponent’s move that can cause serious damage and is ultimately unavoidable but which can be temporarily avoided by the use of delaying tactics). Strategies for combating this include quiescence search or increasing the search depth for certain moves (extensions).

hyperparameter

A hyperparameter is a parameter whose value is used to control the learning process. By contrast, the values of other parameters (typically node weights) are derived via training.

inference

The process of drawing conclusions from evidence and reasoning. Inferences go beyond the explicitly stated information, using prior knowledge, experience, and logic to reach a likely or plausible conclusion. Deduction is a form of inference where the conclusions are logically guaranteed given a set of premises.

information

All the knowledge and observations available to players at any given point in a game (e.g, the rules, the current state, and the past actions).

Hidden information refers to any aspect of the game state that’s not directly observable by all players. Games where some information is hidden from at least one player are considered to have imperfect information, in contrast to perfect information games.

A complete information game is one where all players know the rules, the payoffs of each possible outcome and the strategies available to everyone whereas in an incomplete information game at least one player lacks knowledge about some aspect of the game rules, payoffs, or another player’s strategy space. Complete information deals with game structure and payoffs whereas perfect information deals with game state — it is possible that a game’s information may be perfect but incomplete or complete and imperfect.

Public information is knowledge shared by all players whereas private information is knowledge held by only a single player. Private information is necessarily hidden but not all hidden information is private as it may refer to information known by a subset of more than one player or be completely unknown to all participants.

information set

An information set (IS) is the set of all possible actions in the game for a given player, built on their observations and a set for a particular player that, given what that player has observed, shows the decision vertices available to the player which are indistinguishable to them at the current point in the game. All nodes within an information set represent situations where the player has observed the same sequence of actions, despite potentially different hidden moves being made by their opponents.

input log

A log that tracks every input that has been made throughout a battle, where an input encodes the action taken by a player. However, there are several potential ways this input can be encoded:

raw input simply encodes the action type and slot (and optionally the target) (e.g., switch 4 or move 1 2 mega). This form is ultimately required by an engine to unambiguously implement the battle mechanics of Pokémon.
logical input is an encoding where move or species names are used in lieu of slots, (e.g., switch zapdos or move thunderbolt). This encoding is more intuitive and often more useful to work with when utilizing machine learning techniques, though isn’t accepted by all engines and can be ambiguous (e.g., in formats without the Species Clause).
contextual input is a class of inputs that add additional context to a logical input about the scenario the input was made in to provide necessary information for learning algorithms (e.g., Zapdos Thunderbolt vs. Starmie or Zapdos -> Starmie vs. Rhydon).

Applying anchored subsets of the input log to the initial battle state can be used to recompute past battle states, and competitive Pokémon developers may choose to improve memory overhead by storing the game tree in terms of an initial state and inputs as opposed to storing copies of battle state at each node.

interior equilibria

A mixed Nash equilibrium where all players play each of their available strategies with a positive probability — i.e., no player puts all their weight on any single strategy. Interior equilibria are often more stable than non-interior equilibria because they’re less likely to be disrupted by small changes in the game’s payoffs. This is because if one player deviates from an interior equilibrium by playing a different strategy with a small probability, the other player won’t have a strong incentive to change their strategy in response.

iterative deepening

A technique where a depth-limited search algorithm is performed multiple times with the depth increased with each iteration. While originally used as a form of timer management to allow these searches to work in an online context, iterative deepening can actually be faster in alpha-beta than simply searching a given depth immediately due to information obtained from previous searches having beneficial effects methods such as the history heuristic can have on move ordering.

iterative distillation

A knowledge distillation technique where a smaller student model is trained on the outputs of a larger, more complex teacher model, and this process is repeated multiple times. In each iteration, a new student is trained and distilled from the previous (now more capable) student. The outputs of this new student may be further improved through an optional amplification step aimed at boosting the capabilities of the student model beyond simply mirroring the previous student’s outputs before serving as the training target for the next iteration. This chain of distillation and amplification leads to successively smaller and more accurate student models, transferring knowledge without directly using the original training data.

joint action

The combination of actions chosen by all players in a simultaneous move game. An individual player’s action is not sufficient to determine an outcome in a simultaneous move game — all players’ actions are required and each possible combination of actions may result in different payoffs for each player.

killer move

A move ordering heuristic that optimizes search by prioritizing evaluating highly advantageous moves that dramatically limit an opposing player’s response options (e.g., a move that causes an opposing Pokémon to faint, possibly winning the battle).

knowledge

Knowledge is the possession of information. Knowledge can refer to an a priori understanding of a game’s mechanics and rules; can be encoded as heuristics or within an evaluation function, or can be learned procedurally.

Lc0

Leela Chess Zero, an open-source computer chess agent based on the design behind AlphaGo. Regularly competes with Stockfish for the title of strongest computer chess agent.

lead

The Pokémon on a player’s team that’s sent out first at the beginning of the battle (sometimes predetermined during team-building though otherwise determined in Team Preview).

learning

Learning refers to machine learning (ML), where statistical algorithms are leveraged to generalize and perform tasks without explicit instructions. The three main categories of machine learning are:

supervised learning, where both example inputs and their desired outputs are provided
unsupervised learning, only the inputs are provided, and
reinforcement learning, where an agent aims to perform a certain goal and is given feedback along the way that it attempts to maximize

Libratus

The first computer agent to achieve superhuman performance in no-limit Texas hold ’em poker, using a new variant of counterfactual regret minimization dubbed CFR+ (Brown 2017). Libratus was later followed by Modicum (Brown 2018), Pluribus (Brown 2019), and ReBeL (Brown 2020) — state-of-the-art agents which are similarly foundational as AlphaGo but for imperfect information games.

linear programming

Linear programming (LP), also called linear optimization, is a method to achieve the best outcome (such as maximum profit or lowest cost) in a mathematical model whose requirements are represented by linear relationships. Typically used in competitive Pokémon AI research to solve for the Nash equilibrium.

LLM

Large language models are machine learning models built with transformers that are trained on massive amounts of data and can be used to recognize and generate text.

LMR

Late move reductions are an optimization for search that involves reducing the search depth of moves that have been ordered closer to the end of possible moves for a given state (as opposed to pruning such moves). Search on these late moves can be re-run with the full depth if they end up proving promising.

look-ahead

The depth that a search algorithm searches to.

LOS

The likelihood of superiority denotes how likely it would be for two players of the same strength to reach a certain result — in other fields called a p-value, a measure of the statistical significance of a departure from the null hypothesis.

lrsnash

A linear programming algorithm for determining all of the possible Nash equilibrium solutions in a bimatrix game published in (Avis 2010) and available in the lrslib.

match-up

Match-up (or MU) describes the advantage (or disadvantage) of a given team or Pokémon against another team or Pokémon. In the context of teams or the battle as a whole, match-up determines which team is favored to win after team-building, and often the advantage may be strongly in favor of one side.

matrix node

In simultaneous move games, a matrix node represents a specific combination of actions chosen by all players. Each node in the game’s matrix representation corresponds to an outcome of the game.

MaxDamagePlayer

A simple competitive Pokémon playing agent that chooses to use whichever move action is expected to do the most damage the next turn to the opponent, commonly seen in the literature alongside the $RandomPlayer$ when benchmarking / testing agents due to its simplicity of implementation. Certain variants of this exist depending on how much of the damage formula is implemented, with the simplest versions of $MaxDamagePlayer$ perhaps only considering each move’s effective base power after STAB and type effectiveness. A fully featured $MaxDamagePlayer$ can achieve a much higher level of play than a $RandomPlayer$ , though is extremely exploitable (this can be somewhat mitigated by the addition of some randomness or certain heuristics).

MCTS

Monte Carlo tree search is a heuristic search algorithm that combines tree search with the Monte Carlo method, a broad class of algorithms that rely on repeated random sampling to obtain results. MCTS is particularly useful for games with large decision spaces and imperfect information. The algorithm works by iteratively building a search tree where each node represents a possible game state, and edges represent actions. MCTS selectively expands the tree, guided by simulations (random playouts) that estimate the value of taking different actions from each state. This combination of strategic exploration and value estimation allows MCTS to find strong moves even in complex games like Go, where it’s impossible to evaluate every possible move.

MDP

A Markov decision process is a stochastic decision-making process that uses a mathematical framework to model the decision-making of a dynamic system in scenarios where the results are either random or controlled by a decision-maker, which makes sequential decisions over time. In competitive Pokémon AI research, MDP may also refer to the $MaxDamagePlayer$ in certain contexts.

metagame

Also known as the meta, describes the most popular strategies and approaches to playing a game. In competitive Pokémon, usage stats provide information to determine which Pokémon define a particular format’s metagame.

metaheuristic

A higher-level, problem-independent procedure or framework that outlines strategies to guide the design of heuristic optimization algorithms. Examples include genetic algorithms, simulated annealing, and Tabu search.

midgame

Term to describe everything that happens between the early game opening and the endgame.

minimax

Also known as maximin, describes the concept of minimizing the possible loss for a worst- case (maximum loss) scenario. Usually refers to the minimax search algorithm, a recursive algorithm for choosing the next move in an n-player game such that each player makes the move that maximizes the minimum value of the position resulting from the opponent’s possible moves. Furthermore, due to competitive Pokémon being a stochastic game, minimax is often actually referring to the expectiminimax algorithm. Negamax is a particularly common way of implementing minimax — instead of using two separate functions for the minimizing and maximizing player respectively it passes on a negated score such that $max(a, b) = -min(-a, -b)$ .

model

A simplified mathematical representation of a system or process based on observed data patterns that can be used to make predictions or decisions on new data. Model-based approaches to machine learning learn a representation of how the environment works and how actions lead to state changes whereas model-free approaches learn purely through trial and error and focus directly on learning the policy.

move

See action.

move ordering

The process of determining the sequence in which potential moves should be considered during a game tree search. Good move ordering is crucial for the efficiency of search algorithms like minimax or alpha-beta pruning, as exploring the most promising moves first can lead to more frequent cutoffs, allowing the algorithm to prune irrelevant sections of the search tree and reach deeper evaluations more quickly. Move ordering techniques often involve a mix of heuristics, static move evaluations, and domain-specific knowledge.

MVP

The minimal viable product, a version with the bare-minimum number of features to be usable / demonstrate enough of a solution that feedback becomes relevant.

The Nash equilibrium, a stable state of a system involving the interaction of different participants, in which no participant can gain by a unilateral change of strategy if the strategies of the others remain unchanged. A mixed strategy Nash equilibrium or mixed Nash equilibrium (MNE) is a Nash equilibrium tgat requires a mixed strategy (a probability distribution over pure strategies), and every n-player game has at least one mixed Nash equilibrium. In competitive Pokémon, Nash equilibrium is used interchangeably with mixed Nash equilibrium.

NMH

The null move heuristic (alternatively null move pruning) is a pruning technique in alpha-beta search where the agent performs a reduced-depth search on a null move (the equivalent of passing in Pokémon in circumstances that would not normally allow for a pass). Based on the observation that in most cases there is almost always a better alternative to doing nothing, if the score after the null move is still higher than or equal to β (the minimum score the opponent can force), it indicates the position is already strong for the current player and a cutoff would likely occur even with a real move. This allows the algorithm to identify cutoffs with less computational effort.

A neural network, is a computer system modeled on the human brain and nervous system, comprised of multiple layers of nodes, containing an input layer, one or more hidden layers, and an output layer. Each node, or artificial neuron, connects to another and has an associated weight and threshold. If the output of any individual node is above the specified threshold value, that node is activated, sending data to the next layer of the network. Otherwise, no data is passed along to the next layer of the network.

NNUE

An efficiently updatable neural network, stylized ƎUИИ, an architecture intended to replace the evaluation function of alpha-beta searches running on a CPU via SIMD instructions. Originally developed for Computer Shogi, NNUE has been used successfully in chess engines such as Stockfish to achieve high ratings.

node type

Search tree nodes in alpha-beta search are often classified into one of three types. These types are then stored in a transition table which indicates whether the score (an approximation of the nodes value) for a node is exact, lower or upper bounded.

PV-nodes (or Type 1 nodes) have a score that lies between the α (lower) and β (upper) bounds. These nodes all have moves searched and the value returned is exact, which propagates up to the root along with a principal variation.
Cut-nodes (also know as Type 2 or fail-high nodes) are nodes in which a beta-cutoff was performed. A minimum of one move of a cut-node needs to be searched and the score returned is a lower bound on the exact score of the node.
All-nodes (also known as Type 3 or fail-low nodes) are nodes in which no move’s score exceeded α. Every move of this type of node is searched and the score returned is an upper bound on the exact score.

null window

An alpha-beta search window defined with a very narrow range ( $\alpha = \beta - 1$ ) to test whether a move will cause a cutoff. This test is based on the idea that if a move doesn’t cause a cutoff within this narrow window, it’s highly unlikely to cause a cutoff with a wider window and therefore the rest of the search can be skipped. Null window searches are the foundation of techniques like Principal Variation Search.

observation

Information that is produced when the state transitions. In Pokémon, this may include a result that can be used to determine payoff and possible legal actions, logs which detail various events that may have occurred, an account of the chance actions, or a multitude of other possibilities depending on the engine.

offline

An offline algorithm receives all input data upfront and is required to output a solution that addresses the entire problem. If something can be done offline that usually means that it can be precomputed ahead of time, as opposed to in the middle of a game (contrast to online).

online

An online algorithm is one that can serially process its input piece-by-piece, i.e., in the order that the input is fed to the algorithm, without having the entire input available from the start (contrast to offline).

opening

The beginning of the game (sometimes referred to as the early game). In Pokémon, this consists of the Team Preview in generations where it exists as well as the first few turns of the initial lead match-up. Followed by the midgame. Many agents leverage an opening book which is a precomputed policy for handling play at this stage of the game.

opponent modeling

The process of building a representation of an opponent’s strategies, preferences, and potential actions in a competitive environment. This model can be built through observation of the opponent’s behavior, historical data, and by inferring their goals. An opponent model allows a player to proactively adjust strategies based on their understanding of the opponent’s tendencies and weaknesses and is an important factor in predicting an opponent’s actions.

oracle

A hypothetical black box or subroutine that can instantly solve a specific problem or answer a question, often with perfect accuracy. These are used in theoretical analysis to understand the limits of what can be computed and to compare the complexity of different algorithms. The term is also used less theoretically in machine-learning to refer to the source of ground truth labels or correct answers for supervised learning.

The double-oracle method refers to a particular algorithm for solving simultaneous move alpha-beta search (Bošanský 2013).

OTL

One-turn look-ahead. The $MaxDamagePlayer$ is the conventional name for an agent that leverages only a single turn of look-ahead. $N$ TL can be used to generalize this shorthand for describing the depth an agent searches to (e.g., 3TL for three turns worth of look-ahead).

pathology

Game tree pathology is a phenomenon where searching a game tree deeper results in worse decisions.

payoff

The reward a player receives from arriving at a particular outcome. In Pokémon, only the terminal states give rewards, where the reward is determined based on the result of the battle (winning, losing or drawing — though pedantically, Pokémon glitches mean that one also needs to support the notion of a battle ending in error which can usually be mapped to a draw) and the utility / reward function (which usually maps W/L/T to 1/-1/0 making Pokémon a zero-sum game or 1/0/0.5 making Pokémon a constant-sum game, the latter being more common).

PBS

A public belief state is a probability distribution over all possible hidden states of the game, based on the information that’s shared by all players. PBSs allow players to reason about what others might know and make decisions based on the likelihood of different scenarios, even when they don’t have complete information about the game’s true state.

perspective

Perspective or viewpoint determines which information in Pokémon is available. Perspective can refer to a specific player or side, or can be one of two special perspectives — the omniscient perspective which is aware of all information from both sides and the spectator perspective which is aware of all public information but none of the private information of either player.

piloting

The component of competitive Pokémon that involves playing out a battle online after each players’ teams have already been chosen.

Pinyon

A C++ library that builds on top of lrslib and provides a collection of search and planning utilities for simultaneous move stochastic games. Pinyon includes a state-of-the-art version of simultaneous move alpha-beta search using the double-oracle method (Bošanský 2013) modified with support for stochastic games as well as a generalized implementation of tree-bandit search with Exp3.

pkmn

A common abbreviation of Pokémon dating back to the earliest releases where the stylized ^P_K^M_N was included in the cartridge character set. However, this abbreviation has since been adopted as the name for the pkmn collection of projects (notably the pkmn engine, libpkmn) and is often what’s being referenced (and is always what’s being referred to when prefixed with an @, as @pkmn is the name of the organization).

player

A strategic decision-maker within the context of the game. In Pokémon, these are usually referred to as Player 1 and Player 2 (alternatively, P1 / P2 or Player A / Player B) and are technically not completely symmetrical because certain game mechanics depend on the notion of the host player (conventionally P1). In more general terms, either P1 or P2 may be considered the player vs. the enemy / foe depending on perspective — the P1 vs. P2 distinction is absolute whereas the less specific player vs. foe terminology is relative. When considering Pokémon as a bimatrix game the row player and column player are usually also used to refer to relative perspectives.

playout

The act of playing out the rest of the game from a particular node in the game tree. The term playout or rollout is sometimes confusingly used as a shorthand to describe a single iteration of a tree bandit search algorithm over the tree (i.e., the entire process of selection / expansion / simulation / backpropagation) as opposed to describing just the simulation phase. Furthermore, playouts may sometimes not actually involve playing out to a terminal node if an evaluation function is used instead. Playouts may be classified as light (faster, shallower, relying on simpler evaluation functions and focusing on exploration) or heavy (slower, deeper, more accurate, and featuring stronger evaluation)

PoG

Player of Games, the original name of a general game-playing agent that utilized improved counterfactual regret minimization to achieve superhuman performance in a variety of games (Schmid 2021).

policy

A state to action mapping — for each action possible from a given state, a policy returns the probability of taking the action. Ultimately, the objective of a competitive Pokémon AI is to determine the optimal policy for any given state. Reinforcement learning aims to accomplish this by learning a policy network.

POMDP

Partially observable Markov decision process, a generalization of a Markov decision process which models an agent decision process in which it’s assumed that the system dynamics are determined by an MDP, but the agent can’t directly observe the underlying state.

pondering

The act of thinking during an opponent’s turn. Given that Pokémon is a simultaneous move game, pondering occurs after a player has submitted their action (or been asked to wait) while they’re waiting for their next opportunity to input an action.

PPO

Proximal Policy Optimization is a model-free reinforcement learning policy gradient algorithm offering a solid combination of stability, efficiency, and performance that aims to increase the probability of actions leading to higher rewards by directly optimizing the policy itself.

prediction

An estimate of something that will happen in the future. In competitive Pokémon, a prediction usually refers to a player’s expectation of their opponent’s next action, though players also make predictions about any and all unknown information.

ProbCut

A selective search enhancement for algorithms like alpha-beta search that’s based on the idea that the result of a shallow search can be used to predict the results of deeper searches. ProbCut uses linear regression to model the relationship between shallow and deeper search values, allowing it to identify branches of the search tree that are unlikely to affect the final outcome and therefore can be pruned, improving search efficiency.

protocol

The format and rules governing how clients communicate with the server. Different simulators / engines have their own protocols and there is no single standard, though the vast majority of competitive Pokémon agents adhere to the Pokémon Showdown protocol.

pruning

A name for every heuristic that removes completely certain branches of the search tree, assuming they have no bearing on the search result. Backward pruning where the decision on what to prune happens after finding a refutation while searching (e.g. Alpha-beta) is sound and will never affect the search result — forward pruning which preemptively eliminates branches before searching comes with the risk of potentially overlooking something. Compare to reduction.

Pokémon Showdown, the most popular and influential Pokemon battle simulator.

The principal variation is a sequence of moves that programs consider best and therefore expect to be played. Principal Variation Search (PVS) is an enhancement to alpha-beta search that focuses computational effort by assuming the principal variation found in a previous, shallower search will likely remain the best at the current depth; therefore, nodes within the principal variation are searched deeply, while other nodes are searched with a narrow fail-soft window to quickly determine if they can potentially outperform the current best move.

Q-learning

A model-free reinforcement learning algorithm to learn the value of an action in a particular state that can handle problems with stochastic transitions and rewards without requiring adaptations. Q-learning was used by DeepMind in their Deep Q-Network (DQN) algorithm along with a technique called experience replay to solve a wide range of Atari games (Mnih 2015).

quiescence search

A technique used to improve the accuracy of position evaluations at leaf nodes (positions where no further moves are being generated). It specifically focuses on volatile positions where significant value changes may occur in the immediate next moves (eg. due to knocking out an opponent’s Pokémon). The idea is to extend the search depth selectively only in these volatile positions until a relatively stable queiscent position is reached, preventing premature cutoffs due to the horizon effect and leading to a more reliable evaluation.

RandomPlayer

Also known as RP, a trivial competitive Pokémon playing agent that leverages some version of a uniform policy, commonly seen in the literature alongside the $MaxDamagePlayer$ when benchmarking/ testing agents due to its simplicity of implementation. However, despite this seemingly simple definition, multiple subtly different versions of the $RandomPlayer$ exist (e.g., many implementations decide to choose only move actions unless forced to consider switch actions, and may have similar heuristics for decided how to treat generational gimmicks). While all $RandomPlayer$ implementations are forced to choose move or switch actions when they’re the only option available, nuances between implementations exist in the common scenario where either type of action is possible.

Formally, the notation $RandomPlayer(N\%)$ should be used to refer to an agent which $N\%$ of the time uniformly chooses to make a switch action when available, otherwise choosing uniformly between possible move actions. $RandomPlayer(\widetilde{N\%})$ instead always considers available move actions but only considers switch actions $N\%$ of the time unless forced. This distinction of choosing uniformly between a minimal subset of the action space or considering potentially multiple subsets of the space can be extended to gimmicks such as mega, zmove, dynamax or terastallize, though in those cases a more explicit notation is preferred (e.g., $RandomPlayer(switch=\widetilde{20\%}, zmove=90\%)$ would consider switch 20% of the time and use zmove when available 90% of the time).

reduction

A class of search heuristics that decrease the depth to which a certain branch of the tree is searched. Compare to pruning.

regret

The value of the difference between a possible (usually optimal) action and the action that has actually been chosen. The key concept underpinning counterfactual regret minimization.

relaxation

An approximation of a difficult problem by a nearby problem that’s easier to solve. A solution to the relaxed problem provides information about the original problem.

reverse damage calculator

A tool that computes information about a Pokémon’s set (ability, item, spread ranges) based on the amount of damage its moves inflict on its opponents. The inverse of a damage calculator.

reward shaping

A technique used in reinforcement learning where small intermediate rewards are given to the algorithm to help reshape sparse reward functions to denser ones to help the algorithm converge more quickly.

Reinforcement learning is a form of machine learning where an agent learns to take actions to maximize a cumulative reward. Q-learning, PPO, and Actor-Critic are three common reinforcement learning algorithms.

RNG

A random number generator, though almost always in the context of competitive Pokémon this refers to a pseudo-random number generator (PRNG). The term RNG can also simply refer to random events in general. RNGs usually produce a stream of random numbers that depend on the initial starting state of the generator, known as its seed.

roll

A colloquial term describing a chance event — the result from invoking the random number generator. The roll concept is generalized from that of a dice roll. In Pokémon, a damage roll is one of many possible damage values that result from the step in the damage formula that applies a random factor to the base computed damage of most attacking moves. See also damage coalition.

rollout

See playout. Rollout is also the name of a Pokémon move.

rule

One of a set of explicit or understood regulations or principles governing conduct within a particular activity or sphere. The rules of competitive Pokémon are dictated by the format — there are a wide variety of possible classes of custom rules which may apply. The term can also refer to a rule from a rule-based agent.

rule-based

A system that applies human-made rules to store, sort and manipulate data as opposed to rules discovered via machine learning. These systems typically take the forum of if-then statements and involve hand-crafted evaluation functions.

SAC

Soft Actor-Critic is a method of augmenting the traditional actor-critic reinforcement learning algorithm to favor policies with higher entropy, improving exploration early on. The Actor-Critic algorithm makes use of two different components (usually two separate neural networks) — an Actor that learns the policy and a Critic that learns how good actions are in a given state.

The process of exploring the possible decision paths and outcomes within a game to identify strong strategies or solutions. Search algorithms systematically evaluate different actions and their potential consequences, often building a search tree where nodes represent game states and branches represent possible moves. The goal of search is to find optimal or near-optimal strategies, considering factors like payoffs, uncertainty, and the actions of other players.

selectivity

A search algorithm’s ability to discriminate between promising and unpromising moves or areas of the search space (i.e., those likely to become part of the principal variation). A highly selective algorithm is better at focusing its computational effort on the most relevant parts of the problem, quickly identifying strong candidate solutions, and effectively reducing or pruning irrelevant branches.

self-play

A technique where a game-playing agent plays against itself repeatedly in order to generate data and improve the agent’s strategies without the need for a human opponent. During self-play, the agent learns from both its successes and failures, gradually refining its decision-making abilities within the game environment.

sequentialization

Sometimes called serialization, sequentialization refers to the process of transforming a simultaneous move game into a sequential game in order to simplify analysis or to make the game more tractable for finding solutions. Sequentialization involves introducing an artificial order of moves, potentially with information revealed to some players based on the actions of those who moved earlier, even if the original game didn’t have this sequential structure.

Generally, pessimistic sequentialization is performed such that the player’s moves are known to their opponent, giving the opponent more information and allowing them to choose responses that are most detrimental to the player (compared to optimistic sequentialization where the reverse is true). A pessimistic ordering generally leads to more conservative play focused on minimizing losses whereas optimistic sequentializations are riskier but could lead to greater payoffs.

server

Where the game engine is being run and where the actual, complete battle state is stored. The server is responsible for keeping relevant information hidden from the client.

Shannon Type

A classification of strategies for situations with too many states to explore in the time available that was coined by Claude Shannon. In this classification, a Type A Strategy considers all possible actions to a certain depth of the tree and then uses a heuristic evaluation function to estimate the utility of states at that depth (exploring a wide but shallow portion of the tree). In contrast, a Type B Strategy ignores moves that look bad and follows promising lines “as far as possible” (exploring a deep but narrow portion of the tree).

sim

A simulator is something that mimics the behavior of something by modeling the underlying state of the target, though it doesn’t necessarily attempt to do so through emulation of an external device. In the context of Pokémon, simulator more often describes a particular platform that provides a venue for online Pokémon battling, and usually involves a game server being built around a Pokémon engine. Pokémon Showdown is the most comprehensive and most popular Pokémon simulator.

SIMD

Short for Single Instruction on Multiple Data, SIMD characterizes a type of CPU instruction that allows for computing operations in parallel on a vector of numbers.

simplification

See relaxation.

slot

Refers to the 1-indexed position of either a Pokémon in a player’s party (known as a team slot or party slot) or a move in a Pokémon’s moveset (move slot). Required when specifying a player’s action.

A simultaneous move game (also simply simultaneous game or static game) is a game where each player chooses their action without knowledge of the actions chosen by other players. Contrasts an alternating move game (AM) (also known as a sequential game) where play alternates between players and each player is aware of their opponent’s prior action.

SM is also a commonly used abbreviation for Pokémon Sun & Pokémon Moon, the seventh Pokémon generation.

Smogon

Smogon University, home of competitive Pokémon battling and curator of rules and restrictions for the most popular competitive formats outside of those supported by Nintendo.

sparse

Used to describe something with low informational entropy) — i.e, something which has comparatively low information per element. A matrix that primarily consists of zeros or a game tree where most of the branches can be pruned is considered to sparse.

spread

A spread or EV spread refers to the distribution of components that make up a Pokémon’s stats — primarily EVs (effort values), but also IVs (individual values) and nature. Depending on the context, spread can also refer to Pokémon moves that target more than one Pokémon in certain formats.

STAB

An abbreviation for same-type attack bonus, a game mechanic where a damage boost gets applied to attacking moves used by a Pokémon of the same type.

state

The information required to determine which actions are available to each player. In Pokémon, this is the battle, and the term battle is used as opposed to state as state is a loaded term in programming (programming, in general, is arguably all about the management of state, so state needs to be qualified as battle state or game state to be unambiguous) despite it being a useful term in papers dealing purely with game theory.

stochastic

Used to describe a process characterized by randomness. Probabilistic goes further and denotes something where the outcome can be described by a probability distribution. Stochastic and probabilistic processes are non-deterministic, though non-deterministic describes a system or process where the outcome can’t be predicted with certainty from its initial state (and which may have multiple possible outcomes for a single input). Given the subtle distinction between these three terms they’re often used interchangeably.

strategy

Used interchangeably with policy, it refers generically to a plan to achieve an overall goal.

subgame

A notion used in the solution concept of subgame perfect Nash equilibrium, a refinement of the Nash equilibrium that eliminates non-credible threats. Subgames are a subset of a game that meets a specific set of criteria and when taken in isolation constitutes a game in its own right.

symmetry

Similarity or exact correspondence between different things. In the context of games, symmetry can result certain states occurring more than once in the game tree, meaning techniques such as transposition tables and action consolidation can be leveraged to speed up search.

TBS

Tree-bandit search, the name for the family of search algorithms which are roughly based around Monte Carlo tree search but which make use of various bandit algorithms to greatly improve performance and convergence.

TD-learning

Temporal difference learning refers to a class of model-free reinforcement learning methods which learn by bootstrapping from the current estimate of the value function. These methods sample from the environment and perform updates based on current estimates, adjusting predictions to match later, more accurate, predictions about the future before the final outcome is known.

Team Preview

A pre-battle phase that was introduced in Generation V in which players get to see each of the Pokémon on their opponent’s team and choose which Pokémon get brought to battle and in which order.

team-building

The component of competitive Pokémon that involves constructing a team offline before a battle, usually with the aim of being able to beat the entire metagame.

Thompson sampling

A bandit algorithm which chooses the action that maximizes the expected reward with respect to a randomly drawn belief.

Torch

The de facto standard open-source machine learning and scientific computing framework, most often used in Python via PyTorch.

TPCi

The Pokémon Company International, co-owners of the Pokémon franchise along with the publisher and trademark-holder, Nintendo, and the developers, Game Freak.

transformer

A neural network that learns context and thus meaning by tracking relationships in sequential data.

transition

The change that a state undergoes due to an action. In the pkmn engine, this is implemented by a battle’s update function. In game theory, turns typically count transitions (though not in Pokémon), and a transition may produce a payoff or some sort of observation.

transitions

The entire set of possible transitions from any given state. While every Pokémon game engine must allow for performing a single transition, performing all transitions at once and returning the set of states is a less common feature, and almost always when transitions or transitions is used in conversation as a noun it’s referring to this functionality.

transposition

A sequence of moves that results in a position that may also be reached by another, more common sequence of moves. Transpositions are a type of symmetry and are often handled via transposition table. True transpositions are uncommon in Pokémon unless some level of abstraction has been applied.

A transposition table is a cache of previously seen positions, and associated evaluations, in a game tree generated by a computer game-playing program. If a position recurs via a different sequence of moves, the value of the position is retrieved from the table, avoiding re-searching the game tree below that position.

tuning

The process of maximizing a machine learning model’s performance without overfitting or creating too high of a variance, usually via hyperparameters. Tuning is also commonly performed on evaluation functions (whether learned or hand-crafted) — genetic algorithms, mathematical optimization, or other automated approaches such as CLOP or Texel’s Tuning Method are often considered.

turn

Game theory uses the term ply to refer to one turn taken by one player in a sequential game which doesn’t apply to Pokémon given that Pokémon is a simultaneous game. A turn in Pokémon is a well-defined concept as certain game mechanics depend on its definition (e.g., residual damage happens at the end of a turn in Pokémon in later generations). Instead, it’s more useful to reason in terms of decision points that occur between updates to the battle state (which may happen multiple times in a single turn, e.g., due to a Pokémon using Baton Pass or fainting) — places where the game requires input from players to proceed. In Pokémon Showdown and the pkmn engine, a decision point always requires a joint action, introducing the concept of passing / waiting to account for situations where on a console one of the players is forced to do nothing while input is collected from their opponent.

UCB

Upper Confidence Bound, a bandit algorithm commonly used in tree bandit search.

UCT

Upper Confidence bounds applied to Trees, the contemporary standard implementation of Monte Carlo tree search which uses the UCB algorithm to select promising child nodes to expand. See also tree bandit search.

uniform policy

A policy which assigns equal (uniform) probability to each of the actions possible — this sort of policy is usually only optimal in the absence of any other information (e.g., at the beginning of a search).

unmake

A function provided by an engine which allows for an action’s effect on the state to be reversed, undoing the update. Not all engines support this feature and not all actions made in competitive Pokémon can be trivially reversed. Unmake functionality and the faster, incremental, in-place updates it enables can greatly accelerate search performance compared to the alternative of being required to copy or move states in memory in order to preserve their original value.

usage stats

Statistics processed from a corpus of battle logs that detail the usage of various Pokémon and their attributes for a given format. Smogon University produces public statistics calculated from all rated battles on Pokémon Showdown each month which they use to influence the tiering decisions for the formats they maintain, though alternative usage statistics can also be produced for specific use cases.

value

The expected payoff from a given state, produced by an evaluation function. Evaluation functions are usually difficult to produce accurately in Pokémon and are often approximated with reinforcement learning via a value network that assigns a value to each state of the game by calculating an expected cumulative score. Evaluation functions can be classified as either light (simple and fast) or heavy (complex and detailed), with the former being more commonly used in addition to search whereas the latter is intended to be used on its own.

vaporware

Software that has been advertised but isn’t yet available to use / buy, either because it’s only a concept or because it’s still being written or designed. See also pkmn.

variant

A variation from the standard rules of competitive Pokémon which usually simplifies / relaxes certain properties / rules of the game to make it more tractable for AI, with the expectation that solutions to a variant are usually useful for generalizing to the original ruleset. In computer poker, Kuhn poker is the most common simplification studied, whereas in Pokémon there are numerous potential options of variants. Note that while the various Pokémon formats can be considered variants, the term variant is reserved to describe simplifications to the game that go beyond the rule changes typical between formats.

wincon

A win condition is a Pokémon or scenario that a player needs in order to ensure that they win the game.

WLT

WLT or W/L/T refers to win, loss, and tie (draw) record of a particular player or agent.

zero-sum

A zero-sum game is a mathematical representation in game theory of a situation that involves two sides, where the result is an advantage for one side and an equivalent loss for the other, such that the net improvement in benefit of the game is zero. Zero-sum games are a specific example of a constant-sum game, a concept when the sum of the players’ gains and losses is a fixed constant $C$ . Constant-sum games can be trivially transformed into a zero-sum game by subtracting $C$ from every payoff.