diff --git a/.gitignore b/.gitignore index ebc21507c..9acfeeb47 100644 --- a/.gitignore +++ b/.gitignore @@ -20,6 +20,7 @@ _build build *.egg-info *.pyc +*.so config.guess config.sub @@ -37,3 +38,5 @@ gambit .python-version dist .venv +*.dmg +Gambit.app/* \ No newline at end of file diff --git a/.readthedocs.yml b/.readthedocs.yml index 1dd08d262..af921b438 100644 --- a/.readthedocs.yml +++ b/.readthedocs.yml @@ -11,6 +11,7 @@ build: python: "3.13" apt_packages: - libgmp-dev + - pandoc python: install: diff --git a/doc/conf.py b/doc/conf.py index 57de0b53a..21b5325ba 100644 --- a/doc/conf.py +++ b/doc/conf.py @@ -26,6 +26,7 @@ "IPython.sphinxext.ipython_console_highlighting", "IPython.sphinxext.ipython_directive", "sphinx_design", + "nbsphinx", ] # IPython directive configuration diff --git a/doc/developer.contributing.rst b/doc/developer.contributing.rst index daf27fcac..65857f180 100644 --- a/doc/developer.contributing.rst +++ b/doc/developer.contributing.rst @@ -22,6 +22,8 @@ When reporting a bug, please be sure to include the following: sample game file or files if appropriate; it is often helpful to simplify the game if possible. +.. _contributing-code: + Contributing code ---------------- @@ -57,31 +59,31 @@ The project is hosted on GitHub, and contributions can be made via pull requests Editing this documentation -------------------------- -1. If you haven't already, clone the Gambit repository from GitHub: :: +1. `Install Pandoc `_ for your OS + +2. If you haven't already, clone the Gambit repository from GitHub: :: git clone https://github.com/gambitproject/gambit.git cd gambit -2. Either install the docs requirements into your existing PyGambit development environment, or create a new virtual environment and install both the requirements and PyGambit there. For example, you can use `venv` to create a new environment: :: +3. Either install the docs requirements into your existing PyGambit development environment, or create a new virtual environment and install both the requirements and PyGambit there. For example, you can use `venv` to create a new environment: :: python -m venv docenv source docenv/bin/activate -3. Install the requirements and make the docs: :: +4. Install the requirements and make the docs: :: pip install . cd doc pip install -r requirements.txt make html # or make livehtml for live server with auto-rebuild -4. Open ``doc/_build/html/index.html`` in your browser to view the documentation. +5. Open ``doc/_build/html/index.html`` in your browser to view the documentation. -5. Make any changes you want to the `.rst` files in the ``doc`` directory and rebuld the documentation to check your changes. +6. Make any changes you want to the `.rst` files in the ``doc`` directory and rebuild the documentation to check your changes. -6. Follow the usual GitHub workflow to commit your changes and push them to the repository. +7. Follow the usual GitHub workflow (see :ref:`contributing-code` above) to commit your changes and push them to the repository. -7. Core developers will review your changes and merge to the master branch, which automatically deploys the documentation via the ReadTheDocs service. +8. Core developers will review your changes and merge to the master branch, which automatically deploys the documentation via the ReadTheDocs service. -.. TODO: Add instructions for the GitHub workflow during contributor docs refactoring. - See https://github.com/gambitproject/gambit/issues/541 diff --git a/doc/gui.rst b/doc/gui.rst index 361681c40..9e83e59ab 100644 --- a/doc/gui.rst +++ b/doc/gui.rst @@ -25,6 +25,68 @@ To build larger games or to explore parameter spaces of a game systematically, it is recommended to use :ref:`the Python package `. +Installation +------------ + +To install the Gambit GUI, visit the `Gambit releases page on GitHub `_ and download the appropriate installer or package for your operating system. +Each release includes pre-built binaries for Windows, macOS, and Linux distributions, accessible under the "Assets" section of each release. + +.. dropdown:: Manual macOS Build Instructions + :class-container: sd-border-0 + + To build and install the Gambit GUI from source on macOS, follow these steps: + + 1. **Install build dependencies:** + + .. code-block:: bash + + brew install automake autoconf libtool + + .. note:: + If you encounter interpreter errors with autom4te, you may need to ensure + your Perl installation is correct or reinstall the autotools: + + .. code-block:: bash + + brew reinstall automake autoconf libtool + + 2. **Download and build wxWidgets:** + + .. code-block:: bash + + curl -L -O https://github.com/wxWidgets/wxWidgets/releases/download/v3.2.8/wxWidgets-3.2.8.tar.bz2 + tar xjf wxWidgets-3.2.8.tar.bz2 + cd wxWidgets-3.2.8 + mkdir build-release + cd build-release + ../configure --disable-shared --disable-sys-libs + make -j4 + sudo make install + + 3. **Build and install Gambit:** + + Navigate back to the Gambit source directory and run: + + .. code-block:: bash + + aclocal + automake --add-missing + autoconf + ./configure + make + sudo make install + + 4. **Create macOS application bundle:** + + To create a distributable DMG file: + + .. code-block:: bash + + make osx-dmg + + 5. **Install the application:** + + After creating the DMG file, open it and drag the Gambit application to your Applications folder. .. toctree:: :maxdepth: 2 diff --git a/doc/index.rst b/doc/index.rst index 2fcb284f3..410c72cbe 100644 --- a/doc/index.rst +++ b/doc/index.rst @@ -7,13 +7,13 @@ construction and analysis of finite extensive and strategic games. .. grid:: - .. grid-item-card:: Python user guide + .. grid-item-card:: Python tutorials and user guide :columns: 6 An introduction to using the ``pygambit`` package in Python. - .. button-ref:: pygambit-user + .. button-ref:: pygambit :ref-type: ref :click-parent: :color: secondary diff --git a/doc/pygambit.external_programs.rst b/doc/pygambit.external_programs.rst new file mode 100644 index 000000000..8877ad241 --- /dev/null +++ b/doc/pygambit.external_programs.rst @@ -0,0 +1,31 @@ +Using external programs to compute Nash equilibria +================================================== + +Because the problem of finding Nash equilibria can be expressed in various +mathematical formulations (see [McKMcL96]_), it is helpful to make use +of other software packages designed specifically for solving those problems. + +There are currently two integrations offered for using external programs to solve +for equilibria: + +- :py:func:`.enummixed_solve` supports enumeration of equilibria in + two-player games via `lrslib`. [#lrslib]_ +- :py:func:`.enumpoly_solve` supports computation of totally-mixed equilibria + on supports in strategic games via `PHCpack`. [#phcpack]_ + +For both calls, using the external program requires passing the path to the +executable (via the `lrsnash_path` and `phcpack_path` arguments, respectively). + +The user must download and compile or install these programs on their own; these are +not packaged with Gambit. The solver calls do take care of producing the required +input files, and reading the output to convert into Gambit objects for further +processing. + + +.. [#lrslib] http://cgm.cs.mcgill.ca/~avis/C/lrs.html + +.. [#phcpack] https://homepages.math.uic.edu/~jan/PHCpack/phcpack.html + +.. [McKMcL96] McKelvey, Richard D. and McLennan, Andrew M. (1996) Computation of equilibria + in finite games. In Handbook of Computational Economics, Volume 1, + pages 87-142. diff --git a/doc/pygambit.rst b/doc/pygambit.rst index 556a80afb..39cc4f4c8 100644 --- a/doc/pygambit.rst +++ b/doc/pygambit.rst @@ -4,18 +4,70 @@ PyGambit ======== -Gambit provides a Python package, ``pygambit``, which is available on `PyPI -`_. +The Gambit Python package, ``pygambit``, is available on `PyPI `_ and can be installed with pip:: -Installation ------------- + pip install pygambit -To install the package, use the following command:: - pip install pygambit +For newcomers to Gambit, we recommend reading through the PyGambit tutorials, which demonstrate the API's key capabilities for analyzing and solving Game Theory games. +These tutorials are available to be run interactively as Jupyter notebooks, see :ref:`local_tutorials`. +All of the tutorials assume a basic knowledge of programming in Python. + +Tutorials **1-3** assume no prior knowledge of Game Theory or the PyGambit API and provide detailed explanations of the concepts and code. + +.. toctree:: + :maxdepth: 2 + + tutorials/01_quickstart + tutorials/02_extensive_form + tutorials/03_poker + +Tutorials **4-5** assume some familiarity with the PyGambit API and Game Theory terminology and concepts including: + +- Nash equilibria +- Pure and mixed strategies +- Simplex representations of available strategies +- Logit quantal response equilibrium (LQRE) correspondence + +.. toctree:: + :maxdepth: 2 + + tutorials/04_starting_points + tutorials/05_quantal_response + tutorials/06_gambit_with_openspiel + +You may also wish to read: + +.. toctree:: + :maxdepth: 2 + + tutorials/running_locally + pygambit.external_programs + +Algorithms for computing Nash equilibria +---------------------------------------- + +Interfaces to algorithms for computing Nash equilibria are provided in :py:mod:`pygambit.nash`. +The table below summarizes the available PyGambit functions and the corresponding Gambit CLI commands. + +========================================== ======================================== +CLI command PyGambit function +========================================== ======================================== +:ref:`gambit-enumpure ` :py:func:`pygambit.nash.enumpure_solve` +:ref:`gambit-enummixed ` :py:func:`pygambit.nash.enummixed_solve` +:ref:`gambit-lp ` :py:func:`pygambit.nash.lp_solve` +:ref:`gambit-lcp ` :py:func:`pygambit.nash.lcp_solve` +:ref:`gambit-liap ` :py:func:`pygambit.nash.liap_solve` +:ref:`gambit-logit ` :py:func:`pygambit.nash.logit_solve` +:ref:`gambit-simpdiv ` :py:func:`pygambit.nash.simpdiv_solve` +:ref:`gambit-ipa ` :py:func:`pygambit.nash.ipa_solve` +:ref:`gambit-gnm ` :py:func:`pygambit.nash.gnm_solve` +========================================== ======================================== + +API documentation +---------------- .. toctree:: :maxdepth: 2 - pygambit.user - pygambit.api + pygambit.api \ No newline at end of file diff --git a/doc/pygambit.user.rst b/doc/pygambit.user.rst deleted file mode 100644 index 293ff3565..000000000 --- a/doc/pygambit.user.rst +++ /dev/null @@ -1,872 +0,0 @@ -.. _pygambit-user: - -User guide ----------- - -Example: One-shot trust game with binary actions -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -[Kre90]_ introduced a game commonly referred to as the **trust game**. -We will build a one-shot version of this game using ``pygambit``'s game transformation -operations. - -There are two players, a **Buyer** and a **Seller**. -The Buyer moves first and has two actions, **Trust** or **Not trust**. -If the Buyer chooses **Not trust**, then the game ends, and both players -receive payoffs of 0. -If the Buyer chooses **Trust**, then the Seller has a choice with two actions, -**Honor** or **Abuse**. -If the Seller chooses **Honor**, both players receive payoffs of 1; -if the Seller chooses **Abuse**, the Buyer receives a payoff of -1 and the Seller -receives a payoff of 2. - -We create a game with an extensive representation using :py:meth:`.Game.new_tree`: - -.. ipython:: python - - import pygambit as gbt - g = gbt.Game.new_tree(players=["Buyer", "Seller"], - title="One-shot trust game, after Kreps (1990)") - - -The tree of the game contains just a root node, with no children: - -.. ipython:: python - - g.root - g.root.children - - -To extend a game from an existing terminal node, use :py:meth:`.Game.append_move`: - -.. ipython:: python - - g.append_move(g.root, "Buyer", ["Trust", "Not trust"]) - g.root.children - -We can then also add the Seller's move in the situation after the Buyer chooses Trust: - -.. ipython:: python - - g.append_move(g.root.children[0], "Seller", ["Honor", "Abuse"]) - -Now that we have the moves of the game defined, we add payoffs. Payoffs are associated with -an :py:class:`.Outcome`; each :py:class:`Outcome` has a vector of payoffs, one for each player, -and optionally an identifying text label. First we add the outcome associated with the -Seller proving themselves trustworthy: - -.. ipython:: python - - g.set_outcome(g.root.children[0].children[0], g.add_outcome([1, 1], label="Trustworthy")) - -Next, the outcome associated with the scenario where the Buyer trusts but the Seller does -not return the trust: - -.. ipython:: python - - g.set_outcome(g.root.children[0].children[1], g.add_outcome([-1, 2], label="Untrustworthy")) - -And, finally the outcome associated with the Buyer opting out of the interaction: - -.. ipython:: python - - g.set_outcome(g.root.children[1], g.add_outcome([0, 0], label="Opt-out")) - -Nodes without an outcome attached are assumed to have payoffs of zero for all players. -Therefore, adding the outcome to this latter terminal node is not strictly necessary in Gambit, -but it is useful to be explicit for readability. - -.. [Kre90] Kreps, D. (1990) "Corporate Culture and Economic Theory." - In J. Alt and K. Shepsle, eds., *Perspectives on Positive Political Economy*, - Cambridge University Press. - - -.. _pygambit.user.poker: - -Example: A one-card poker game with private information -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -To illustrate games in extensive form, [Mye91]_ presents a one-card poker game. -A version of this game also appears in [RUW08]_, as a classroom game under the -name "stripped-down poker". This is perhaps the simplest interesting game -with imperfect information. - -In our version of the game, there are two players, **Alice** and **Bob**. -There is a deck of cards, with equal numbers of **King** and **Queen** cards. -The game begins with each player putting $1 in the pot. -One card is dealt at random to Alice; Alice observes her card but Bob does not. -After Alice observes her card, she can choose either to **Raise** or to **Fold**. -If she chooses to Fold, Bob wins the pot and the game ends. -If she chooses to Raise, she adds another $1 to the pot. -Bob then chooses either to **Meet** or **Pass**. If he chooses to Pass, -Alice wins the pot and the game ends. -If he chooses to Meet, he adds another $1 to the pot. -There is then a showdown, in which Alice reveals her card. If she has a King, -then she wins the pot; if she has a Queen, then Bob wins the pot. - -We can build this game using the following script:: - - g = gbt.Game.new_tree(players=["Alice", "Bob"], - title="One card poker game, after Myerson (1991)") - g.append_move(g.root, g.players.chance, ["King", "Queen"]) - for node in g.root.children: - g.append_move(node, "Alice", ["Raise", "Fold"]) - g.append_move(g.root.children[0].children[0], "Bob", ["Meet", "Pass"]) - g.append_infoset(g.root.children[1].children[0], - g.root.children[0].children[0].infoset) - alice_winsbig = g.add_outcome([2, -2], label="Alice wins big") - alice_wins = g.add_outcome([1, -1], label="Alice wins") - bob_winsbig = g.add_outcome([-2, 2], label="Bob wins big") - bob_wins = g.add_outcome([-1, 1], label="Bob wins") - g.set_outcome(g.root.children[0].children[0].children[0], alice_winsbig) - g.set_outcome(g.root.children[0].children[0].children[1], alice_wins) - g.set_outcome(g.root.children[0].children[1], bob_wins) - g.set_outcome(g.root.children[1].children[0].children[0], bob_winsbig) - g.set_outcome(g.root.children[1].children[0].children[1], alice_wins) - g.set_outcome(g.root.children[1].children[1], bob_wins) - -All extensive games have a chance (or nature) player, accessible as -``.Game.players.chance``. Moves belonging to the chance player can be added in the same -way as to personal players. At any new move created for the chance player, the action -probabilities default to uniform randomization over the actions at the move. - -In this game, information structure is important. Alice knows her card, so the two nodes -at which she has the move are part of different information sets. The loop:: - - for node in g.root.children: - g.append_move(node, "Alice", ["Raise", "Fold"]) - -causes each of the newly-appended moves to be in new information sets. In contrast, Bob -does not know Alice's card, and therefore cannot distinguish between the two nodes at which -he has the decision. This is implemented in the following lines:: - - g.append_move(g.root.children[0].children[0], "Bob", ["Meet", "Pass"]) - g.append_infoset(g.root.children[1].children[0], - g.root.children[0].children[0].infoset) - -The call :py:meth:`.Game.append_infoset` adds a move at a terminal node as part of -an existing information set (represented in ``pygambit`` as an :py:class:`.Infoset`). - - -.. [Mye91] Myerson, Roger B. (1991) *Game Theory: Analysis of Conflict*. - Cambridge: Harvard University Press. - -.. [RUW08] Reiley, David H., Michael B. Urbancic and Mark Walker. (2008) - "Stripped-down poker: A classroom game with signaling and bluffing." - *The Journal of Economic Education* 39(4): 323-341. - - - -Building a strategic game -~~~~~~~~~~~~~~~~~~~~~~~~~ - -Games in strategic form, also referred to as normal form, are represented solely -by a collection of payoff tables, one per player. The most direct way to create -a strategic game is via :py:meth:`.Game.from_arrays`. This function takes one -n-dimensional array per player, where n is the number of players in the game. -The arrays can be any object that can be indexed like an n-times-nested Python list; -so, for example, `numpy` arrays can be used directly. - -For example, to create a standard prisoner's dilemma game in which the cooperative -payoff is 8, the betrayal payoff is 10, the sucker payoff is 2, and the noncooperative -payoff is 5: - -.. ipython:: python - - import numpy as np - m = np.array([[8, 2], [10, 5]]) - g = gbt.Game.from_arrays(m, np.transpose(m)) - g - -The arrays passed to :py:meth:`.Game.from_arrays` are all indexed in the same sense, that is, -the top level index is the choice of the first player, the second level index of the second player, -and so on. Therefore, to create a two-player symmetric game, as in this example, the payoff matrix -for the second player is transposed before passing to :py:meth:`.Game.from_arrays`. - -There is a reverse function :py:meth:`.Game.to_arrays` that produces -the players' payoff tables given a strategic game. The output is the list of ``numpy`` arrays, -where the number of produced arrays is equal to the number of players. - -.. ipython:: python - - m, m_transposed = g.to_arrays() - m - -The optional parameter `dtype`` controls the data type of the payoffs in the generated arrays. - -.. ipython:: python - - m, m_transposed = g.to_arrays(dtype=float) - m - -The function supports any type which can convert from Python's `fractions.Fraction` type. -For example, to convert the payoffs to their string representations via `str`: - -.. ipython:: python - - m, m_transposed = g.to_arrays(dtype=str) - m - -.. _pygambit.user.numbers: - -Representation of numerical data of a game -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Payoffs to players and probabilities of actions at chance information sets are specified -as numbers. Gambit represents the numerical values in a game in exact precision, -using either decimal or rational representations. - -To illustrate, we consider a trivial game which just has one move for the chance player: - -.. ipython:: python - - import pygambit as gbt - g = gbt.Game.new_tree() - g.append_move(g.root, g.players.chance, ["a", "b", "c"]) - [act.prob for act in g.root.infoset.actions] - -The default when creating a new move for chance is that all actions are chosen with -equal probability. These probabilities are represented as rational numbers, -using ``pygambit``'s :py:class:`.Rational` class, which is derived from Python's -`fractions.Fraction`. Numerical data can be set as rational numbers: - -.. ipython:: python - - g.set_chance_probs(g.root.infoset, - [gbt.Rational(1, 4), gbt.Rational(1, 2), gbt.Rational(1, 4)]) - [act.prob for act in g.root.infoset.actions] - -They can also be explicitly specified as decimal numbers: - -.. ipython:: python - - g.set_chance_probs(g.root.infoset, - [gbt.Decimal(".25"), gbt.Decimal(".50"), gbt.Decimal(".25")]) - [act.prob for act in g.root.infoset.actions] - -Although the two representations above are mathematically equivalent, ``pygambit`` -remembers the format in which the values were specified. - -Expressing rational or decimal numbers as above is verbose and tedious. -``pygambit`` offers a more concise way to express numerical data in games: -when setting numerical game data, ``pygambit`` will attempt to convert text strings to -their rational or decimal representation. The above can therefore be written -more compactly using string representations: - -.. ipython:: python - - g.set_chance_probs(g.root.infoset, ["1/4", "1/2", "1/4"]) - [act.prob for act in g.root.infoset.actions] - - g.set_chance_probs(g.root.infoset, [".25", ".50", ".25"]) - [act.prob for act in g.root.infoset.actions] - -As a further convenience, ``pygambit`` will accept Python ``int`` and ``float`` values. -``int`` values are always interpreted as :py:class:`.Rational` values. -``pygambit`` attempts to render `float` values in an appropriate :py:class:`.Decimal` -equivalent. In the majority of cases, this creates no problems. -For example, - -.. ipython:: python - - g.set_chance_probs(g.root.infoset, [.25, .50, .25]) - [act.prob for act in g.root.infoset.actions] - -However, rounding can cause difficulties when attempting to use `float` values to -represent values which do not have an exact decimal representation - -.. ipython:: python - :okexcept: - - g.set_chance_probs(g.root.infoset, [1/3, 1/3, 1/3]) - -This behavior can be slightly surprising, especially in light of the fact that -in Python, - -.. ipython:: python - - 1/3 + 1/3 + 1/3 - -In checking whether these probabilities sum to one, ``pygambit`` first converts each -of the probabilitiesto a :py:class:`.Decimal` representation, via the following method - -.. ipython:: python - - gbt.Decimal(str(1/3)) - -and the sum-to-one check then fails because - -.. ipython:: python - - gbt.Decimal(str(1/3)) + gbt.Decimal(str(1/3)) + gbt.Decimal(str(1/3)) - -Setting payoffs for players also follows the same rules. Representing probabilities -and payoffs exactly is essential, because ``pygambit`` offers (in particular for two-player -games) the possibility of computation of equilibria exactly, because the Nash equilibria -of any two-player game with rational payoffs and chance probabilities can be expressed exactly -in terms of rational numbers. - -It is therefore advisable always to specify the numerical data of games either in terms -of :py:class:`.Decimal` or :py:class:`.Rational` values, or their string equivalents. -It is safe to use `int` values, but `float` values should be used with some care to ensure -the values are recorded as intended. - - -Reading a game from a file -~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Games stored in existing Gambit savefiles can be loaded using :meth:`.read_efg` or :meth:`.read_nfg`: - -.. ipython:: python - :suppress: - - cd ../contrib/games - - -.. ipython:: python - - g = gbt.read_nfg("e02.nfg") - g - -.. ipython:: python - :suppress: - - cd ../../doc - - -Lifetime of a game object and its elements -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -A game is only deallocated when all variables referring to the game either directly -or indirectly have gone out of scope. Indirect references to games include objects such -as :py:class:`~pygambit.gambit.MixedStrategyProfile` or :py:class:`~pygambit.gambit.MixedBehaviorProfile`, -or variables referring to individual elements of a game. - -So for example, the following sequence of operations is valid: - -.. ipython:: python - :suppress: - - cd ../contrib/games - - -.. ipython:: python - - g = gbt.read_efg("e02.efg") - p = g.players[0] - print(p) - g = gbt.read_efg("poker.efg") - print(p) - print(g) - -.. ipython:: python - :suppress: - - cd ../../doc - -The variable `p` refers to a player in the game read from ``e02.efg``. -So, when ``poker.efg`` is read and assigned to the variable `g`, the game from -``e02.efg`` is still referred to indirectly via `p`. The game object from the -first game can therefore still be obtained from the object referring to the -player: - -.. ipython:: python - - print(p.game) - - -Computing Nash equilibria -~~~~~~~~~~~~~~~~~~~~~~~~~ - -Interfaces to algorithms for computing Nash equilibria are provided in :py:mod:`pygambit.nash`. - -========================================== ======================================== -Method Python function -========================================== ======================================== -:ref:`gambit-enumpure ` :py:func:`pygambit.nash.enumpure_solve` -:ref:`gambit-enummixed ` :py:func:`pygambit.nash.enummixed_solve` -:ref:`gambit-lp ` :py:func:`pygambit.nash.lp_solve` -:ref:`gambit-lcp ` :py:func:`pygambit.nash.lcp_solve` -:ref:`gambit-liap ` :py:func:`pygambit.nash.liap_solve` -:ref:`gambit-logit ` :py:func:`pygambit.nash.logit_solve` -:ref:`gambit-simpdiv ` :py:func:`pygambit.nash.simpdiv_solve` -:ref:`gambit-ipa ` :py:func:`pygambit.nash.ipa_solve` -:ref:`gambit-gnm ` :py:func:`pygambit.nash.gnm_solve` -========================================== ======================================== - -We take as an example the :ref:`one-card poker game `. This is a two-player, -constant sum game, and so all of the equilibrium-finding methods can be applied to it. - -For two-player games, :py:func:`.lcp_solve` can compute Nash equilibria directly using -the extensive representation. Assuming that ``g`` refers to the game - -.. ipython:: python - :suppress: - - g = gbt.read_efg("poker.efg") - -.. ipython:: python - - result = gbt.nash.lcp_solve(g) - result - len(result.equilibria) - -The result of the calculation is returned as a :py:class:`.NashComputationResult` object. -The set of equilibria found is reported in :py:attr:`.NashComputationResult.equilibria`; -in this case, this is a list of mixed behavior profiles. -A mixed behavior profile specifies, for each information set, the probability distribution over -actions at that information set. -Indexing a :py:class:`.MixedBehaviorProfile` by a player gives a :py:class:`.MixedBehavior`, -which specifies probability distributions at each of the player's information sets: - -.. ipython:: python - - eqm = result.equilibria[0] - eqm["Alice"] - -In this case, at Alice's first information set, the one at which she has the King, she always raises. -At her second information set, where she has the Queen, she sometimes bluffs, raising with -probability one-third. -The probability distribution at an information set is represented by a :py:class:`.MixedAction`. -:py:meth:`.MixedBehavior.mixed_actions` iterates over these for the player: - -.. ipython:: python - - for infoset, mixed_action in eqm["Alice"].mixed_actions(): - print(infoset) - print(mixed_action) - -So we could extract Alice's probabilities of raising at her respective information sets -like this: - -.. ipython:: python - - {infoset: mixed_action["Raise"] for infoset, mixed_action in eqm["Alice"].mixed_actions()} - -In larger games, labels may not always be the most convenient way to refer to specific -actions. We can also index profiles directly with :py:class:`.Action` objects. -So an alternative way to extract the probabilities of playing "Raise" would be by -iterating Alice's list of actions: - -.. ipython:: python - - {action.infoset: eqm[action] for action in g.players["Alice"].actions if action.label == "Raise"} - - -Looking at Bob's strategy, - -.. ipython:: python - - eqm["Bob"] - -Bob meets Alice's raise two-thirds of the time. The label "Raise" is used in more than one -information set for Alice, so in the above we had to specify information sets when indexing. -When there is no ambiguity, we can specify action labels directly. So for example, because -Bob has only one action named "Meet" in the game, we can extract the probability that Bob plays -"Meet" by: - -.. ipython:: python - - eqm["Bob"]["Meet"] - -Moreover, this is the only action with that label in the game, so we can index the -profile directly using the action label without any ambiguity: - -.. ipython:: python - - eqm["Meet"] - -Because this is an equilibrium, the fact that Bob randomizes at his information set must mean he -is indifferent between the two actions at his information set. :py:meth:`.MixedBehaviorProfile.action_value` -returns the expected payoff of taking an action, conditional on reaching that action's information set: - -.. ipython:: python - - {action: eqm.action_value(action) for action in g.players["Bob"].infosets[0].actions} - -Bob's indifference between his actions arises because of his beliefs given Alice's strategy. -:py:meth:`.MixedBehaviorProfile.belief` returns the probability of reaching a node, conditional on -its information set being reached: - -.. ipython:: python - - {node: eqm.belief(node) for node in g.players["Bob"].infosets[0].members} - -Bob believes that, conditional on Alice raising, there's a 75% chance that she has the king; -therefore, the expected payoff to meeting is in fact -1 as computed. -:py:meth:`.MixedBehaviorProfile.infoset_prob` returns the probability that an information set is -reached: - -.. ipython:: python - - eqm.infoset_prob(g.players["Bob"].infosets[0]) - -The corresponding probability that a node is reached in the play of the game is given -by :py:meth:`.MixedBehaviorProfile.realiz_prob`, and the expected payoff to a player -conditional on reaching a node is given by :py:meth:`.MixedBehaviorProfile.node_value`. - -.. ipython:: python - - {node: eqm.node_value("Bob", node) for node in g.players["Bob"].infosets[0].members} - -The overall expected payoff to a player given the behavior profile is returned by -:py:meth:`.MixedBehaviorProfile.payoff`: - -.. ipython:: python - - eqm.payoff("Alice") - eqm.payoff("Bob") - -The equilibrium computed expresses probabilities in rational numbers. Because -the numerical data of games in Gambit :ref:`are represented exactly `, -methods which are specialized to two-player games, :py:func:`.lp_solve`, :py:func:`.lcp_solve`, -and :py:func:`.enummixed_solve`, can report exact probabilities for equilibrium strategy -profiles. This is enabled by default for these methods. - -When a game has an extensive representation, equilibrium finding methods default to computing -on that representation. It is also possible to compute using the strategic representation. -``pygambit`` transparently computes the reduced strategic form representation of an extensive game - -.. ipython:: python - - [s.label for s in g.players["Alice"].strategies] - -In the strategic form of this game, Alice has four strategies. The generated strategy labels -list the action numbers taken at each information set. We can therefore apply a method which -operates on a strategic game to any game with an extensive representation - -.. ipython:: python - - result = gbt.nash.gnm_solve(g) - result - -:py:func:`.gnm_solve` can be applied to any game with any number of players, and uses a path-following -process in floating-point arithmetic, so it returns profiles with probabilities expressed as -floating-point numbers. This method operates on the strategic representation of the game, so -the returned results are of type :py:class:`~pygambit.gambit.MixedStrategyProfile`, and -specify, for each player, a probability distribution over that player's strategies. -Indexing a :py:class:`.MixedStrategyProfile` by a player gives the probability distribution -over that player's strategies only. - -.. ipython:: python - - eqm = result.equilibria[0] - eqm["Alice"] - eqm["Bob"] - -The expected payoff to a strategy is provided by :py:meth:`.MixedStrategyProfile.strategy_value`: - -.. ipython:: python - - {strategy: eqm.strategy_value(strategy) for strategy in g.players["Alice"].strategies} - {strategy: eqm.strategy_value(strategy) for strategy in g.players["Bob"].strategies} - -The overall expected payoff to a player is returned by :py:meth:`.MixedStrategyProfile.payoff`: - -.. ipython:: python - - eqm.payoff("Alice") - eqm.payoff("Bob") - -When a game has an extensive representation, we can convert freely between -:py:class:`~pygambit.gambit.MixedStrategyProfile` and the corresponding -:py:class:`~pygambit.gambit.MixedBehaviorProfile` representation of the same strategies -using :py:meth:`.MixedStrategyProfile.as_behavior` and :py:meth:`.MixedBehaviorProfile.as_strategy`. - -.. ipython:: python - - eqm.as_behavior() - eqm.as_behavior().as_strategy() - - -.. _pygambit-nash-maxregret: - -Acceptance criteria for Nash equilibria -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Some methods for computing Nash equilibria operate using floating-point arithmetic and/or -generate candidate equilibrium profiles using methods which involve some form of successive -approximations. The outputs of these methods therefore are in general -:math:`\varepsilon`-equilibria, for some positive :math:`\varepsilon`. - -To provide a uniform interface across methods, where relevant Gambit provides a parameter -`maxregret`, which specifies the acceptance criterion for labeling the output of the -algorithm as an equilibrium. -This parameter is interpreted *proportionally* to the range of payoffs in the game. -Any profile returned as an equilibrium is guaranteed to be an -:math:`\varepsilon`-equilibrium, for :math:`\varepsilon` no more than `maxregret` -times the difference of the game's maximum and minimum payoffs. - -As an example, consider solving the standard one-card poker game using -:py:func:`.logit_solve`. The range of the payoffs in this game is 4 (from +2 to -2). - -.. ipython:: python - - g = gbt.read_efg("poker.efg") - g.max_payoff, g.min_payoff - -:py:func:`.logit_solve` is a globally-convergent method, in that it computes a -sequence of profiles which is guaranteed to have a subsequence that converges to a -Nash equilibrium. The default value of `maxregret` for this method is set at -:math:`10^{-8}`: - -.. ipython:: python - - result = gbt.nash.logit_solve(g, maxregret=1e-8) - result.equilibria - result.equilibria[0].max_regret() - -The value of :py:meth:`.MixedBehaviorProfile.max_regret` of the computed profile exceeds -:math:`10^{-8}` measured in payoffs of the game. However, when considered relative -to the scale of the game's payoffs, we see it is less than :math:`10^{-8}` of -the payoff range, as requested: - -.. ipython:: python - - result.equilibria[0].max_regret() / (g.max_payoff - g.min_payoff) - - -In general, for globally-convergent methods especially, there is a tradeoff between -precision and running time. Some methods may be slow to converge on some games, and -it may be useful instead to get a more coarse approximation to an equilibrium. -We could instead ask only for an :math:`\varepsilon`-equilibrium with a -(scaled) :math:`\varepsilon` of no more than :math:`10^{-4}`: - -.. ipython:: python - - result = gbt.nash.logit_solve(g, maxregret=1e-4) - result.equilibria[0] - result.equilibria[0].max_regret() - result.equilibria[0].max_regret() / (g.max_payoff - g.min_payoff) - -The convention of expressing `maxregret` scaled by the game's payoffs standardises the -behavior of methods across games. For example, consider solving the poker game instead -using :py:meth:`.liap_solve`. - -.. ipython:: python - - result = gbt.nash.liap_solve(g.mixed_behavior_profile(), maxregret=1.0e-4) - result.equilibria[0] - result.equilibria[0].max_regret() - result.equilibria[0].max_regret() / (g.max_payoff - g.min_payoff) - -If, instead, we double all payoffs, the output of the method is unchanged. - -.. ipython:: python - - for outcome in g.outcomes: - outcome["Alice"] = outcome["Alice"] * 2 - outcome["Bob"] = outcome["Bob"] * 2 - - result = gbt.nash.liap_solve(g.mixed_behavior_profile(), maxregret=1.0e-4) - result.equilibria[0] - result.equilibria[0].max_regret() - result.equilibria[0].max_regret() / (g.max_payoff - g.min_payoff) - - -Generating starting points for algorithms -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Some methods for computation of Nash equilibria take as an initial condition a -:py:class:`.MixedStrategyProfile` or :py:class:`MixedBehaviorProfile` which is used -as a starting point. The equilibria found will depend on which starting point is -selected. To facilitate generating starting points, :py:class:`.Game` provides -methods :py:meth:`.Game.random_strategy_profile` and :py:meth:`.Game.random_behavior_profile`, -to generate profiles which are drawn from the uniform distribution on the product -of simplices. - -As an example, we consider a three-player game from McKelvey and McLennan (1997), -in which each player has two strategies. This game has nine equilibria in total, and -in particular has two totally mixed Nash equilibria, which is the maximum possible number -of regular totally mixed equilbria in games of this size. - -We first consider finding Nash equilibria in this game using :py:func:`.liap_solve`. -If we run this method starting from the centroid (uniform randomization across all -strategies for each player), :py:func:`.liap_solve` finds one of the totally-mixed equilibria. - -.. ipython:: python - - g = gbt.read_nfg("2x2x2.nfg") - gbt.nash.liap_solve(g.mixed_strategy_profile()) - -Which equilibrium is found depends on the starting point. With a different starting point, -we can find, for example, one of the pure-strategy equilibria. - -.. ipython:: python - - gbt.nash.liap_solve(g.mixed_strategy_profile([[.9, .1], [.9, .1], [.9, .1]])) - -To search for more equilibria, we can instead generate strategy profiles at random. - -.. ipython:: python - - gbt.nash.liap_solve(g.random_strategy_profile()) - -Note that methods which take starting points do record the starting points used in the -result object returned. However, the random profiles which are generated will differ -in different runs of a program. To support making the generation of random strategy -profiles reproducible, and for finer-grained control of the generation of these profiles -if desired, :py:meth:`.Game.random_strategy_profile` and :py:meth:`.Game.random_behavior_profile` -optionally take a :py:class:`numpy.random.Generator` object, which is used as the source -of randomness for creating the profile. - -.. ipython:: python - - import numpy as np - gen = np.random.default_rng(seed=1234567890) - p1 = g.random_strategy_profile(gen=gen) - p1 - gen = np.random.default_rng(seed=1234567890) - p2 = g.random_strategy_profile(gen=gen) - p2 - p1 == p2 - -When creating profiles in which probabilities are represented as floating-point numbers, -:py:meth:`.Game.random_strategy_profile` and :py:meth:`.Game.random_behavior_profile` -internally use the Dirichlet distribution for each simplex to generate correctly uniform -sampling over probabilities. However, in some applications generation of random profiles -with probabilities as rational numbers is desired. For example, :py:func:`.simpdiv_solve` -takes such a starting point, because it operates by successively refining a triangulation -over the space of mixed strategy profiles. -:py:meth:`.Game.random_strategy_profile` and :py:meth:`.Game.random_behavior_profile` -both take an optional parameter `denom` which, if specified, generates a profile in which -probabilities are generated uniformly from the grid in each simplex in which all probabilities -have denominator `denom`. - -.. ipython:: python - - gen = np.random.default_rng(seed=1234567890) - g.random_strategy_profile(denom=10, gen=gen) - g.random_strategy_profile(denom=10, gen=gen) - -These can then be used in conjunction with :py:func:`.simpdiv_solve` to search for equilibria -from different starting points. - -.. ipython:: python - - gbt.nash.simpdiv_solve(g.random_strategy_profile(denom=10, gen=gen)) - gbt.nash.simpdiv_solve(g.random_strategy_profile(denom=10, gen=gen)) - gbt.nash.simpdiv_solve(g.random_strategy_profile(denom=10, gen=gen)) - - -Quantal response equilibrium -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Gambit implements the idea of [McKPal95]_ and [McKPal98]_ to compute Nash equilibria -via path-following a branch of the logit quantal response equilibrium (LQRE) correspondence -using the function :py:func:`.logit_solve`. As an example, we will consider an -asymmetric matching pennies game from [Och95]_ as analyzed in [McKPal95]_. - -.. ipython:: python - - g = gbt.Game.from_arrays( - [[1.1141, 0], [0, 0.2785]], - [[0, 1.1141], [1.1141, 0]], - title="Ochs (1995) asymmetric matching pennies as transformed in McKelvey-Palfrey (1995)" - ) - gbt.nash.logit_solve(g) - - -:py:func:`.logit_solve` returns only the limiting (approximate) Nash equilibrium found. -Profiles along the QRE correspondence are frequently of interest in their own right. -Gambit offers several functions for more detailed examination of branches of the -QRE correspondence. - -The function :py:func:`.logit_solve_branch` uses the same procedure as :py:func:`.logit_solve`, -but returns a list of LQRE profiles computed along the branch instead of just the limiting -approximate Nash equilibrium. - -.. ipython:: python - - qres = gbt.qre.logit_solve_branch(g) - len(qres) - qres[0] - qres[5] - -:py:func:`.logit_solve_branch` uses an adaptive step size heuristic to find points on -the branch. The parameters `first_step` and `max_accel` are used to adjust the initial -step size and the maximum rate at which the step size changes adaptively. The step size -used is computed as the distance traveled along the path, and, importantly, not the -distance as measured by changes in the precision parameter lambda. As a result the -lambda values for which profiles are computed cannot be controlled in advance. -In some situations, the LQRE profiles at specified values of lambda are of interest. -For this, Gambit provides :py:func:`.logit_solve_lambda`. This function provides -accurate values of strategy profiles at one or more specified values of lambda. - -.. ipython:: python - - qres = gbt.qre.logit_solve_lambda(g, lam=[1, 2, 3]) - qres[0] - qres[1] - qres[2] - - -LQRE are frequently taken to data by using maximum likelihood estimation to find the -LQRE profile that best fits an observed profile of play. This is provided by -the function :py:func:`.logit_estimate`. We replicate the analysis of a block -of the data from [Och95]_ for which [McKPal95]_ estimated an LQRE. - -.. ipython:: python - - data = g.mixed_strategy_profile([[128*0.527, 128*(1-0.527)], [128*0.366, 128*(1-0.366)]]) - fit = gbt.qre.logit_estimate(data) - -The returned :py:class:`.LogitQREMixedStrategyFitResult` object contains the results of the -estimation. -The results replicate those reported in [McKPal95]_, including the estimated value of lambda, -the QRE profile probabilities, and the log-likelihood. -Because `data` contains the empirical counts of play, and not just frequencies, the resulting -log-likelihood is correct for use in likelihoood-ratio tests. [#f1]_ - -.. ipython:: python - - print(fit.lam) - print(fit.profile) - print(fit.log_like) - -All of the functions above also support working with the agent LQRE of [McKPal98]_. -Agent QRE are computed as the default behavior whenever the game has a extensive (tree) -representation. For :py:func:`.logit_solve`, :py:func:`.logit_solve_branch`, and -:py:func:`.logit_solve_lambda`, this can be overriden by passing `use_strategic=True`; -this will compute LQRE using the reduced strategy set of the game instead. -Likewise, :py:func:`.logit_estimate` will perform estimation using agent LQRE if the -data passed are a :py:class:`.MixedBehaviorProfile`, and will return a -:py:class:`.LogitQREMixedBehaviorFitResult` object. - -.. rubric:: Footnotes - -.. [#f1] The log-likelihoods quoted in [McKPal95]_ are exactly a factor of 10 larger than - those obtained by replicating the calculation. - - -Using external programs to compute Nash equilbria -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Because the problem of finding Nash equilibria can be expressed in various -mathematical formulations (see [McKMcL96]_), it is helpful to make use -of other software packages designed specifically for solving those problems. - -There are currently two integrations offered for using external programs to solve -for equilibria: - -- :py:func:`.enummixed_solve` supports enumeration of equilibria in - two-player games via `lrslib`. [#lrslib]_ -- :py:func:`.enumpoly_solve` supports computation of totally-mixed equilibria - on supports in strategic games via `PHCpack`. [#phcpack]_ - -For both calls, using the external program requires passing the path to the -executable (via the `lrsnash_path` and `phcpack_path` arguments, respectively). - -The user must download and compile or install these programs on their own; these are -not packaged with Gambit. The solver calls do take care of producing the required -input files, and reading the output to convert into Gambit objects for further -processing. - - -.. [#lrslib] http://cgm.cs.mcgill.ca/~avis/C/lrs.html - -.. [#phcpack] https://homepages.math.uic.edu/~jan/PHCpack/phcpack.html - -.. [McKMcL96] McKelvey, Richard D. and McLennan, Andrew M. (1996) Computation of equilibria - in finite games. In Handbook of Computational Economics, Volume 1, - pages 87-142. diff --git a/doc/requirements.txt b/doc/requirements.txt index 5909d77ee..61d16db80 100644 --- a/doc/requirements.txt +++ b/doc/requirements.txt @@ -4,7 +4,9 @@ scipy==1.16.1 pydata-sphinx-theme==0.16.1 sphinx_design==0.6.1 sphinx-autobuild==2024.10.3 +nbsphinx==0.9.7 ipython==9.4.0 matplotlib==3.10.5 pickleshare==0.7.5 -jupyter==1.1.1 \ No newline at end of file +jupyter==1.1.1 +open_spiel==1.6.1 \ No newline at end of file diff --git a/doc/tutorials/01_quickstart.ipynb b/doc/tutorials/01_quickstart.ipynb new file mode 100644 index 000000000..3aa751cc8 --- /dev/null +++ b/doc/tutorials/01_quickstart.ipynb @@ -0,0 +1,487 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "88c376d0", + "metadata": {}, + "source": [ + "# 1) Getting started with Gambit\n", + "\n", + "In this tutorial, we'll demo the basic features of the Gambit library for game theory, using the `PyGambit` Python package.\n", + "\n", + "This includes creating a `Game` object and using it to set up a strategic (normal) form game, the Prisoner's Dilemma, one of the most famous games in game theory.\n", + "\n", + "We'll then use Gambit's built-in functions to analyze the game and find its Nash equilibria.\n", + "\n", + "**The Prisoner's Dilemma**\n", + "\n", + "The Prisoner's Dilemma is a classic example in game theory that illustrates why two rational individuals who cannot communicate might not cooperate, even if it appears that it is in their best interest to do so. After being caught by the police for committing a crime, the two prisoners are separately offered a deal:\n", + "\n", + "- If both stay silent (cooperate), they get light sentences.\n", + "- If one defects (betrays the other) while the other stays silent, the defector goes free and the silent one gets a heavy sentence.\n", + "- If both defect, they both get moderate sentences." + ] + }, + { + "cell_type": "markdown", + "id": "b563d13d", + "metadata": {}, + "source": [ + "## Creating a strategic form game\n", + "\n", + "Let's start by importing PyGambit and creating a game object.\n", + "Since Prisoner's Dilemma is a strategic form game, it can be created in a tabular fashion with `Game.new_table`.\n", + "\n", + "To do this, we need to know the number of players, which in Prisoner's Dilemma is 2, and the number of strategies for each player, which is in both cases is 2 (Cooperate and Defect).\n", + "We'll define a list as long as the number of players, specifying the number of strategies for each player to pass into the `Game.new_table` function." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "2060c1ed", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "pygambit.gambit.Game" + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import pygambit as gbt\n", + "\n", + "n_strategies = [2, 2]\n", + "g = gbt.Game.new_table(n_strategies, title=\"Prisoner's Dilemma\")\n", + "type(g)" + ] + }, + { + "cell_type": "markdown", + "id": "903376dc", + "metadata": {}, + "source": [ + "Now let's name the players and each of their possible strategies, in both cases \"Cooperate\" and \"Defect\".\n", + "\n", + "Note: it's not necessary to specify labels for players and strategies when defining a game, however doing so makes the game easier to understand and work with." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "9d8203e8", + "metadata": {}, + "outputs": [], + "source": [ + "g.players[0].label = \"Tom\"\n", + "g.players[0].strategies[0].label = \"Cooperate\"\n", + "g.players[0].strategies[1].label = \"Defect\"\n", + "\n", + "g.players[1].label = \"Jerry\"\n", + "g.players[1].strategies[0].label = \"Cooperate\"\n", + "g.players[1].strategies[1].label = \"Defect\"" + ] + }, + { + "cell_type": "markdown", + "id": "60bfe828", + "metadata": {}, + "source": [ + "Now let's assign payoffs for each of the game's possible outcomes, based on the standard payoffs for the Prisoner's Dilemma:\n", + "- Both players cooperate and receive the lightest sentence: `(-1, -1)`\n", + "- Tom cooperates, but Jerry defects (betrays Tom): `(0, -3)`\n", + "- Tom defects, Jerry cooperates: `(-3, 0)`\n", + "- Both defect: `(-2, -2)`" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "61030607", + "metadata": {}, + "outputs": [], + "source": [ + "# Both cooperate\n", + "g[\"Cooperate\", \"Cooperate\"][\"Tom\"] = -1\n", + "g[\"Cooperate\", \"Cooperate\"][\"Jerry\"] = -1\n", + "\n", + "# Tom cooperates, Jerry defects\n", + "g[\"Cooperate\", \"Defect\"][\"Tom\"] = -3\n", + "g[\"Cooperate\", \"Defect\"][\"Jerry\"] = 0\n", + "\n", + "# Tom defects, Jerry cooperates\n", + "g[\"Defect\", \"Cooperate\"][\"Tom\"] = 0\n", + "g[\"Defect\", \"Cooperate\"][\"Jerry\"] = -3\n", + "\n", + "# Both defect\n", + "g[\"Defect\", \"Defect\"][\"Tom\"] = -2\n", + "g[\"Defect\", \"Defect\"][\"Jerry\"] = -2" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "caecc334", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "

Prisoner's Dilemma

\n", + "
CooperateDefect
Cooperate-1,-1-3,0
Defect0,-3-2,-2
\n" + ], + "text/plain": [ + "Game(title='Prisoner's Dilemma')" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# View the payout matrix\n", + "g" + ] + }, + { + "cell_type": "markdown", + "id": "659fc2c5", + "metadata": {}, + "source": [ + "The payout matrix structure shows what in Game Theory is described as the \"strategic form\" (also \"normal form\") representation of a game.\n", + "\n", + "The matrix presents the players' strategies and their expected payoff following their played strategies.\n", + "\n", + "The strategic form assumes players choose their strategies simultaneously, and the outcome depends on the combination." + ] + }, + { + "cell_type": "markdown", + "id": "5e9fe410", + "metadata": {}, + "source": [ + "## Creating games from arrays\n", + "\n", + "The most direct way to create a strategic form game is via `Game.from_arrays()`.\n", + "\n", + "This function takes one n-dimensional array per player, where n is the number of players in the game.\n", + "\n", + "The arrays can be any object that can be indexed like an n-times-nested Python list; so, for example, numpy arrays can be used directly.\n", + "\n", + "To create a two-player symmetric game, we can simply transpose the payoff matrix for the second player before passing to `Game.from_arrays()`." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "843ba7f3", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "

Another Prisoner's Dilemma

\n", + "
12
1-1,-1-3,0
20,-3-2,-2
\n" + ], + "text/plain": [ + "Game(title='Another Prisoner's Dilemma')" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import numpy as np\n", + "player1_payoffs = np.array([[-1, -3], [0, -2]])\n", + "player2_payoffs = np.transpose(player1_payoffs)\n", + "\n", + "g1 = gbt.Game.from_arrays(\n", + " player1_payoffs,\n", + " player2_payoffs,\n", + " title=\"Another Prisoner's Dilemma\"\n", + ")\n", + "\n", + "g1" + ] + }, + { + "cell_type": "markdown", + "id": "696d83cb", + "metadata": {}, + "source": [ + "You can retrieve the players’ payoff tables from a game object using the `Game.to_arrays()` method, which produces a list of numpy arrays representing the payoffs for each player.\n", + "\n", + "The optional parameter `dtype` controls the data type of the payoffs in the generated arrays." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "5ee752c4", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "-1\n", + "\n" + ] + } + ], + "source": [ + "tom_payoffs, jerry_payoffs = g.to_arrays(\n", + " # dtype=float\n", + ")\n", + "print(tom_payoffs[0][0])\n", + "print(type(tom_payoffs[0][0]))" + ] + }, + { + "cell_type": "markdown", + "id": "f2e6645e", + "metadata": {}, + "source": [ + "Computing the Nash equilibria in one line of code\n", + "-----------------------------\n", + "\n", + "We can use Gambit to compute the Nash equilibria for our Prisoner's Dilemma game in a single line of code; a Nash equilibrium tells us the strategies that players can adopt to maximize their payoffs, given the setup of the game.\n", + "\n", + "For a two-player normal form game, let's use `enumpure_solve` to search for a pure-strategy Nash equilibria.\n", + "The returned object will be a `NashComputationResult`." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "a81c06c7", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "pygambit.nash.NashComputationResult" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "result = gbt.nash.enumpure_solve(g)\n", + "type(result)" + ] + }, + { + "cell_type": "markdown", + "id": "7d8076f8", + "metadata": {}, + "source": [ + "Let's inspect our result further to see how many equilibria were found." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "bd395180", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(result.equilibria)" + ] + }, + { + "cell_type": "markdown", + "id": "5fb009be", + "metadata": {}, + "source": [ + "For a given equilibria, we can then look at the \"mixed strategy profile\", which maps each strategy in a game to the corresponding probability with which that strategy is played." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "76570ebc", + "metadata": {}, + "outputs": [ + { + "data": { + "text/latex": [ + "$\\left[\\left[0,1\\right],\\left[0,1\\right]\\right]$" + ], + "text/plain": [ + "[[Rational(0, 1), Rational(1, 1)], [Rational(0, 1), Rational(1, 1)]]" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "msp = result.equilibria[0]\n", + "msp" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "6e8cfcde", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "pygambit.gambit.MixedStrategyProfileRational" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type(msp)" + ] + }, + { + "cell_type": "markdown", + "id": "f937e1ab", + "metadata": {}, + "source": [ + "The mixed strategy profile can show us the expected payoffs for each player when playing the strategies as specified by an equilibrium.\n", + "\n", + "The profile `[[0,1],[0,1]]` indicates that both players' strategy is to play \"Cooperate\" with probability 0 and \"Defect\" with probability 1:" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "980bf6b1", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Tom plays the equilibrium strategy:\n", + "Probability of cooperating: 0\n", + "Probability of defecting: 1\n", + "Payoff: -2\n", + "\n", + "Jerry plays the equilibrium strategy:\n", + "Probability of cooperating: 0\n", + "Probability of defecting: 1\n", + "Payoff: -2\n", + "\n" + ] + } + ], + "source": [ + "for player in g.players:\n", + " print(f\"{player.label} plays the equilibrium strategy:\")\n", + " print(f\"Probability of cooperating: {msp[player.label]['Cooperate']}\")\n", + " print(f\"Probability of defecting: {msp[player.label]['Defect']}\")\n", + " print(f\"Payoff: {msp.payoff(player.label)}\")\n", + " print()" + ] + }, + { + "cell_type": "markdown", + "id": "24f36b0d", + "metadata": {}, + "source": [ + "The equilibrium shows that both players are playing their dominant strategy, which is to defect. This is because defecting is the best response to the other player's strategy, regardless of what that strategy is.\n", + "\n", + "Saving and reading strategic form games to and from file\n", + "--------------------\n", + "\n", + "You can use Gambit to save games to, and read from files.\n", + "The specific format depends on whether the game is normal or extensive form.\n", + "\n", + "Here we'll save the Prisoner's Dilemma (normal form) to the `.nfg` format." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "f58eaa77", + "metadata": {}, + "outputs": [], + "source": [ + "g.to_nfg(\"games/prisoners_dilemma.nfg\")" + ] + }, + { + "cell_type": "markdown", + "id": "e373be1e", + "metadata": {}, + "source": [ + "You can easily restore the game object from file like so:" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "4119a2ac", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "pygambit.gambit.Game" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "restored_game = gbt.read_nfg(\"games/prisoners_dilemma.nfg\")\n", + "type(restored_game)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "gambitvenv313", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.5" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/tutorials/02_extensive_form.ipynb b/doc/tutorials/02_extensive_form.ipynb new file mode 100644 index 000000000..60279286d --- /dev/null +++ b/doc/tutorials/02_extensive_form.ipynb @@ -0,0 +1,336 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "96019084", + "metadata": {}, + "source": [ + "# 2) Extensive form games\n", + "\n", + "In the first tutorial, we used Gambit to set up the Prisoner's Dilemma, an example of a normal (strategic) form game.\n", + "\n", + "Gambit can also be used to set up extensive form games; the game is represented as a tree, where each node represents a decision point for a player, and the branches represent the possible actions they can take.\n", + "\n", + "**Example: One-shot trust game with binary actions**\n", + "\n", + "[Kreps (1990)](#references) introduced a game commonly referred to as the **trust game**.\n", + "We will build a one-shot version of this game using Gambit's game transformation operations.\n", + "\n", + "The game can be defined as follows:\n", + "- There are two players, a **Buyer** and a **Seller**.\n", + "- The Buyer moves first and has two actions, **Trust** or **Not trust**.\n", + "- If the Buyer chooses **Not trust**, then the game ends, and both players receive payoffs of `0`.\n", + "- If the Buyer chooses **Trust**, then the Seller has a choice with two actions, **Honor** or **Abuse**.\n", + "- If the Seller chooses **Honor**, both players receive payoffs of `1`;\n", + "- If the Seller chooses **Abuse**, the Buyer receives a payoff of `-1` and the Seller receives a payoff of `2`.\n", + "\n", + "We create a game with an extensive representation using `Game.new_tree`:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "5946289b", + "metadata": {}, + "outputs": [], + "source": [ + "import pygambit as gbt" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "91ed4dfb", + "metadata": {}, + "outputs": [], + "source": [ + "g = gbt.Game.new_tree(\n", + " players=[\"Buyer\", \"Seller\"],\n", + " title=\"One-shot trust game, after Kreps (1990)\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "e1903069", + "metadata": {}, + "source": [ + "The tree of the game contains just a root node, with no children:" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "3cd94917", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(g.root.children)" + ] + }, + { + "cell_type": "markdown", + "id": "962b4e52", + "metadata": {}, + "source": [ + "To extend a game from an existing terminal node, use `Game.append_move`. To begin with, the sole root node is the terminal node.\n", + "\n", + "Here we extend the game from the root node by adding the first move for the \"Buyer\" player, creating two child nodes (one for each possible action)." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "5d27a07a", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "2" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "g.append_move(\n", + " g.root, # This is the node to append the move to\n", + " player=\"Buyer\",\n", + " actions=[\"Trust\", \"Not trust\"]\n", + ")\n", + "len(g.root.children)" + ] + }, + { + "cell_type": "markdown", + "id": "43e28b1e", + "metadata": {}, + "source": [ + "We can also optionally specify labels for nodes when defining a game.\n", + "This isn't strictly necessary, but doing so makes the game easier to understand and work with than referring to nodes by their indices.\n", + "\n", + "Here we'll label the nodes according to the actions that precede them in the game tree." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "65b21e37", + "metadata": {}, + "outputs": [], + "source": [ + "for node in g.root.children:\n", + " node.label = node.prior_action.label" + ] + }, + { + "cell_type": "markdown", + "id": "bba61594", + "metadata": {}, + "source": [ + "We can then also add the Seller's move in the situation after the Buyer chooses Trust:" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "47c4a31b", + "metadata": {}, + "outputs": [], + "source": [ + "g.append_move(\n", + " g.root.children[\"Trust\"],\n", + " player=\"Seller\",\n", + " actions=[\"Honor\", \"Abuse\"]\n", + ")\n", + "for node in g.root.children[\"Trust\"].children:\n", + " node.label = node.prior_action.label" + ] + }, + { + "cell_type": "markdown", + "id": "382ba37d", + "metadata": {}, + "source": [ + "Now that we have the moves of the game defined, we add payoffs.\n", + "\n", + "Payoffs are associated with an `Outcome`; each `Outcome` has a vector of payoffs, one for each player, and optionally an identifying text label.\n", + "\n", + "First we add the outcome associated with the Seller proving themselves trustworthy:" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "716e9b9a", + "metadata": {}, + "outputs": [], + "source": [ + "g.set_outcome(\n", + " g.root.children[\"Trust\"].children[\"Honor\"],\n", + " outcome=g.add_outcome(\n", + " payoffs=[1, 1],\n", + " label=\"Trustworthy\"\n", + " )\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "df082b10", + "metadata": {}, + "source": [ + "Next, the outcome associated with the scenario where the Buyer trusts but the Seller does not return the trust:" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "695b1aad", + "metadata": {}, + "outputs": [], + "source": [ + "g.set_outcome(\n", + " g.root.children[\"Trust\"].children[\"Abuse\"],\n", + " outcome=g.add_outcome(\n", + " payoffs=[-1, 2],\n", + " label=\"Untrustworthy\"\n", + " )\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "48335eb8", + "metadata": {}, + "source": [ + "And, finally the outcome associated with the Buyer opting out of the interaction:" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "0704ef86", + "metadata": {}, + "outputs": [], + "source": [ + "g.set_outcome(\n", + " g.root.children[\"Not trust\"],\n", + " g.add_outcome(\n", + " payoffs=[0, 0],\n", + " label=\"Opt-out\"\n", + " )\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "09ef5e2e", + "metadata": {}, + "source": [ + "Nodes without an outcome attached are assumed to have payoffs of zero for all players.\n", + "\n", + "Therefore, adding the outcome to this latter terminal node is not strictly necessary in Gambit, but it is useful to be explicit for readability." + ] + }, + { + "cell_type": "markdown", + "id": "cfc52edc", + "metadata": {}, + "source": [ + "Saving and reading extensive form games to and from file\n", + "--------------------\n", + "\n", + "You can use Gambit to save games to, and read from files.\n", + "The specific format depends on whether the game is normal or extensive form.\n", + "\n", + "Here we'll save the Trust game (extensive form) to the `.efg` format." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "37c51152", + "metadata": {}, + "outputs": [], + "source": [ + "g.to_efg(\"games/trust_game.efg\")" + ] + }, + { + "cell_type": "markdown", + "id": "0eb31525", + "metadata": {}, + "source": [ + "You can easily restore the game object from file like so:" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "0d86a750", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "pygambit.gambit.Game" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "restored_game = gbt.read_efg(\"games/trust_game.efg\")\n", + "type(restored_game)" + ] + }, + { + "cell_type": "markdown", + "id": "be034836", + "metadata": {}, + "source": [ + "#### References\n", + "\n", + "Kreps, D. (1990) \"Corporate Culture and Economic Theory.\" In J. Alt and K. Shepsle, eds., *Perspectives on Positive Political Economy*, Cambridge University Press." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "gambitvenv313", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.5" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/tutorials/03_poker.ipynb b/doc/tutorials/03_poker.ipynb new file mode 100644 index 000000000..5bc30f462 --- /dev/null +++ b/doc/tutorials/03_poker.ipynb @@ -0,0 +1,1635 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "98eb65d8", + "metadata": {}, + "source": [ + "# 3) A one-card poker game with private information\n", + "\n", + "In this tutorial, we'll create an extensive form representation of a one-card poker game [[Mye91](#references)] and use it to demonstrate and explain the following with Gambit:\n", + "\n", + "1. Setting up an extensive form game with imperfect information using [information sets](#information-sets)\n", + "2. [Computing and interpreting Nash equilibria](#computing-and-interpreting-nash-equilibria) and understanding mixed behaviour and mixed strategy profiles\n", + "3. [Acceptance criteria for Nash equilibria](#acceptance-criteria-for-nash-equilibria)\n", + "\n", + "A version of this game also appears in [[RUW08](#references)], as a classroom game under the name \"stripped-down poker\".\n", + "This is perhaps the simplest interesting game with imperfect information.\n", + "\n", + "In our version of the game, there are two players, **Alice** and **Bob**, and a deck of cards, with equal numbers of **King** and **Queen** cards.\n", + "\n", + "- The game begins with each player putting $1 in the pot.\n", + "- A card is dealt at random to Alice\n", + " - Alice observes her card\n", + " - Bob does not observe the card\n", + "- Alice then chooses either to **Raise** or to **Fold**.\n", + " - If she chooses to Fold, Bob wins the pot and the game ends.\n", + " - If she chooses to Raise, she adds another $1 to the pot.\n", + "- Bob then chooses either to **Meet** or **Pass**.\n", + " - If he chooses to Pass, Alice wins the pot and the game ends.\n", + " - If he chooses to Meet, he adds another $1 to the pot.\n", + "- There is then a showdown, in which Alice reveals her card.\n", + " - If she has a King, then she wins the pot;\n", + " - If she has a Queen, then Bob wins the pot." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "69cbfe81", + "metadata": {}, + "outputs": [], + "source": [ + "import pygambit as gbt" + ] + }, + { + "cell_type": "markdown", + "id": "70819881", + "metadata": {}, + "source": [ + "Create the game with two players." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "ad6a1119", + "metadata": {}, + "outputs": [], + "source": [ + "g = gbt.Game.new_tree(\n", + " players=[\"Alice\", \"Bob\"], \n", + " title=\"One card poker\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "d9796238", + "metadata": {}, + "source": [ + "In addition to the two named players, Gambit also instantiates a chance player." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "841f9f74", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Player(game=Game(title='One card poker'), label='Alice')\n", + "Player(game=Game(title='One card poker'), label='Bob')\n", + "ChancePlayer(game=Game(title='One card poker'))\n" + ] + } + ], + "source": [ + "print(g.players[\"Alice\"])\n", + "print(g.players[\"Bob\"])\n", + "print(g.players.chance)" + ] + }, + { + "cell_type": "markdown", + "id": "0d4c7f5b", + "metadata": {}, + "source": [ + "Moves belonging to the chance player can be added in the same way as to other players.\n", + "\n", + "At any new move created for the chance player, the action probabilities default to uniform randomization over the actions at the move.\n", + "\n", + "The first step in this game is that Alice is dealt a card which could be a King or Queen, each with probability 1/2.\n", + "\n", + "To simulate this in Gambit, we create a chance player move at the root node of the game.\n", + "\n", + "Note: throughout this tutorial, we'll also label nodes according to the actions that precede them in the game tree to improve code readability." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "fe80c64c", + "metadata": {}, + "outputs": [], + "source": [ + "g.append_move(\n", + " g.root,\n", + " player=g.players.chance,\n", + " actions=[\"King\", \"Queen\"] # By default, chance actions have equal probabilities\n", + ")\n", + "for node in g.root.children: # Add labels to the new child nodes to improve code readability\n", + " node.label = node.prior_action.label" + ] + }, + { + "cell_type": "markdown", + "id": "5cf73f0a", + "metadata": {}, + "source": [ + "## Information sets\n", + "\n", + "In this game, information structure is important.\n", + "Alice knows her card, so the two nodes at which she has the move are part of different **information sets**.\n", + "\n", + "We'll therefore need to append Alice's move separately for each of the root node's children, i.e. the scenarios where she has a King or a Queen.\n", + "Let's now add both of these possible moves." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "0e3bb5ef", + "metadata": {}, + "outputs": [], + "source": [ + "for node in g.root.children:\n", + " g.append_move(\n", + " node,\n", + " player=\"Alice\",\n", + " actions=[\"Raise\", \"Fold\"]\n", + " )\n", + " for child_node in node.children:\n", + " child_node.label = child_node.prior_action.label" + ] + }, + { + "cell_type": "markdown", + "id": "4c8d0343", + "metadata": {}, + "source": [ + "The loop above causes each of the newly-appended moves to be in new information sets, reflecting the fact that Alice's decision depends on the knowledge of which card she holds.\n", + "\n", + "In contrast, Bob does not know Alice’s card, and therefore cannot distinguish between the two nodes at which he has to make his decision:\n", + "\n", + " - Chance player chooses King, then Alice Raises: `g.root.children[\"King\"].children[\"Raise\"]`\n", + " - Chance player chooses Queen, then Alice Raises: `g.root.children[\"Queen\"].children[\"Raise\"]`\n", + "\n", + "In other words, Bob's decision when Alice raises with a Queen should be part of the same information set as Bob's decision when Alice raises with a King.\n", + "\n", + "To set this scenario up in Gambit, we'll need to use `Game.append_infoset` to add a move as part of an existing information set (represented in Gambit as an `Infoset`).\n", + "\n", + "First, let's add Bob's move to the node where Alice has raised with a King." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "dbfa7035", + "metadata": {}, + "outputs": [], + "source": [ + "g.append_move(\n", + " g.root.children[\"King\"].children[\"Raise\"],\n", + " player=\"Bob\",\n", + " actions=[\"Meet\", \"Pass\"]\n", + ")\n", + "for node in g.root.children[\"King\"].children[\"Raise\"].children:\n", + " node.label = node.prior_action.label" + ] + }, + { + "cell_type": "markdown", + "id": "689ce12c", + "metadata": {}, + "source": [ + "Now let's add the information set we created at the node where Alice raised with a King, to the node where Alice raised with a Queen." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "655cdae3", + "metadata": {}, + "outputs": [], + "source": [ + "g.append_infoset(\n", + " g.root.children[\"Queen\"].children[\"Raise\"],\n", + " infoset=g.root.children[\"King\"].children[\"Raise\"].infoset\n", + ")\n", + "for node in g.root.children[\"Queen\"].children[\"Raise\"].children:\n", + " node.label = node.prior_action.label" + ] + }, + { + "cell_type": "markdown", + "id": "c4eeb65f", + "metadata": {}, + "source": [ + "In game theory terms, this creates \"imperfect information\".\n", + "Bob cannot distinguish between these two nodes in the game tree, so he must use the same strategy (same probabilities for Meet vs. Pass) in both situations.\n", + "\n", + "This is crucial in games where players must make decisions without complete knowledge of their opponents' private information.\n", + "\n", + "Let's now set up the four possible payoff outcomes for the game." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "87c988be", + "metadata": {}, + "outputs": [], + "source": [ + "alice_winsbig = g.add_outcome([2, -2], label=\"Alice wins big\")\n", + "alice_wins = g.add_outcome([1, -1], label=\"Alice wins\")\n", + "bob_winsbig = g.add_outcome([-2, 2], label=\"Bob wins big\")\n", + "bob_wins = g.add_outcome([-1, 1], label=\"Bob wins\")" + ] + }, + { + "cell_type": "markdown", + "id": "467a2c39", + "metadata": {}, + "source": [ + "Finally, we should assign an outcome to each of the terminal nodes in the game tree." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "29aa60a0", + "metadata": {}, + "outputs": [], + "source": [ + "# Alice folds, Bob wins small\n", + "g.set_outcome(g.root.children[\"King\"].children[\"Fold\"], bob_wins)\n", + "g.set_outcome(g.root.children[\"Queen\"].children[\"Fold\"], bob_wins)\n", + "\n", + "# Bob sees Alice raise and calls, correctly believing she is bluffing, Bob wins big\n", + "g.set_outcome(g.root.children[\"Queen\"].children[\"Raise\"].children[\"Meet\"], bob_winsbig)\n", + "\n", + "# Bob sees Alice raise and calls, incorrectly believing she is bluffing, Alice wins big\n", + "g.set_outcome(g.root.children[\"King\"].children[\"Raise\"].children[\"Meet\"], alice_winsbig)\n", + "\n", + "# Bob does not call Alice's raise, Alice wins small\n", + "g.set_outcome(g.root.children[\"King\"].children[\"Raise\"].children[\"Pass\"], alice_wins)\n", + "g.set_outcome(g.root.children[\"Queen\"].children[\"Raise\"].children[\"Pass\"], alice_wins)" + ] + }, + { + "cell_type": "markdown", + "id": "17eb6af5", + "metadata": {}, + "source": [ + "## Computing and interpreting Nash equilibria\n", + "\n", + "\n", + "Since our one-card poker game is extensive form and has two players, we can use the `lcp_solve` algorithm in Gambit to compute the Nash equilibria." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "4d92c8d9", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "NashComputationResult(method='lcp', rational=True, use_strategic=False, equilibria=[[[[Rational(1, 1), Rational(0, 1)], [Rational(1, 3), Rational(2, 3)]], [[Rational(2, 3), Rational(1, 3)]]]], parameters={'stop_after': 0, 'max_depth': 0})" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "result = gbt.nash.lcp_solve(g)\n", + "result" + ] + }, + { + "cell_type": "markdown", + "id": "e5946077", + "metadata": {}, + "source": [ + "The result of the calculation is returned as a `NashComputationResult` object.\n", + "\n", + "The set of equilibria found is reported in `NashComputationResult.equilibria`; in this case, this is a list of `MixedBehaviorProfile`'s.\n", + "\n", + "For one-card poker, we expect to find a single equilibrium (one `MixedBehaviorProfile`):" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "9967d6f7", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Number of equilibria found: 1\n" + ] + } + ], + "source": [ + "print(\"Number of equilibria found:\", len(result.equilibria))\n", + "eqm = result.equilibria[0]" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "3293e818", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "pygambit.gambit.MixedBehaviorProfileRational" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Note: MixedBehaviorProfileRational is a subclass of MixedBehaviorProfile that uses rational numbers for probabilities.\n", + "type(eqm)" + ] + }, + { + "cell_type": "markdown", + "id": "69f67b5b", + "metadata": {}, + "source": [ + "A mixed behavior profile specifies, for each information set, the probability distribution over actions at that information set.\n", + "\n", + "Indexing a mixed behaviour profile by a player gives a `MixedBehavior`, which specifies probability distributions at each of the player's information sets:" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "4cf38264", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "pygambit.gambit.MixedBehavior" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type(eqm[\"Alice\"])" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "85e7fdda", + "metadata": {}, + "outputs": [ + { + "data": { + "text/latex": [ + "$\\left[\\left[1,0\\right],\\left[\\frac{1}{3},\\frac{2}{3}\\right]\\right]$" + ], + "text/plain": [ + "[[Rational(1, 1), Rational(0, 1)], [Rational(1, 3), Rational(2, 3)]]" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "eqm[\"Alice\"]" + ] + }, + { + "cell_type": "markdown", + "id": "6615115d", + "metadata": {}, + "source": [ + "In this case, at Alice's first information set, the one at which she has the King, she always raises.\n", + "\n", + "At her second information set, where she has the Queen, she sometimes bluffs, raising with probability one-third.\n", + "\n", + "The probability distribution at an information set is represented by a `MixedAction`.\n", + "\n", + "`MixedBehavior.mixed_actions` iterates over these for the player:" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "f45a82b6", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "At information set 0, Alice plays Raise with probability: 1 and Fold with probability: 0\n", + "At information set 1, Alice plays Raise with probability: 1/3 and Fold with probability: 2/3\n" + ] + } + ], + "source": [ + "for infoset, mixed_action in eqm[\"Alice\"].mixed_actions():\n", + " print(\n", + " f\"At information set {infoset.number}, \"\n", + " f\"Alice plays Raise with probability: {mixed_action['Raise']}\"\n", + " f\" and Fold with probability: {mixed_action['Fold']}\"\n", + " )" + ] + }, + { + "cell_type": "markdown", + "id": "9eeae046", + "metadata": {}, + "source": [ + "We can alternatively iterate through each of a player's actions like so:" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "83bbd3e5", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "At information set 0, Alice plays Raise with probability: 1\n", + "At information set 0, Alice plays Fold with probability: 0\n", + "At information set 1, Alice plays Raise with probability: 1/3\n", + "At information set 1, Alice plays Fold with probability: 2/3\n" + ] + } + ], + "source": [ + "for action in g.players[\"Alice\"].actions:\n", + " print(\n", + " f\"At information set {action.infoset.number}, \"\n", + " f\"Alice plays {action.label} with probability: {eqm[action]}\"\n", + " )" + ] + }, + { + "cell_type": "markdown", + "id": "1f121d48", + "metadata": {}, + "source": [ + "Now let's look at Bob’s strategy:" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "6bf51b38", + "metadata": {}, + "outputs": [ + { + "data": { + "text/latex": [ + "$\\left[\\left[\\frac{2}{3},\\frac{1}{3}\\right]\\right]$" + ], + "text/plain": [ + "[[Rational(2, 3), Rational(1, 3)]]" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "eqm[\"Bob\"]" + ] + }, + { + "cell_type": "markdown", + "id": "e906c4c4", + "metadata": {}, + "source": [ + "Bob meets Alice’s raise two-thirds of the time.\n", + "The label “Raise” is used in more than one information set for Alice, so in the above we had to specify information sets when indexing.\n", + "\n", + "When there is no ambiguity, we can specify action labels directly.\n", + "So for example, because Bob has only one action named “Meet” in the game, we can extract the probability that Bob plays “Meet” by:" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "2966e700", + "metadata": {}, + "outputs": [ + { + "data": { + "text/latex": [ + "$\\frac{2}{3}$" + ], + "text/plain": [ + "Rational(2, 3)" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "eqm[\"Bob\"][\"Meet\"]" + ] + }, + { + "cell_type": "markdown", + "id": "2ec69f8c", + "metadata": {}, + "source": [ + "Moreover, this is the only action with that label in the game, so we can index the profile directly using the action label without any ambiguity:" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "f5a7f110", + "metadata": {}, + "outputs": [ + { + "data": { + "text/latex": [ + "$\\frac{2}{3}$" + ], + "text/plain": [ + "Rational(2, 3)" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "eqm[\"Meet\"]" + ] + }, + { + "cell_type": "markdown", + "id": "db19411b", + "metadata": {}, + "source": [ + "Because this is an equilibrium, Bob is indifferent between the two actions at his information set, meaning he has no reason to prefer one action over the other, given Alice's expected strategy.\n", + "\n", + "`MixedBehaviorProfile.action_value` returns the expected payoff of taking an action, conditional on reaching that action's information set:" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "a7d3816d", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "When Bob plays Meet he can expect the payoff: -1\n", + "When Bob plays Pass he can expect the payoff: -1\n" + ] + } + ], + "source": [ + "# Remember that Bob has a single information set\n", + "for action in g.players[\"Bob\"].infosets[0].actions:\n", + " print(\n", + " f\"When Bob plays {action.label} he can expect the payoff: {eqm.action_value(action)}\"\n", + " )" + ] + }, + { + "cell_type": "markdown", + "id": "6491fdda", + "metadata": {}, + "source": [ + "Bob's indifference between his actions arises because of his beliefs given Alice's strategy.\n", + "\n", + "`MixedBehaviorProfile.belief` returns the probability of reaching a node, conditional on its information set being reached.\n", + "\n", + "Recall that the two nodes in Bob's only information set are `g.root.children[\"King\"].children[\"Raise\"]` and `g.root.children[\"Queen\"].children[\"Raise\"]`):" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "4a54b20c", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Bob's belief in reaching the King -> Raise node is: 3/4\n", + "Bob's belief in reaching the Queen -> Raise node is: 1/4\n" + ] + } + ], + "source": [ + "for node in g.players[\"Bob\"].infosets[0].members:\n", + " print(\n", + " f\"Bob's belief in reaching the {node.parent.label} -> {node.label} node is: {eqm.belief(node)}\"\n", + " )" + ] + }, + { + "cell_type": "markdown", + "id": "351bb3ce", + "metadata": {}, + "source": [ + "Bob believes that, conditional on Alice raising, there's a 3/4 chance that she has the King; therefore, the expected payoff to meeting is in fact -1 as computed.\n", + "\n", + "`MixedBehaviorProfile.infoset_prob` returns the probability that an information set is reached:" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "b250c1cd", + "metadata": {}, + "outputs": [ + { + "data": { + "text/latex": [ + "$\\frac{2}{3}$" + ], + "text/plain": [ + "Rational(2, 3)" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "eqm.infoset_prob(g.players[\"Bob\"].infosets[0])" + ] + }, + { + "cell_type": "markdown", + "id": "9216ea34", + "metadata": {}, + "source": [ + "The corresponding probability that a node is reached in the play of the game is given by `MixedBehaviorProfile.realiz_prob`, and the expected payoff to a player conditional on reaching a node is given by `MixedBehaviorProfile.node_value`." + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "6f01846b", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The probability that the node King -> Raise is reached is: 1/2. Bob's expected payoff conditional on reaching this node is: -5/3\n", + "The probability that the node Queen -> Raise is reached is: 1/6. Bob's expected payoff conditional on reaching this node is: 1\n" + ] + } + ], + "source": [ + "for node in g.players[\"Bob\"].infosets[0].members:\n", + " print(\n", + " f\"The probability that the node {node.parent.label} -> {node.label} is reached is: {eqm.realiz_prob(node)}. \",\n", + " f\"Bob's expected payoff conditional on reaching this node is: {eqm.node_value(\"Bob\", node)}\"\n", + " )" + ] + }, + { + "cell_type": "markdown", + "id": "5ba0c241", + "metadata": {}, + "source": [ + "The overall expected payoff to a player given the behavior profile is returned by `MixedBehaviorProfile.payoff`:" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "id": "5079d231", + "metadata": {}, + "outputs": [ + { + "data": { + "text/latex": [ + "$\\frac{1}{3}$" + ], + "text/plain": [ + "Rational(1, 3)" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "eqm.payoff(\"Alice\")" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "id": "c55f2c7a", + "metadata": {}, + "outputs": [ + { + "data": { + "text/latex": [ + "$\\frac{-1}{3}$" + ], + "text/plain": [ + "Rational(-1, 3)" + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "eqm.payoff(\"Bob\")" + ] + }, + { + "cell_type": "markdown", + "id": "26d5e8ff", + "metadata": {}, + "source": [ + "The equilibrium computed expresses probabilities in rational numbers.\n", + "\n", + "Because the numerical data of games in Gambit [are represented exactly](#representation-of-numerical-data-of-a-game), methods which are specialized to two-player games, `lp_solve`, `lcp_solve`, and `enummixed_solve`, can report exact probabilities for equilibrium strategy profiles.\n", + "\n", + "This is enabled by default for these methods.\n", + "\n", + "When a game has an extensive representation, equilibrium finding methods default to computing on that representation.\n", + "It is also possible to compute using the strategic representation.\n", + "`pygambit` transparently computes the reduced strategic form representation of an extensive game." + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "id": "d4ecff88", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['11', '12', '21', '22']" + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "[s.label for s in g.players[\"Alice\"].strategies]" + ] + }, + { + "cell_type": "markdown", + "id": "a9bf9b73", + "metadata": {}, + "source": [ + "In the strategic form of this game, Alice has four strategies.\n", + "\n", + "The generated strategy labels list the action numbers taken at each information set.\n", + "For example, label '11' refers to the strategy gets dealt the King, then raises.\n", + "\n", + "We can therefore apply a method which operates on a strategic game to any game with an extensive representation." + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "id": "24e4b6e8", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "NashComputationResult(method='gnm', rational=False, use_strategic=True, equilibria=[[[0.33333333333866677, 0.6666666666613335, 0.0, 0.0], [0.6666666666559997, 0.3333333333440004]]], parameters={'perturbation': [[1.0, 0.0, 0.0, 0.0], [1.0, 0.0]], 'end_lambda': -10.0, 'steps': 100, 'local_newton_interval': 3, 'local_newton_maxits': 10})" + ] + }, + "execution_count": 27, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "gnm_result = gbt.nash.gnm_solve(g)\n", + "gnm_result" + ] + }, + { + "cell_type": "markdown", + "id": "d88b736b", + "metadata": {}, + "source": [ + "`gnm_solve` can be applied to any game with any number of players, and uses a path-following process in floating-point arithmetic, so it returns profiles with probabilities expressed as floating-point numbers.\n", + "\n", + "This method operates on the strategic representation of the game, so the returned results are of type `MixedStrategyProfile` (specifically `MixedStrategyProfileDouble`)." + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "id": "d9ffb4b8", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "pygambit.gambit.MixedStrategyProfileDouble" + ] + }, + "execution_count": 28, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "gnm_eqm = gnm_result.equilibria[0]\n", + "type(gnm_eqm)" + ] + }, + { + "cell_type": "markdown", + "id": "102d22c2", + "metadata": {}, + "source": [ + "Indexing a `MixedStrategyProfile` by a player gives the probability distribution over that player's strategies only.\n", + "\n", + "The expected payoff to a strategy is provided by `MixedStrategyProfile.strategy_value` and the overall expected payoff to a player is returned by `MixedStrategyProfile.payoff`:" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "id": "56e2f847", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Alice's expected payoffs playing:\n", + "Strategy 11: 0.3333\n", + "Strategy 12: 0.3333\n", + "Strategy 21: -1.0000\n", + "Strategy 22: -1.0000\n", + "Alice's overall expected payoff: 0.3333\n", + "\n", + "Bob's expected payoffs playing:\n", + "Strategy 1: -0.3333\n", + "Strategy 2: -0.3333\n", + "Bob's overall expected payoff: -0.3333\n", + "\n" + ] + } + ], + "source": [ + "for player in g.players:\n", + " print(\n", + " f\"{player.label}'s expected payoffs playing:\"\n", + " )\n", + " for strategy in player.strategies:\n", + " print(\n", + " f\"Strategy {strategy.label}: {gnm_eqm.strategy_value(strategy):.4f}\"\n", + " )\n", + " print(\n", + " f\"{player.label}'s overall expected payoff: {gnm_eqm.payoff(player):.4f}\"\n", + " )\n", + " print()" + ] + }, + { + "cell_type": "markdown", + "id": "874be231", + "metadata": {}, + "source": [ + "When a game has an extensive representation, we can convert freely between a mixed strategy profile and the corresponding mixed behaviour profile representation of the same strategies using `MixedStrategyProfile.as_behavior` and `MixedBehaviorProfile.as_strategy`.\n", + "\n", + "- A mixed **strategy** profile maps each strategy in a game to the corresponding probability with which that strategy is played.\n", + "- A mixed **behaviour** profile maps each action at each information set in a game to the corresponding probability with which the action is played, conditional on that information set being reached.\n", + "\n", + "Let's convert the equilibrium we found using `gnm_solve` to a mixed behaviour profile and iterate through the players actions to show their expected payoffs, comparing as we go with the payoffs found by `lcp_solve`:" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "id": "d18a91f0", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Alice's expected payoffs:\n", + "At information set 0, when playing Raise - gnm: 1.6667, lcp: 1.6667\n", + "At information set 0, when playing Fold - gnm: -1.0000, lcp: -1.0000\n", + "At information set 1, when playing Raise - gnm: -1.0000, lcp: -1.0000\n", + "At information set 1, when playing Fold - gnm: -1.0000, lcp: -1.0000\n", + "\n", + "Bob's expected payoffs:\n", + "At information set 0, when playing Meet - gnm: -1.0000, lcp: -1.0000\n", + "At information set 0, when playing Pass - gnm: -1.0000, lcp: -1.0000\n", + "\n" + ] + } + ], + "source": [ + "for player in g.players:\n", + " print(\n", + " f\"{player.label}'s expected payoffs:\"\n", + " )\n", + " for action in player.actions:\n", + " print(\n", + " f\"At information set {action.infoset.number}, \"\n", + " f\"when playing {action.label} - \"\n", + " f\"gnm: {gnm_eqm.as_behavior().action_value(action):.4f}\"\n", + " f\", lcp: {eqm.action_value(action):.4f}\"\n", + " )\n", + " print()" + ] + }, + { + "cell_type": "markdown", + "id": "b2867dca", + "metadata": {}, + "source": [ + "Acceptance criteria for Nash equilibria\n", + "---------------------------------------\n", + "\n", + "Some methods for computing Nash equilibria operate using floating-point arithmetic and/or generate candidate equilibrium profiles using methods which involve some form of successive approximations.\n", + "The outputs of these methods therefore are in general $\\varepsilon$-equilibria, for some positive $\\varepsilon$.\n", + "\n", + "$\\varepsilon$-equilibria (from [Wikipedia](https://en.wikipedia.org/wiki/Epsilon-equilibrium)):\n", + "\n", + "> In game theory, an epsilon-equilibrium, or near-Nash equilibrium, is a strategy profile that approximately satisfies the condition of Nash equilibrium. In a Nash equilibrium, no player has an incentive to change his behavior. In an approximate Nash equilibrium, this requirement is weakened to allow the possibility that a player may have a small incentive to do something different.\n", + "\n", + "> Given a game and a real non-negative parameter $\\varepsilon$, a strategy profile is said to be an $\\varepsilon$-equilibrium if it is not possible for any player to gain more than $\\varepsilon$ in expected payoff by unilaterally deviating from his strategy. Every Nash Equilibrium is equivalent to an $\\varepsilon$-equilibrium where $\\varepsilon = 0$.\n", + "\n", + "\n", + "To provide a uniform interface across methods, where relevant Gambit provides a parameter\n", + "`maxregret`, which specifies the acceptance criterion for labeling the output of the\n", + "algorithm as an equilibrium.\n", + "This parameter is interpreted *proportionally* to the range of payoffs in the game.\n", + "Any profile returned as an equilibrium is guaranteed to be an $\\varepsilon$-equilibrium, for $\\varepsilon$ no more than `maxregret`\n", + "times the difference of the game's maximum and minimum payoffs.\n", + "\n", + "As an example, consider solving our one-card poker game using `logit_solve`. The range of the payoffs in this game is 4 (from +2 to -2).\n" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "id": "0c55f745", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(Rational(2, 1), Rational(-2, 1))" + ] + }, + "execution_count": 31, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "g.max_payoff, g.min_payoff" + ] + }, + { + "cell_type": "markdown", + "id": "6263ad6e", + "metadata": {}, + "source": [ + "`logit_solve` is a globally-convergent method, in that it computes a sequence of profiles which is guaranteed to have a subsequence that converges to a\n", + "Nash equilibrium.\n", + "\n", + "The default value of `maxregret` for this method is set at $10^{-8}$:" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "id": "101598c6", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1" + ] + }, + "execution_count": 32, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "logit_solve_result = gbt.nash.logit_solve(g, maxregret=1e-8)\n", + "len(logit_solve_result.equilibria)" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "id": "9b142728", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "3.987411578698641e-08" + ] + }, + "execution_count": 33, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "ls_eqm = logit_solve_result.equilibria[0]\n", + "ls_eqm.max_regret()" + ] + }, + { + "cell_type": "markdown", + "id": "a2ba06c4", + "metadata": {}, + "source": [ + "The value of `MixedBehaviorProfile.max_regret` of the computed profile exceeds $10^{-8}$ measured in payoffs of the game.\n", + "However, when considered relative to the scale of the game's payoffs, we see it is less than $10^{-8}$ of the payoff range, as requested:" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "id": "ff405409", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "9.968528946746602e-09" + ] + }, + "execution_count": 34, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "ls_eqm.max_regret() / (g.max_payoff - g.min_payoff)" + ] + }, + { + "cell_type": "markdown", + "id": "54635455", + "metadata": {}, + "source": [ + "In general, for globally-convergent methods especially, there is a tradeoff between precision and running time.\n", + "\n", + "We could instead ask only for an $\\varepsilon$-equilibrium with a (scaled) $\\varepsilon$ of no more than $10^{-4}$:" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "id": "31b0143c", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "9.395259956013202e-05" + ] + }, + "execution_count": 35, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "gbt.nash.logit_solve(g, maxregret=1e-4).equilibria[0].max_regret() / (g.max_payoff - g.min_payoff)" + ] + }, + { + "cell_type": "markdown", + "id": "dc8c8509", + "metadata": {}, + "source": [ + "The tradeoff comes from some methods being slow to converge on some games, making it useful instead to get a more coarse approximation to an equilibrium (higher `maxregret` value) which is faster to calculate. " + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "id": "7cfba34a", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "CPU times: user 10.5 ms, sys: 269 μs, total: 10.7 ms\n", + "Wall time: 10.7 ms\n" + ] + }, + { + "data": { + "text/plain": [ + "NashComputationResult(method='logit', rational=False, use_strategic=False, equilibria=[[[[1.0, 0.0], [0.3338351656285655, 0.666164834417892]], [[0.6670407651644307, 0.3329592348608147]]]], parameters={'first_step': 0.03, 'max_accel': 1.1})" + ] + }, + "execution_count": 36, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "%%time\n", + "gbt.nash.logit_solve(g, maxregret=1e-4)" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "id": "6f1809a7", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "CPU times: user 20.1 ms, sys: 610 μs, total: 20.7 ms\n", + "Wall time: 21.5 ms\n" + ] + }, + { + "data": { + "text/plain": [ + "NashComputationResult(method='logit', rational=False, use_strategic=False, equilibria=[[[[1.0, 0.0], [0.33333338649882943, 0.6666666135011706]], [[0.6666667065407631, 0.3333332934592369]]]], parameters={'first_step': 0.03, 'max_accel': 1.1})" + ] + }, + "execution_count": 37, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "%%time\n", + "gbt.nash.logit_solve(g, maxregret=1e-8)" + ] + }, + { + "cell_type": "markdown", + "id": "76461069", + "metadata": {}, + "source": [ + "The convention of expressing `maxregret` scaled by the game's payoffs standardises the behavior of methods across games.\n", + "\n", + "For example, consider solving the poker game instead using `liap_solve()`." + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "id": "414b6f65", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "5.509533871672634e-05" + ] + }, + "execution_count": 38, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "gbt.nash.liap_solve(g.mixed_behavior_profile(), maxregret=1.0e-4).equilibria[0].max_regret() / (g.max_payoff - g.min_payoff)" + ] + }, + { + "cell_type": "markdown", + "id": "c6853432", + "metadata": {}, + "source": [ + "If, instead, we double all payoffs, the output of the method is unchanged." + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "id": "a892dc2b", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "5.509533871672634e-05" + ] + }, + "execution_count": 39, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "for outcome in g.outcomes:\n", + " outcome[\"Alice\"] = outcome[\"Alice\"] * 2\n", + " outcome[\"Bob\"] = outcome[\"Bob\"] * 2\n", + "\n", + "gbt.nash.liap_solve(g.mixed_behavior_profile(), maxregret=1.0e-4).equilibria[0].max_regret() / (g.max_payoff - g.min_payoff)" + ] + }, + { + "cell_type": "markdown", + "id": "5f1f66e0", + "metadata": {}, + "source": [ + "## Representation of numerical data of a game\n", + "\n", + "Payoffs to players and probabilities of actions at chance information sets are specified as numbers.\n", + "Gambit represents the numerical values in a game in exact precision, using either decimal or rational representations.\n", + "\n", + "To illustrate, consider a trivial game which just has one move for the chance player:" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "id": "2f79695a", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[Rational(1, 3), Rational(1, 3), Rational(1, 3)]" + ] + }, + "execution_count": 40, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "small_game = gbt.Game.new_tree()\n", + "small_game.append_move(small_game.root, small_game.players.chance, [\"a\", \"b\", \"c\"])\n", + "[act.prob for act in small_game.root.infoset.actions]" + ] + }, + { + "cell_type": "markdown", + "id": "dc4522b5", + "metadata": {}, + "source": [ + "The default when creating a new move for chance is that all actions are chosen with equal probability.\n", + "These probabilities are represented as rational numbers, using `pygambit`'s `Rational` class, which is derived from Python's `fractions.Fraction`.\n", + "\n", + "Numerical data can be set as rational numbers. Here we update the chance action probabilities with `Rational` numbers:" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "id": "5de6acb2", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[Rational(1, 4), Rational(1, 2), Rational(1, 4)]" + ] + }, + "execution_count": 41, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "small_game.set_chance_probs(\n", + " small_game.root.infoset,\n", + " [gbt.Rational(1, 4), gbt.Rational(1, 2), gbt.Rational(1, 4)]\n", + ")\n", + "[act.prob for act in small_game.root.infoset.actions]" + ] + }, + { + "cell_type": "markdown", + "id": "23263b21", + "metadata": {}, + "source": [ + "Numerical data can also be explicitly specified as decimal numbers:" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "id": "c47d2ab6", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[Decimal('0.25'), Decimal('0.50'), Decimal('0.25')]" + ] + }, + "execution_count": 42, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "small_game.set_chance_probs(\n", + " small_game.root.infoset,\n", + " [gbt.Decimal(\".25\"), gbt.Decimal(\".50\"), gbt.Decimal(\".25\")]\n", + ")\n", + "[act.prob for act in small_game.root.infoset.actions]" + ] + }, + { + "cell_type": "markdown", + "id": "bffda303", + "metadata": {}, + "source": [ + "Although the two representations above are mathematically equivalent, `pygambit` remembers the format in which the values were specified.\n", + "\n", + "Expressing rational or decimal numbers as above is verbose and tedious.\n", + "`pygambit` offers a more concise way to express numerical data in games: when setting numerical game data, `pygambit` will attempt to convert text strings to their rational or decimal representation.\n", + "The above can therefore be written more compactly using string representations:" + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "id": "04329084", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[Rational(1, 4), Rational(1, 2), Rational(1, 4)]" + ] + }, + "execution_count": 43, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "small_game.set_chance_probs(small_game.root.infoset, [\"1/4\", \"1/2\", \"1/4\"])\n", + "[act.prob for act in small_game.root.infoset.actions]" + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "id": "9015e129", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[Decimal('0.25'), Decimal('0.50'), Decimal('0.25')]" + ] + }, + "execution_count": 44, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "small_game.set_chance_probs(small_game.root.infoset, [\".25\", \".50\", \".25\"])\n", + "[act.prob for act in small_game.root.infoset.actions]" + ] + }, + { + "cell_type": "markdown", + "id": "9f22d40d", + "metadata": {}, + "source": [ + "As a further convenience, `pygambit` will accept Python `int` and `float` values.\n", + "`int` values are always interpreted as `Rational` values.\n", + "\n", + "`pygambit` attempts to render `float` values in an appropriate `Decimal` equivalent.\n", + "In the majority of cases, this creates no problems.\n", + "For example," + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "id": "0a019aa5", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[Decimal('0.25'), Decimal('0.5'), Decimal('0.25')]" + ] + }, + "execution_count": 45, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "small_game.set_chance_probs(small_game.root.infoset, [.25, .50, .25])\n", + "[act.prob for act in small_game.root.infoset.actions]" + ] + }, + { + "cell_type": "markdown", + "id": "d53adcd4", + "metadata": {}, + "source": [ + "However, rounding can cause difficulties when attempting to use `float` values to represent values which do not have an exact decimal representation" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "id": "1991d288", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "ValueError: set_chance_probs(): must specify non-negative probabilities that sum to one\n" + ] + } + ], + "source": [ + "try:\n", + " small_game.set_chance_probs(small_game.root.infoset, [1/3, 1/3, 1/3])\n", + "except ValueError as e:\n", + " print(\"ValueError:\", e)\n" + ] + }, + { + "cell_type": "markdown", + "id": "89fefd34", + "metadata": {}, + "source": [ + "This behavior can be slightly surprising, especially in light of the fact that\n", + "in Python," + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "id": "b1dc37fd", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1.0" + ] + }, + "execution_count": 47, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "1/3 + 1/3 + 1/3" + ] + }, + { + "cell_type": "markdown", + "id": "a06699af", + "metadata": {}, + "source": [ + "In checking whether these probabilities sum to one, `pygambit` first converts each of the probabilities to a `Decimal` representation, via the following method" + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "id": "dc1edea2", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Decimal('0.3333333333333333')" + ] + }, + "execution_count": 48, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "gbt.Decimal(str(1/3))" + ] + }, + { + "cell_type": "markdown", + "id": "4bfff415", + "metadata": {}, + "source": [ + "and the sum-to-one check then fails because" + ] + }, + { + "cell_type": "code", + "execution_count": 49, + "id": "1edd90d6", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Decimal('0.9999999999999999')" + ] + }, + "execution_count": 49, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "gbt.Decimal(str(1/3)) + gbt.Decimal(str(1/3)) + gbt.Decimal(str(1/3))" + ] + }, + { + "cell_type": "markdown", + "id": "5208b7a4", + "metadata": {}, + "source": [ + "Setting payoffs for players also follows the same rules.\n", + "Representing probabilities and payoffs exactly is essential, because `pygambit` offers (in particular for two-player games) the possibility of computation of equilibria exactly, because the Nash equilibria of any two-player game with rational payoffs and chance probabilities can be expressed exactly in terms of rational numbers.\n", + "\n", + "It is therefore advisable always to specify the numerical data of games either in terms of `Decimal` or `Rational` values, or their string equivalents.\n", + "It is safe to use `int` values, but `float` values should be used with some care to ensure the values are recorded as intended." + ] + }, + { + "cell_type": "markdown", + "id": "65def67e", + "metadata": {}, + "source": [ + "#### References\n", + "\n", + "Myerson, Roger B. (1991) *Game Theory: Analysis of Conflict*. Cambridge: Harvard University Press.\n", + "\n", + "Reiley, David H., Michael B. Urbancic and Mark Walker. (2008) \"Stripped-down poker: A classroom game with signaling and bluffing.\" *The Journal of Economic Education* 39(4): 323-341." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "gambitvenv313", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.5" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/tutorials/04_starting_points.ipynb b/doc/tutorials/04_starting_points.ipynb new file mode 100644 index 000000000..c3555943c --- /dev/null +++ b/doc/tutorials/04_starting_points.ipynb @@ -0,0 +1,449 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "6818538c", + "metadata": {}, + "source": [ + "# 4) Generating starting points for algorithms\n", + "\n", + "In the previous tutorial, we demonstrated how to calculate the Nash equilibria of a game set up using Gambit and interpret the `MixedStrategyProfile` or `MixedBehaviorProfile` objects returned by the solver.\n", + "In this tutorial, we will demonstrate how to use a `MixedStrategyProfile` or `MixedBehaviorProfile` as an initial condition, a starting point, for some methods of computing Nash equilibria.\n", + "The equilibria found will depend on which starting point is selected.\n", + "\n", + "To facilitate generating starting points, Gambit's `Game` class provides the methods `random_strategy_profile` and `random_behavior_profile`, to generate profiles which are drawn from the uniform distribution on the product of simplices. In other words, the profiles are sampled from a uniform distribution so that each possible mixed strategy profile (or mixed behaviour profile) is equally likely to be selected.\n", + "\n", + "As an example, we consider a three-player game from McKelvey and McLennan (1997), in which each player has two strategies.\n", + "This game has nine equilibria in total, and in particular has two totally mixed Nash equilibria, which is the maximum possible number of regular totally mixed equilbria in games of this size.\n", + "\n", + "Pure and mixed strategies:\n", + "\n", + "- **Pure strategy**: A player chooses the action with probability 1 (always picks the same move)\n", + "- **Mixed strategy**: A player assigns probabilities to their available actions (some actions may have probability 0)\n", + "- **Totally mixed strategy**: Mixed strategy where every available action is chosen with strictly positive probability (no action has probability 0)" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "493cafb8", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "

2x2x2 Example from McKelvey-McLennan, with 9 Nash equilibria, 2 totally mixed

\n", + "
Subtable with strategies:
Player 3 Strategy 1
12
19,8,120,0,0
20,0,09,8,2
Subtable with strategies:
Player 3 Strategy 2
12
10,0,03,4,6
23,4,60,0,0
\n" + ], + "text/plain": [ + "Game(title='2x2x2 Example from McKelvey-McLennan, with 9 Nash equilibria, 2 totally mixed')" + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import pygambit as gbt\n", + "g = gbt.read_nfg(\"../2x2x2.nfg\")\n", + "g" + ] + }, + { + "cell_type": "markdown", + "id": "1e68a5bd", + "metadata": {}, + "source": [ + "We first consider finding Nash equilibria in this game using `liap_solve`.\n", + "If we run this method starting from the centroid (uniform randomization across all strategies for each player), `liap_solve` finds one of the totally-mixed equilibria. Without providing a list to `Game.mixed_strategy_profile`, the method will return the centroid mixed strategy profile." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "b32adf22", + "metadata": {}, + "outputs": [ + { + "data": { + "text/latex": [ + "$\\left[[0.5, 0.5],[0.5, 0.5],[0.5, 0.5]\\right]$" + ], + "text/plain": [ + "[[0.5, 0.5], [0.5, 0.5], [0.5, 0.5]]" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "centroid_start = g.mixed_strategy_profile()\n", + "centroid_start" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "c0b62502", + "metadata": {}, + "outputs": [ + { + "data": { + "text/latex": [ + "$\\left[[0.3999999026224355, 0.6000000973775644],[0.49999981670851457, 0.5000001832914854],[0.3333329684317666, 0.6666670315682334]\\right]$" + ], + "text/plain": [ + "[[0.3999999026224355, 0.6000000973775644], [0.49999981670851457, 0.5000001832914854], [0.3333329684317666, 0.6666670315682334]]" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "gbt.nash.liap_solve(centroid_start).equilibria[0]" + ] + }, + { + "cell_type": "markdown", + "id": "df507eda", + "metadata": {}, + "source": [ + "As you can see, in this totally mixed strategy equilibrium, no action has probability 0.\n", + "\n", + "Which equilibrium is found depends on the starting point.\n", + "With a different starting point, we can find, for example, one of the pure-strategy equilibria." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "cf22064e", + "metadata": {}, + "outputs": [ + { + "data": { + "text/latex": [ + "$\\left[[0.9, 0.1],[0.9, 0.1],[0.9, 0.1]\\right]$" + ], + "text/plain": [ + "[[0.9, 0.1], [0.9, 0.1], [0.9, 0.1]]" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "new_start = g.mixed_strategy_profile([[.9, .1], [.9, .1], [.9, .1]])\n", + "new_start" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "08a22505", + "metadata": {}, + "outputs": [ + { + "data": { + "text/latex": [ + "$\\left[[1.0, 0.0],[0.9999999944750116, 5.524988446860122e-09],[0.9999999991845827, 8.154173380971617e-10]\\right]$" + ], + "text/plain": [ + "[[1.0, 0.0], [0.9999999944750116, 5.524988446860122e-09], [0.9999999991845827, 8.154173380971617e-10]]" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "gbt.nash.liap_solve(new_start).equilibria[0]" + ] + }, + { + "cell_type": "markdown", + "id": "3977088f", + "metadata": {}, + "source": [ + "To search for more equilibria, we can instead generate strategy profiles at random." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "cfbc2714", + "metadata": {}, + "outputs": [ + { + "data": { + "text/latex": [ + "$\\left[[0.7187961367413075, 0.2812038632586925],[0.1291105793795489, 0.8708894206204512],[0.12367227612277114, 0.876327723877229]\\right]$" + ], + "text/plain": [ + "[[0.7187961367413075, 0.2812038632586925], [0.1291105793795489, 0.8708894206204512], [0.12367227612277114, 0.876327723877229]]" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "random_start = g.random_strategy_profile()\n", + "random_start" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "eb53062a", + "metadata": {}, + "outputs": [ + { + "data": { + "text/latex": [ + "$\\left[[0.5000003932357804, 0.4999996067642197],[0.3999998501612186, 0.6000001498387814],[0.2500001518113522, 0.7499998481886477]\\right]$" + ], + "text/plain": [ + "[[0.5000003932357804, 0.4999996067642197], [0.3999998501612186, 0.6000001498387814], [0.2500001518113522, 0.7499998481886477]]" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "gbt.nash.liap_solve(random_start).equilibria[0]" + ] + }, + { + "cell_type": "markdown", + "id": "185c6abb", + "metadata": {}, + "source": [ + "Note that methods which take starting points do record the starting points used in the result object returned.\n", + "However, the random profiles which are generated will differ in different runs of a program.\n", + "\n", + "To support making the generation of random strategy profiles reproducible, and for finer-grained control of the generation of these profiles if desired, `Game.random_strategy_profile` and `Game.random_behavior_profile` optionally take a `numpy.random.Generator` object, which is used as the source of randomness for creating the profile." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "4293343a", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import numpy as np\n", + "gen = np.random.default_rng(seed=1234567890)\n", + "p1 = g.random_strategy_profile(gen=gen)\n", + "gen = np.random.default_rng(seed=1234567890)\n", + "p2 = g.random_strategy_profile(gen=gen)\n", + "p1 == p2" + ] + }, + { + "cell_type": "markdown", + "id": "a98e0b66", + "metadata": {}, + "source": [ + "When creating profiles in which probabilities are represented as floating-point numbers, `Game.random_strategy_profile` and `Game.random_behavior_profile` internally use the Dirichlet distribution for each simplex to generate correctly uniform sampling over probabilities.\n", + "However, in some applications generation of random profiles with probabilities as rational numbers is desired.\n", + "\n", + "For example, `simpdiv_solve` takes such a starting point, because it operates by successively refining a triangulation over the space of mixed strategy profiles.\n", + "`Game.random_strategy_profile` and `Game.random_behavior_profile` both take an optional parameter `denom` which, if specified, generates a profile in which probabilities are generated uniformly from the grid in each simplex in which all probabilities have denominator `denom`.\n", + "\n", + "These can then be used in conjunction with `simpdiv_solve` to search for equilibria from different starting points." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "e9716ae0", + "metadata": {}, + "outputs": [ + { + "data": { + "text/latex": [ + "$\\left[\\left[\\frac{1}{2},\\frac{1}{2}\\right],\\left[\\frac{7}{10},\\frac{3}{10}\\right],\\left[0,1\\right]\\right]$" + ], + "text/plain": [ + "[[Rational(1, 2), Rational(1, 2)], [Rational(7, 10), Rational(3, 10)], [Rational(0, 1), Rational(1, 1)]]" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "gen = np.random.default_rng(seed=1234567890)\n", + "rsp = g.random_strategy_profile(denom=10, gen=gen)\n", + "rsp" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "c153918a", + "metadata": {}, + "outputs": [ + { + "data": { + "text/latex": [ + "$\\left[\\left[1,0\\right],\\left[1,0\\right],\\left[1,0\\right]\\right]$" + ], + "text/plain": [ + "[[Rational(1, 1), Rational(0, 1)], [Rational(1, 1), Rational(0, 1)], [Rational(1, 1), Rational(0, 1)]]" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "gbt.nash.simpdiv_solve(rsp).equilibria[0]" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "70a57b26", + "metadata": {}, + "outputs": [ + { + "data": { + "text/latex": [ + "$\\left[\\left[\\frac{1}{10},\\frac{9}{10}\\right],\\left[\\frac{3}{5},\\frac{2}{5}\\right],\\left[\\frac{3}{5},\\frac{2}{5}\\right]\\right]$" + ], + "text/plain": [ + "[[Rational(1, 10), Rational(9, 10)], [Rational(3, 5), Rational(2, 5)], [Rational(3, 5), Rational(2, 5)]]" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "rsp1 = g.random_strategy_profile(denom=10, gen=gen)\n", + "rsp1" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "11995836", + "metadata": {}, + "outputs": [ + { + "data": { + "text/latex": [ + "$\\left[\\left[0,1\\right],\\left[0,1\\right],\\left[1,0\\right]\\right]$" + ], + "text/plain": [ + "[[Rational(0, 1), Rational(1, 1)], [Rational(0, 1), Rational(1, 1)], [Rational(1, 1), Rational(0, 1)]]" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "gbt.nash.simpdiv_solve(rsp1).equilibria[0]" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "2791ffe2", + "metadata": {}, + "outputs": [ + { + "data": { + "text/latex": [ + "$\\left[\\left[\\frac{7}{10},\\frac{3}{10}\\right],\\left[\\frac{4}{5},\\frac{1}{5}\\right],\\left[0,1\\right]\\right]$" + ], + "text/plain": [ + "[[Rational(7, 10), Rational(3, 10)], [Rational(4, 5), Rational(1, 5)], [Rational(0, 1), Rational(1, 1)]]" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "rsp2 = g.random_strategy_profile(denom=10, gen=gen)\n", + "rsp2" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "2ab2caa4", + "metadata": {}, + "outputs": [ + { + "data": { + "text/latex": [ + "$\\left[\\left[1,0\\right],\\left[1,0\\right],\\left[1,0\\right]\\right]$" + ], + "text/plain": [ + "[[Rational(1, 1), Rational(0, 1)], [Rational(1, 1), Rational(0, 1)], [Rational(1, 1), Rational(0, 1)]]" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "gbt.nash.simpdiv_solve(rsp2).equilibria[0]" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "gambitvenv313", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.5" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/tutorials/05_quantal_response.ipynb b/doc/tutorials/05_quantal_response.ipynb new file mode 100644 index 000000000..d04b94ae7 --- /dev/null +++ b/doc/tutorials/05_quantal_response.ipynb @@ -0,0 +1,336 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "ef7d397e", + "metadata": {}, + "source": [ + "# 5) Quantal response equilibria\n", + "\n", + "Gambit implements the idea of [McKPal95](https://gambitproject.readthedocs.io/en/latest/biblio.html#general-game-theory-articles-and-texts) and [McKPal98](https://gambitproject.readthedocs.io/en/latest/biblio.html#general-game-theory-articles-and-texts) to compute Nash equilibria via path-following a branch of the logit quantal response equilibrium (LQRE) correspondence using the function `logit_solve`.\n", + "As an example, we will consider an asymmetric matching pennies game from [Och95](https://gambitproject.readthedocs.io/en/latest/biblio.html#general-game-theory-articles-and-texts) as analyzed in [McKPal95](https://gambitproject.readthedocs.io/en/latest/biblio.html#general-game-theory-articles-and-texts)." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "ebc4c60e", + "metadata": {}, + "outputs": [], + "source": [ + "import pygambit as gbt" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "202786ef", + "metadata": {}, + "outputs": [ + { + "data": { + "text/latex": [ + "$\\left[[0.5000000234106035, 0.49999997658939654],[0.19998563837426647, 0.8000143616257336]\\right]$" + ], + "text/plain": [ + "[[0.5000000234106035, 0.49999997658939654], [0.19998563837426647, 0.8000143616257336]]" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "g = gbt.Game.from_arrays(\n", + " [[1.1141, 0], [0, 0.2785]],\n", + " [[0, 1.1141], [1.1141, 0]],\n", + " title=\"Ochs (1995) asymmetric matching pennies as transformed in McKelvey-Palfrey (1995)\"\n", + ")\n", + "gbt.nash.logit_solve(g).equilibria[0]" + ] + }, + { + "cell_type": "markdown", + "id": "1ce76964", + "metadata": {}, + "source": [ + "`logit_solve` returns only the limiting (approximate) Nash equilibrium found.\n", + "Profiles along the QRE correspondence are frequently of interest in their own right.\n", + "Gambit offers several functions for more detailed examination of branches of the QRE correspondence.\n", + "\n", + "The function `logit_solve_branch` uses the same procedure as `logit_solve`, but returns a list of LQRE profiles computed along the branch instead of just the limiting approximate Nash equilibrium." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "840d9203", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "193" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "qres = gbt.qre.logit_solve_branch(g)\n", + "len(qres)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "be419db2", + "metadata": {}, + "outputs": [ + { + "data": { + "text/latex": [ + "$\\left[[0.5, 0.5],[0.5, 0.5]\\right]$" + ], + "text/plain": [ + "[[0.5, 0.5], [0.5, 0.5]]" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "qres[0].profile" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "582838de", + "metadata": {}, + "outputs": [ + { + "data": { + "text/latex": [ + "$\\left[[0.5182276540742868, 0.4817723459257562],[0.49821668880066783, 0.5017833111993909]\\right]$" + ], + "text/plain": [ + "[[0.5182276540742868, 0.4817723459257562], [0.49821668880066783, 0.5017833111993909]]" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "qres[5].profile" + ] + }, + { + "cell_type": "markdown", + "id": "61e86949", + "metadata": {}, + "source": [ + "`logit_solve_branch` uses an adaptive step size heuristic to find points on the branch.\n", + "The parameters `first_step` and `max_accel` are used to adjust the initial step size and the maximum rate at which the step size changes adaptively.\n", + "The step size used is computed as the distance traveled along the path, and, importantly, not the distance as measured by changes in the precision parameter lambda.\n", + "As a result the lambda values for which profiles are computed cannot be controlled in advance.\n", + "\n", + "In some situations, the LQRE profiles at specified values of lambda are of interest.\n", + "For this, Gambit provides `logit_solve_lambda`.\n", + "This function provides accurate values of strategy profiles at one or more specified values of lambda." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "ce354b49", + "metadata": {}, + "outputs": [ + { + "data": { + "text/latex": [ + "$\\left[[0.5867840364385154, 0.4132159635614846],[0.4518070316997103, 0.5481929683002897]\\right]$" + ], + "text/plain": [ + "[[0.5867840364385154, 0.4132159635614846], [0.4518070316997103, 0.5481929683002897]]" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "qres = gbt.qre.logit_solve_lambda(g, lam=[1, 2, 3])\n", + "qres[0].profile" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "280fa428", + "metadata": {}, + "outputs": [ + { + "data": { + "text/latex": [ + "$\\left[[0.6175219458400859, 0.3824780541599141],[0.3719816648492249, 0.6280183351507751]\\right]$" + ], + "text/plain": [ + "[[0.6175219458400859, 0.3824780541599141], [0.3719816648492249, 0.6280183351507751]]" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "qres[1].profile" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "3dee57df", + "metadata": {}, + "outputs": [ + { + "data": { + "text/latex": [ + "$\\left[[0.6168968501329284, 0.3831031498670716],[0.31401636202001226, 0.6859836379799877]\\right]$" + ], + "text/plain": [ + "[[0.6168968501329284, 0.3831031498670716], [0.31401636202001226, 0.6859836379799877]]" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "qres[2].profile" + ] + }, + { + "cell_type": "markdown", + "id": "5601be33", + "metadata": {}, + "source": [ + "LQRE are frequently taken to data by using maximum likelihood estimation to find the LQRE profile that best fits an observed profile of play.\n", + "This is provided by the function `logit_estimate`.\n", + "We replicate the analysis of a block of the data from [Och95](https://gambitproject.readthedocs.io/en/latest/biblio.html#general-game-theory-articles-and-texts) for which [McKPal95](https://gambitproject.readthedocs.io/en/latest/biblio.html#general-game-theory-articles-and-texts) estimated an LQRE." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "b34a9278", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "pygambit.qre.LogitQREMixedStrategyFitResult" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "data = g.mixed_strategy_profile([[128*0.527, 128*(1-0.527)], [128*0.366, 128*(1-0.366)]])\n", + "fit = gbt.qre.logit_estimate(data)\n", + "type(fit)" + ] + }, + { + "cell_type": "markdown", + "id": "12534924", + "metadata": {}, + "source": [ + "The returned `LogitQREMixedStrategyFitResult` object contains the results of the estimation.\n", + "The results replicate those reported in [McKPal95](https://gambitproject.readthedocs.io/en/latest/biblio.html#general-game-theory-articles-and-texts), including the estimated value of lambda, the QRE profile probabilities, and the log-likelihood.\n", + "\n", + "Because `data` contains the empirical counts of play, and not just frequencies, the resulting log-likelihood is correct for use in likelihoood-ratio tests.\n", + "[[1](#f1)]" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "e10e9abd", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1.8456097536855862\n", + "[[0.615651314427859, 0.3843486855721409], [0.38329094004562914, 0.6167090599543709]]\n", + "-174.76453191087447\n" + ] + } + ], + "source": [ + "print(fit.lam)\n", + "print(fit.profile)\n", + "print(fit.log_like)" + ] + }, + { + "cell_type": "markdown", + "id": "0316795f", + "metadata": {}, + "source": [ + "All of the functions above also support working with the agent LQRE of [McKPal98](https://gambitproject.readthedocs.io/en/latest/biblio.html#general-game-theory-articles-and-texts).\n", + "Agent QRE are computed as the default behavior whenever the game has a extensive (tree) representation.\n", + "\n", + "For `logit_solve`, `logit_solve_branch`, and `logit_solve_lambda`, this can be overriden by passing `use_strategic=True`;\n", + "this will compute LQRE using the reduced strategy set of the game instead.\n", + "\n", + "Likewise, `logit_estimate` will perform estimation using agent LQRE if the data passed are a `MixedBehaviorProfile`, and will return a `LogitQREMixedBehaviorFitResult` object." + ] + }, + { + "cell_type": "markdown", + "id": "486f68a7", + "metadata": {}, + "source": [ + "**Footnotes:**\n", + "\n", + " The log-likelihoods quoted in [McKPal95](https://gambitproject.readthedocs.io/en/latest/biblio.html#general-game-theory-articles-and-texts) are exactly a factor of 10 larger than those obtained by replicating the calculation." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "gambitvenv313", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.5" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/tutorials/06_gambit_with_openspiel.ipynb b/doc/tutorials/06_gambit_with_openspiel.ipynb new file mode 100644 index 000000000..20f6ff4ef --- /dev/null +++ b/doc/tutorials/06_gambit_with_openspiel.ipynb @@ -0,0 +1,1213 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 6) Using Gambit with OpenSpiel\n", + "\n", + "This tutorial demonstrates the interoperability of the Gambit and OpenSpiel Python packages for game-theoretic analysis.\n", + "\n", + "Where Gambit is used to compute exact equilibria for games, OpenSpiel provides a variety of iterative learning algorithms that can be used to approximate strategies. Another key distinction is that the PyGambit API allows the user a simple way to define custom games (see tutorials 1-3). This is also possible in OpenSpiel for normal form games, and you can load `.efg` files created from Gambit for extensive form, however some of the key functionality for iterated learning of strategies is only available for games from the built-in library (see the [OpenSpiel documentation](https://openspiel.readthedocs.io/en/latest/games.html)).\n", + "\n", + "This tutorial demonstrates:\n", + "\n", + "1. Transferring examples of normal (strategic) form and extensive form games between OpenSpiel and Gambit\n", + "2. Simulating evolutionary dynamics of populations of strategies in OpenSpiel for normal form games\n", + "3. Training agents using self-play of extensive form games in OpenSpiel to create strategies\n", + "4. Comparing the strategies from OpenSpiel against equilibria strategies computed with Gambit\n", + "\n", + "Note:\n", + "- The version of OpenSpiel used in this tutorial is `1.6.1`. If you are running this tutorial locally, this will be the version installed via the included `requirements.txt` file.\n", + "- The OpenSpiel code was adapted from the introductory tutorial for the OpenSpiel API on colab [here](https://colab.research.google.com/github/deepmind/open_spiel/blob/master/open_spiel/colabs/OpenSpielTutorial.ipynb)." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "ebb78322", + "metadata": {}, + "outputs": [], + "source": [ + "from io import StringIO\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "\n", + "from open_spiel.python import rl_environment\n", + "from open_spiel.python.algorithms import tabular_qlearner\n", + "from open_spiel.python.algorithms.gambit import export_gambit\n", + "from open_spiel.python.egt import dynamics\n", + "from open_spiel.python.egt.utils import game_payoffs_array\n", + "\n", + "import pyspiel\n", + "\n", + "import pygambit as gbt" + ] + }, + { + "cell_type": "markdown", + "id": "fd324814", + "metadata": {}, + "source": [ + "## OpenSpiel game library\n", + "\n", + "The [library of games](https://openspiel.readthedocs.io/en/latest/games.html) included in OpenSpiel is extensive. Many of these games will not be amenable to equilibrium computation with Gambit, due to their size. For the purposes of this tutorial, we'll pick small games from the list below." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "b3eb3671", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "['2048', 'add_noise', 'amazons', 'backgammon', 'bargaining', 'battleship', 'blackjack', 'blotto', 'breakthrough', 'bridge', 'bridge_uncontested_bidding', 'cached_tree', 'catch', 'checkers', 'chess', 'cliff_walking', 'clobber', 'coin_game', 'colored_trails', 'connect_four', 'coop_box_pushing', 'coop_to_1p', 'coordinated_mp', 'crazy_eights', 'cribbage', 'cursor_go', 'dark_chess', 'dark_hex', 'dark_hex_ir', 'deep_sea', 'dots_and_boxes', 'dou_dizhu', 'efg_game', 'einstein_wurfelt_nicht', 'euchre', 'first_sealed_auction', 'gin_rummy', 'go', 'goofspiel', 'hanabi', 'havannah', 'hearts', 'hex', 'hive', 'kriegspiel', 'kuhn_poker', 'laser_tag', 'leduc_poker', 'lewis_signaling', 'liars_dice', 'liars_dice_ir', 'lines_of_action', 'maedn', 'mancala', 'markov_soccer', 'matching_pennies_3p', 'matrix_bos', 'matrix_brps', 'matrix_cd', 'matrix_coordination', 'matrix_mp', 'matrix_pd', 'matrix_rps', 'matrix_rpsw', 'matrix_sh', 'matrix_shapleys_game', 'mfg_crowd_modelling', 'mfg_crowd_modelling_2d', 'mfg_dynamic_routing', 'mfg_garnet', 'misere', 'mnk', 'morpion_solitaire', 'negotiation', 'nfg_game', 'nim', 'nine_mens_morris', 'normal_form_extensive_game', 'oh_hell', 'oshi_zumo', 'othello', 'oware', 'pathfinding', 'pentago', 'phantom_go', 'phantom_ttt', 'phantom_ttt_ir', 'pig', 'quoridor', 'rbc', 'repeated_game', 'restricted_nash_response', 'sheriff', 'skat', 'solitaire', 'spades', 'start_at', 'stones_and_gems', 'tarok', 'tic_tac_toe', 'tiny_bridge_2p', 'tiny_bridge_4p', 'tiny_hanabi', 'trade_comm', 'turn_based_simultaneous_game', 'twixt', 'ultimate_tic_tac_toe', 'universal_poker', 'y', 'zerosum']\n" + ] + } + ], + "source": [ + "print(pyspiel.registered_names())" + ] + }, + { + "cell_type": "markdown", + "id": "e628a86d", + "metadata": {}, + "source": [ + "## Normal form games from the OpenSpiel library\n", + "\n", + "Let's start with a simple normal form game of rock-paper-scissors, in which the payoffs can be represented by a 3x3 matrix.\n", + "\n", + "Load matrix rock-paper-scissors from OpenSpiel:" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "ops_matrix_rps_game = pyspiel.load_game(\"matrix_rps\")" + ] + }, + { + "cell_type": "markdown", + "id": "fda1204e", + "metadata": {}, + "source": [ + "In order to simulate a playthrough of the game, you can first initialise a game state:" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "1bcdb97b", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Terminal? false\n", + "Row actions: Rock Paper Scissors \n", + "Col actions: Rock Paper Scissors \n", + "Utility matrix:\n", + "0,0 -1,1 1,-1 \n", + "1,-1 0,0 -1,1 \n", + "-1,1 1,-1 0,0 " + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "state = ops_matrix_rps_game.new_initial_state()\n", + "state" + ] + }, + { + "cell_type": "markdown", + "id": "eeee015a", + "metadata": {}, + "source": [ + "The possible actions for both players (player 0 and player 1) are Rock, Paper and Scissors, but these are not labelled and must be accessed via integer indices:" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "70575dc7", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[0, 1, 2]\n", + "[0, 1, 2]\n" + ] + } + ], + "source": [ + "print(state.legal_actions(0)) # Player 0 (row) actions\n", + "print(state.legal_actions(1)) # Player 1 (column) actions" + ] + }, + { + "cell_type": "markdown", + "id": "fdea7e5b", + "metadata": {}, + "source": [ + "Since Rock-paper-scissors is a 1-step simultaneous-move normal form game, we'll apply a list of player actions in one step to reach the terminal state.\n", + "\n", + "Let's simulate player 0 playing Rock (0) and player 1 playing Paper (1):" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "a532321e", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Terminal? true\n", + "History: 0, 1\n", + "Returns: -1,1\n", + "Row actions: \n", + "Col actions: \n", + "Utility matrix:\n", + "0,0 -1,1 1,-1 \n", + "1,-1 0,0 -1,1 \n", + "-1,1 1,-1 0,0 " + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "state.apply_actions([0, 1])\n", + "state" + ] + }, + { + "cell_type": "markdown", + "id": "045cf8dd", + "metadata": {}, + "source": [ + "OpenSpiel can generate an NFG representation of the game loadable in Gambit:" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "f5fa4e42", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'NFG 1 R \"OpenSpiel export of matrix_rps()\"\\n{ \"Player 0\" \"Player 1\" } { 3 3 }\\n\\n0 0\\n1 -1\\n-1 1\\n-1 1\\n0 0\\n1 -1\\n1 -1\\n-1 1\\n0 0\\n'" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nfg_matrix_rps_game = pyspiel.game_to_nfg_string(ops_matrix_rps_game)\n", + "nfg_matrix_rps_game" + ] + }, + { + "cell_type": "markdown", + "id": "70d1df64", + "metadata": {}, + "source": [ + "Now let's load the NFG in Gambit. Since Gambit's `read_nfg` function expects a file like object, we'll convert the string with `io.StringIO`.\n", + "We can also add labels for the actions to make the output more interpretable." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "b684325e", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "

Rock-Paper-Scissors

\n", + "
RockPaperScissors
Rock0,0-1,11,-1
Paper1,-10,0-1,1
Scissors-1,11,-10,0
\n" + ], + "text/plain": [ + "Game(title='Rock-Paper-Scissors')" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "gbt_matrix_rps_game = gbt.read_nfg(StringIO(nfg_matrix_rps_game))\n", + "\n", + "gbt_matrix_rps_game.title = \"Rock-Paper-Scissors\"\n", + "\n", + "for player in gbt_matrix_rps_game.players:\n", + " player.strategies[0].label = \"Rock\"\n", + " player.strategies[1].label = \"Paper\"\n", + " player.strategies[2].label = \"Scissors\"\n", + "\n", + "gbt_matrix_rps_game" + ] + }, + { + "cell_type": "markdown", + "id": "6d7da6f3", + "metadata": {}, + "source": [ + "The equilibrium mixed strategy profile for both players is to choose rock, paper, and scissors with equal probability:" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "707c6c30", + "metadata": {}, + "outputs": [ + { + "data": { + "text/latex": [ + "$\\left[\\left[\\frac{1}{3},\\frac{1}{3},\\frac{1}{3}\\right],\\left[\\frac{1}{3},\\frac{1}{3},\\frac{1}{3}\\right]\\right]$" + ], + "text/plain": [ + "[[Rational(1, 3), Rational(1, 3), Rational(1, 3)], [Rational(1, 3), Rational(1, 3), Rational(1, 3)]]" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "gbt.nash.lcp_solve(gbt_matrix_rps_game).equilibria[0]" + ] + }, + { + "cell_type": "markdown", + "id": "966e7e3f", + "metadata": {}, + "source": [ + "We can use OpenSpiel's dynamics module to demonstrate evolutionary game theory dynamics, or \"replicator dynamics\", which models how a mixed strategy profile evolves over time based on how the strategies (e.g., choice of actions A, B, C with probabilities X, Y, Z) perform against one another.\n", + "\n", + "Let's start with an initial profile that is not at equilibrium, but weighted towards scissors with proportions: 30% Rock, 30% Paper, 40% Scissors:" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "cf1acdeb", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([ 0.03, -0.03, 0. ])" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "matrix_rps_payoffs = game_payoffs_array(ops_matrix_rps_game)\n", + "dyn = dynamics.SinglePopulationDynamics(matrix_rps_payoffs, dynamics.replicator)\n", + "x = np.array([0.3, 0.3, 0.4])\n", + "dyn(x)" + ] + }, + { + "cell_type": "markdown", + "id": "fa382753", + "metadata": {}, + "source": [ + "`dyn(x)` calculates the rate of change (derivative) for each strategy in the current profile and returns how fast each strategy's frequency is changing.\n", + "\n", + "In replicator dynamics, a strategy that performs well against others will increase in frequency, while strategies performing worse will decrease.\n", + "In our rock-paper-scissors example, the performance of each strategy depends on the probability it is assigned in the mixed strategy profile. At the start, whilst there are more players choosing scissors as their action, then rock will perform well and increase in frequency (be more likely to get played in subsequent rounds), while paper will perform poorly and decrease in frequency. We can plot how the frequency of each strategy changes over time:" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "b9a352c5", + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "def plot_rps_dynamics(proportions, steps=100, alpha=0.1, plot_average_strategy=False):\n", + " x = np.array(proportions)\n", + " rock_proportions = [x[0]]\n", + " paper_proportions = [x[1]]\n", + " scissors_proportions = [x[2]]\n", + " y = []\n", + " for _ in range(steps):\n", + " x += alpha * dyn(x)\n", + " rock_proportions.append(x[0])\n", + " paper_proportions.append(x[1])\n", + " scissors_proportions.append(x[2])\n", + " if plot_average_strategy:\n", + " y.append([np.mean(rock_proportions), np.mean(paper_proportions), np.mean(scissors_proportions)])\n", + " else:\n", + " y.append(x.copy())\n", + " y = np.array(y)\n", + "\n", + " plt.plot(y[:, 0], label=\"Rock\")\n", + " plt.plot(y[:, 1], label=\"Paper\")\n", + " plt.plot(y[:, 2], label=\"Scissors\")\n", + " plt.xlabel(\"Time step\")\n", + " if plot_average_strategy:\n", + " plt.ylabel(\"Strategy frequency average up to time step\")\n", + " else:\n", + " plt.ylabel(\"Strategy frequency\")\n", + " plt.legend()\n", + " plt.show()\n", + "\n", + "plot_rps_dynamics([0.3, 0.3, 0.4])" + ] + }, + { + "cell_type": "markdown", + "id": "8569aef4", + "metadata": {}, + "source": [ + "Through the dynamics, we can see that the population proportions oscillate around the equilibrium point (1/3, 1/3, 1/3) without converging to it, because the best strategy depends on the likelihood of the opponents' actions, as defined by the current action probabilities.\n", + "\n", + "However, if we start with the initial population already at the equilibrium mixed strategy profile computed by Gambit (each action is chosen exactly 1/3 of the time), the strategy frequencies will remain constant over time (at the equilibrium point):" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "86c6aa52", + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plot_rps_dynamics([1/3, 1/3, 1/3])" + ] + }, + { + "cell_type": "markdown", + "id": "a1f6662e", + "metadata": {}, + "source": [ + "When starting from an unbalanced initial mixed strategy profile, the strategy frequencies will oscillate around the equilibrium point without converging to it. However, if we plot the average strategy frequencies over time, we can see that this begins to converge to the equilibrium point:" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "189f898f", + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plot_rps_dynamics([0.3, 0.3, 0.4], plot_average_strategy=True)" + ] + }, + { + "cell_type": "markdown", + "id": "078a21e0", + "metadata": {}, + "source": [ + "## Normal form games created with Gambit\n", + "\n", + "You can also set up a normal form game in Gambit and export it to OpenSpiel. Here we demonstrate this with the Prisoner's Dilemma game from tutorial 1." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "cdd0bfe0", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "

Prisoner's Dilemma

\n", + "
CooperateDefect
Cooperate-1,-1-3,0
Defect0,-3-2,-2
\n" + ], + "text/plain": [ + "Game(title='Prisoner's Dilemma')" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "gbt_prisoners_dilemma_game = gbt.read_nfg(\"games/prisoners_dilemma.nfg\")\n", + "gbt_prisoners_dilemma_game" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "d42e6545", + "metadata": {}, + "outputs": [ + { + "data": { + "text/latex": [ + "$\\left[\\left[0,1\\right],\\left[0,1\\right]\\right]$" + ], + "text/plain": [ + "[[Rational(0, 1), Rational(1, 1)], [Rational(0, 1), Rational(1, 1)]]" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "gbt.nash.lcp_solve(gbt_prisoners_dilemma_game).equilibria[0]" + ] + }, + { + "cell_type": "markdown", + "id": "15dd432d", + "metadata": {}, + "source": [ + "As expected, Gambit computes the equilibrium strategy for both players as choosing cooperate with probability 0 and defect with probability 1.\n", + "\n", + "To re-create the game in OpenSpiel we extract the player payoffs to NumPy arrays, which are then used to create a matrix game in OpenSpiel:" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "fcd42af0", + "metadata": {}, + "outputs": [], + "source": [ + "p1_payoffs, p2_payoffs = gbt_prisoners_dilemma_game.to_arrays(dtype=float)\n", + "ops_prisoners_dilemma_game = pyspiel.create_matrix_game(\n", + " gbt_prisoners_dilemma_game.title,\n", + " \"Classic Prisoner's Dilemma\", # description\n", + " [strategy.label for strategy in gbt_prisoners_dilemma_game.players[0].strategies],\n", + " [strategy.label for strategy in gbt_prisoners_dilemma_game.players[1].strategies],\n", + " p1_payoffs,\n", + " p2_payoffs\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "625a35a4", + "metadata": {}, + "source": [ + "Like rock-paper-scissors, the Prisoner's Dilemma is a 1-step simultaneous-move normal form game; we'll apply a list of player actions in one step to reach the terminal state. Let's have both player choose to defect (1):" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "7ce6f2e2", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Terminal? true\n", + "History: 1, 1\n", + "Returns: -2,-2\n", + "Row actions: \n", + "Col actions: \n", + "Utility matrix:\n", + "-1,-1 -3,0 \n", + "0,-3 -2,-2 " + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "state = ops_prisoners_dilemma_game.new_initial_state()\n", + "state.apply_actions([1, 1])\n", + "state" + ] + }, + { + "cell_type": "markdown", + "id": "1fea0224", + "metadata": {}, + "source": [ + "Unlike in rock-paper-scissors, the Prisoner's Dilemma has a dominant strategy equilibrium, in which both players defect.\n", + "Using evolutionary dynamics, we can see that a population starting with a mix of cooperators and defectors will evolve towards all defectors over time:" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "d1495c7c", + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "matrix_pd_payoffs = game_payoffs_array(ops_prisoners_dilemma_game)\n", + "pd_dyn = dynamics.SinglePopulationDynamics(matrix_pd_payoffs, dynamics.replicator)\n", + "\n", + "def plot_pd_dynamics(proportions, steps=100, alpha=0.1):\n", + " x = np.array(proportions)\n", + " y = []\n", + " for _ in range(steps):\n", + " x += alpha * pd_dyn(x)\n", + " y.append(x.copy())\n", + " y = np.array(y)\n", + " plt.plot(y[:, 0], label=\"Cooperate\")\n", + " plt.plot(y[:, 1], label=\"Defect\")\n", + " plt.xlabel(\"Time step\")\n", + " plt.ylabel(\"Frequency\")\n", + " plt.legend()\n", + " plt.show()\n", + "\n", + "plot_pd_dynamics([0.8, 0.2])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "b12f6330", + "metadata": {}, + "source": [ + "## Extensive form games from the OpenSpiel library\n", + "\n", + "For extensive form games, OpenSpiel can export to the EFG format used by Gambit. Here we demonstrate this with **Tiny Hanabi**, loaded from the OpenSpiel [game library](https://openspiel.readthedocs.io/en/latest/games.html)." + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "02a42600", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'EFG 2 R \"tiny_hanabi()\" { \"Pl0\" \"Pl1\" } \\nc \"\" 1 \"\" { \"d0\" 0.5000000000000000 \"d1\" 0.5000000000000000 } 0\\n c \"p0:d0\" 2 \"\" { \"d0\" 0.5000000000000000 \"d1\" 0.5000000000000000 } 0\\n p \"\" 1 1 \"\" { \"p0a0\" \"p0a1\" \"p0a2\" } 0\\n p \"\" 2 1 \"\" { \"p1a0\" \"p1a1\" \"p1a2\" } 0\\n t \"\" 1 \"\" { 10.0 10.0 }\\n t \"\" 2 \"\" { 0.0 0.0 }\\n t \"\" 3 \"\" { 0.0 0.0 }\\n p \"\" 2 2 \"\" { \"p1a0\" \"p1a1\" \"p1a2\" } 0\\n t \"\" 4 \"\" { 4.0 4.0 }\\n t \"\" 5 \"\" { 8.0 8.0 }\\n t \"\" 6 \"\" { 4.0 4.0 }\\n p \"\" 2 3 \"\" { \"p1a0\" \"p1a1\" \"p1a2\" } 0\\n t \"\" 7 \"\" { 10.0 10.0 }\\n t \"\" 8 \"\" { 0.0 0.0 }\\n t \"\" 9 \"\" { 0.0 0.0 }\\n p \"\" 1 1 \"\" { \"p0a0\" \"p0a1\" \"p0a2\" } 0\\n p \"\" 2 4 \"\" { \"p1a0\" \"p1a1\" \"p1a2\" } 0\\n t \"\" 10 \"\" { 0.0 0.0 }\\n t \"\" 11 \"\" { 0.0 0.0 }\\n t \"\" 12 \"\" { 10.0 10.0 }\\n p \"\" 2 5 \"\" { \"p1a0\" \"p1a1\" \"p1a2\" } 0\\n t \"\" 13 \"\" { 4.0 4.0 }\\n t \"\" 14 \"\" { 8.0 8.0 }\\n t \"\" 15 \"\" { 4.0 4.0 }\\n p \"\" 2 6 \"\" { \"p1a0\" \"p1a1\" \"p1a2\" } 0\\n t \"\" 16 \"\" { 0.0 0.0 }\\n t \"\" 17 \"\" { 0.0 0.0 }\\n t \"\" 18 \"\" { 10.0 10.0 }\\n c \"p0:d1\" 3 \"\" { \"d0\" 0.5000000000000000 \"d1\" 0.5000000000000000 } 0\\n p \"\" 1 2 \"\" { \"p0a0\" \"p0a1\" \"p0a2\" } 0\\n p \"\" 2 1 \"\" { \"p1a0\" \"p1a1\" \"p1a2\" } 0\\n t \"\" 19 \"\" { 0.0 0.0 }\\n t \"\" 20 \"\" { 0.0 0.0 }\\n t \"\" 21 \"\" { 10.0 10.0 }\\n p \"\" 2 2 \"\" { \"p1a0\" \"p1a1\" \"p1a2\" } 0\\n t \"\" 22 \"\" { 4.0 4.0 }\\n t \"\" 23 \"\" { 8.0 8.0 }\\n t \"\" 24 \"\" { 4.0 4.0 }\\n p \"\" 2 3 \"\" { \"p1a0\" \"p1a1\" \"p1a2\" } 0\\n t \"\" 25 \"\" { 0.0 0.0 }\\n t \"\" 26 \"\" { 0.0 0.0 }\\n t \"\" 27 \"\" { 0.0 0.0 }\\n p \"\" 1 2 \"\" { \"p0a0\" \"p0a1\" \"p0a2\" } 0\\n p \"\" 2 4 \"\" { \"p1a0\" \"p1a1\" \"p1a2\" } 0\\n t \"\" 28 \"\" { 10.0 10.0 }\\n t \"\" 29 \"\" { 0.0 0.0 }\\n t \"\" 30 \"\" { 0.0 0.0 }\\n p \"\" 2 5 \"\" { \"p1a0\" \"p1a1\" \"p1a2\" } 0\\n t \"\" 31 \"\" { 4.0 4.0 }\\n t \"\" 32 \"\" { 8.0 8.0 }\\n t \"\" 33 \"\" { 4.0 4.0 }\\n p \"\" 2 6 \"\" { \"p1a0\" \"p1a1\" \"p1a2\" } 0\\n t \"\" 34 \"\" { 10.0 10.0 }\\n t \"\" 35 \"\" { 0.0 0.0 }\\n t \"\" 36 \"\" { 0.0 0.0 }\\n'" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "ops_hanabi_game = pyspiel.load_game(\"tiny_hanabi\")\n", + "efg_hanabi_game = export_gambit(ops_hanabi_game)\n", + "efg_hanabi_game" + ] + }, + { + "cell_type": "markdown", + "id": "fa354c9f", + "metadata": {}, + "source": [ + "Now let's load the EFG in Gambit.\n", + "We can then compute equilibria strategies for the players as usual." + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "1a534e25", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Pl0\n", + "Pl1\n" + ] + } + ], + "source": [ + "gbt_hanabi_game = gbt.read_efg(StringIO(efg_hanabi_game))\n", + "eqm = gbt.nash.lcp_solve(gbt_hanabi_game).equilibria[0]\n", + "for player in gbt_hanabi_game.players:\n", + " print(player.label)" + ] + }, + { + "cell_type": "markdown", + "id": "cdfe924e", + "metadata": {}, + "source": [ + "We can look at player 0's equilibrium strategy:" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "1ec19b1c", + "metadata": {}, + "outputs": [ + { + "data": { + "text/latex": [ + "$\\left[\\left[0,0,1\\right],\\left[0,1,0\\right]\\right]$" + ], + "text/plain": [ + "[[Rational(0, 1), Rational(0, 1), Rational(1, 1)], [Rational(0, 1), Rational(1, 1), Rational(0, 1)]]" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "eqm['Pl0']" + ] + }, + { + "cell_type": "markdown", + "id": "b54411c0", + "metadata": {}, + "source": [ + "...and use Gambit to explore what those numbers actually mean for player 0:" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "ae9fc7a7", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "At information set 0, Player 0 plays action 0 with probability: 0 and action 1 with probability: 0 and action 2 with probability: 1\n", + "At information set 1, Player 0 plays action 0 with probability: 0 and action 1 with probability: 1 and action 2 with probability: 0\n" + ] + } + ], + "source": [ + "for infoset, mixed_action in eqm[\"Pl0\"].mixed_actions():\n", + " print(\n", + " f\"At information set {infoset.number}, \"\n", + " f\"Player 0 plays action 0 with probability: {mixed_action['p0a0']}\"\n", + " f\" and action 1 with probability: {mixed_action['p0a1']}\"\n", + " f\" and action 2 with probability: {mixed_action['p0a2']}\"\n", + " )" + ] + }, + { + "cell_type": "markdown", + "id": "eac73a24", + "metadata": {}, + "source": [ + "For player 1, we can do the same:" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "8528e1bd", + "metadata": {}, + "outputs": [ + { + "data": { + "text/latex": [ + "$\\left[\\left[0,0,1\\right],\\left[0,1,0\\right],\\left[1,0,0\\right],\\left[0,0,1\\right],\\left[0,1,0\\right],\\left[0,0,1\\right]\\right]$" + ], + "text/plain": [ + "[[Rational(0, 1), Rational(0, 1), Rational(1, 1)], [Rational(0, 1), Rational(1, 1), Rational(0, 1)], [Rational(1, 1), Rational(0, 1), Rational(0, 1)], [Rational(0, 1), Rational(0, 1), Rational(1, 1)], [Rational(0, 1), Rational(1, 1), Rational(0, 1)], [Rational(0, 1), Rational(0, 1), Rational(1, 1)]]" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "eqm['Pl1']" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "id": "2965aed0", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "At information set 0, Player 1 plays action 0 with probability: 0 and action 1 with probability: 0 and action 2 with probability: 1\n", + "At information set 1, Player 1 plays action 0 with probability: 0 and action 1 with probability: 1 and action 2 with probability: 0\n", + "At information set 2, Player 1 plays action 0 with probability: 1 and action 1 with probability: 0 and action 2 with probability: 0\n", + "At information set 3, Player 1 plays action 0 with probability: 0 and action 1 with probability: 0 and action 2 with probability: 1\n", + "At information set 4, Player 1 plays action 0 with probability: 0 and action 1 with probability: 1 and action 2 with probability: 0\n", + "At information set 5, Player 1 plays action 0 with probability: 0 and action 1 with probability: 0 and action 2 with probability: 1\n" + ] + } + ], + "source": [ + "for infoset, mixed_action in eqm[\"Pl1\"].mixed_actions():\n", + " print(\n", + " f\"At information set {infoset.number}, \"\n", + " f\"Player 1 plays action 0 with probability: {mixed_action['p1a0']}\"\n", + " f\" and action 1 with probability: {mixed_action['p1a1']}\"\n", + " f\" and action 2 with probability: {mixed_action['p1a2']}\"\n", + " )" + ] + }, + { + "cell_type": "markdown", + "id": "d628c0d5", + "metadata": {}, + "source": [ + "Let's now train 2 agents using independent Q-learning on Tiny Hanabi, and play them against eachother.\n", + "\n", + "We can compare the learned strategies played to the equilibrium strategies computed by Gambit.\n", + "\n", + "First let's open the RL environment for Tiny Hanabi and create the agents, one for each player (2 players in this case):" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "id": "4e72c924", + "metadata": {}, + "outputs": [], + "source": [ + "# Create the environment\n", + "env = rl_environment.Environment(\"tiny_hanabi\")\n", + "num_players = env.num_players\n", + "num_actions = env.action_spec()[\"num_actions\"]\n", + "\n", + "# Create the agents\n", + "agents = [\n", + " tabular_qlearner.QLearner(player_id=idx, num_actions=num_actions)\n", + " for idx in range(num_players)\n", + "]" + ] + }, + { + "cell_type": "markdown", + "id": "4bf9eea4", + "metadata": {}, + "source": [ + "Now we can train the Q-learning agents in self-play." + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "id": "53547263", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Episodes: 0\n", + "Episodes: 10000\n", + "Episodes: 20000\n", + "Episodes: 30000\n" + ] + } + ], + "source": [ + "for cur_episode in range(30000):\n", + " if cur_episode % 10000 == 0:\n", + " print(f\"Episodes: {cur_episode}\")\n", + "\n", + " time_step = env.reset()\n", + " while not time_step.last():\n", + " player_id = time_step.observations[\"current_player\"]\n", + " agent_output = agents[player_id].step(time_step)\n", + " time_step = env.step([agent_output.action])\n", + "\n", + " # Episode is over, step all agents with final info state.\n", + " for agent in agents:\n", + " agent.step(time_step)\n", + "\n", + "print(f\"Episodes: {cur_episode+1}\")" + ] + }, + { + "cell_type": "markdown", + "id": "75cddd36", + "metadata": {}, + "source": [ + "Let's check out the strategies our agents have learned by playing them against eachother again, this time in evaluation mode (setting `is_evaluation=True`):" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "id": "d71bc733", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "p0:d0 p1:d1\n", + "Agent 0 chooses p0a2\n", + "\n", + "p0:d0 p1:d1 p0:a2\n", + "Agent 1 chooses p1a2\n", + "\n", + "p0:d0 p1:d1 p0:a2 p1:a2\n", + "Rewards: [10.0, 10.0]\n" + ] + } + ], + "source": [ + "time_step = env.reset()\n", + "\n", + "while not time_step.last():\n", + " print(\"\")\n", + " print(env.get_state)\n", + "\n", + " player_id = time_step.observations[\"current_player\"]\n", + " agent_output = agents[player_id].step(time_step, is_evaluation=True)\n", + " print(f\"Agent {player_id} chooses {env.get_state.action_to_string(agent_output.action)}\")\n", + " time_step = env.step([agent_output.action])\n", + "\n", + "print(\"\")\n", + "print(env.get_state)\n", + "print(f\"Rewards: {time_step.rewards}\")" + ] + }, + { + "cell_type": "markdown", + "id": "f1e9b174", + "metadata": {}, + "source": [ + "Are the learned strategies chosen by p0 and p1 consistent with an equilibrium computed by Gambit?\n", + "\n", + "When I ran the above I got the final game state `p0:d0 p1:d0 p0:a2 p1:a0` with payoffs `[10.0, 10.0]`. This is consistent with the equilibrium computed by Gambit:\n", + "- The node `p0:d0 p1:d0` is part of player 0's information set 0.\n", + "- p0 picks a2 which matches the first equilibrium strategy in `eqm['Pl0']` where action `p0a2` is played with probability 1.0.\n", + "- This puts player 1 in their information set 2, and player 1 picks action 0, which is consistent with `eqm['Pl1']` where action `p1a0` is played with probability 1.0." + ] + }, + { + "cell_type": "markdown", + "id": "6f356383", + "metadata": {}, + "source": [ + "## Extensive form games created with Gambit\n", + "\n", + "It's also possible to create an extensive form game in Gambit and export it to OpenSpiel. Here we demonstrate this with the one-card poker game introduced in tutorial 3." + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "id": "07340e32", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "efg_game()" + ] + }, + "execution_count": 28, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "with open(\"../poker.efg\", \"r\") as f:\n", + " poker_efg_string = f.read()\n", + " ops_one_card_poker = pyspiel.load_efg_game(poker_efg_string)\n", + "ops_one_card_poker" + ] + }, + { + "cell_type": "markdown", + "id": "ef6939f6", + "metadata": {}, + "source": [ + "Games loaded from EFG in OpenSpiel do not take advantage of the full functionality of the package, for example, it is not possible to carry out training with RL algorithms on these games, as in the example above with Tiny Hanabi. The OpenSpiel documentation explains [how to submit new games to the library](https://openspiel.readthedocs.io/en/latest/developer_guide.html#adding-a-game) if you wish to add your own games.\n", + "\n", + "We can however use the state representation and play through the game step by step:" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "id": "c01c4d6f", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "4" + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "ops_one_card_poker.num_distinct_actions()" + ] + }, + { + "cell_type": "markdown", + "id": "9986860c", + "metadata": {}, + "source": [ + "The one-card poker game has 4 distinct actions, 2 are for the first player (Alice in the example game): \"Raise\" and \"Fold\", and 2 for the second player (Bob): \"Meet\" and \"Pass\".\n", + "\n", + "Initialising the game state, we can see the current player at the start is the chance player, who deals the cards:" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "id": "3b9cc43b", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0: Chance: 1 King 0.5 Queen 0.5" + ] + }, + "execution_count": 30, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "state = ops_one_card_poker.new_initial_state()\n", + "state" + ] + }, + { + "cell_type": "markdown", + "id": "7b0959f9", + "metadata": {}, + "source": [ + "Let's make the chance player's action dealing a King (action 0):" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "id": "4dd5d504", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1: Player: 1 1 Raise Fold" + ] + }, + "execution_count": 31, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "state.apply_action(0)\n", + "state" + ] + }, + { + "cell_type": "markdown", + "id": "b4291f07", + "metadata": {}, + "source": [ + "As expected, it's now the first player's (Alice's) turn.\n", + "Let's have Alice choose to \"Raise\" (action 0):" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "id": "bd15369f", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "3: Player: 2 1 Meet Pass" + ] + }, + "execution_count": 32, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "state.apply_action(0)\n", + "state" + ] + }, + { + "cell_type": "markdown", + "id": "cd63f7d7", + "metadata": {}, + "source": [ + "As expected, the current player is now player 2 (Bob), let's check the legal actions available to Bob:" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "id": "8d81ff6b", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[2, 3]" + ] + }, + "execution_count": 33, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "state.legal_actions()" + ] + }, + { + "cell_type": "markdown", + "id": "fdb5194f", + "metadata": {}, + "source": [ + "Whereas player 1 (Alice) had the option to \"Raise\" (action 0) and \"Fold\" (action 1), player 2 (Bob) now has the option to \"Meet\" (action 2) or \"Pass\" (action 3).\n", + "Let's have Bob choose to \"Pass\":" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "id": "97913fe5", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "6: Terminal: Alice wins 1 -1" + ] + }, + "execution_count": 34, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "state.apply_action(3)\n", + "state" + ] + }, + { + "cell_type": "markdown", + "id": "1bf09576", + "metadata": {}, + "source": [ + "Since Bob passed, Alice takes the small win and we reach a terminal state." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "gbt_pygraphviz", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/tutorials/games/prisoners_dilemma.nfg b/doc/tutorials/games/prisoners_dilemma.nfg new file mode 100644 index 000000000..a551362f6 --- /dev/null +++ b/doc/tutorials/games/prisoners_dilemma.nfg @@ -0,0 +1,14 @@ +NFG 1 R "Prisoner's Dilemma" { "Tom" "Jerry" } + +{ { "Cooperate" "Defect" } +{ "Cooperate" "Defect" } +} +"" + +{ +{ "" -1, -1 } +{ "" 0, -3 } +{ "" -3, 0 } +{ "" -2, -2 } +} +1 2 3 4 diff --git a/doc/tutorials/games/trust_game.efg b/doc/tutorials/games/trust_game.efg new file mode 100644 index 000000000..5b85cac9d --- /dev/null +++ b/doc/tutorials/games/trust_game.efg @@ -0,0 +1,8 @@ +EFG 2 R "One-shot trust game, after Kreps (1990)" { "Buyer" "Seller" } +"" + +p "" 1 1 "" { "Trust" "Not trust" } 0 +p "Trust" 2 1 "" { "Honor" "Abuse" } 0 +t "Honor" 1 "Trustworthy" { 1, 1 } +t "Abuse" 2 "Untrustworthy" { -1, 2 } +t "Not trust" 3 "Opt-out" { 0, 0 } diff --git a/doc/tutorials/running_locally.rst b/doc/tutorials/running_locally.rst new file mode 100644 index 000000000..f17eae6e2 --- /dev/null +++ b/doc/tutorials/running_locally.rst @@ -0,0 +1,22 @@ +.. _local_tutorials: + +How to run PyGambit tutorials on your computer +============================================== + +The PyGambit tutorials are available as Jupyter notebooks and can be run interactively using any program that supports Jupyter notebooks, such as JupyterLab or VSCode. +You will need a working installation of Python 3 (tested with 3.9 and later) on your machine. + +1. To download the tutorials, open your OS's command prompt and clone the Gambit repository from GitHub, then navigate to the tutorials directory: :: + + git clone https://github.com/gambitproject/gambit.git + cd gambit/doc/tutorials + +2. Install `PyGambit` and `JupyterLab`. We recommend creating a new virtual environment and installing both the requirements there. e.g. :: + + python -m venv pygambit-env + source pygambit-env/bin/activate + pip install pygambit jupyterlab + +3. Open `JupyterLab` and click on any of the tutorial notebooks (files ending in `.ipynb`) :: + + jupyter lab