A UCI-based Chess Game Analyser

A UCI-based Chess Game Analyser
David J. Barnes
and Julio Hernandez-Castro,

Introduction

This page hosts the C++ and Java source codes for a UCI-based analyser for chess games written in PGN notation. The programs were written as part our research for the paper, On the limits of engine analysis for cheating detection in chess, Computers & Security 48:58-73. Available at: http://dx.doi.org/10.1016/j.cose.2014.10.002

The source is made available under the terms of The GNU General Public License.

Abstract

The integrity of online games has important economic consequences for both the gaming industry and players of all levels, from professionals to amateurs. Where there is a high likelihood of cheating, there is a loss of trust and players will be reluctant to participate --- particularly if this is likely to cost them money.
Chess is one game that has been established online for around 25 years and is played over the Internet commercially. In that environment, where players are not physically present over the board (OTB), chess is one of the most easily exploitable games by those who wish to cheat, because of the widespread availability of very strong chess-playing programs. Allegations of cheating even in OTB games have increased significantly in recent years, and even led to recent changes in the laws of the game that potentially impinge upon players' privacy.
In this work, we examine some of the difficulties inherent in identifying the covert use of chess-playing programs purely from an analysis of the moves of a game. Our approach is to deeply examine a large collection of games where there is a very high degree of confidence that cheating has not taken place, and analyse those that could be easily misclassified.
We conclude that there is a serious risk of finding numerous false positives and that, in general, it is unsafe to use just the moves of a single game as prima facie evidence of cheating. We also demonstrate that it is impossible to compute definitive values of the figures currently employed to measure similarity to a chess-engine for a particular game, as values inevitably vary at different depths and, even under identical conditions, when multi-threading evaluation is used.

Purpose

The purpose of the analyser is to read a source file containing moves of one or more chess games (properly encoded - see below) and pass them to a UCI-compatible chess engine for evaluation. The analyser receives back the engine's evaluations and writes them out in an XML format for processing by another program, such as the Java one provided here. The analyser's program arguments are used to configure the engine for search depth, number of candidate moves, etc. Note that the code does not include a UCI engine but provides an interface that allows games to be analysed via such an engine. A suitable free engine would be stockfish.

This page also hosts a separate Java XML processor to process the output from the analyser.

Installation

A Makefile is provided for installation of the analyser on Unix/Linux/cygwin environments that have a C++ compiler.

Usage

The analyser takes 0 or more command-line options (see below) and 0 or more game files. If no game files are provided then it reads games from standard input:

analyse [optional-command-line-options] game-file ...

The analyser requires as input games written with PGN headers. However, UCI engines expect the moves of a game to be formatted in long-algebraic notation, so games should be provided to the analyser in the form shown below under Input Format.

The easiest way to convert a game into this format is to use a tool such as pgn-extract. Its -Wuci flag will output the moves of a game in a format suitable for the analyser to pass on to a UCI engine. For instance:

pgn-extract -Wuci -oout.pgn games.pgn

outputs the PGN games from games.pgn to the file out.pgn, which would then be passed to the analyser.

Command-line Options

The UCI engine to use should be specified via --engine with either the full pathname of the engine program, or a name that will be found in your environment's executable search path. The default name is "stockfish".

--annotate
output the games with evaluation annotations
--annotatePGN
output the games in PGN format with evaluation annotations
--blackonly
only analyse black's moves
--bookdepth depth
depth in ply to skip at start of game
--searchdepth depth
search depth in ply
--engine program
program to use as the UCI engine
--help
show this usage message
--setoption optionName optionValue
set a UCI option
--variations vars number of variations to analyse per move
--whiteonly only analyse white's moves



Input Format
A game in the format expected by the analyser - note that the moves have been
(re)written into long-algebraic form using
pgn-extract.

[Event "World Championship 23th"]
[Site "Moscow"]
[Date "1960.03.17"]
[Round "2"]
[White "Botvinnik, Mikhail"]
[Black "Tal, Mihail"]
[Result "1/2-1/2"]
[BookDepth "17"]

d2d4 g8f6 c2c4 c7c5 d4d5 e7e6 b1c3 e6d5 c4d5 d7d6 g1f3 g7g6 c1g5 f8g7 f3d2 h7h6
g5h4 g6g5 h4g3 f6h5 d2c4 h5g3 h2g3 e8g8 e2e3 d8e7 f1e2 f8d8 e1g1 b8d7 a2a4 d7e5
c4e5 e7e5 a4a5 a8b8 a1a2 c8d7 c3b5 d7b5 e2b5 b7b6 a5a6 b8c8 d1d3 c8c7 b2b3 e5c3
d3c3 g7c3 a2c2 c3f6 g3g4 c7e7 c2c4 d8c8 g2g3 f6g7 f1d1 c8f8 d1d3 g8h7 g1g2 h7g6
d3d1 h6h5 g4h5 g6h5 g3g4 h5g6 c4c2 f8h8 b5d3 g6f6 g2g3 e7e8 d3b5 e8e4 c2c4 e4c4
b3c4 f6e7 b5a4 g7e5 g3f3 h8h4 d1g1 f7f5 1/2-1/2


Output Format
The analysis of each game is output in XML format (except when
--annotatePGN is
used). The output includes the game details

<game>
<tags>
<tag name = "Event" value = "World Championship 23th" />
<tag name = "Site" value = "Moscow" />
<tag name = "Date" value = "1960.03.17" />
<tag name = "Round" value = "2" />
<tag name = "White" value = "Botvinnik, Mikhail" />
<tag name = "Black" value = "Tal, Mihail" />
<tag name = "Result" value = "1/2-1/2" />
<tag name = "BookDepth" value = "17" />
</tags>
<moves>
d2d4 g8f6 c2c4 c7c5 d4d5 e7e6 b1c3 e6d5 c4d5 d7d6 g1f3 g7g6 c1g5 f8g7 f3d2 h7h6
g5h4 g6g5 h4g3 f6h5 d2c4 h5g3 h2g3 e8g8 e2e3 d8e7 f1e2 f8d8 e1g1 b8d7 a2a4 d7e5
c4e5 e7e5 a4a5 a8b8 a1a2 c8d7 c3b5 d7b5 e2b5 b7b6 a5a6 b8c8 d1d3 c8c7 b2b3 e5c3
d3c3 g7c3 a2c2 c3f6 g3g4 c7e7 c2c4 d8c8 g2g3 f6g7 f1d1 c8f8 d1d3 g8h7 g1g2 h7g6
d3d1 h6h5 g4h5 g6h5 g3g4 h5g6 c4c2 f8h8 b5d3 g6f6 g2g3 e7e8 d3b5 e8e4 c2c4 e4c4
b3c4 f6e7 b5a4 g7e5 g3f3 h8h4 d1g1 f7f5 1/2-1/2
</moves>
...
</game>


This is following by the analysis, which includes the settings of the engine:

<analysis engine = "..." bookDepth = "17" searchDepth = "10" variations = "5" > ... </analysis>


The evaluation of each played move and the required number of alternatives follow:

<move player = "black" >
<played>g6g5</played>
<evaluation move = "e8g8" value = "-88" />
<evaluation move = "d8e7" value = "-109" />
<evaluation move = "a7a6" value = "-121" />
<evaluation move = "b8d7" value = "-121" />
<evaluation move = "b8a6" value = "-133" />
<evaluation move = "g6g5" value = "-133" />
</move>


Evaluations are typically in centipawns.

Annotated output in PGN format
The argument --annotatePGN allows the engine's analysis
to be output in PGN format rather than XML. Each move is followed by
its evaluation in a comment and, if the engine considers there is a better move,
the alternative in a variation with its evaluation.
A possible series of steps to analyse the games in games.pgn might be:
# Turn the original games into UCI-compatible format.
pgn-extract -Wuci --output games-uci.pgn games.pgn
# Analyse the games with stockfish to search depth 12 and
# output the analysis in PGN format.
analyse --engine stockfish --searchdepth 12 --annotatePGN games-uci.pgn > games-annotated.pgn 
# Format the output nicely.
pgn-extract --output annotated.pgn games-annotated.pgn

Since both pgn-extract and analyse can be used in
a pipeline, the intermediate files may be omitted:
pgn-extract -Wuci games.pgn | \
    analyse --engine stockfish --searchdepth 12 --annotatePGN | \
    pgn-extract --output annotated.pgn games-annotated.pgn


Processing the XML Output
The XML processor is designed
to read the XML output of the analyser and provide summary statistical information from it.

The program is provided as a JAR file, dataextract.jar, which can be executed,
once downloaded, as follows:

java -jar dataextract.jar


given a suitable installation of the Java Runtime Environment (JRE).

Usage
The processor takes 0 or more command-line options (see below) and 1 or more
XML files.

java -jar dataextract.jar [optional-command-line-options] xml-file ...


In essence, the program is used to provide summary statistics on the degree to
which the moves played in a chess game match those selected by a UCI-compatible
chess engine. The two key statistics are CV and AE. These are defined as follows:


Coincidence Value (CV) is a figure between 0 and 1 representing
the proportion of non-book moves chosen by a player with the same evaluation as
the engine's preferred move.

Average error (AE) is the mean difference in evaluation between the best move
and the played move for non-book moves, expressed in centipawns.


For each game analysed, two lines are output (one for each player) in the following
format:

Date:Player:W/B:BD:EM:Depth:AE:sd:CV:


Fields are separated with a colon character. The fields have a fixed-width, as indicated:

Date (4): The year of the game.
Player (30): The player's name.
W/B (1): Either W or B depending upon which player colour the data refers to.
BD (3): The book depth. The number of ply considered book and, therefore, not analysed.
EM (3): The number of moves evaluated for this player in this game.
Depth (2): The UCI engine's search depth.
AE (8): The AE value for this player's moves in this game.
sd (6): The standard deviation of the score differences for this player in this game.
CV (5): The CV value for this player's moves in this game.


For instance:
2002:Pyshkin, Aleksandr Sergeevic  :B:  8: 22:14:  -19.27:  32.9: 0.41:


This line indicates a game played in 2002 where Pyshkin played black. The first 8
ply were treated as book moves and 22 further moves by Pyshkin were analysed to depth
14. The average centipawn difference in evaluation between the moves played by Pyshkin
and the engine's assessment of the best move was -19.27. The standard deviation was
32.9. 41% of the moves played by Pyshkin were considered equal in evaluation to the
best move at that point by the engine.  

Annotating a Game
The processor can be used to convert the analyser's XML output into annotated versions
of games via the --annotate flag. The alternative would be to use the
--annotatePGN
option to the analyser itself, so this option is really only necessary to
post-process XML output to avoid re-running the original analysis.
In this case, the game scores are output with centipawn evaluations of the moves and alternatives
in comments.
For example, using the analyser's output in output.xml
to create annotated PGN games in annotated.pgn:
java -jar dataextract.jar --annotate annotated.pgn output.xml


Command-line Options
--AEthreshold D - set the lower AE threshold for outputting details of games to D.
--CVthreshold D - set the lower CV threshold for outputting details of games to D (0-1.0).
--annotate filename - output the games with evaluation annotations.
--fullstats - output the differences values of each move.
--help - show the usage information.
--id id-string - output only games with the given ID (see below for ID).
--idfile filename - output only games with the IDs listed in filename.
--matching - output the PGN for games that are output, in the file matching.pgn
--minlength N - only output games with a minimum of N evaluated moves.
--player name - only output games played by the given player. NB <White>, <Black> and
<WhiteOrBlack> will match any player playing white, black, or either colour, respectively.
--random probability - randomly select games to be output with the given probability (0-1.0).
--stats - output stats on the game to standard output (default).


ID strings
The --id option allows selection of a game with specific details. The --idfile option allows
multiple ID strings to be stored in a file to simplify command-line usage.

An ID consists of the first 5 fields of the output format:

Date:Player:W/B:BD:EM


where Date is 4 characters, Player is 30 characters, W/B is 1 character, BD is two characters
and EM is 3 characters.

Examples:
java -jar dataextract.jar --player "<WhiteOrBlack>" file.xml


will output the stats for all players in the games found in file.xml

java -jar dataextract.jar --player "<White>" --matching file.xml


will output the stats for all players playing White and write a file, matching.pgn,
containing the games.

java -jar dataextract.jar --player "Morphy, Paul" --CVthreshold 0.9 file.xml


will output only those statistics for games in file.xml in which Paul Morphy's moves have
a CV value of at least 90%.

java -jar dataextract.jar --id "2002:Pyshkin, Aleksandr Sergeevic  :B:  8: 22" --matching file.xml


will output the game described above that was played in 2002 by Pyshkin.

Related Links
PGN-Spy by Michael Gleeson.
A tool to help detect cheating in chess, uses the analyser for part of
its functionality.



This page
(https://www.cs.kent.ac.uk/chessplag/ or
https://www.cs.kent.ac.uk/uci-analyser/)
is maintained by:
David J. Barnes
(the anti-spam email address will need editing by you)
to whom any questions, comments and corrections should be addressed.

© David J. Barnes and Julio Hernandez-Castro

Last Updated: 16th June 2017: corrected output of "mate" values with the --annotate
options of the analyser.


Created: 9th April 2014