A UCI-based Chess Game Analyser
David J. Barnes
and Julio Hernandez-Castro,

David

Introduction

This page hosts the C++ and Java source codes for a UCI-based analyser for chess games written in PGN notation. The programs were written as part our research for the paper, On the limits of engine analysis for cheating detection in chess, Computers & Security 48:58-73. Available at: http://dx.doi.org/10.1016/j.cose.2014.10.002

The source is made available under the terms of The GNU General Public License.

Abstract

The integrity of online games has important economic consequences for both the gaming industry and players of all levels, from professionals to amateurs. Where there is a high likelihood of cheating, there is a loss of trust and players will be reluctant to participate --- particularly if this is likely to cost them money.

Chess is one game that has been established online for around 25 years and is played over the Internet commercially. In that environment, where players are not physically present over the board (OTB), chess is one of the most easily exploitable games by those who wish to cheat, because of the widespread availability of very strong chess-playing programs. Allegations of cheating even in OTB games have increased significantly in recent years, and even led to recent changes in the laws of the game that potentially impinge upon players' privacy.

In this work, we examine some of the difficulties inherent in identifying the covert use of chess-playing programs purely from an analysis of the moves of a game. Our approach is to deeply examine a large collection of games where there is a very high degree of confidence that cheating has not taken place, and analyse those that could be easily misclassified.

We conclude that there is a serious risk of finding numerous false positives and that, in general, it is unsafe to use just the moves of a single game as prima facie evidence of cheating. We also demonstrate that it is impossible to compute definitive values of the figures currently employed to measure similarity to a chess-engine for a particular game, as values inevitably vary at different depths and, even under identical conditions, when multi-threading evaluation is used.

Purpose

The purpose of the analyser is to read a source file containing moves of one or more chess games (properly encoded - see below) and pass them to a UCI-compatible chess engine for evaluation. The analyser receives back the engine's evaluations and writes them out in an XML format for processing by another program, such as the Java one provided here. The analyser's program arguments are used to configure the engine for search depth, number of candidate moves, etc. Note that the code does not include a UCI engine but provides an interface that allows games to be analysed via such an engine. A suitable free engine would be stockfish.

This page also hosts a separate Java XML processor to process the output from the analyser.

Installation

A Makefile is provided for installation of the analyser on Unix/Linux/cygwin environments that have a C++ compiler.

Usage

The analyser takes 0 or more command-line options (see below) and 0 or more game files. If no game files are provided then it reads games from standard input:

analyse [optional-command-line-options] game-file ...

The analyser requires as input games written with PGN headers. However, UCI engines expect the moves of a game to be formatted in long-algebraic notation, so games should be provided to the analyser in the form shown below under Input Format.

The easiest way to convert a game into this format is to use a tool such as pgn-extract. Its -Wuci flag will output the moves of a game in a format suitable for the analyser to pass on to a UCI engine. For instance:

pgn-extract -Wuci -oout.pgn games.pgn

outputs the PGN games from games.pgn to the file out.pgn, which would then be passed to the analyser.

Command-line Options

The UCI engine to use should be specified via --engine with either the full pathname of the engine program, or a name that will be found in your environment's executable search path. The default name is "stockfish".

Input Format

A game in the format expected by the analyser - note that the moves have been (re)written into long-algebraic form using pgn-extract.

[Event "World Championship 23th"]
[Site "Moscow"]
[Date "1960.03.17"]
[Round "2"]
[White "Botvinnik, Mikhail"]
[Black "Tal, Mihail"]
[Result "1/2-1/2"]
[BookDepth "17"]

d2d4 g8f6 c2c4 c7c5 d4d5 e7e6 b1c3 e6d5 c4d5 d7d6 g1f3 g7g6 c1g5 f8g7 f3d2 h7h6
g5h4 g6g5 h4g3 f6h5 d2c4 h5g3 h2g3 e8g8 e2e3 d8e7 f1e2 f8d8 e1g1 b8d7 a2a4 d7e5
c4e5 e7e5 a4a5 a8b8 a1a2 c8d7 c3b5 d7b5 e2b5 b7b6 a5a6 b8c8 d1d3 c8c7 b2b3 e5c3
d3c3 g7c3 a2c2 c3f6 g3g4 c7e7 c2c4 d8c8 g2g3 f6g7 f1d1 c8f8 d1d3 g8h7 g1g2 h7g6
d3d1 h6h5 g4h5 g6h5 g3g4 h5g6 c4c2 f8h8 b5d3 g6f6 g2g3 e7e8 d3b5 e8e4 c2c4 e4c4
b3c4 f6e7 b5a4 g7e5 g3f3 h8h4 d1g1 f7f5 1/2-1/2

Output Format

The analysis of each game is output in XML format (except when --annotatePGN is used). The output includes the game details

<game>
<tags>
<tag name = "Event" value = "World Championship 23th" />
<tag name = "Site" value = "Moscow" />
<tag name = "Date" value = "1960.03.17" />
<tag name = "Round" value = "2" />
<tag name = "White" value = "Botvinnik, Mikhail" />
<tag name = "Black" value = "Tal, Mihail" />
<tag name = "Result" value = "1/2-1/2" />
<tag name = "BookDepth" value = "17" />
</tags>
<moves>
d2d4 g8f6 c2c4 c7c5 d4d5 e7e6 b1c3 e6d5 c4d5 d7d6 g1f3 g7g6 c1g5 f8g7 f3d2 h7h6
g5h4 g6g5 h4g3 f6h5 d2c4 h5g3 h2g3 e8g8 e2e3 d8e7 f1e2 f8d8 e1g1 b8d7 a2a4 d7e5
c4e5 e7e5 a4a5 a8b8 a1a2 c8d7 c3b5 d7b5 e2b5 b7b6 a5a6 b8c8 d1d3 c8c7 b2b3 e5c3
d3c3 g7c3 a2c2 c3f6 g3g4 c7e7 c2c4 d8c8 g2g3 f6g7 f1d1 c8f8 d1d3 g8h7 g1g2 h7g6
d3d1 h6h5 g4h5 g6h5 g3g4 h5g6 c4c2 f8h8 b5d3 g6f6 g2g3 e7e8 d3b5 e8e4 c2c4 e4c4
b3c4 f6e7 b5a4 g7e5 g3f3 h8h4 d1g1 f7f5 1/2-1/2
</moves>
...
</game>

This is following by the analysis, which includes the settings of the engine:

<analysis engine = "..." bookDepth = "17" searchDepth = "10" variations = "5" > ... </analysis>

The evaluation of each played move and the required number of alternatives follow:

<move player = "black" >
<played>g6g5</played>
<evaluation move = "e8g8" value = "-88" />
<evaluation move = "d8e7" value = "-109" />
<evaluation move = "a7a6" value = "-121" />
<evaluation move = "b8d7" value = "-121" />
<evaluation move = "b8a6" value = "-133" />
<evaluation move = "g6g5" value = "-133" />
</move>

Evaluations are typically in centipawns.

Annotated output in PGN format

The argument --annotatePGN allows the engine's analysis to be output in PGN format rather than XML. Each move is followed by its evaluation in a comment and, if the engine considers there is a better move, the alternative in a variation with its evaluation.

A possible series of steps to analyse the games in games.pgn might be:

# Turn the original games into UCI-compatible format.
pgn-extract -Wuci --output games-uci.pgn games.pgn
# Analyse the games with stockfish to search depth 12 and
# output the analysis in PGN format.
analyse --engine stockfish --searchdepth 12 --annotatePGN games-uci.pgn > games-annotated.pgn 
# Format the output nicely.
pgn-extract --output annotated.pgn games-annotated.pgn

Since both pgn-extract and analyse can be used in a pipeline, the intermediate files may be omitted:

pgn-extract -Wuci games.pgn | \
    analyse --engine stockfish --searchdepth 12 --annotatePGN | \
    pgn-extract --output annotated.pgn games-annotated.pgn

Processing the XML Output

The XML processor is designed to read the XML output of the analyser and provide summary statistical information from it.

The program is provided as a JAR file, dataextract.jar, which can be executed, once downloaded, as follows:

java -jar dataextract.jar

given a suitable installation of the Java Runtime Environment (JRE).

Usage

The processor takes 0 or more command-line options (see below) and 1 or more XML files.

java -jar dataextract.jar [optional-command-line-options] xml-file ...

In essence, the program is used to provide summary statistics on the degree to which the moves played in a chess game match those selected by a UCI-compatible chess engine. The two key statistics are CV and AE. These are defined as follows:

For each game analysed, two lines are output (one for each player) in the following format:

Date:Player:W/B:BD:EM:Depth:AE:sd:CV:

Fields are separated with a colon character. The fields have a fixed-width, as indicated:

For instance:

2002:Pyshkin, Aleksandr Sergeevic  :B:  8: 22:14:  -19.27:  32.9: 0.41:

This line indicates a game played in 2002 where Pyshkin played black. The first 8 ply were treated as book moves and 22 further moves by Pyshkin were analysed to depth 14. The average centipawn difference in evaluation between the moves played by Pyshkin and the engine's assessment of the best move was -19.27. The standard deviation was 32.9. 41% of the moves played by Pyshkin were considered equal in evaluation to the best move at that point by the engine.

Annotating a Game

The processor can be used to convert the analyser's XML output into annotated versions of games via the --annotate flag. The alternative would be to use the --annotatePGN option to the analyser itself, so this option is really only necessary to post-process XML output to avoid re-running the original analysis. In this case, the game scores are output with centipawn evaluations of the moves and alternatives in comments. For example, using the analyser's output in output.xml to create annotated PGN games in annotated.pgn:

java -jar dataextract.jar --annotate annotated.pgn output.xml

Command-line Options

  • --AEthreshold D - set the lower AE threshold for outputting details of games to D.
  • --CVthreshold D - set the lower CV threshold for outputting details of games to D (0-1.0).
  • --annotate filename - output the games with evaluation annotations.
  • --fullstats - output the differences values of each move.
  • --help - show the usage information.
  • --id id-string - output only games with the given ID (see below for ID).
  • --idfile filename - output only games with the IDs listed in filename.
  • --matching - output the PGN for games that are output, in the file matching.pgn
  • --minlength N - only output games with a minimum of N evaluated moves.
  • --player name - only output games played by the given player. NB <White>, <Black> and <WhiteOrBlack> will match any player playing white, black, or either colour, respectively.
  • --random probability - randomly select games to be output with the given probability (0-1.0).
  • --stats - output stats on the game to standard output (default).

    ID strings

    The --id option allows selection of a game with specific details. The --idfile option allows multiple ID strings to be stored in a file to simplify command-line usage.

    An ID consists of the first 5 fields of the output format:

    Date:Player:W/B:BD:EM
    

    where Date is 4 characters, Player is 30 characters, W/B is 1 character, BD is two characters and EM is 3 characters.

    Examples:

    java -jar dataextract.jar --player "<WhiteOrBlack>" file.xml
    

    will output the stats for all players in the games found in file.xml

    java -jar dataextract.jar --player "<White>" --matching file.xml
    

    will output the stats for all players playing White and write a file, matching.pgn, containing the games.

    java -jar dataextract.jar --player "Morphy, Paul" --CVthreshold 0.9 file.xml
    

    will output only those statistics for games in file.xml in which Paul Morphy's moves have a CV value of at least 90%.

    java -jar dataextract.jar --id "2002:Pyshkin, Aleksandr Sergeevic  :B:  8: 22" --matching file.xml
    

    will output the game described above that was played in 2002 by Pyshkin.

    Related Links

    PGN-Spy by Michael Gleeson. A tool to help detect cheating in chess, uses the analyser for part of its functionality.


    This page (https://www.cs.kent.ac.uk/chessplag/ or https://www.cs.kent.ac.uk/uci-analyser/) is maintained by: David J. Barnes (the anti-spam email address will need editing by you) to whom any questions, comments and corrections should be addressed.

    © David J. Barnes and Julio Hernandez-Castro

    Last Updated: 16th June 2017: corrected output of "mate" values with the --annotate options of the analyser.
    Created: 9th April 2014