23 February 2013

Problem with the SP or with the Engines?

Continuing with Proof of Concept with HarryO, in Waving a Yellow Flag I discovered that SP408 and its twin SP749 give lopsided results for White in engine-vs-engine play. Is this because of a problem with the SP or a problem with the engines?

HarryO and I set out to play the SP in the comments to a post on his blog: Non-Random Chess 960 Trial Game 6: SP408. He chose to play 1.d4, the same move played in 64 of the 112 CCRL games, and which had garnered a success rate of 75% for White. The resulting position is shown in the following diagram.


SP408 RBQNBNKR after 1.d4

Now it was my move. I noted in my first comment,

From CCRL I'm seeing 1.d4 with a WLD score of +44-12=8. The two most important variations are 1...c5 with +22-9=6 and 1...Ng6 with +20-2=0. Those are terrible stats for Black.

Later I added,

After 1.d4 c5 2.d5, the most popular move is 2...c4. It's not a bug. The engine is trying to prevent c2-c4, which creates a strong center for White. Note that White has also moved the same Pawn twice, but the second move is very strong because it limits the movement of the Black Knights. I don't think 1...c5 is playable.

On top of that, I don't like 1...Ng6. It commits the Knight to a less than optimal square and leaves White a free hand in the center. I appreciate that it prepares ...O-O and guards the weak e-Pawn, but neither of these objectives is a priority.

I finally decided to play 1...d5, a move which had not been tried in any of the 112 CCRL games. HarryO played the critical move 2.c4, against which I had prepared 2...Nde6. The move depends on the correctness of the tactical sequence 3.cxd5 Nxd4. We played through move 16, and although Black never achieved full equality, he was never in real danger of losing in the opening.

Looking again at the comments we made while making the moves, I am impressed by our general discoveries about chess960. After the game, HarryO expanded the anchor post to highlight some of the unusual variations that might have been played.

Our next game started with the twin of SP408: Non-Random Chess 960 Trial Game 7: SP749. It has the same sequence of pieces, but in reverse order -- only the castling considerations are different. The first two moves for each side mirrored the ideas discovered in SP408. White varied first, playing the equivalent of 3.e3. This put less pressure on Black and when we finally abandoned the trial on move 15, Black had achieved equality and was perhaps even somewhat better.

Getting back to the initial question -- Were the lopsided CCRL results 'because of a problem with the SP or a problem with the engines?' -- I'm convinced that it's a problem with the engines. A discussion of why they go wrong would be a good topic for a future post.

1 comment:

GeneM said...

I tend to agree with your conclusion Mark, that...
...The chess960 engines are making some dubious move choices in the opening of some start setups.

The CCRL data shows some large differences (in White's win rate, and in the draw rate) for mirror setups; for no justifiable reason.
Yes the castling situation is slightly different between mirror setups, but not different enough to account for the win-draw rate differences.

If we turn off the opening book database and make Fritz play the traditional setup opening on his own, we quickly see how very far from optimal Fritz opens without his crutch database.
Therefore we should assume that Fritz also plays the other 958 openings poorly by modern grandmaster standards in the traditional setup. This poor play lets in a lot of variability and volatility.

If the chess world eventually selects a second setup to often reuse, the current Fritz (and Houdini, Rybka etc) will be secondary in the development of the whole new opening theory, and grandmasters and the more numerous masters will be the primary developers.
All will use Fritz mostly to blunder check.

GeneM