Category Archives: chess databases

Capturing Attention

This review has been printed in the May 2020 issue of Chess Life.  A penultimate (and unedited) version of the review is reproduced here. Minor differences exist between this and the printed version. My thanks to the good folks at Chess Life for allowing me to do so.


Correspondence Database 2020 – $199.95 new. Downloadable.
https://shop.chessbase.com/en/products/corr_2020

UltraCorr2020v2 – 55 euros. Downloadable.
http://www.chessmail.com/UC-2020/UC2020-intro.html

Fernschach 2020. 15 euros mailed outside of Germany; 13.50 euros inside Germany. http://www.fernschachbund.de/fernschach-cd/index.html

Correspondence chess is taking on a new relevance in modern chess, and not just because it can be played during a pandemic.

Because most organizations allow players to make use of all resources in move generation, including computers, games at the upper echelons of correspondence chess are fantastically rich, melding high-powered silicon power with human guidance to create some astounding games.[1]

Top level players are taking notice. Attentive readers will note the increasing reference to correspondence games in game annotations, and it’s clear that the elite players are mining correspondence games for opening ideas. GM Erwin l’Ami, second to GM Anish Giri, has a column in the New in Chess Yearbook where he shows new and interesting ideas from correspondence play, and recently he has taken up the mantle himself, playing games through the International Correspondence Chess Federation (ICCF).

l’Ami is not the only Grandmaster to try his hand at correspondence chess. GM Ulf Andersson had a brilliant, if brief, dalliance with the form, and now GM Krishnan Sasikiran is perhaps the most active correspondence player among over-the-board GMs. I had intended to show you a crushing win by Sasikiran (Jaulneau-Sasikiran, ICCF 2014) on the Black side of a King’s Indian here, but l’Ami has annotated an even more amazing Sasikiran game in the newest (#134) New in Chess Yearbook.

SCOTCH GAMBIT [C56]
Wieland Belka (2500)
Krishnan Sasikiran (2550)
corr ICCF, 31.03.2019

1.e4 e5 2. Nf3 Nc6 3. d4 exd4 4. Bc4 Nf6 5. e5 d5 6. Bb5 Ne4 7. Nxd4 Bd7 8. Bxc6 bxc6 9. 0–0 Bc5 10. f3 Ng5 11. Be3 0–0 12. f4 f6!?

12… Ne4 is the main line here, while 12. … Ne6 is a playable alternative. Sasikiran’s move is not a novelty, as l’Ami seems to imply. Still, he has worked out a fascinating new concept in this very well-known opening.

13. exf6

  1. fxg5? fxe5 14. Nf5 d4 gives Black back the material with interest, as l’Ami points out.

13… Nh3+!

The key point. 13. … Ne4 14. fxg7 Re8 was tried in Dostal-Efimov (ICCF 2015) but Sasikiran improves on this dramatically.

14. gxh3 Bxh3

l’Ami says that Black has full compensation here; for the details, check out his notes in Yearbook 134. The game concluded:

15. b4 Bb6 16. Rf3 Bg4 17. fxg7 Re8 18. Bf2 Qf6 19. c3 Qg6 20. Kh1 Bxd4 21. cxd4 Qh5 22. Nd2 Re4 23. Kg2 Rxf4 24. Bg3 Rxd4 25. Kg1 Bxf3 26. Qxf3 Qxf3 27. Nxf3 Rxb4 ½–½

It is not easy for the non-initiate to find correspondence games to study, although (as we will see) the data is out there if you know where to look. Much simpler is the use of a commercial correspondence database, and this month, we’ll take a serious look at two such products and mention a third in passing.

The ChessBase Correspondence Database 2020 (Corr 2020) contains 1,626,801 games, 5,674 of which are annotated. Author and correspondence Senior IM Tim Harding’s UltraCorr 2020 is a long running alternative to ChessBase’s database, with this year’s edition coming in at 2,150,356 games, 37,048 of which are annotated.

Two sets of numbers stand out here: the number of games / annotated games in each database, and their respective prices. Corr 2020 has fewer games for more money, while UltraCorr is less expensive but includes vastly more games. But numbers don’t always tell the full story, and for me, the differences between Corr 2020 and Ultracorr are rooted in the types of products each tries to be.

Corr 2020 is a typical ChessBase product. The data is very clean and standardized, and it benefits from its publisher’s relationship with top players. The unused Sasikiran game mentioned above – Jaulneau-Sasikiran – is annotated by the player in Corr 2020, and the database is accompanied by a set of fourteen videos from from some of ChessBase’s stable of authors, all of which deal with games from correspondence play.

UltraCorr is much more of a one-man job. Harding has done excellent work in scouring the web for games, giving his database over 500,000 more games than are found in Corr 2020, but not all of the games are of good quality, and there is less standardization of names and events. Some over-the-board games have even crept in.

UltraCorr’s nearly sevenfold advantage annotated games over Corr 2020 is also something of a mixed bag, Harding has obviously spent a lot of time collecting and inputting annotations, and UltraCorr contains nearly all of the notes that appeared in the nine year run of Harding’s well-regarded Chess Mail magazine. But more than two thousand of the annotations in UltraCorr are “anno-Fritzed,” automatically annotated by ancient versions of Fritz and Junior, and still more are annotated in name only.

Both Corr 2020 and UltraCorr 2020 are designed to be comprehensive historical documents, covering the earliest correspondence games on record along with World Championships, Olympiads, etc. Both have a tremendous number of recent games as well. So how does one choose between them?

UltraCorr is certainly the best “bang for the buck,” with more games and (legitimate) annotations for less money. But there is something to be said for the well-curated data in Corr 2020 as well. When I built my own research database from earlier incarnations of these two sources some months ago, I used Corr 2018 as the basis for the new “Frankencorr,” and then “cannibalized” games from UltraCorr 2019 to build it out. My reason for doing so was to try to retain as much of Corr 2018’s clean data – names and tournaments – as possible.

Corr 2020 and UltraCorr 2020 are both fine products, each with their own selling points and drawbacks. Just to complicate things further, let me briefly mention a third option. Herbert Bellmann, a German Senior IM in correspondence, is the publisher of Fernschach 2020. It’s my sense that Bellmann is not trying to compete with either ChessBase or Harding with this product; rather, this is more or less Bellmann’s own private database for sale, something that can be seen in the data itself.

Fernschach 2020 contains 1,513,390 games, with annotations to 12,655 of them. That last number is, as with UltraCorr 2020, somewhat misleading. Many of the games listed as annotated lack notes, but more than 1500 are annotated by Bellmann himself. The database does not try to be historically comprehensive – the first game is from 1978! – and instead appears to focus intensely on German correspondence events. For that reason it may be a valuable resource for the chess researcher, particularly given its modest price.

Fernschach 2020 was published in October 2019, while Corr 2020 appeared in November 2019 and the final edition of UltraCorr 2020 was released in February 2020. This gives us an indication of how up to date each product is, although again, the dates alone can be misleading. Case in point: the Belka-Sasikiran game given above appears in both Fernschach and UltraCorr, but not Corr 2020, even though Corr 2020 appeared after Fernschach.

None of the three databases discussed in this month’s column include updates, which are especially important for those users looking for new opening ideas in correspondence play. This can be overcome with a bit of work on the part of end users.

The great majority of important correspondence games are played through the ICCF. Those with accounts at iccf.com (it’s free to sign up) can download new game files at the beginning of each month, and the frugal among us could even create a fairly substantial correspondence database just by downloading and collating all of the games at ICCF.

Other important correspondence sites and sources of games are:

It is sufficient to simply merge .pgn files from these sites into one’s database of choice. Perfectionists will want to process those files to remove diacritical marks and hyphens from names, as Bellmann, ChessBase, and Harding all do, in the interest of standardization. For this, I can recommend either the Text Mechanic website[2] or a combination of Notepad++ and “Python Script”[3] as effective tools. Trying to edit all the games manually in ChessBase is also possible, but it is difficult and time consuming.

….

A personal note before we go: this is my last review column for Chess Life, as I will be taking over editorial duties come June 1st. Thanks to Dan Lucas and Melinda Matthews for giving me page space each month to try and do real criticism, something that is a bit of a rarity in today’s publishing world. While I suspect that readers will not have agreed with all of my judgments, I trust that the seriousness and honesty I’ve tried to bring to the page came through.

I’m thrilled that my dear friend and mentor IM John Watson will be taking over review duties for Chess Life next month. For me, John is the best reviewer in the business, and I can’t wait to see what he has to say.


[1] Note that events run by US Chess do not permit computer use, so if you have a competitive itch you need to scratch and you don’t want Stockfish’s help, this might be worth investigating!

[2] http://textmechanic.com/text-tools/basic-text-tools/remove-letter-accents/

[3] https://superuser.com/questions/484141/replacing-all-special-accented-characters-with-equivalent-regular-characters-in

Diving into Databases

BigBase / MegaBase 2016

Correspondence Database 2015.

The Week in Chess (TWIC)

Paramount Chess Database.

———————————————–

When I was in high school and learning about the basics of computer science, I was taught an acronym to underscore the importance of having clean data to work with: GIGO, or ‘Garbage in, Garbage out.’ You can have all the fantastic algorithms and formula you like, but if your data is in poor shape, you’ll never come close to the results you desire.

The same is true of chess data. You can buy the fanciest GUI (graphical user interface) the market has to offer, and you can collect all of the strongest engines around, but if you’re working with poor quality data, your research will suffer for it. Fortunately for us, there are a number of high quality databases out there, each fulfilling a specific set of needs for different types of users.

In this review I’ll look at four (or five, depending on how you look at things) of the most important databases out there, and as we will see, there is something useful for just about everyone. All of them are available in ChessBase’s native data format, and two (TWIC and Paramount) are also available in .pgn format, making them readable by those using GUIs other than ChessBase or Fritz.

Big / MegaBase 2016

There’s no way around it. You need a large reference database if you’re going to do any serious chess research or study. Online databases like chess-db.com, chessgames.com and ChessBase’s own online database are no substitute. They require internet connections and you can’t easily manipulate online data. The largest and most well-known of these reference databases are Big Database (BigBase) and Mega Database (MegaBase) 2016 from ChessBase.

BigBase and MegaBase each contain over 6.46 million games running from the earliest recorded games through October of this year. The database is searchable by player, tournament, and annotator (among other things), and you can access various indices or ‘keys’ for openings, endgames, strategic and tactical themes. Note the last three keys are not accessible in the default ChessBase 12/13 settings. You can access them by going to Options – Misc – Use ‘Theme Keys.’

Mega 2016 keys

You might suspect, given the name of the product, that each year brings a new version of the database to the market. And you would be correct to do so. The 2015 release of MegaBase contained 6,161,344 games, and the data wranglers at ChessBase have bumped that total to 6,466,288 in the 2016 edition. About half of these games have appeared in issues of ChessBase Magazine and ChessBase Magazine Extra, but 166,692 of them are entirely new to the ecosystem.

Mega 2016 Sources

While the majority are from 2014 and 2015 events, there are some historical additions as well. Among them are 18 games played by Botvinnik, 14 by Alekhine, and 9 by Spassky.

There are a number of similarities between BigBase and MegaBase. The number of games in each product is identical, as are the indices and keys. So what distinguishes them? MegaBase comes with two additional features that BigBase lacks: the inclusion of annotated games and a year’s worth of weekly updates. [MegaBase also comes with an updated version of PlayerBase, which collects rating data and pictures for thousands of players, but since I don’t use the feature, I will refrain from commenting on it.]

The 2016 version of MegaBase includes over seventy five thousand games with named annotators. This represents an increase of 3425 annotated games over the 2015 edition. While regulars like Atalik, Ftacnik and Marin provide notes to Super-GM games, there are also analyzed games by lesser-known combatants. Hundreds of annotated games from John Donaldson and Elliot Winslow are new to this edition, all of which come from amateur contests at the Mechanics Institute in the past few years.

MegaBase also comes with an update service, where weekly downloads of 5000 games are provided for a year. As a point of comparison, we are currently at update number 49 for MegaBase 2015, and 245713 games have been added to the database with all updates included.

MegaBase Update Service

This means, by the way, that not every game submitted to ChessBase is included in these weekly updates. Apples to oranges comparisons aren’t possible, but about sixty thousand or so games are in the 2016 database and not in the fully updated 2015 version.

BigBase and MegaBase are the preeminent reference databases available today. They are not perfect. Tim Harding has remarked on problems (some of which appear to have been fixed) with Blackburne’s games, for example, and John Watson never played in the 1966 British U14 Championship. Doubtless there remains plenty of tournaments, like the 1995 MCC/ACF Summer International (whose bulletin sits on my desk), just waiting to be entered into the computer. But no other database comes close to these two in terms of comprehensiveness and cleanliness of data. Anyone doing serious chess work, from openings to history to biography, needs one of these two products.

BigBase 2016 is available for download or post for €59.90 ($55.42 without VAT for those outside the EU). MegaBase 2016, which includes the annotated games, the weekly updates and the PlayerBase, costs €159.90 ($147.93 without VAT), and updates from previous versions of MegaBase costs €59.90 ($55.42 without VAT). The Update option comes with the annotated games, weekly updates, etc.

Correspondence Database 2015

Opening theorists are increasingly turning to correspondence games in their work. In his newly released Grandmaster Repertoire 20: The Semi-Slav, for instance, Lars Schandorff makes extensive use of games by the Russian Correspondence Grandmaster Efremov in working out the theory of the Botvinnik Variation. Such scrutiny is entirely logical if you think about it. The best correspondence players use all possible resources – books, computers, whatever! – over a period of months to choose their moves, making their games a veritable gold mine for opening ideas and novelties.

This is one area in which both the Big and Mega Databases are lacking, as they contain only over-the-board games. It is possible to cobble together a database of correspondence games by going to the websites of major correspondence organizations (ICCF, IECC, BdF, LSS) and collecting published games, but instead you might consider the Correspondence Database 2015 from ChessBase.

The Correspondence Database 2015 (CorrBase) contains 1,274,161 games played by post and e-mail from 1804 through January 2015. (The dates in this database seem to refer to the start date for the games.) 5649 of those games are annotated. The 2015 version of CorrBase also contains over 200,000 new games when compared with its 2013 incarnation, and it includes games from all of the leading correspondence groups.

So what will you find here? Let’s look at the games of ICCF-GM Aleksandr Gennadiev Efremov, the ‘hero’ of the early chapters of Schandorff’s new book. 577 of Efremov’s games appear in CorrBase 2015, including dozens of games (with both colors) in the Semi-Slav. The latest of these began sometime in 2013, and just about every one of Schandorff’s citations can be found in CorrBase.

CorrBase 2015 is an incredibly useful resource for the serious opening theorist or correspondence player. Because there is no update service (the TeleChess sections of CBM notwithstanding) discerning users will want to search out the latest games each month at organizational websites and add them to their databases. The effort is entirely worth it.

The Correspondence Database 2015 is available via download or post for €99.90 ($92.42 without VAT). An upgrade from earlier versions is available for €59.90 ($55.42 without VAT).

The Week in Chess

Not everyone can afford to buy MegaBase, and for those who do buy BigBase, there remains the problem of keeping the database up-to-date. For both of these problems there is Mark Crowther’s indispensable e-magazine The Week in Chess (TWIC).

The first issue of TWIC appeared in September of 1994. Each week since then, Crowther has produced a text report on the week’s chess news along with a database of new games in ChessBase and .pgn formats. Because both have always been available to download at no cost, TWIC has become a weekly must-see for players of all strengths. Indeed, we get a sense of just how central Crowther’s work has become with this tweet from Anish Giri:

Giri's tweet

We should cut Giri some slack. He was, after all, on his honeymoon!

Every issue of TWIC, from #1 (Sept 17, 1994) through the current day (#1094 at the time of writing), can be downloaded from The Week in Chess website. The databases from issue #920 (June 25, 2012) forward are also available. Combining those 175 files, a user could create a free database with 495,966 (482,290 after killing doubles) games to study. Among them we find 640 games played by Vachier-Lagrave (the most in the database), 516 by Nakamura, 507 by Svidler, and 7 miserable efforts by Hartmann.

This would be sufficient as a first step in chess research and database use, but Crowther also offers his readers the possibility of downloading a copy of his complete, private database for a donation of £30. The database contains every game ever published in TWIC, and as of the last version (#1-1093) it contained nearly 1.8 million games.

Crowther’s £30 offer is, in my opinion, very good value for the money. This is all the more true once you consider that you can keep it updated for free by downloading new issues of TWIC each week. I also suspect that you would boost your karmic standing by supporting Crowther’s tremendous efforts with a donation.

Owners of BigBase, who do not receive weekly updates as part of their purchase, can also use new issues of TWIC to update BigBase. Just keep in mind that the standardized names used by ChessBase and TWIC are different, so if you’re interested in studying (for instance) Kramnik’s games, you’ll have to look at ‘Kramnik,Vladimir’ (BigBase) and ‘Kramnik,V’ (TWIC) to find them all.

Paramount Chess Database

The Paramount Chess Database (Paramount) represents a complementary approach to chess research. Instead of the millions of games found in the databases discussed above, Paramount only contains 113,832 games with a roughly 70/40 split between complete games and fragments. What’s the value in that, you might ask? These are the collected games of issues 1-123 of the Chess Informant series of books, legendary among players since the first one was published in 1966. There are decades of history and knowledge collected in these games.

What has traditionally separated the Informant series from other chess publications was its annotators. It was a badge of honor to have your game selected for inclusion in the Informant, and just about every major player since the 60s has annotated for the series. All of those annotations are collected in the Paramount Database, and that’s what differentiates this products from those discussed above.

Here are some examples: there are 60 games annotated by Kasparov in MegaBase 2016, and 592 in Paramount. Anand annotated 506 games in Paramount and 267 in MegaBase. Older players like Larsen, Petrosian and Tal each have hundreds of annotated games in Paramount, while their notes in MegaBase can cumulatively be counted on two hands.

Why is this important? Others might provide competent notes, especially in the age of the computer, but games annotated by the combatants themselves have a special value. This is where the Paramount database shines, albeit with one caveat. You are more likely to find annotations by today’s Super GMs in MegaBase than in Paramount due to editorial shifts in Belgrade.

How might a player use the Paramount database? Two avenues come to mind. First, this database is very well suited to doing the kind of historical opening research championed by Kasparov in Garry Kasparov on Modern Chess: Revolution in the 70s. It’s hard to think of a better way to gain insight into, say, the Zaitsev Ruy than to actually study the games and notes that created modern theory, most of which appear in Paramount. The database can also be used to study the most important games of specific players, many of which are (as noted above) annotated by the players themselves.

One nice feature of the Paramount package is the way in which the data is presented after installation. You get a complete database of all the games, but dozens of smaller databases organized by opening, player and annotator are also included. This makes studying a specific player or important opening very easy. Each issue of the Informant appears in its own separate file, and the data is also provided in .pgn format.

Paramount databases

The Paramount Chess Database is available by download or post for $199 from the publisher, although you can find discounted deals at various chess retailers on the web.

Summary

There is no substitute for having a large research database such as MegaBase or BigBase at your disposal for pre-game preparation, opening research and general chess study. Because MegaBase comes with annotated games, weekly updates and the PlayerBase, it is the premier database product on the market today. Serious opening analysts and correspondence players should absolutely consider supplementing BigBase or MegaBase with CorrBase.

Not everyone can afford MegaBase. For those on a budget, BigBase is an adequate stand-in for MegaBase. For those less interested in historical games and more in recent examples, Mark Crowther’s complete The Week in Chess database is perhaps a more worthy and cost-effective replacement.

Downloading the free weekly updates of TWIC and maintaining a stand-alone TWIC database should be part of every ambitious player’s weekly schedule, even if you own MegaBase and use the update subscription service. Games appear at different times in the TWIC and MegaBase updates, so if you’re doing pre-game scouting on an opponent, you should have a look at both sources.

The Paramount Chess Database has a different role to play in your research portfolio. Paramount is a wonderful historical document, a font of opening ideas to be mined and a tremendous source of well-annotated games by the best players of the past half-century. It is a superb complement to your reference database of choice, but it does not replace the need for one.