Generative AI and combinatorial optimization, part 1: Why image generators can’t set up chess boards properly.
Language and chess are two of the best-known domains for combinatorial reasoning, and they share the fundamental property of all combinatorial domains: their discrete building blocks differ in quality, which means there is no in-between.
They also share the other fundamental property of combinatorics: arrangement matters! Switch two letters in a word, or two words in a sentence, and the meaning can change entirely. Switch two moves in a chess game, and a whole new game emerges.
That’s why text-based Image generators struggle with chessboards in a way they don’t struggle with foxes and badgers: they try to interpolate from the learned images of chess games. So they create impossible positions, multiple kings or even “in-between” figures.
But it’s noteworthy that they don’t have problems with the combination “a fox and a badger play chess”, even if there are likely no such pictures in the training data.