A project I have been meaning to do for ages is extract all games that are digital for searchability reasons. Can then do lots of fun NLP things with them of course.
This is a terrible job because of the planet’s love for that presentation format, the PDF. So some things are scans, some are a hack combo, some are a 4th generation format transfer. So lots of those will not work very well, so will have to do some sort of classification.
For example, the 1st edition AD&D DMG extracted fine first past but the Player’s Handbook did not. That sort of problem, then the OCR problem and others.
So an interest place to start going back the other way will be games that have actual text versions whether html [eg epub and websites], mobipocket, text files because of their age like FUDGE and others.
Some that spring to mind – Sine Nomine – Stars Without Number et al., Eclipse Phase, Dungeon World.
There are also on the web SRDs of various games so that would also be interesting.
On the NLP front you could end up with a multi-game version of ‘what is the general advice for a GM doing X’ answer capability.