RPG Word Embeddings – What is Most Similar? – 01

I made a gloVe embedding model based on my game book collection – 7000 odd, of which 6000 or so managed to make it through a first pass pdf extraction pipeline

This framework is quite good https://github.com/NRCan/geoscience_language_models/tree/main/project_tools

https://github.com/NRCan/geoscience_language_models/tree/main/project_tools and parallelises, which is important for big books

The C version of gloVe is superior:


With some work you can get a python version going, but I wouldn’t recommend for large numbers.

e.g. https://pypi.org/project/glove-py

and associated hacks..

The Notebook associated with this is here: https://github.com/bluetyson/RPG-gloVe-Model

These days microsoft probably won’t let you see something that big online, so will make a series of post excerpts.

RPGs with text versions – finding them.

A project I have been meaning to do for ages is extract all games that are digital for searchability reasons. Can then do lots of fun NLP things with them of course.

This is a terrible job because of the planet’s love for that presentation format, the PDF. So some things are scans, some are a hack combo, some are a 4th generation format transfer. So lots of those will not work very well, so will have to do some sort of classification.

For example, the 1st edition AD&D DMG extracted fine first past but the Player’s Handbook did not. That sort of problem, then the OCR problem and others.

So an interest place to start going back the other way will be games that have actual text versions whether html [eg epub and websites], mobipocket, text files because of their age like FUDGE and others.

Some that spring to mind – Sine Nomine – Stars Without Number et al., Eclipse Phase, Dungeon World.

There are also on the web SRDs of various games so that would also be interesting.

On the NLP front you could end up with a multi-game version of ‘what is the general advice for a GM doing X’ answer capability.

%d bloggers like this: