RPG Word Embeddings – What is Most Similar? – 01

I made a gloVe embedding model based on my game book collection – 7000 odd, of which 6000 or so managed to make it through a first pass pdf extraction pipeline

This framework is quite good https://github.com/NRCan/geoscience_language_models/tree/main/project_tools

https://github.com/NRCan/geoscience_language_models/tree/main/project_tools and parallelises, which is important for big books

The C version of gloVe is superior:

https://github.com/stanfordnlp/GloVe

With some work you can get a python version going, but I wouldn’t recommend for large numbers.

e.g. https://pypi.org/project/glove-py

and associated hacks..

The Notebook associated with this is here: https://github.com/bluetyson/RPG-gloVe-Model

These days microsoft probably won’t let you see something that big online, so will make a series of post excerpts.

%d bloggers like this: