Zamia Brain
The Zamia Brain project provides infrastructure useful to create natural language processing systems based on transformer networks (see https://arxiv.org/abs/1706.03762 ).
This project is still highly experimental, everything is subject to change without prior notice. The current approach is to generate training corpora for pre-training as well as (multi-)domain refinement. The goal is to train networks that are very robust (i.e. avoid brittleness present in traditional rule-based systems) in their natural language processing capabilities (pretraining) while allowing for a certain amount of control of their behavior (refinement).
For this, you will find these components:
- scripts to generate pre-training corpora, typically using web scraping techniques as well as scripts that adapt scientific copora for training https://github.com/gooofy/zbrain
- scripts that generate corpora from patterns (“skills”) for refinement https://github.com/gooofy/zbrain
- A GPT-2 implementation along with tokenization, training and inference tools https://github.com/gooofy/transformer-lm
- A TransformerXL implementation along with tokenization, training and inference tools https://github.com/gooofy/transformer-xl
- Pre-trained models https://goofy.zamia.org/zamia-speech/brain/
Available Models
Downloads here:
https://goofy.zamia.org/zamia-speech/brain/
Model | Size | Language | Training corpus | Vocabulary |
---|---|---|---|---|
gpt2-german-345M-r20191119 | 345M | german | 10 epochs on 27GB twitter+wikipedia+heise+parole | 50k sentencepiece |
transformerXL-german-163M-r20190928 | 163M | german | 1 epochs on 27GB twitter+wikipedia+heise+parole | 50k sentencepiece |
Model Dataflow
Credits
Massive thanks to Konstantin Lopuhin https://github.com/lopuhin for great code and support!