Fake it Till You Make it: Self-Supervised Semantic Shifts for Monolingual Word Embedding Tasks

This code repository contains the code for the experiments seen in the paper Fake it Till You Make it: Self-Supervised Semantic Shifts for Monolingual Word Embedding Tasks (2020).

Requirements

This repository contains mainly Python3 routines and dependencies listed in requirements.txt. To install the dependencies using pip/venv, run:

pip3 install -r requirements.txt

Setup

After installing the requirements, run setup.sh to configure the environment and download the pre-trained word embeddings.

sh setup.sh

This will create the folders to store the results, and will download pre-trained vectors. The size of the download is approximately 500MB.

Alternatively, the pre-trained embeddings can be downloaded here.

Results

British English 🇬🇧 vs. American English 🇺🇸

Results for the classification task on detecting semantic shift between British English and American English.

** Requires the pre-trained word embeddings from BNC and COCA **

To reproduce these results, run:

chmod +x ukus_experiment.sh
./ukus_experiment.sh

By default, results are saved to results/ukus/cls_results.txt.

Method	Alignment	Accuracy	Precision	Recall	F1
COS	global	0.35	0.71	0.19	0.3
S4-D	global	0.45 +- 0.02	0.45 +- 0.02	0.45 +- 0.02	0.45 +- 0.03
Noisy-Pairs	-	0.29	1.0	0.03	0.06

SemEval-2020 Task on Unsupervised Lexical Semantic Change Detection

Results for the binary classification task on semantic shift for multiple languages (SemEval2020 Task 1): English, German, Latin, and Swedish.

** Requires the pre-trained embeddings from SemEval **

To reproduce these results:

chmod +x semeval_experiment.sh
./semeval_experiment.sh

By default, results are saved to results/semeval/cls_results.txt.

Method	Language	Mean acc.	Max acc.
s4	english	0.62	0.7
noise-aware	english	0.61	0.65
top-10	english	0.59	0.68
bot-10	english	0.58	0.68
global	english	0.61	0.68
top-5	english	0.59	0.65
bot-5	english	0.57	0.68

ArXiv Semantic Shift Discovery

Word discovery experiment on the arXiv data set for subjects Artificial Intelligence (cs.AI) and Classical Physics (physics.class-ph). This table shows the list of top semantically shifted words uniquely discovered by Global, Noise-Aware and S4-A alignments, respectively. As well as the most shifted words commonly discovered by all three methods.

** Requires the pre-trained embeddings from arXiv **

To reproduce these results:

chmod +x arxiv_experiment.sh
./arxiv_experiment.sh

The table of results is saved in results/arxiv/table.txt, the ranking correlation plot is saved in results/arxiv/arxiv_ranking.pdf.

Global	S4-A	Common
agent	components	concepts	nodes
approximation	element	density	phys
boundary	mass	deterministic	polynomial
conceptual	order	die	probability
knowledge	solution	edge	respect
plane	space	equations	rev
reference	state	fields	rough
rules	term	internal	rule
system	time	light	tensor
systems	vector	los	variables

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data/ukus		data/ukus
results		results
LICENSE		LICENSE
README.md		README.md
WordVectors.py		WordVectors.py
alignment.py		alignment.py
arxiv.py		arxiv.py
arxiv_bigram.py		arxiv_bigram.py
arxiv_bigram_experiment.sh		arxiv_bigram_experiment.sh
arxiv_experiment.sh		arxiv_experiment.sh
arxiv_table.py		arxiv_table.py
cord19.py		cord19.py
cord_experiment.sh		cord_experiment.sh
noise_aware.py		noise_aware.py
requirements.txt		requirements.txt
results.txt		results.txt
s4.py		s4.py
semeval.py		semeval.py
semeval_experiment.sh		semeval_experiment.sh
setup.sh		setup.sh
ukus.py		ukus.py
ukus_experiment.sh		ukus_experiment.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Fake it Till You Make it: Self-Supervised Semantic Shifts for Monolingual Word Embedding Tasks

Requirements

Setup

Results

British English 🇬🇧 vs. American English 🇺🇸

SemEval-2020 Task on Unsupervised Lexical Semantic Change Detection

ArXiv Semantic Shift Discovery

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

IBM/S4_semantic_shift

Folders and files

Latest commit

History

Repository files navigation

Fake it Till You Make it: Self-Supervised Semantic Shifts for Monolingual Word Embedding Tasks

Requirements

Setup

Results

British English 🇬🇧 vs. American English 🇺🇸

SemEval-2020 Task on Unsupervised Lexical Semantic Change Detection

ArXiv Semantic Shift Discovery

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages