Linguistics

Linguistics explanation, tools and algorithms.

Table of contents: <wiki:toc max_depth="2" />

General resources

Stanford links [- Software resources http://www-a2k.is.tokushima-u.ac.jp/member/kita/NLP/nlp_tools.html

Linguistics tools

Suggested and used tools for linguistics analysis. We need to obtain a sentence structure and base form of all words. This process is composed of four parts, each is described below.

Tokenizer

A basic tool for string splitting. Tokenizer separates numbers from letters, separates words and deals with a punctuation.

Input	Natural language sentence; negation detection
Output	Array of tokens (eg. words, numbers)

Tools

The EGYPT toolkit - perl tokenizer

Part of a statistical machine translation toolkit. http://www.clsp.jhu.edu/ws99/projects/mt/

LingPipe - tokenizer (added in version 4.0.1)

Good looking Java toolkit for processing text using computational linguistics. [- Simple Java tokenizer for regular expressions

http://introcs.cs.princeton.edu/72regular/Tokenizer.java.html

Tagger

Usually tool just for part-of-speech tagging. Identifies basic linguistic category for each word.

Input	Array of tokens
Output	Array of POS tagged tokens (eg. adjectives, verbs)

Tools

A Maximum Entropy Model for Part-Of-Speech Tagging

Java implementation of this tagger - [Another wrapper - http://godel.stanford.edu/public/doc-versions/util/doc/api/csli/util/nlp/postag/MXPOST.html as a part of basic tools (/util) at the Center for the Study of Language and Information at Stanford University [Penn Treebank Tags - explanation of all tags http://bulba.sdsu.edu/jeanette/thesis/PennTags.html#RB

Stanford Log-linear Part-Of-Speech Tagger

Java implementation of the log-linear part-of-speech taggers [## Parser

Mainly a statistical parser. This tool is used to discover sentence structure, usually written as a syntactic tree. Part of a syntactic analysis.

Input	Array of POS tagged tokens
Output	Parse trees of each sentence

Tools

M. Collins parsing model http://people.csail.mit.edu/mcollins/code.html
Multilingual Statistical Parsing Engine in Java - based on Michal Collins's parser [- MaltParser modern data-driven dependency parser in Java http://maltparser.org/
RASP: Robust Accurate Statistical Parsing toolkit bigger parser - [## Morphological analyser

Analysis and description of the structure of morphemes. For our purpose is sufficient obtaining lemmas (base forms). Eg. have for had. Lemmatizers are finding only lemmas, stemmers are for finding stems (eg. bug for debugging).

Input	Parse trees of each sentence
Output	Lemma for each word

Tools

Morphological tool-set Developed by G. Minnen at the University of Sussex http://www.informatics.susx.ac.uk/research/groups/nlp/carroll/morph.html
mate-tools (anna) Tools set including lemmatizer with models [- Lttoolbox-java Part of Apertium project (translator) - java implementation http://wiki.apertium.org/wiki/Lttoolbox-java

General tools

Core NLP Stanford suite of Core NLP Tools (allmost all operations) [- XTAG Tool for operations with Tree Adjoining Grammars http://www.cis.upenn.edu/~xtag/

Sentence negation

Keywords:	negative-positive conversion

Identification of a negation in sentence
Dealing with negation

antonyms links in a corpora
WordNet [Testing WordNet online - http://wordnetweb.princeton.edu/perl/webwn
MIT Java Wordnet Interface (JWI) [## Tools & Algorithms
Sentiment Analysis with Python NLTK Text Classification - online tool -http://text-processing.com/demo/sentiment/
Negation Detection Processes - basic algorithm - http://blog.typeslashcode.com/voxpop/2010/02/negation-detection-processes/

Reference

Tools

Ying He, Mehmet Kayaalp: A Comparison of 13 Tokenizers on MEDLINE [- Adwait Ratnaparkhi: A Maximum Entropy Model for Part-Of-Speech Tagging http://www.ldc.upenn.edu/acl/W/W96/W96-0213.pdf

Sentence negation

Sanda Harabagiu, Andrew Hickl and Finley Lacatusu: Negation, Contrast and Contradiction in Text Processing [- SSergey Goryachev, Margarita Sordo, Qing T. Zeng, Long Ngo: Implementation and Evaluation of Four Different Methods of Negation Detection https://www.i2b2.org/software/projects/hitex/negation.pdf
Katsura, Y., Matsumoto, K., Ren, F.:Flexible English writing support based on negative-positive conversion method [- Pradeep G. Mutalik, MD, Aniruddha Deshpande, MD, and Prakash M. Nadkarni, MD: Use of General-purpose Negation Detection to Augment Concept Indexing of Medical Documents http://www.ncbi.nlm.nih.gov/pmc/articles/PMC130070/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Linguistics

Linguistics explanation, tools and algorithms.

General resources

Linguistics tools

Tokenizer

Tools

Tagger

Tools

Tools

Tools

General tools

Sentence negation

Reference

Tools

Sentence negation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally