-
Notifications
You must be signed in to change notification settings - Fork 0
Linguistics
Table of contents: <wiki:toc max_depth="2" />
- Stanford links [- Software resources http://www-a2k.is.tokushima-u.ac.jp/member/kita/NLP/nlp_tools.html
Suggested and used tools for linguistics analysis. We need to obtain a sentence structure and base form of all words. This process is composed of four parts, each is described below.
A basic tool for string splitting. Tokenizer separates numbers from letters, separates words and deals with a punctuation.
| Input | Natural language sentence; negation detection |
| Output | Array of tokens (eg. words, numbers) |
- The EGYPT toolkit - perl tokenizer
Part of a statistical machine translation toolkit. http://www.clsp.jhu.edu/ws99/projects/mt/
- LingPipe - tokenizer (added in version 4.0.1)
Good looking Java toolkit for processing text using computational linguistics. [- Simple Java tokenizer for regular expressions
http://introcs.cs.princeton.edu/72regular/Tokenizer.java.html
Usually tool just for part-of-speech tagging. Identifies basic linguistic category for each word.
| Input | Array of tokens |
| Output | Array of POS tagged tokens (eg. adjectives, verbs) |
- A Maximum Entropy Model for Part-Of-Speech Tagging
Java implementation of this tagger - [Another wrapper - http://godel.stanford.edu/public/doc-versions/util/doc/api/csli/util/nlp/postag/MXPOST.html as a part of basic tools (/util) at the Center for the Study of Language and Information at Stanford University [Penn Treebank Tags - explanation of all tags http://bulba.sdsu.edu/jeanette/thesis/PennTags.html#RB
- Stanford Log-linear Part-Of-Speech Tagger
Java implementation of the log-linear part-of-speech taggers [## Parser
Mainly a statistical parser. This tool is used to discover sentence structure, usually written as a syntactic tree. Part of a syntactic analysis.
| Input | Array of POS tagged tokens |
| Output | Parse trees of each sentence |
-
M. Collins parsing model http://people.csail.mit.edu/mcollins/code.html
-
Multilingual Statistical Parsing Engine in Java - based on Michal Collins's parser [- MaltParser modern data-driven dependency parser in Java http://maltparser.org/
-
RASP: Robust Accurate Statistical Parsing toolkit bigger parser - [## Morphological analyser
Analysis and description of the structure of morphemes. For our purpose is sufficient obtaining lemmas (base forms). Eg. have for had. Lemmatizers are finding only lemmas, stemmers are for finding stems (eg. bug for debugging).
| Input | Parse trees of each sentence |
| Output | Lemma for each word |
-
Morphological tool-set Developed by G. Minnen at the University of Sussex http://www.informatics.susx.ac.uk/research/groups/nlp/carroll/morph.html
-
mate-tools (anna) Tools set including lemmatizer with models [- Lttoolbox-java Part of Apertium project (translator) - java implementation http://wiki.apertium.org/wiki/Lttoolbox-java
- Core NLP Stanford suite of Core NLP Tools (allmost all operations) [- XTAG Tool for operations with Tree Adjoining Grammars http://www.cis.upenn.edu/~xtag/
| Keywords: | negative-positive conversion |
- Identification of a negation in sentence
- Dealing with negation
-
antonyms links in a corpora
-
WordNet [Testing WordNet online - http://wordnetweb.princeton.edu/perl/webwn
-
MIT Java Wordnet Interface (JWI) [## Tools & Algorithms
-
Sentiment Analysis with Python NLTK Text Classification - online tool -http://text-processing.com/demo/sentiment/
-
Negation Detection Processes - basic algorithm - http://blog.typeslashcode.com/voxpop/2010/02/negation-detection-processes/
- Ying He, Mehmet Kayaalp: A Comparison of 13 Tokenizers on MEDLINE [- Adwait Ratnaparkhi: A Maximum Entropy Model for Part-Of-Speech Tagging http://www.ldc.upenn.edu/acl/W/W96/W96-0213.pdf
-
Sanda Harabagiu, Andrew Hickl and Finley Lacatusu: Negation, Contrast and Contradiction in Text Processing [- SSergey Goryachev, Margarita Sordo, Qing T. Zeng, Long Ngo: Implementation and Evaluation of Four Different Methods of Negation Detection https://www.i2b2.org/software/projects/hitex/negation.pdf
-
Katsura, Y., Matsumoto, K., Ren, F.:Flexible English writing support based on negative-positive conversion method [- Pradeep G. Mutalik, MD, Aniruddha Deshpande, MD, and Prakash M. Nadkarni, MD: Use of General-purpose Negation Detection to Augment Concept Indexing of Medical Documents http://www.ncbi.nlm.nih.gov/pmc/articles/PMC130070/