Text analyzed and parsed to TEI XML wrapper

I set up a simple testing page for a wrapper of raw text to TEI XML. It uses in this version just the Stanford CoreNLP tools to tokenize, recognize sentences, part of speech annotate and lemmatize the input. Just paste a paragraph of text in there. In the next version this will be expanded with NLP tools for a couple of more languages, as well as other analysis components and tools for English.

Read More...

Charty in JavaScript...

Ben Cool ported Charty (CFG-based Chart parser) to JavaScript for a class project and added in one version feature augmentation and unification to it. You can test it online. This is running on mobile devices like iPad or iPhone in Safari and on Android with a browser that has JavaScript support without any server-based component. See the documentation and test site here

Read More...

Intensive Python class for Linguists (for corpuslinguistics, language data processing and manipulation etc.)

I am offering an intensive class for the LING519 students, all the Linguist List people, and whoever might be interested, this Saturday 19th of Nov. 2011 at 10 AM Eastern Time in Cooper, the LinguistList Suite. We plan to meet for 4 hours or more, depending on speed and interest. Let me know, if you are interested. If you want to join us, let me know. I will share the screen and the audio already with Zadar, we can include you, if you cannot come. The topics covered might be:

Intro to Python 3
Using Komodo Edit 6.x
Processing corpora like the Brown corpus (raw text with slash-pos, or TEI XML), the Penn Treebank, the Croatian Language Corpus etc.
Generating statistical models and profiles: frequency profiles, N-gram models
Calculating significance, mutual information, relative entropy, …
Simple Finite State Machines
Simple Parsers
Generating outputs of analyses: CSV, HTML, XML, etc.


DC
Read More...

Updated Python code and tools

The Charty parser code is updated to Python 3.x (implementing an Earley parser for context-free grammars), and a compact module, TextStat.py, with some useful functions for N-gram models, frequency profiles, vector space models, statistical analyses, information theoretic measures (entropy, mutual information, etc.). If you have comments, or you find some bug or error, let me know.
Read More...

Setting up Aquamacs for XLE and XFST

Here is a small introduction about my working environment setting for grammar and morphology development using Aquamacs, XLE, XFST and just scripting with Python and Bash in the OS X Terminal.app...
Read More...