Workshop

(start screen and audio recording)

Intro (Daniel, 5-10 minutes):

  • describe basic idea: a rule system that describes errors in XML
  • flowchart of LT

Workshop:

  • 1. block (Daniel, 15 minutes)
    • download the current OXT
    • install it in OOo and try on English text
    • try the GUI: java -jar LanguageToolGUI.jar
    • mention command line and HTTP server
  • 2. block (Marcin, 40 minutes)
    • show that the English rules are in in rules/en/grammar.xml
    • explain that using an XML editor is a good idea
    • explain the syntax of a simple rule - one of the English redundant phrases, this is simple but useful
    • create a rule to match "foo bar"
    • restart LT and try it (GUI version, not OOo)
    • run a test using testrules.sh/testrules.bat
    • mention that it's important to use real examples
    • explain a more complex rule
      • what is a POS tagger
        • mention SENT_START / SENT_END tags available to all languages
      • it_is - a couple of cases, one with POS tags, and a couple others (as a rulegroup)
      • both… and in English, as it requires skipping
      • a rule for repeated phrases (can be French or Dutch)
    • if time permits:
      • explain complex suggestions and testing corrections
      • what is a synthesizer
  • 3. block (Marcin, 15 minutes)
    • show how we add a new language by adapting the Java code
      • cvs -d:pserver:ten.egrofecruos.svc.lootegaugnal|suomynona#ten.egrofecruos.svc.lootegaugnal|suomynona:/cvsroot/languagetool login [type return when asked for password]
      • cvs -z3 -d:pserver:ten.egrofecruos.svc.lootegaugnal|suomynona#ten.egrofecruos.svc.lootegaugnal|suomynona:/cvsroot/languagetool co -P JLanguageTool
      • cd JLanguageTool
      • ant
      • if all works well, a file called "dist/LanguageTool-1.0.0-dev.oxt" is generated which you can install in OOo (or unzip to use the command line LanguageTool)
    • see http://www.languagetool.org/development/#newlanguage
    • explain that people can usually start working in a different language, i.e. write their rules for a new language in the grammar.xml of an existing one
  • 4. block (Daniel, 10 minutes)
    • explain community.languagetool.org website
      • how to use it to find false alarms in Wikipedia
      • how to find content for rules (Wikipedia typos page, community feedback…)
      • … TODO….
  • 5. block: if time permits and if attendees are interested in Java development (Daniel, 30 minutes):
    • how to setup LanguageTool in Eclipse
      • check out from CVS (see above)
      • File -> Import…
      • General, Existing Projects into Workspace
      • select the "JLanguageTool" directory just checked out from CVS
      • you can ignore compile errors in the "dev" package (or add the dependency, or delete the affected *.java files), but you need to fix all other compile errors
    • structure of a rule written in Java
    • write your own rule
    • explain that sentence segmentation / disambiguation is also rule-driven
      • and that it influences the default "Uppercase sentence start" rule (Marcin)

Finally (Marcin, 5 minutes):

  • mention resources: homepage, Wiki, mailing list

(stop screen and audio recording)

page_revision: 11, last_edited: 1257123361|%e %b %Y, %H:%M %Z (%O ago)
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License