manual.html

<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
               "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
lang="en" xml:lang="en">
<head>
<title>The Illinois SRL Manual</title>
<meta http-equiv="Content-Type" content="text/html;charset=iso-8859-1"/>
<meta name="generator" content="Org-mode"/>
<meta name="generated" content=""/>
<meta name="author" content="Vivek Srikumar"/>
<meta name="description" content=""/>
<meta name="keywords" content=""/>
<style type="text/css">
 <!--/*--><![CDATA[/*><!--*/
  html { font-family: Times, serif; font-size: 12pt; }
  .title  { text-align: center; }
  .todo   { color: red; }
  .done   { color: green; }
  .tag    { background-color: #add8e6; font-weight:normal }
  .target { }
  .timestamp { color: #bebebe; }
  .timestamp-kwd { color: #5f9ea0; }
  .right  {margin-left:auto; margin-right:0px;  text-align:right;}
  .left   {margin-left:0px;  margin-right:auto; text-align:left;}
  .center {margin-left:auto; margin-right:auto; text-align:center;}
  p.verse { margin-left: 3% }
  pre {
	border: 1pt solid #AEBDCC;
	background-color: #F3F5F7;
	padding: 5pt;
	font-family: courier, monospace;
        font-size: 90%;
        overflow:auto;
  }
  table { border-collapse: collapse; }
  td, th { vertical-align: top;  }
  th.right  { text-align:center;  }
  th.left   { text-align:center;   }
  th.center { text-align:center; }
  td.right  { text-align:right;  }
  td.left   { text-align:left;   }
  td.center { text-align:center; }
  dt { font-weight: bold; }
  div.figure { padding: 0.5em; }
  div.figure p { text-align: center; }
  textarea { overflow-x: auto; }
  .linenr { font-size:smaller }
  .code-highlighted {background-color:#ffff00;}
  .org-info-js_info-navigation { border-style:none; }
  #org-info-js_console-label { font-size:10px; font-weight:bold;
                               white-space:nowrap; }
  .org-info-js_search-highlight {background-color:#ffff00; color:#000000;
                                 font-weight:bold; }
  /*]]>*/-->
</style>
<link rel="stylesheet" type="text/css" href="style.css" />
<script type="text/javascript">
<!--/*--><![CDATA[/*><!--*/
 function CodeHighlightOn(elem, id)
 {
   var target = document.getElementById(id);
   if(null != target) {
     elem.cacheClassElem = elem.className;
     elem.cacheClassTarget = target.className;
     target.className = "code-highlighted";
     elem.className   = "code-highlighted";
   }
 }
 function CodeHighlightOff(elem, id)
 {
   var target = document.getElementById(id);
   if(elem.cacheClassElem)
     elem.className = elem.cacheClassElem;
   if(elem.cacheClassTarget)
     target.className = elem.cacheClassTarget;
 }
/*]]>*///-->
</script>

</head>
<body>
<div id="content">

<h1 class="title">The Illinois SRL Manual</h1>


<div id="table-of-contents">
<h2>Table of Contents</h2>
<div id="text-table-of-contents">
<ul>
<li><a href="#sec-1">1 Introduction </a></li>
<li><a href="#sec-2">2 Installation and usage </a>
<ul>
<li><a href="#sec-2_1">2.1 Getting started </a></li>
<li><a href="#sec-2_2">2.2 Configuration </a></li>
<li><a href="#sec-2_3">2.3 Modes of use </a>
<ul>
<li><a href="#sec-2_3_1">2.3.1 As a Curator plugin </a></li>
<li><a href="#sec-2_3_2">2.3.2 As a batch annotator </a></li>
<li><a href="#sec-2_3_3">2.3.3 Interactive mode </a></li>
</ul></li>
</ul>
</li>
<li><a href="#sec-3">3 Papers that used this software </a></li>
<li><a href="#sec-4">4 References </a></li>
</ul>
</div>
</div>

<div id="outline-container-1" class="outline-2">
<h2 id="sec-1"><span class="section-number-2">1</span> Introduction </h2>
<div class="outline-text-2" id="text-1">

<p>The Illinois SRL implements the single-parse Semantic Role Labeler
that is described in (Punyakanonk, et. al. 2008). Using a similar
approach, it also implements a nominal SRL system for deverbal nouns
in Nombank (See (Meyers 2007) for a detailed description of this
class.)
</p>
<p>
This re-implementation is entirely in Java and achieves an
equivalent performance on the test set of the Penn Treebank as
described in the paper. Using parse trees from the Charniak parser,
the original work achieves an average F1 of 76.29%. In comparison, ,
this re-implementation gets an F1 of 76.47% with beam search (which
is comparable to the performance when ILP inference is used). The
nominal SRL gets an F1 score of 66.97% with beam search.
</p>


<p>
<b>Citing this work</b> To come soon.
</p>

</div>

</div>

<div id="outline-container-2" class="outline-2">
<h2 id="sec-2"><span class="section-number-2">2</span> Installation and usage </h2>
<div class="outline-text-2" id="text-2">


</div>

<div id="outline-container-2_1" class="outline-3">
<h3 id="sec-2_1"><span class="section-number-3">2.1</span> Getting started </h3>
<div class="outline-text-3" id="text-2_1">

<p>After downloading the archive containing the SRL system, unpack it
and run <code>srl.sh -v -i</code>. This will start the verb SRL system in the
interactive mode, where you can enter sentences on the command line
and get it verb semantic role labels. For nominal semantic role
labeling, replace <code>-v</code> with <code>-n</code>.  For the first sentence alone,
the system will take a long time to load the model to the
memory. Subsequent sentences will be faster.  Note that this system
requires nearly 10 GB of RAM for verb SRL and about 5 GB for
nominals.
</p>


<p>
If this works you are all set. You can now use the semantic role
labeler in one of three modes: as a curator plugin, as a batch
annotator and in the interactive mode.
</p>
</div>

</div>

<div id="outline-container-2_2" class="outline-3">
<h3 id="sec-2_2"><span class="section-number-3">2.2</span> Configuration </h3>
<div class="outline-text-3" id="text-2_2">

<p>Most of the configuration to the SRL system can be provided via a
config file. The configuration file can be specified via the
command line option <code>-c &lt;config-file&gt;</code>. If this option is not
specified, the system looks for the file <code>srl-config.properties</code> in
the same directory.
</p>
<p>
Here is a summary of the configuration options:
</p>
<ol>
<li>
<i>CuratorHost</i>: Specifies the host of the curator instance which
provides the various inputs to the SRL system.
</li>
<li>
<i>CuratorPort</i>: Specifies the port on which the curator is
listening on <i>CuratorHost</i>.
</li>
<li>
<i>DefaultParser</i>: This can either be <code>Charniak</code> or
<code>Stanford</code>. This selects the constituent parser that provides
the features for the SRL system. It is assumed that the parser
corresponding to the choice here is provided by the
Curator. (Note: The SRL system has been trained using the
Charniak parser.)
</li>
<li>
<i>WordNetConfig</i>: Specifies the xml file that provides the
configuration for Java WordNet Library(JWNL). An example
configuration file is provided as <code>jwnl_properties.xml</code>. The
path to the WordNet dictionary should be set in this file. 


<pre class="src src-xml">&lt;<span style="color: #0000ff;">param</span> <span style="color: #a0522d;">name</span>=<span style="color: #8b2252;">"</span><span style="color: #8b2252;">dictionary_path</span><span style="color: #8b2252;">"</span> <span style="color: #a0522d;">value</span>=<span style="color: #8b2252;">"</span><span style="color: #8b2252;">/path/to/wordnet/dict/here</span><span style="color: #8b2252;">"</span>/&gt;
</pre>


</li>
<li>
<i>LoadWordNetConfigFromClassPath</i>: Specifies whether the WordNet
config file specified in <i>WordNetConfig</i> should be loaded from
the classpath. This property can take either <code>true</code> or <code>false</code>
values. If <code>true</code>, the system will look for the WordNet
configuration file in the classpath. If <code>false</code> or if the
property is not present, it loads the file from the filesystem.
</li>
<li>
<i>Inference</i>: This can either be <code>BeamSearch</code> or <code>ILP</code> and decides
the inference algorithm that is used to make the final
prediction. If the choice is <code>BeamSearch</code>, in in-built beam
search engine is used for inference. If the choice is <code>ILP</code>,
then the Gurobi ILP solver will be used. (Note: To use ILP
inference, the Gurobi engine needs to be configured.)
</li>
<li>
<i>BeamSize</i>: Specifies the beam size if beam search inference is
chosen. Otherwise, this option is ignored.
</li>
<li>
<i>TrimLeadingPrepositions</i>: Should the leading prepositions of
arguments be trimmed. If this is set to true, then a sentence
like "John bought a car from Mary on Thursday for 2000 dollars."
would be analyzed as "bought(A0:John, A1: the car, A2: Mary, A3:
2000 dollars, AM-TMP: Thursday)". If this is set to false (or if
the argument is not present), then the leading prepositions are
included. This gives "bought(A0:John, A1: the car, A2: from
Mary, A3: for 2000 dollars, AM-TMP: on Thursday)" This option
applies for both verbs and nouns.

</li>
</ol>

</div>

</div>

<div id="outline-container-2_3" class="outline-3">
<h3 id="sec-2_3"><span class="section-number-3">2.3</span> Modes of use </h3>
<div class="outline-text-3" id="text-2_3">

<p>For all three modes, either <code>-v</code> or <code>-n</code> argument is required to
indicate verb or nominal SRL respectively.
</p>

</div>

<div id="outline-container-2_3_1" class="outline-4">
<h4 id="sec-2_3_1"><span class="section-number-4">2.3.1</span> As a Curator plugin </h4>
<div class="outline-text-4" id="text-2_3_1">

<p>To start the SRL system as a curator plugin, run the following command:
</p>


<pre class="src src-sh">./srl.sh [-v |-n ] -s &lt;port-number&gt; [-t &lt;number-of-threads&gt;]
</pre>


<p>
The number of threads need not be specified and defaults to using
one thread. 
</p>
<p>
After the server starts, the curator instance can be configured to
use this to serve SRL outputs. The following XML snippet should be
added on to the curator annotator descriptor file (with appropriate
type, host and port entries):
</p>


<pre class="src src-xml">&lt;<span style="color: #0000ff;">annotator</span>&gt;
  &lt;<span style="color: #0000ff;">type</span>&gt;parser&lt;/<span style="color: #0000ff;">type</span>&gt;
  &lt;<span style="color: #0000ff;">field</span>&gt;srl&lt;/<span style="color: #0000ff;">field</span>&gt;
  &lt;<span style="color: #0000ff;">host</span>&gt;srl-host:srlport&lt;/<span style="color: #0000ff;">host</span>&gt;
  &lt;<span style="color: #0000ff;">requirement</span>&gt;sentences&lt;/<span style="color: #0000ff;">requirement</span>&gt;
  &lt;<span style="color: #0000ff;">requirement</span>&gt;tokens&lt;/<span style="color: #0000ff;">requirement</span>&gt;
  &lt;<span style="color: #0000ff;">requirement</span>&gt;pos&lt;/<span style="color: #0000ff;">requirement</span>&gt;
  &lt;<span style="color: #0000ff;">requirement</span>&gt;ner&lt;/<span style="color: #0000ff;">requirement</span>&gt;
  &lt;<span style="color: #0000ff;">requirement</span>&gt;chunk&lt;/<span style="color: #0000ff;">requirement</span>&gt;
  &lt;<span style="color: #0000ff;">requirement</span>&gt;charniak&lt;/<span style="color: #0000ff;">requirement</span>&gt;
&lt;/<span style="color: #0000ff;">annotator</span>&gt;
</pre>


</div>

</div>

<div id="outline-container-2_3_2" class="outline-4">
<h4 id="sec-2_3_2"><span class="section-number-4">2.3.2</span> As a batch annotator </h4>
<div class="outline-text-4" id="text-2_3_2">

<p>The SRL system can be used to annotate several sentences as a batch
by running it on an input file with a set of sentences. Running the
SRL in this form produces a CoNLL style column format with the SRL
annotation. 
</p>
<p>
The following command runs the SRL in batch mode:
</p>


<pre class="src src-sh">./srl.sh [-v | -n ] -b &lt;input-file&gt; -o &lt;output-file&gt; [-w]
</pre>


<p>
Each line in the input file is treated as a separate sentence. The
option <code>-w</code> indicates that the sentences in the input file are
whitespace tokenized. Otherwise, the curator is asked to provide
the tokenization.
</p>
</div>

</div>

<div id="outline-container-2_3_3" class="outline-4">
<h4 id="sec-2_3_3"><span class="section-number-4">2.3.3</span> Interactive mode </h4>
<div class="outline-text-4" id="text-2_3_3">

<p>The SRL system can be used in an interactive mode by running it
with the <code>-i</code> option.
</p>

</div>
</div>
</div>

</div>

<div id="outline-container-3" class="outline-2">
<h2 id="sec-3"><span class="section-number-2">3</span> Papers that used this software </h2>
<div class="outline-text-2" id="text-3">

<p>The following papers have used an earlier version of this software:
</p>
<ul>
<li>
G. Kundu and D. Roth, <i>Adapting Text Instead of the Model: An Open     Domain Approach</i>. In Proc. of the Conference of Computational
Natural Language Learning, 2011.

</li>
<li>
V. Srikumar and D. Roth, A Joint Model for Extended Semantic Role
Labeling. Proceedings of the Conference on Empirical Methods in
Natural Language Processing (EMNLP), 2011.
</li>
</ul>


<p>
If you use this package, please let me know and I will add the
reference to this list here.
</p>

</div>

</div>

<div id="outline-container-4" class="outline-2">
<h2 id="sec-4"><span class="section-number-2">4</span> References </h2>
<div class="outline-text-2" id="text-4">

<ol>
<li>
V. Punyakanok, D. Roth and W. Yih, <i>The importance of Syntactic      Parsing and Inference in Semantic Role Labeling</i>. Computational
Linguistics, 2008.
</li>
<li>
A. Meyers. <i>Those other nombank dictionaries</i>. Technical report,
Technical report, New York University, 2007.

</li>
</ol>


</div>
</div>
<div id="postamble">
<p class="author"> Author: Vivek Srikumar
</p>
<p class="date"> Date: </p>
<p class="creator">HTML generated by org-mode 7.4 in emacs 23</p>
</div>
</div>
</body>
</html>