Skip to content
Snippets Groups Projects
manual.html 13.5 KiB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
               "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
lang="en" xml:lang="en">
<head>
<title>The Illinois SRL Manual</title>
<meta http-equiv="Content-Type" content="text/html;charset=iso-8859-1"/>
<meta name="generator" content="Org-mode"/>
<meta name="generated" content=""/>
<meta name="author" content="Vivek Srikumar"/>
<meta name="description" content=""/>
<meta name="keywords" content=""/>
<style type="text/css">
 <!--/*--><![CDATA[/*><!--*/
  html { font-family: Times, serif; font-size: 12pt; }
  .title  { text-align: center; }
  .todo   { color: red; }
  .done   { color: green; }
  .tag    { background-color: #add8e6; font-weight:normal }
  .target { }
  .timestamp { color: #bebebe; }
  .timestamp-kwd { color: #5f9ea0; }
  .right  {margin-left:auto; margin-right:0px;  text-align:right;}
  .left   {margin-left:0px;  margin-right:auto; text-align:left;}
  .center {margin-left:auto; margin-right:auto; text-align:center;}
  p.verse { margin-left: 3% }
  pre {
	border: 1pt solid #AEBDCC;
	background-color: #F3F5F7;
	padding: 5pt;
	font-family: courier, monospace;
        font-size: 90%;
        overflow:auto;
  }
  table { border-collapse: collapse; }
  td, th { vertical-align: top;  }
  th.right  { text-align:center;  }
  th.left   { text-align:center;   }
  th.center { text-align:center; }
  td.right  { text-align:right;  }
  td.left   { text-align:left;   }
  td.center { text-align:center; }
  dt { font-weight: bold; }
  div.figure { padding: 0.5em; }
  div.figure p { text-align: center; }
  textarea { overflow-x: auto; }
  .linenr { font-size:smaller }
  .code-highlighted {background-color:#ffff00;}
  .org-info-js_info-navigation { border-style:none; }
  #org-info-js_console-label { font-size:10px; font-weight:bold;
                               white-space:nowrap; }
  .org-info-js_search-highlight {background-color:#ffff00; color:#000000;
                                 font-weight:bold; }
  /*]]>*/-->
</style>
<link rel="stylesheet" type="text/css" href="style.css" />
<script type="text/javascript">
<!--/*--><![CDATA[/*><!--*/
 function CodeHighlightOn(elem, id)
 {
   var target = document.getElementById(id);
   if(null != target) {
     elem.cacheClassElem = elem.className;
     elem.cacheClassTarget = target.className;
     target.className = "code-highlighted";
     elem.className   = "code-highlighted";
   }
 }
 function CodeHighlightOff(elem, id)
 {
   var target = document.getElementById(id);
   if(elem.cacheClassElem)
     elem.className = elem.cacheClassElem;
   if(elem.cacheClassTarget)
     target.className = elem.cacheClassTarget;
 }
/*]]>*///-->
</script>

</head>
<body>
<div id="content">

<h1 class="title">The Illinois SRL Manual</h1>


<div id="table-of-contents">
<h2>Table of Contents</h2>
<div id="text-table-of-contents">
<ul>
<li><a href="#sec-1">1 Introduction </a></li>
<li><a href="#sec-2">2 Installation and usage </a>
<ul>
<li><a href="#sec-2_1">2.1 Getting started </a></li>
<li><a href="#sec-2_2">2.2 Configuration </a></li>
<li><a href="#sec-2_3">2.3 Modes of use </a>
<ul>
<li><a href="#sec-2_3_1">2.3.1 As a Curator plugin </a></li>
<li><a href="#sec-2_3_2">2.3.2 As a batch annotator </a></li>
<li><a href="#sec-2_3_3">2.3.3 Interactive mode </a></li>
</ul></li>
</ul>
</li>
<li><a href="#sec-3">3 Papers that used this software </a></li>
<li><a href="#sec-4">4 References </a></li>
</ul>
</div>
</div>

<div id="outline-container-1" class="outline-2">
<h2 id="sec-1"><span class="section-number-2">1</span> Introduction </h2>
<div class="outline-text-2" id="text-1">

<p>The Illinois SRL implements the single-parse Semantic Role Labeler
that is described in (Punyakanonk, et. al. 2008). Using a similar
approach, it also implements a nominal SRL system for deverbal nouns
in Nombank (See (Meyers 2007) for a detailed description of this
class.)
</p>
<p>
This re-implementation is entirely in Java and achieves an
equivalent performance on the test set of the Penn Treebank as
described in the paper. Using parse trees from the Charniak parser,
the original work achieves an average F1 of 76.29%. In comparison, ,
this re-implementation gets an F1 of 76.47% with beam search (which
is comparable to the performance when ILP inference is used). The
nominal SRL gets an F1 score of 66.97% with beam search.
</p>


<p>
<b>Citing this work</b> To come soon.
</p>

</div>

</div>

<div id="outline-container-2" class="outline-2">
<h2 id="sec-2"><span class="section-number-2">2</span> Installation and usage </h2>
<div class="outline-text-2" id="text-2">



</div>

<div id="outline-container-2_1" class="outline-3">
<h3 id="sec-2_1"><span class="section-number-3">2.1</span> Getting started </h3>
<div class="outline-text-3" id="text-2_1">

<p>After downloading the archive containing the SRL system, unpack it
and run <code>srl.sh -v -i</code>. This will start the verb SRL system in the
interactive mode, where you can enter sentences on the command line
and get it verb semantic role labels. For nominal semantic role
labeling, replace <code>-v</code> with <code>-n</code>.  For the first sentence alone,
the system will take a long time to load the model to the
memory. Subsequent sentences will be faster.  Note that this system
requires nearly 10 GB of RAM for verb SRL and about 5 GB for
nominals.
</p>


<p>
If this works you are all set. You can now use the semantic role
labeler in one of three modes: as a curator plugin, as a batch
annotator and in the interactive mode.
</p>
</div>

</div>

<div id="outline-container-2_2" class="outline-3">
<h3 id="sec-2_2"><span class="section-number-3">2.2</span> Configuration </h3>
<div class="outline-text-3" id="text-2_2">

<p>Most of the configuration to the SRL system can be provided via a
config file. The configuration file can be specified via the
command line option <code>-c &lt;config-file&gt;</code>. If this option is not
specified, the system looks for the file <code>srl-config.properties</code> in
the same directory.
</p>
<p>
Here is a summary of the configuration options:
</p>
<ol>
<li>
<i>CuratorHost</i>: Specifies the host of the curator instance which
provides the various inputs to the SRL system.
</li>
<li>
<i>CuratorPort</i>: Specifies the port on which the curator is
listening on <i>CuratorHost</i>.
</li>
<li>
<i>DefaultParser</i>: This can either be <code>Charniak</code> or
<code>Stanford</code>. This selects the constituent parser that provides
the features for the SRL system. It is assumed that the parser
corresponding to the choice here is provided by the
Curator. (Note: The SRL system has been trained using the
Charniak parser.)
</li>
<li>
<i>WordNetConfig</i>: Specifies the xml file that provides the
configuration for Java WordNet Library(JWNL). An example
configuration file is provided as <code>jwnl_properties.xml</code>. The
path to the WordNet dictionary should be set in this file. 



<pre class="src src-xml">&lt;<span style="color: #0000ff;">param</span> <span style="color: #a0522d;">name</span>=<span style="color: #8b2252;">"</span><span style="color: #8b2252;">dictionary_path</span><span style="color: #8b2252;">"</span> <span style="color: #a0522d;">value</span>=<span style="color: #8b2252;">"</span><span style="color: #8b2252;">/path/to/wordnet/dict/here</span><span style="color: #8b2252;">"</span>/&gt;
</pre>


</li>
<li>
<i>LoadWordNetConfigFromClassPath</i>: Specifies whether the WordNet
config file specified in <i>WordNetConfig</i> should be loaded from
the classpath. This property can take either <code>true</code> or <code>false</code>
values. If <code>true</code>, the system will look for the WordNet
configuration file in the classpath. If <code>false</code> or if the
property is not present, it loads the file from the filesystem.
</li>
<li>
<i>Inference</i>: This can either be <code>BeamSearch</code> or <code>ILP</code> and decides
the inference algorithm that is used to make the final
prediction. If the choice is <code>BeamSearch</code>, in in-built beam
search engine is used for inference. If the choice is <code>ILP</code>,
then the Gurobi ILP solver will be used. (Note: To use ILP
inference, the Gurobi engine needs to be configured.)
</li>
<li>
<i>BeamSize</i>: Specifies the beam size if beam search inference is
chosen. Otherwise, this option is ignored.
</li>
<li>
<i>TrimLeadingPrepositions</i>: Should the leading prepositions of
arguments be trimmed. If this is set to true, then a sentence
like "John bought a car from Mary on Thursday for 2000 dollars."
would be analyzed as "bought(A0:John, A1: the car, A2: Mary, A3:
2000 dollars, AM-TMP: Thursday)". If this is set to false (or if
the argument is not present), then the leading prepositions are
included. This gives "bought(A0:John, A1: the car, A2: from
Mary, A3: for 2000 dollars, AM-TMP: on Thursday)" This option
applies for both verbs and nouns.

</li>
</ol>

</div>

</div>

<div id="outline-container-2_3" class="outline-3">
<h3 id="sec-2_3"><span class="section-number-3">2.3</span> Modes of use </h3>
<div class="outline-text-3" id="text-2_3">

<p>For all three modes, either <code>-v</code> or <code>-n</code> argument is required to
indicate verb or nominal SRL respectively.
</p>

</div>

<div id="outline-container-2_3_1" class="outline-4">
<h4 id="sec-2_3_1"><span class="section-number-4">2.3.1</span> As a Curator plugin </h4>
<div class="outline-text-4" id="text-2_3_1">

<p>To start the SRL system as a curator plugin, run the following command:
</p>


<pre class="src src-sh">./srl.sh [-v |-n ] -s &lt;port-number&gt; [-t &lt;number-of-threads&gt;]
</pre>



<p>
The number of threads need not be specified and defaults to using
one thread. 
</p>
<p>
After the server starts, the curator instance can be configured to
use this to serve SRL outputs. The following XML snippet should be
added on to the curator annotator descriptor file (with appropriate
type, host and port entries):
</p>



<pre class="src src-xml">&lt;<span style="color: #0000ff;">annotator</span>&gt;
  &lt;<span style="color: #0000ff;">type</span>&gt;parser&lt;/<span style="color: #0000ff;">type</span>&gt;
  &lt;<span style="color: #0000ff;">field</span>&gt;srl&lt;/<span style="color: #0000ff;">field</span>&gt;
  &lt;<span style="color: #0000ff;">host</span>&gt;srl-host:srlport&lt;/<span style="color: #0000ff;">host</span>&gt;
  &lt;<span style="color: #0000ff;">requirement</span>&gt;sentences&lt;/<span style="color: #0000ff;">requirement</span>&gt;
  &lt;<span style="color: #0000ff;">requirement</span>&gt;tokens&lt;/<span style="color: #0000ff;">requirement</span>&gt;
  &lt;<span style="color: #0000ff;">requirement</span>&gt;pos&lt;/<span style="color: #0000ff;">requirement</span>&gt;
  &lt;<span style="color: #0000ff;">requirement</span>&gt;ner&lt;/<span style="color: #0000ff;">requirement</span>&gt;
  &lt;<span style="color: #0000ff;">requirement</span>&gt;chunk&lt;/<span style="color: #0000ff;">requirement</span>&gt;
  &lt;<span style="color: #0000ff;">requirement</span>&gt;charniak&lt;/<span style="color: #0000ff;">requirement</span>&gt;
&lt;/<span style="color: #0000ff;">annotator</span>&gt;
</pre>





</div>

</div>

<div id="outline-container-2_3_2" class="outline-4">
<h4 id="sec-2_3_2"><span class="section-number-4">2.3.2</span> As a batch annotator </h4>
<div class="outline-text-4" id="text-2_3_2">

<p>The SRL system can be used to annotate several sentences as a batch
by running it on an input file with a set of sentences. Running the
SRL in this form produces a CoNLL style column format with the SRL
annotation. 
</p>
<p>
The following command runs the SRL in batch mode:
</p>



<pre class="src src-sh">./srl.sh [-v | -n ] -b &lt;input-file&gt; -o &lt;output-file&gt; [-w]
</pre>



<p>
Each line in the input file is treated as a separate sentence. The
option <code>-w</code> indicates that the sentences in the input file are
whitespace tokenized. Otherwise, the curator is asked to provide
the tokenization.
</p>
</div>

</div>

<div id="outline-container-2_3_3" class="outline-4">
<h4 id="sec-2_3_3"><span class="section-number-4">2.3.3</span> Interactive mode </h4>
<div class="outline-text-4" id="text-2_3_3">

<p>The SRL system can be used in an interactive mode by running it
with the <code>-i</code> option.
</p>

</div>
</div>
</div>

</div>

<div id="outline-container-3" class="outline-2">
<h2 id="sec-3"><span class="section-number-2">3</span> Papers that used this software </h2>
<div class="outline-text-2" id="text-3">

<p>The following papers have used an earlier version of this software:
</p>
<ul>
<li>
G. Kundu and D. Roth, <i>Adapting Text Instead of the Model: An Open     Domain Approach</i>. In Proc. of the Conference of Computational
Natural Language Learning, 2011.

</li>
<li>
V. Srikumar and D. Roth, A Joint Model for Extended Semantic Role
Labeling. Proceedings of the Conference on Empirical Methods in
Natural Language Processing (EMNLP), 2011.
</li>
</ul>


<p>
If you use this package, please let me know and I will add the
reference to this list here.
</p>

</div>

</div>

<div id="outline-container-4" class="outline-2">
<h2 id="sec-4"><span class="section-number-2">4</span> References </h2>
<div class="outline-text-2" id="text-4">

<ol>
<li>
V. Punyakanok, D. Roth and W. Yih, <i>The importance of Syntactic      Parsing and Inference in Semantic Role Labeling</i>. Computational
Linguistics, 2008.
</li>
<li>
A. Meyers. <i>Those other nombank dictionaries</i>. Technical report,
Technical report, New York University, 2007.

</li>
</ol>


</div>
</div>
<div id="postamble">
<p class="author"> Author: Vivek Srikumar
</p>
<p class="date"> Date: </p>
<p class="creator">HTML generated by org-mode 7.4 in emacs 23</p>
</div>
</div>
</body>
</html>