Newer
Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
lang="en" xml:lang="en">
<head>
<title>The Illinois SRL Manual</title>
<meta http-equiv="Content-Type" content="text/html;charset=iso-8859-1"/>
<meta name="generator" content="Org-mode"/>
<meta name="generated" content=""/>
<meta name="author" content="Vivek Srikumar"/>
<meta name="description" content=""/>
<meta name="keywords" content=""/>
<style type="text/css">
<!--/*--><![CDATA[/*><!--*/
html { font-family: Times, serif; font-size: 12pt; }
.title { text-align: center; }
.todo { color: red; }
.done { color: green; }
.tag { background-color: #add8e6; font-weight:normal }
.target { }
.timestamp { color: #bebebe; }
.timestamp-kwd { color: #5f9ea0; }
.right {margin-left:auto; margin-right:0px; text-align:right;}
.left {margin-left:0px; margin-right:auto; text-align:left;}
.center {margin-left:auto; margin-right:auto; text-align:center;}
p.verse { margin-left: 3% }
pre {
border: 1pt solid #AEBDCC;
background-color: #F3F5F7;
padding: 5pt;
font-family: courier, monospace;
font-size: 90%;
overflow:auto;
}
table { border-collapse: collapse; }
td, th { vertical-align: top; }
th.right { text-align:center; }
th.left { text-align:center; }
th.center { text-align:center; }
td.right { text-align:right; }
td.left { text-align:left; }
td.center { text-align:center; }
dt { font-weight: bold; }
div.figure { padding: 0.5em; }
div.figure p { text-align: center; }
textarea { overflow-x: auto; }
.linenr { font-size:smaller }
.code-highlighted {background-color:#ffff00;}
.org-info-js_info-navigation { border-style:none; }
#org-info-js_console-label { font-size:10px; font-weight:bold;
white-space:nowrap; }
.org-info-js_search-highlight {background-color:#ffff00; color:#000000;
font-weight:bold; }
/*]]>*/-->
</style>
<link rel="stylesheet" type="text/css" href="style.css" />
<script type="text/javascript">
<!--/*--><![CDATA[/*><!--*/
function CodeHighlightOn(elem, id)
{
var target = document.getElementById(id);
if(null != target) {
elem.cacheClassElem = elem.className;
elem.cacheClassTarget = target.className;
target.className = "code-highlighted";
elem.className = "code-highlighted";
}
}
function CodeHighlightOff(elem, id)
{
var target = document.getElementById(id);
if(elem.cacheClassElem)
elem.className = elem.cacheClassElem;
if(elem.cacheClassTarget)
target.className = elem.cacheClassTarget;
}
/*]]>*///-->
</script>
</head>
<body>
<div id="content">
<h1 class="title">The Illinois SRL Manual</h1>
<div id="table-of-contents">
<h2>Table of Contents</h2>
<div id="text-table-of-contents">
<ul>
<li><a href="#sec-1">1 Introduction </a></li>
<li><a href="#sec-2">2 Installation and usage </a>
<ul>
<li><a href="#sec-2_1">2.1 Getting started </a></li>
<li><a href="#sec-2_2">2.2 Configuration </a></li>
<li><a href="#sec-2_3">2.3 Modes of use </a>
<ul>
<li><a href="#sec-2_3_1">2.3.1 As a Curator plugin </a></li>
<li><a href="#sec-2_3_2">2.3.2 As a batch annotator </a></li>
<li><a href="#sec-2_3_3">2.3.3 Interactive mode </a></li>
</ul></li>
</ul>
</li>
<li><a href="#sec-3">3 Papers that used this software </a></li>
<li><a href="#sec-4">4 References </a></li>
</ul>
</div>
</div>
<div id="outline-container-1" class="outline-2">
<h2 id="sec-1"><span class="section-number-2">1</span> Introduction </h2>
<div class="outline-text-2" id="text-1">
<p>The Illinois SRL implements the single-parse Semantic Role Labeler
that is described in (Punyakanonk, et. al. 2008). Using a similar
approach, it also implements a nominal SRL system for deverbal nouns
in Nombank (See (Meyers 2007) for a detailed description of this
class.)
</p>
<p>
This re-implementation is entirely in Java and achieves an
equivalent performance on the test set of the Penn Treebank as
described in the paper. Using parse trees from the Charniak parser,
the original work achieves an average F1 of 76.29%. In comparison, ,
this re-implementation gets an F1 of 76.47% with beam search (which
is comparable to the performance when ILP inference is used). The
nominal SRL gets an F1 score of 66.97% with beam search.
</p>
<p>
<b>Citing this work</b> To come soon.
</p>
</div>
</div>
<div id="outline-container-2" class="outline-2">
<h2 id="sec-2"><span class="section-number-2">2</span> Installation and usage </h2>
<div class="outline-text-2" id="text-2">
</div>
<div id="outline-container-2_1" class="outline-3">
<h3 id="sec-2_1"><span class="section-number-3">2.1</span> Getting started </h3>
<div class="outline-text-3" id="text-2_1">
<p>After downloading the archive containing the SRL system, unpack it
and run <code>srl.sh -v -i</code>. This will start the verb SRL system in the
interactive mode, where you can enter sentences on the command line
and get it verb semantic role labels. For nominal semantic role
labeling, replace <code>-v</code> with <code>-n</code>. For the first sentence alone,
the system will take a long time to load the model to the
memory. Subsequent sentences will be faster. Note that this system
requires nearly 10 GB of RAM for verb SRL and about 5 GB for
nominals.
</p>
<p>
If this works you are all set. You can now use the semantic role
labeler in one of three modes: as a curator plugin, as a batch
annotator and in the interactive mode.
</p>
</div>
</div>
<div id="outline-container-2_2" class="outline-3">
<h3 id="sec-2_2"><span class="section-number-3">2.2</span> Configuration </h3>
<div class="outline-text-3" id="text-2_2">
<p>Most of the configuration to the SRL system can be provided via a
config file. The configuration file can be specified via the
command line option <code>-c <config-file></code>. If this option is not
specified, the system looks for the file <code>srl-config.properties</code> in
the same directory.
</p>
<p>
Here is a summary of the configuration options:
</p>
<ol>
<li>
<i>CuratorHost</i>: Specifies the host of the curator instance which
provides the various inputs to the SRL system.
</li>
<li>
<i>CuratorPort</i>: Specifies the port on which the curator is
listening on <i>CuratorHost</i>.
</li>
<li>
<i>DefaultParser</i>: This can either be <code>Charniak</code> or
<code>Stanford</code>. This selects the constituent parser that provides
the features for the SRL system. It is assumed that the parser
corresponding to the choice here is provided by the
Curator. (Note: The SRL system has been trained using the
Charniak parser.)
</li>
<li>
<i>WordNetConfig</i>: Specifies the xml file that provides the
configuration for Java WordNet Library(JWNL). An example
configuration file is provided as <code>jwnl_properties.xml</code>. The
path to the WordNet dictionary should be set in this file.
<pre class="src src-xml"><<span style="color: #0000ff;">param</span> <span style="color: #a0522d;">name</span>=<span style="color: #8b2252;">"</span><span style="color: #8b2252;">dictionary_path</span><span style="color: #8b2252;">"</span> <span style="color: #a0522d;">value</span>=<span style="color: #8b2252;">"</span><span style="color: #8b2252;">/path/to/wordnet/dict/here</span><span style="color: #8b2252;">"</span>/>
</pre>
</li>
<li>
<i>LoadWordNetConfigFromClassPath</i>: Specifies whether the WordNet
config file specified in <i>WordNetConfig</i> should be loaded from
the classpath. This property can take either <code>true</code> or <code>false</code>
values. If <code>true</code>, the system will look for the WordNet
configuration file in the classpath. If <code>false</code> or if the
property is not present, it loads the file from the filesystem.
</li>
<li>
<i>Inference</i>: This can either be <code>BeamSearch</code> or <code>ILP</code> and decides
the inference algorithm that is used to make the final
prediction. If the choice is <code>BeamSearch</code>, in in-built beam
search engine is used for inference. If the choice is <code>ILP</code>,
then the Gurobi ILP solver will be used. (Note: To use ILP
inference, the Gurobi engine needs to be configured.)
</li>
<li>
<i>BeamSize</i>: Specifies the beam size if beam search inference is
chosen. Otherwise, this option is ignored.
</li>
<li>
<i>TrimLeadingPrepositions</i>: Should the leading prepositions of
arguments be trimmed. If this is set to true, then a sentence
like "John bought a car from Mary on Thursday for 2000 dollars."
would be analyzed as "bought(A0:John, A1: the car, A2: Mary, A3:
2000 dollars, AM-TMP: Thursday)". If this is set to false (or if
the argument is not present), then the leading prepositions are
included. This gives "bought(A0:John, A1: the car, A2: from
Mary, A3: for 2000 dollars, AM-TMP: on Thursday)" This option
applies for both verbs and nouns.
</li>
</ol>
</div>
</div>
<div id="outline-container-2_3" class="outline-3">
<h3 id="sec-2_3"><span class="section-number-3">2.3</span> Modes of use </h3>
<div class="outline-text-3" id="text-2_3">
<p>For all three modes, either <code>-v</code> or <code>-n</code> argument is required to
indicate verb or nominal SRL respectively.
</p>
</div>
<div id="outline-container-2_3_1" class="outline-4">
<h4 id="sec-2_3_1"><span class="section-number-4">2.3.1</span> As a Curator plugin </h4>
<div class="outline-text-4" id="text-2_3_1">
<p>To start the SRL system as a curator plugin, run the following command:
</p>
<pre class="src src-sh">./srl.sh [-v |-n ] -s <port-number> [-t <number-of-threads>]
</pre>
<p>
The number of threads need not be specified and defaults to using
one thread.
</p>
<p>
After the server starts, the curator instance can be configured to
use this to serve SRL outputs. The following XML snippet should be
added on to the curator annotator descriptor file (with appropriate
type, host and port entries):
</p>
<pre class="src src-xml"><<span style="color: #0000ff;">annotator</span>>
<<span style="color: #0000ff;">type</span>>parser</<span style="color: #0000ff;">type</span>>
<<span style="color: #0000ff;">field</span>>srl</<span style="color: #0000ff;">field</span>>
<<span style="color: #0000ff;">host</span>>srl-host:srlport</<span style="color: #0000ff;">host</span>>
<<span style="color: #0000ff;">requirement</span>>sentences</<span style="color: #0000ff;">requirement</span>>
<<span style="color: #0000ff;">requirement</span>>tokens</<span style="color: #0000ff;">requirement</span>>
<<span style="color: #0000ff;">requirement</span>>pos</<span style="color: #0000ff;">requirement</span>>
<<span style="color: #0000ff;">requirement</span>>ner</<span style="color: #0000ff;">requirement</span>>
<<span style="color: #0000ff;">requirement</span>>chunk</<span style="color: #0000ff;">requirement</span>>
<<span style="color: #0000ff;">requirement</span>>charniak</<span style="color: #0000ff;">requirement</span>>
</<span style="color: #0000ff;">annotator</span>>
</pre>
</div>
</div>
<div id="outline-container-2_3_2" class="outline-4">
<h4 id="sec-2_3_2"><span class="section-number-4">2.3.2</span> As a batch annotator </h4>
<div class="outline-text-4" id="text-2_3_2">
<p>The SRL system can be used to annotate several sentences as a batch
by running it on an input file with a set of sentences. Running the
SRL in this form produces a CoNLL style column format with the SRL
annotation.
</p>
<p>
The following command runs the SRL in batch mode:
</p>
<pre class="src src-sh">./srl.sh [-v | -n ] -b <input-file> -o <output-file> [-w]
</pre>
<p>
Each line in the input file is treated as a separate sentence. The
option <code>-w</code> indicates that the sentences in the input file are
whitespace tokenized. Otherwise, the curator is asked to provide
the tokenization.
</p>
</div>
</div>
<div id="outline-container-2_3_3" class="outline-4">
<h4 id="sec-2_3_3"><span class="section-number-4">2.3.3</span> Interactive mode </h4>
<div class="outline-text-4" id="text-2_3_3">
<p>The SRL system can be used in an interactive mode by running it
with the <code>-i</code> option.
</p>
</div>
</div>
</div>
</div>
<div id="outline-container-3" class="outline-2">
<h2 id="sec-3"><span class="section-number-2">3</span> Papers that used this software </h2>
<div class="outline-text-2" id="text-3">
<p>The following papers have used an earlier version of this software:
</p>
<ul>
<li>
G. Kundu and D. Roth, <i>Adapting Text Instead of the Model: An Open Domain Approach</i>. In Proc. of the Conference of Computational
Natural Language Learning, 2011.
</li>
<li>
V. Srikumar and D. Roth, A Joint Model for Extended Semantic Role
Labeling. Proceedings of the Conference on Empirical Methods in
Natural Language Processing (EMNLP), 2011.
</li>
</ul>
<p>
If you use this package, please let me know and I will add the
reference to this list here.
</p>
</div>
</div>
<div id="outline-container-4" class="outline-2">
<h2 id="sec-4"><span class="section-number-2">4</span> References </h2>
<div class="outline-text-2" id="text-4">
<ol>
<li>
V. Punyakanok, D. Roth and W. Yih, <i>The importance of Syntactic Parsing and Inference in Semantic Role Labeling</i>. Computational
Linguistics, 2008.
</li>
<li>
A. Meyers. <i>Those other nombank dictionaries</i>. Technical report,
Technical report, New York University, 2007.
</li>
</ol>
</div>
</div>
<div id="postamble">
<p class="author"> Author: Vivek Srikumar
</p>
<p class="date"> Date: </p>
<p class="creator">HTML generated by org-mode 7.4 in emacs 23</p>
</div>
</div>
</body>
</html>