Skip to content
Snippets Groups Projects
Commit 9c03a575 authored by Christos Christodoulopoulos's avatar Christos Christodoulopoulos
Browse files

Merge branch 'shyam' into 'master'

Shyam

Now uses illinoi-sl's cleaner interface and stanford parse instead of charniak. Moved to V5.1.4.

Also works without assertion.

See merge request !1
parents 0bd55177 ae89da42
No related branches found
Tags v.5.0
No related merge requests found
Showing
with 896 additions and 391 deletions
......@@ -13,3 +13,4 @@ local.properties
### JetBrains
*.iml
.idea/
/bin/
Version 5.1.4
Switched entirely to illinois-sl for structured prediction (removed JLIS traces)
Using the latest AnnotatorService from illinois-core-utilities for both Curator & pipeline annotation
Major cleaning up
Version 5.1
Added JUnit tests
Removed unnecessary dependencies
......
# illinois-srl: Semantic Role Labeler
### Running
You can use the **illinois-srl** system in either *interactive* or *annotator* mode.
#### Interactive mode
In *interactive mode* the user can input a single piece of text and get back the feedback from either
the **Nom**inal or **Verb**al SRL systems in plain text.
To run the system in *interactive mode* see the class `edu.illinois.cs.cogcomp.srl.SemanticRoleLabeler`
or simply run the script:
```
scripts/run-interactive.sh <config> <Verb|Nom>
```
#### As an `Annotator` component
**illinois-srl** can also be used programmatically through the
[Annotator interface](http://cogcomp.cs.illinois.edu/software/doc/illinois-core-utilities/apidocs/edu/illinois/cs/cogcomp/core/datastructures/textannotation/Annotator.html).
The main method is `getView(TextAnnotation)` inside `SemanticRoleLabeler`. This will add a new
[`PredicateArgumentView`](http://cogcomp.cs.illinois.edu/software/doc/illinois-core-utilities/apidocs/edu/illinois/cs/cogcomp/core/datastructures/textannotation/PredicateArgumentView.html)
for either **Nom**inal or **Verb**al SRL.
### Training
To train the SRL system you will require access to the [Propbank](https://verbs.colorado.edu/~mpalmer/projects/ace.html)
or [Nombank](http://nlp.cs.nyu.edu/meyers/NomBank.html) corpora. You need to set pointers to these in the
`config/srl-config.properties` file.
(To train the system with a non-Prop/Nombank corpus, you need to extend
[`AbstractSRLAnnotationReader`](http://cogcomp.cs.illinois.edu/software/doc/illinois-core-utilities/apidocs/edu/illinois/cs/cogcomp/nlp/corpusreaders/AbstractSRLAnnotationReader.html))
To perform the whole training/testing suite, run the `Main` class with parameters `<config-file> expt Verb|Nom true`.
This will:
1. Read and cache the datasets (train/test)
2. Annotate each `TextAnnotation` with the required views
(here you can set the `useCurator` flag to false to use the CogComp's standalone NLP pipeline)
3. Pre-extract and cache the features for the classifiers
4. Train the classifiers
5. Evaluate on the (cached) test corpus
**IMPORTANT** After training, make sure you comment-out the pre-trained SRL model dependencies inside
`pom.xml` (lines 27-38).
\ No newline at end of file
# Available learning models: {L2LossSSVM, StructuredPerceptron}
LEARNING_MODEL = L2LossSSVM
# Available solver types: {DCDSolver, ParallelDCDSolver, DEMIParallelDCDSolver}
L2_LOSS_SSVM_SOLVER_TYPE = ParallelDCDSolver
NUMBER_OF_THREADS = 8
# Regularization parameter
C_FOR_STRUCTURE = 1.0
# Mini-batch for 'warm' start
TRAINMINI = true
TRAINMINI_SIZE = 10000
# Suppress optimatility check
CHECK_INFERENCE_OPT = false
# Number of training rounds
MAX_NUM_ITER = 100
# Conll config file
# Required fields
configFilename finalSystemBILOU
pathToModelFile data/Models/CoNLL
taggingEncodingScheme BILOU
tokenizationScheme DualTokenizationScheme
# Optional fields
beamSize 5
forceNewSentenceOnLineBreaks true
labelTypes PER ORG LOC MISC
logging false
# debuggingLogPath irrelevant
inferenceMethod GREEDY
normalizeTitleText false
pathToTokenNormalizationData brown-clusters/brown-english-wikitext.case-intact.txt-c1000-freq10-v3.txt
predictionConfidenceThreshold -1
sortLexicallyFilesInFolders true
thresholdPrediction false
treatAllFilesInFolderAsOneBigDocument true
debug true
# Features
Forms 1
Capitalization 1
WordTypeInformation 1
Affixes 1
PreviousTag1 1
PreviousTag2 1
PreviousTagPatternLevel1 1
PreviousTagPatternLevel2 1
AggregateContext 0
AggregateGazetteerMatches 0
PrevTagsForContext 1
PredictionsLevel1 1
# Feature groups
BrownClusterPaths 1
isLowercaseBrownClusters false false false
pathsToBrownClusters brown-clusters/brown-english-wikitext.case-intact.txt-c1000-freq10-v3.txt brown-clusters/brownBllipClusters brown-clusters/brown-rcv1.clean.tokenized-CoNLL03.txt-c1000-freq1.txt
minWordAppThresholdsForBrownClusters 5 5 5
GazetteersFeatures 1
pathToGazetteersLists ner-ext/KnownLists
WordEmbeddings 0
# pathsToWordEmbeddings WordEmbedding/model-2280000000.LEARNING_RATE=1e-08.EMBEDDING_LEARNING_RATE=1e-07.EMBEDDING_SIZE=50.gz
# embeddingDimensionalities 50
# minWordAppThresholdsForEmbeddings 0
# normalizationConstantsForEmbeddings 1.0
# normalizationMethodsForEmbeddings OVERALL
# isLowercaseWordEmbeddings false
usePos true
useChunker true
useLemmatizer true
useNer true
useStanfordParse true
lemmaCacheFile data/lemmaCache.txt
updateLemmaCacheFile false
maxLemmaCacheEntries 10000
wordnetPath data/WordNet
nerConfigFile config/ner-conll-config.properties
......@@ -6,21 +6,18 @@
# Whether to use the Illinois Curator to get the required annotations for training/testing
# If set to false, Illinois NLP pipeline will be used
UseCurator = true
# The URL and host of Curator. If UseCurator is false, make sure you have pipeline config file set
CuratorHost = trollope.cs.illinois.edu
CuratorPort = 9010
# The file containing the configuration for the Illinois NLP pipeline
PipelineConfigFile = config/pipeline-config.properties
UseCurator = false
# The parser used to extract constituents and syntactic features
# Options are: Charniak, Berkeley, Stanford
# NB: Only Stanford can be used in standalone mode.
DefaultParser = Charniak
DefaultParser = Stanford
WordNetConfig = jwnl_properties.xml
# The configuration for the Structured learner
LearnerConfig = config/learner.properties
### Training corpora directories ###
# This is the directory of the merged (mrg) WSJ files
PennTreebankHome = /shared/corpora/corporaWeb/treebanks/eng/pennTreebank/treebank-3/parsed/mrg/wsj/
......@@ -29,13 +26,10 @@ NombankHome = /shared/corpora/corporaWeb/treebanks/eng/nombank/
# The directory of the sentence and pre-extracted features database (~5G of space required)
# Not used during test/working with pre-trained models
# TODO Change this when done
CacheDirectory = /scratch/illinoisSRL/cache
CacheDirectory = cache
ModelsDirectory = models
# Directory to output gold and predicted files for manual comparison
# Comment out for no output
OutputDirectory = srl-out
MaxInferenceRounds = 200
\ No newline at end of file
OutputDirectory = srl-out
\ No newline at end of file
......@@ -69,11 +69,12 @@ public class IllinoisNomSRLHandler extends IllinoisAbstractHandler implements Pa
public Forest parseRecord(Record record) throws AnnotationFailedException,
TException {
try {
return srlSystem.getSRLForest(record);
// return srlSystem.getSRLForest(record);
} catch(Exception e) {
logger.error("Error annotating record", e);
throw new AnnotationFailedException(e.getMessage());
}
return null;
}
......
......@@ -69,11 +69,12 @@ public class IllinoisVerbSRLHandler extends IllinoisAbstractHandler implements P
public Forest parseRecord(Record record) throws AnnotationFailedException,
TException {
try {
return srlSystem.getSRLForest(record);
// return srlSystem.getSRLForest(record);
} catch(Exception e) {
logger.error("Error annotating record", e);
throw new AnnotationFailedException(e.getMessage());
}
return null;
}
......
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>edu.illinois.cs.cogcomp</groupId>
<artifactId>illinois-srl</artifactId>
<packaging>jar</packaging>
<version>5.1.2</version>
<url>http://cogcomp.cs.illinois.edu</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
</properties>
<repositories>
<repository>
<id>CogcompSoftware</id>
<name>CogcompSoftware</name>
<url>http://cogcomp.cs.illinois.edu/m2repo/</url>
</repository>
</repositories>
<dependencies>
<!--Include the pre-trained SRL models for running SemanticRoleLabeler-->
<!--<dependency>-->
<!--<groupId>edu.illinois.cs.cogcomp</groupId>-->
<!--<artifactId>illinois-srl</artifactId>-->
<!--<classifier>models-verb-stanford</classifier>-->
<!--<version>5.0</version>-->
<!--</dependency>-->
<!--<dependency>-->
<!--<groupId>edu.illinois.cs.cogcomp</groupId>-->
<!--<artifactId>illinois-srl</artifactId>-->
<!--<classifier>models-nom-stanford</classifier>-->
<!--<version>5.0</version>-->
<!--</dependency>-->
<!--The Illinois pipeline can be used instead -->
<dependency>
<groupId>edu.illinois.cs.cogcomp</groupId>
<artifactId>illinois-nlp-pipeline</artifactId>
<version>0.1.1</version>
</dependency>
<dependency>
<groupId>com.gurobi</groupId>
<artifactId>gurobi</artifactId>
<version>6.0</version>
<optional>true</optional>
</dependency>
<dependency>
<groupId>edu.illinois.cs.cogcomp</groupId>
<artifactId>illinois-common-resources</artifactId>
<classifier>illinoisSRL</classifier>
<version>1.4</version>
</dependency>
<dependency>
<groupId>edu.illinois.cs.cogcomp</groupId>
<artifactId>IllinoisSL-core</artifactId>
<version>0.1-withJLIS</version>
<!--TODO Remove this when IllinoisSL declares logback optional-->
<exclusions>
<exclusion>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-core</artifactId>
</exclusion>
<exclusion>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.h2database</groupId>
<artifactId>h2</artifactId>
<version>1.2.140</version>
</dependency>
<dependency>
<groupId>edu.illinois.cs.cogcomp</groupId>
<artifactId>edison</artifactId>
<version>0.7.8</version>
</dependency>
<dependency>
<groupId>edu.illinois.cs.cogcomp</groupId>
<artifactId>inference</artifactId>
<version>0.4.1</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
<version>1.6.1</version>
<optional>true</optional>
</dependency>
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>edu.illinois.cs.cogcomp</groupId>
<artifactId>illinois-srl</artifactId>
<packaging>jar</packaging>
<version>5.1.4</version>
<url>http://cogcomp.cs.illinois.edu</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
</properties>
<repositories>
<repository>
<id>CogcompSoftware</id>
<name>CogcompSoftware</name>
<url>http://cogcomp.cs.illinois.edu/m2repo/</url>
</repository>
</repositories>
<dependencies>
<!-- Include the pre-trained SRL models for running SemanticRoleLabeler -->
<dependency>
<groupId>edu.illinois.cs.cogcomp</groupId>
<artifactId>illinois-srl</artifactId>
<classifier>models-verb-stanford</classifier>
<version>5.1.4</version>
</dependency>
<dependency>
<groupId>edu.illinois.cs.cogcomp</groupId>
<artifactId>illinois-srl</artifactId>
<classifier>models-nom-stanford</classifier>
<version>5.1.4</version>
</dependency>
<!--The Illinois pipeline can be used instead -->
<dependency>
<groupId>edu.illinois.cs.cogcomp</groupId>
<artifactId>illinois-nlp-pipeline</artifactId>
<version>0.1.9</version>
</dependency>
<dependency>
<groupId>edu.illinois.cs.cogcomp</groupId>
<artifactId>illinois-core-utilities</artifactId>
<version>1.3.1</version>
</dependency>
<dependency>
<groupId>edu.illinois.cs.cogcomp</groupId>
<artifactId>illinois-curator</artifactId>
<version>3.1.1</version>
</dependency>
<dependency>
<groupId>com.gurobi</groupId>
<artifactId>gurobi</artifactId>
<version>6.0</version>
<optional>true</optional>
</dependency>
<dependency>
<groupId>edu.illinois.cs.cogcomp</groupId>
<artifactId>illinois-common-resources</artifactId>
<classifier>illinoisSRL</classifier>
<version>1.4</version>
</dependency>
<dependency>
<groupId>edu.illinois.cs.cogcomp</groupId>
<artifactId>illinois-sl</artifactId>
<version>1.3.1</version>
</dependency>
<dependency>
<groupId>commons-lang</groupId>
<artifactId>commons-lang</artifactId>
<version>2.6</version>
</dependency>
<dependency>
<groupId>com.h2database</groupId>
<artifactId>h2</artifactId>
<version>1.2.140</version>
</dependency>
<dependency>
<groupId>edu.illinois.cs.cogcomp</groupId>
<artifactId>edison</artifactId>
<version>1.7.9</version>
</dependency>
<dependency>
<groupId>edu.illinois.cs.cogcomp</groupId>
<artifactId>inference</artifactId>
<version>0.4.2</version>
</dependency>
<dependency>
<groupId>org.tartarus</groupId>
<artifactId>snowball</artifactId>
<version>1.0</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
<version>1.7.7</version>
<optional>true</optional>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>2.0.2</version>
<configuration>
<source>1.6</source>
<target>1.6</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-source-plugin</artifactId>
<version>2.1.2</version>
<executions>
<execution>
<id>attach-sources</id>
<goals>
<goal>jar</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
<resources>
<resource>
<directory>src/main/resources</directory>
</resource>
</resources>
<extensions>
<extension>
<groupId>org.apache.maven.wagon</groupId>
<artifactId>wagon-ssh</artifactId>
<version>2.4</version>
</extension>
</extensions>
</build>
<distributionManagement>
<repository>
<id>CogcompSoftware</id>
<name>CogcompSoftware</name>
<url>scp://bilbo.cs.illinois.edu:/mounts/bilbo/disks/0/www/cogcomp/html/m2repo</url>
</repository>
</distributionManagement>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>2.0.2</version>
<configuration>
<source>1.7</source>
<target>1.7</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-source-plugin</artifactId>
<version>2.1.2</version>
<executions>
<execution>
<id>attach-sources</id>
<goals>
<goal>jar</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
<resources>
<resource>
<directory>src/main/resources</directory>
</resource>
</resources>
<extensions>
<extension>
<groupId>org.apache.maven.wagon</groupId>
<artifactId>wagon-ssh</artifactId>
<version>2.4</version>
</extension>
</extensions>
</build>
<reporting>
<excludeDefaults>true</excludeDefaults>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-javadoc-plugin</artifactId>
<version>2.10.3</version>
</plugin>
</plugins>
</reporting>
<distributionManagement>
<repository>
<id>CogcompSoftware</id>
<name>CogcompSoftware</name>
<url>scp://bilbo.cs.illinois.edu:/mounts/bilbo/disks/0/www/cogcomp/html/m2repo</url>
</repository>
<site>
<id>CogcompSoftwareDoc</id>
<url>scp://bilbo.cs.illinois.edu:/mounts/bilbo/disks/0/www/cogcomp/html/software/doc/${project.artifactId}</url>
</site>
</distributionManagement>
</project>
#!/bin/bash -e
VERSION=`mvn org.apache.maven.plugins:maven-help-plugin:2.1.1:evaluate -Dexpression=project.version | grep -v 'INFO'`
tmpdir=tmp-Srl-verb-$RANDOM
rm -rdf ${tmpdir}
mkdir -p ${tmpdir}/models
for parser in STANFORD CHARNIAK; do
cp ./models/Verb*${parser}* ${tmpdir}/models
cd ${tmpdir}
rm -rdf ../target/illinois-srl-models-verb-${parser}-${VERSION}.jar
jar cf ../target/illinois-srl-models-verb-${parser}-${VERSION}.jar models
cd ..
rm ${tmpdir}/models/*
done
rm -rdf ${tmpdir}
tmpdir=tmp-Srl-nom-$RANDOM
rm -rdf ${tmpdir}
mkdir -p ${tmpdir}/models
for parser in STANFORD CHARNIAK; do
cp ./models/Nom*${parser}* ${tmpdir}/models
cd ${tmpdir}
rm -rdf ../target/illinois-srl-models-nom-${parser}-${VERSION}.jar
jar cf ../target/illinois-srl-models-nom-${parser}-${VERSION}.jar models
cd ..
rm ${tmpdir}/models/*
done
rm -rdf ${tmpdir}
\ No newline at end of file
......@@ -2,6 +2,51 @@
VERSION=`mvn org.apache.maven.plugins:maven-help-plugin:2.1.1:evaluate -Dexpression=project.version | grep -v 'INFO'`
tmpdir=tmp-Srl-verb-$RANDOM
rm -rdf ${tmpdir}
mkdir -p ${tmpdir}/models
for parser in STANFORD CHARNIAK; do
if [ ! -e "./models/Verb.Classifier.PARSE_${parser}.lex" ]; then
echo "$parser Verb models not found"
continue
fi
cp ./models/Verb*${parser}* ${tmpdir}/models
cd ${tmpdir}
rm -rdf ../target/illinois-srl-models-verb-${parser}-${VERSION}.jar
jar cf ../target/illinois-srl-models-verb-${parser}-${VERSION}.jar models
cd ..
rm ${tmpdir}/models/*
done
rm -rdf ${tmpdir}
tmpdir=tmp-Srl-nom-$RANDOM
rm -rdf ${tmpdir}
mkdir -p ${tmpdir}/models
for parser in STANFORD CHARNIAK; do
if [ ! -e "./models/Nom.Classifier.PARSE_${parser}.lex" ]; then
echo "$parser Nom models not found"
continue
fi
cp ./models/Nom*${parser}* ${tmpdir}/models
cd ${tmpdir}
rm -rdf ../target/illinois-srl-models-nom-${parser}-${VERSION}.jar
jar cf ../target/illinois-srl-models-nom-${parser}-${VERSION}.jar models
cd ..
rm ${tmpdir}/models/*
done
rm -rdf ${tmpdir}
echo "Compiled models to jars"
if [ -e "target/illinois-srl-models-nom-CHARNIAK-${VERSION}.jar" ]; then
echo "Deploying illinois-srl-models-nom-CHARNIAK-${VERSION}.jar"
mvn deploy:deploy-file \
-Dfile=target/illinois-srl-models-nom-CHARNIAK-${VERSION}.jar \
......@@ -12,7 +57,9 @@ mvn deploy:deploy-file \
-Dpackaging=jar \
-Durl=scp://bilbo.cs.illinois.edu:/mounts/bilbo/disks/0/www/cogcomp/html/m2repo \
-DrepositoryId=CogcompSoftware
fi
if [ -e "target/illinois-srl-models-nom-STANFORD-${VERSION}.jar" ]; then
echo "Deploying illinois-srl-models-nom-STANFORD-${VERSION}.jar"
mvn deploy:deploy-file \
-Dfile=target/illinois-srl-models-nom-STANFORD-${VERSION}.jar \
......@@ -23,7 +70,9 @@ mvn deploy:deploy-file \
-Dpackaging=jar \
-Durl=scp://bilbo.cs.illinois.edu:/mounts/bilbo/disks/0/www/cogcomp/html/m2repo \
-DrepositoryId=CogcompSoftware
fi
if [ -e "target/illinois-srl-models-verb-CHARNIAK-${VERSION}.jar" ]; then
echo "Deploying illinois-srl-models-verb-CHARNIAK-${VERSION}.jar"
mvn deploy:deploy-file \
-Dfile=target/illinois-srl-models-verb-CHARNIAK-${VERSION}.jar \
......@@ -34,7 +83,9 @@ mvn deploy:deploy-file \
-Dpackaging=jar \
-Durl=scp://bilbo.cs.illinois.edu:/mounts/bilbo/disks/0/www/cogcomp/html/m2repo \
-DrepositoryId=CogcompSoftware
fi
if [ -e "target/illinois-srl-models-verb-STANFORD-${VERSION}.jar" ]; then
echo "Deploying illinois-srl-models-verb-STANFORD-${VERSION}.jar"
mvn deploy:deploy-file \
-Dfile=target/illinois-srl-models-verb-STANFORD-${VERSION}.jar \
......@@ -44,4 +95,5 @@ mvn deploy:deploy-file \
-Dclassifier=models-verb-stanford \
-Dpackaging=jar \
-Durl=scp://bilbo.cs.illinois.edu:/mounts/bilbo/disks/0/www/cogcomp/html/m2repo \
-DrepositoryId=CogcompSoftware
\ No newline at end of file
-DrepositoryId=CogcompSoftware
fi
\ No newline at end of file
#!/bin/bash
mvn compile
mvn clean compile
mvn -q dependency:copy-dependencies
CP=target/classes:config:target/dependency/*
......
#!/bin/bash
mvn compile
mvn clean compile
mvn -q dependency:copy-dependencies
CP=target/classes:config:target/dependency/*
MEMORY="-Xmx25g"
MEMORY="-Xmx100g"
OPTIONS="-ea $MEMORY -cp $CP "
#OPTIONS="-ea $MEMORY -cp $CP "
OPTIONS="$MEMORY -cp $CP "
MAINCLASS=edu.illinois.cs.cogcomp.srl.Main
#MAINCLASS=edu.illinois.cs.cogcomp.srl.SemanticRoleLabeler
time nice java $OPTIONS $MAINCLASS "$@"
package edu.illinois.cs.cogcomp.srl;
import java.io.PrintWriter;
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import edu.illinois.cs.cogcomp.core.datastructures.ViewNames;
import edu.illinois.cs.cogcomp.core.datastructures.textannotation.Constituent;
import edu.illinois.cs.cogcomp.core.datastructures.textannotation.PredicateArgumentView;
import edu.illinois.cs.cogcomp.core.datastructures.textannotation.Relation;
import edu.illinois.cs.cogcomp.core.datastructures.textannotation.Sentence;
import edu.illinois.cs.cogcomp.core.datastructures.textannotation.SpanLabelView;
import edu.illinois.cs.cogcomp.core.datastructures.textannotation.TextAnnotation;
import edu.illinois.cs.cogcomp.core.datastructures.textannotation.TextAnnotationUtilities;
import edu.illinois.cs.cogcomp.core.datastructures.trees.Tree;
import edu.illinois.cs.cogcomp.edison.features.helpers.ParseHelper;
import edu.illinois.cs.cogcomp.edison.features.helpers.WordHelpers;
import edu.illinois.cs.cogcomp.nlp.corpusreaders.CoNLLColumnFormatReader;
import edu.illinois.cs.cogcomp.nlp.utilities.ParseUtils;
/**
* Prints text annotation formatted with one word per line as follows.
* <blockquote> <code>
* form POS full-parse chunk NE verb-sense verb-lemma [verb1-args
* [verb2-args ... ]]
* </code> </blockquote>
*
* @author Vivek Srikumar
*/
public class ColumnFormatWriter {
private final String predicateArgumentViewName;
private final String parseViewName;
public ColumnFormatWriter(String parseViewName,
String predicateArgumentViewName) {
this.parseViewName = parseViewName;
this.predicateArgumentViewName = predicateArgumentViewName;
}
public ColumnFormatWriter() {
this(ViewNames.PARSE_CHARNIAK, ViewNames.SRL_VERB);
}
public void transform(Iterable<TextAnnotation> reader, PrintWriter out)
throws Exception {
for (TextAnnotation ta : reader) {
transform(ta, out);
}
}
public void transform(TextAnnotation ta, PrintWriter out) throws Exception {
String[][] columns = transformToColumns(ta);
printFormatted(columns, out, ta);
}
public void printPredicateArgumentView(PredicateArgumentView pav,
PrintWriter out) {
// System.out.println("*" + pav + "*");
List<String[]> columns = new ArrayList<>();
convertPredicateArgView(pav.getTextAnnotation(), pav, columns, false);
String[][] tr = transpose(columns, pav.getTextAnnotation().size());
printFormatted(tr, out, pav.getTextAnnotation());
}
private void printFormatted(String[][] columns, PrintWriter out, TextAnnotation ta) {
// leftOfStar: length of everything before the asterisk.
// rightOfStar: length of asterisk and what comes after.
int[] leftOfStar = new int[columns[0].length];
int[] rightOfStart = new int[columns[0].length];
for (String[] rowData : columns) {
for (int col = 0; col < rowData.length; col++) {
String word = rowData[col];
int starPos = word.indexOf("*");
int lenLeft, lenRight;
if (starPos < 0) {
lenLeft = word.length();
lenRight = -1;
} else {
lenLeft = starPos + 1;
lenRight = word.length() - starPos + 1;
}
if (leftOfStar[col] < lenLeft)
leftOfStar[col] = lenLeft;
if (rightOfStart[col] < lenRight)
rightOfStart[col] = lenRight;
}
}
// System.out.println("here");
assert ta.size() == columns.length;
for (int sentenceId = 0; sentenceId < ta.getNumberOfSentences(); sentenceId++) {
int start = ta.getSentence(sentenceId).getStartSpan();
for (int row = start; row < ta.getSentence(sentenceId).getEndSpan(); row++) {
String[] rowData = columns[row];
out.print(rowData[0]);
// print the spaces
for (int spCount = rowData[0].length(); spCount < leftOfStar[0]; spCount++)
out.print(" ");
out.print(" " + rowData[1]);
// print the spaces
for (int spCount = rowData[1].length(); spCount < leftOfStar[1]; spCount++)
out.print(" ");
out.print(" ");
for (int colId = 2; colId < rowData.length; colId++) {
String word = rowData[colId];
int starPos = word.indexOf("*");
int leftSpaces, rightSpaces;
leftSpaces = leftOfStar[colId];
rightSpaces = rightOfStart[colId];
if (rightSpaces == 0)
leftSpaces = 0;
else
leftSpaces -= starPos;
if (rightSpaces == 0) {
rightSpaces = leftOfStar[colId] - word.length();
} else {
rightSpaces -= (word.length() - starPos);
}
for (int i = 0; i < leftSpaces - 1; i++)
out.print(" ");
out.print(word + " ");
for (int i = 0; i < rightSpaces; i++)
out.print(" ");
}
out.println();
}
out.println();
}
}
private String[][] transpose(List<String[]> columns, int size) {
String[][] output = new String[size][];
for (int i = 0; i < size; i++) {
output[i] = new String[columns.size()];
}
for (int row = 0; row < size; row++) {
for (int col = 0; col < columns.size(); col++) {
output[row][col] = columns.get(col)[row];
}
}
return output;
}
/**
* Return a table. Numrows = number of words. Num Cols depends on how many
* predicate arg relations we have
*/
private String[][] transformToColumns(TextAnnotation ta) {
List<String[]> columns = new ArrayList<>();
// first the words
String[] form = new String[ta.size()];
String[] pos = new String[ta.size()];
for (int i = 0; i < ta.size(); i++) {
form[i] = WordHelpers.getWord(ta, i);
pos[i] = WordHelpers.getPOS(ta, i);
}
columns.add(form);
columns.add(pos);
// now the parse
String[] parse = getParse(ta);
columns.add(parse);
// add the chunks
String[] chunk = getChunkData(ta);
columns.add(chunk);
// add the ner. For now, we don't have ner annotation
String[] ne = getNEData(ta);
columns.add(ne);
// if (ta.hasView(ViewNames.SRL_NOM))
// // add the predicate argument column information
// addPredicateArgs(columns, ta);
// add the predicate argument column information
addPredicateArgs(columns, ta);
return transpose(columns, ta.size());
}
private static String[] getNEData(TextAnnotation ta) {
if (!ta.hasView(ViewNames.NER_CONLL)) {
String[] chunk = new String[ta.size()];
for (int i = 0; i < ta.size(); i++) {
chunk[i] = "*";
}
return chunk;
}
SpanLabelView nerView = (SpanLabelView) ta.getView(ViewNames.NER_CONLL);
List<Constituent> nerConstituents = nerView.getConstituents();
Collections.sort(nerConstituents,
TextAnnotationUtilities.constituentStartComparator);
Map<Integer, String> cc = new HashMap<>();
for (Constituent c : nerConstituents) {
for (int i = c.getStartSpan(); i < c.getEndSpan(); i++) {
if (i == c.getStartSpan())
cc.put(i, "(" + c.getLabel());
else
cc.put(i, "");
cc.put(i, cc.get(i) + "*");
if (i == c.getEndSpan() - 1)
cc.put(i, cc.get(i) + ")");
}
}
String[] ner = new String[ta.size()];
for (int i = 0; i < ta.size(); i++) {
if (cc.containsKey(i)) {
ner[i] = cc.get(i);
} else
ner[i] = "*";
}
return ner;
}
private static String[] getChunkData(TextAnnotation ta) {
if (!ta.hasView(ViewNames.SHALLOW_PARSE)) {
String[] chunk = new String[ta.size()];
for (int i = 0; i < ta.size(); i++) {
chunk[i] = "*";
}
return chunk;
}
SpanLabelView chunkView = (SpanLabelView) ta.getView(ViewNames.SHALLOW_PARSE);
List<Constituent> chunkConstituents = chunkView.getConstituents();
Collections.sort(chunkConstituents, TextAnnotationUtilities.constituentStartComparator);
Map<Integer, String> cc = new HashMap<>();
for (Constituent c : chunkConstituents) {
for (int i = c.getStartSpan(); i < c.getEndSpan(); i++) {
if (i == c.getStartSpan())
cc.put(i, "(" + c.getLabel());
else
cc.put(i, "");
cc.put(i, cc.get(i) + "*");
if (i == c.getEndSpan() - 1)
cc.put(i, cc.get(i) + ")");
}
}
String[] chunk = new String[ta.size()];
for (int i = 0; i < ta.size(); i++) {
if (cc.containsKey(i)) {
chunk[i] = cc.get(i);
} else
chunk[i] = "*";
}
return chunk;
}
private void addPredicateArgs(List<String[]> columns, TextAnnotation ta) {
PredicateArgumentView predArgView = null;
if (ta.hasView(predicateArgumentViewName))
predArgView = (PredicateArgumentView) ta.getView(predicateArgumentViewName);
convertPredicateArgView(ta, predArgView, columns, true);
}
private void convertPredicateArgView(TextAnnotation ta,
PredicateArgumentView pav, List<String[]> columns, boolean addSense) {
List<Constituent> predicates = new ArrayList<>();
if (pav != null)
predicates = pav.getPredicates();
Collections.sort(predicates, TextAnnotationUtilities.constituentStartComparator);
int size = ta.size();
addPredicateInfo(columns, predicates, size, addSense);
for (Constituent predicate : predicates) {
assert pav != null;
List<Relation> args = pav.getArguments(predicate);
String[] paInfo = addPredicateArgInfo(predicate, args, size);
columns.add(paInfo);
}
}
private void addPredicateInfo(List<String[]> columns,
List<Constituent> predicates, int size, boolean addSense) {
Map<Integer, String> senseMap = new HashMap<>();
Map<Integer, String> lemmaMap = new HashMap<>();
for (Constituent c : predicates) {
senseMap.put(c.getStartSpan(), c.getAttribute(CoNLLColumnFormatReader.SenseIdentifer));
lemmaMap.put(c.getStartSpan(), c.getAttribute(CoNLLColumnFormatReader.LemmaIdentifier));
}
String[] sense = new String[size];
String[] lemma = new String[size];
for (int i = 0; i < size; i++) {
if (lemmaMap.containsKey(i)) {
sense[i] = senseMap.get(i);
lemma[i] = lemmaMap.get(i);
} else {
sense[i] = "-";
lemma[i] = "-";
}
}
if (addSense)
columns.add(sense);
columns.add(lemma);
}
private String[] addPredicateArgInfo(Constituent predicate,
List<Relation> args, int size) {
Map<Integer, String> paInfo = new HashMap<>();
paInfo.put(predicate.getStartSpan(), "(V*)");
for (Relation r : args) {
String argPredicate = r.getRelationName();
argPredicate = argPredicate.replaceAll("ARG", "A");
argPredicate = argPredicate.replaceAll("Support", "SUP");
for (int i = r.getTarget().getStartSpan(); i < r.getTarget().getEndSpan(); i++) {
paInfo.put(i, "*");
if (i == r.getTarget().getStartSpan())
paInfo.put(i, "(" + argPredicate + paInfo.get(i));
if (i == r.getTarget().getEndSpan() - 1)
paInfo.put(i, paInfo.get(i) + ")");
}
}
String[] paColumn = new String[size];
for (int i = 0; i < size; i++) {
if (paInfo.containsKey(i))
paColumn[i] = paInfo.get(i);
else
paColumn[i] = "*";
}
return paColumn;
}
private String[] getParse(TextAnnotation ta) {
String[] parse = new String[ta.size()];
for (int sentenceId = 0; sentenceId < ta.getNumberOfSentences(); sentenceId++) {
Tree<String> tree = ParseHelper.getParseTree(parseViewName, ta, sentenceId);
Sentence sentence = ta.getSentence(sentenceId);
tree = ParseUtils.snipNullNodes(tree);
tree = ParseUtils.stripFunctionTags(tree);
String[] treeLines = tree.toString().split("\n");
if (treeLines.length != sentence.size()) {
System.out.println(ta);
System.out.println(ta.getView(parseViewName));
System.out.println("Sentence: " + sentence);
throw new IllegalStateException("Expected " + sentence.size()
+ " tokens, but found " + treeLines.length
+ " tokens in the tree");
}
for (int i = 0; i < treeLines.length; i++) {
String t = treeLines[i].replaceAll(" ", "");
// get rid of the word
t = t.replaceAll("\\([^\\(\\)]*\\)", "*");
// get rid of the pos and replace with a "*"
parse[sentence.getStartSpan() + i] = t;
}
}
return parse;
}
}
\ No newline at end of file
......@@ -5,7 +5,7 @@ package edu.illinois.cs.cogcomp.srl;
* TODO Change this before shipping
*/
public class Constants {
public final static String systemVersion = "5.1";
public final static String systemVersion = "5.4.1";
public final static String systemName = "illinoisSRL";
......
......@@ -2,22 +2,24 @@ package edu.illinois.cs.cogcomp.srl;
import edu.illinois.cs.cogcomp.core.datastructures.Lexicon;
import edu.illinois.cs.cogcomp.core.datastructures.Pair;
import edu.illinois.cs.cogcomp.core.datastructures.ViewNames;
import edu.illinois.cs.cogcomp.core.datastructures.textannotation.IResetableIterator;
import edu.illinois.cs.cogcomp.core.datastructures.textannotation.PredicateArgumentView;
import edu.illinois.cs.cogcomp.core.datastructures.textannotation.TextAnnotation;
import edu.illinois.cs.cogcomp.core.experiments.ClassificationTester;
import edu.illinois.cs.cogcomp.core.io.IOUtils;
import edu.illinois.cs.cogcomp.core.stats.Counter;
import edu.illinois.cs.cogcomp.core.utilities.commands.CommandDescription;
import edu.illinois.cs.cogcomp.core.utilities.commands.CommandIgnore;
import edu.illinois.cs.cogcomp.core.utilities.commands.InteractiveShell;
import edu.illinois.cs.cogcomp.edison.data.ColumnFormatWriter;
import edu.illinois.cs.cogcomp.edison.data.IResetableIterator;
import edu.illinois.cs.cogcomp.edison.data.srl.NombankReader;
import edu.illinois.cs.cogcomp.edison.data.srl.PropbankReader;
import edu.illinois.cs.cogcomp.edison.sentences.PredicateArgumentView;
import edu.illinois.cs.cogcomp.edison.sentences.TextAnnotation;
import edu.illinois.cs.cogcomp.edison.sentences.ViewNames;
import edu.illinois.cs.cogcomp.infer.ilp.ILPSolverFactory;
import edu.illinois.cs.cogcomp.sl.core.StructuredProblem;
import edu.illinois.cs.cogcomp.sl.inference.AbstractInferenceSolver;
import edu.illinois.cs.cogcomp.nlp.corpusreaders.NombankReader;
import edu.illinois.cs.cogcomp.nlp.corpusreaders.PropbankReader;
import edu.illinois.cs.cogcomp.sl.core.SLParameters;
import edu.illinois.cs.cogcomp.sl.core.SLProblem;
import edu.illinois.cs.cogcomp.sl.learner.Learner;
import edu.illinois.cs.cogcomp.sl.learner.LearnerFactory;
import edu.illinois.cs.cogcomp.sl.util.IFeatureVector;
import edu.illinois.cs.cogcomp.sl.util.WeightVector;
import edu.illinois.cs.cogcomp.srl.caches.FeatureVectorCacheFile;
import edu.illinois.cs.cogcomp.srl.caches.SentenceDBHandler;
......@@ -31,11 +33,13 @@ import edu.illinois.cs.cogcomp.srl.experiment.PruningPreExtractor;
import edu.illinois.cs.cogcomp.srl.experiment.TextPreProcessor;
import edu.illinois.cs.cogcomp.srl.inference.SRLILPInference;
import edu.illinois.cs.cogcomp.srl.inference.SRLMulticlassInference;
import edu.illinois.cs.cogcomp.srl.jlis.SRLFeatureExtractor;
import edu.illinois.cs.cogcomp.srl.jlis.SRLMulticlassInstance;
import edu.illinois.cs.cogcomp.srl.jlis.SRLMulticlassLabel;
import edu.illinois.cs.cogcomp.srl.learn.IdentifierThresholdTuner;
import edu.illinois.cs.cogcomp.srl.learn.JLISLearner;
import edu.illinois.cs.cogcomp.srl.learn.LearnerParameters;
import edu.illinois.cs.cogcomp.srl.nom.NomSRLManager;
import edu.illinois.cs.cogcomp.srl.utilities.PredicateArgumentEvaluator;
import edu.illinois.cs.cogcomp.srl.utilities.WeightVectorUtils;
import edu.illinois.cs.cogcomp.srl.verb.VerbSRLManager;
import org.apache.commons.configuration.ConfigurationException;
import org.slf4j.Logger;
......@@ -56,7 +60,7 @@ public class Main {
@CommandIgnore
public static void main(String[] arguments) throws ConfigurationException {
InteractiveShell<Main> shell = new InteractiveShell<Main>(Main.class);
InteractiveShell<Main> shell = new InteractiveShell<>(Main.class);
if (arguments.length == 0) {
System.err.println("Usage: <config-file> command");
......@@ -98,15 +102,22 @@ public class Main {
if (Boolean.parseBoolean(cacheDatasets)) cacheDatasets();
// Step 2: Iterate between pre-extracting all the features needed for training and training
preExtract(srlType, "Sense");
train(srlType, "Sense");
preExtract(srlType, "Identifier");
train(srlType, "Identifier");
tuneIdentifier(srlType);
// We don't need to train a predicate classifier for Verb
if (SRLType.valueOf(srlType) == SRLType.Nom) {
preExtract(srlType, "Predicate");
train(srlType, "Predicate");
}
preExtract(srlType, "Classifier");
train(srlType, "Classifier");
preExtract(srlType, "Sense");
train(srlType, "Sense");
preExtract(srlType, "Identifier");
train(srlType, "Identifier");
tuneIdentifier(srlType);
preExtract(srlType, "Classifier");
train(srlType, "Classifier");
// Step 3: Evaluate
evaluate(srlType);
......@@ -134,7 +145,7 @@ public class Main {
String treebankHome = properties.getPennTreebankHome();
String[] allSectionsArray = properties.getAllSections();
List<String> trainSections = Arrays.asList(properties.getAllTrainSections());
List<String> testSections = Arrays.asList(properties.getTestSections());
List<String> testSections = Collections.singletonList(properties.getTestSections());
List<String> trainDevSections = Arrays.asList(properties.getTrainDevSections());
List<String> devSections = Arrays.asList(properties.getDevSections());
List<String> ptb0204Sections = Arrays.asList("02", "03", "04");
......@@ -177,15 +188,15 @@ public class Main {
}
private static void addRequiredViews(IResetableIterator<TextAnnotation> dataset) {
Counter<String> addedViews = new Counter<String>();
Counter<String> addedViews = new Counter<>();
log.info("Initializing pre-processor");
TextPreProcessor.initialize(configFile, true);
TextPreProcessor.initialize(configFile);
int count = 0;
while (dataset.hasNext()) {
TextAnnotation ta = dataset.next();
Set<String> views = new HashSet<String>(ta.getAvailableViews());
Set<String> views = new HashSet<>(ta.getAvailableViews());
try {
TextPreProcessor.getInstance().preProcessText(ta);
......@@ -206,7 +217,7 @@ public class Main {
continue;
}
Set<String> newViews = new HashSet<String>(ta.getAvailableViews());
Set<String> newViews = new HashSet<>(ta.getAvailableViews());
newViews.removeAll(views);
if (newViews.size() > 0) {
......@@ -224,10 +235,20 @@ public class Main {
public static SRLManager getManager(SRLType srlType, boolean trainingMode) throws Exception {
String viewName;
if (defaultParser == null) defaultParser = SRLProperties.getInstance().getDefaultParser();
if (defaultParser.equals("Charniak")) viewName = ViewNames.PARSE_CHARNIAK;
else if (defaultParser.equals("Berkeley")) viewName = ViewNames.PARSE_BERKELEY;
else if (defaultParser.equals("Stanford")) viewName = ViewNames.PARSE_STANFORD;
else viewName = defaultParser;
switch (defaultParser) {
case "Charniak":
viewName = ViewNames.PARSE_CHARNIAK;
break;
case "Berkeley":
viewName = ViewNames.PARSE_BERKELEY;
break;
case "Stanford":
viewName = ViewNames.PARSE_STANFORD;
break;
default:
viewName = defaultParser;
break;
}
if (srlType == SRLType.Verb)
return new VerbSRLManager(trainingMode, viewName);
......@@ -263,7 +284,8 @@ public class Main {
log.info("Pre-extracting {} features", modelToExtract);
ModelInfo modelInfo = manager.getModelInfo(modelToExtract);
assert manager != null;
ModelInfo modelInfo = manager.getModelInfo(modelToExtract);
String featureSet = "" + modelInfo.featureManifest.getIncludedFeatures().hashCode();
......@@ -298,9 +320,8 @@ public class Main {
Models modelToExtract, FeatureVectorCacheFile featureCache,
String cacheFile2) throws Exception {
if (IOUtils.exists(cacheFile2)) {
log.warn("Old pruned cache file found. Deleting...");
IOUtils.rm(cacheFile2);
log.info("Done");
log.warn("Old pruned cache file found. Not doing anything...");
return;
}
log.info("Pruning features. Saving pruned features to {}", cacheFile2);
......@@ -318,9 +339,20 @@ public class Main {
int numConsumers, SRLManager manager, Models modelToExtract, Dataset dataset,
String cacheFile, boolean lockLexicon) throws Exception {
if (IOUtils.exists(cacheFile)) {
log.warn("Old cache file found. Deleting...");
IOUtils.rm(cacheFile);
log.info("Done");
log.warn("Old cache file found. Returning it...");
FeatureVectorCacheFile vectorCacheFile = new FeatureVectorCacheFile(cacheFile, modelToExtract, manager);
vectorCacheFile.openReader();
while (vectorCacheFile.hasNext()) {
Pair<SRLMulticlassInstance, SRLMulticlassLabel> pair=vectorCacheFile.next();
IFeatureVector cachedFeatureVector = pair.getFirst().getCachedFeatureVector(modelToExtract);
int length=cachedFeatureVector.getNumActiveFeatures();
for(int i=0;i<length;i++) {
manager.getModelInfo(modelToExtract).getLexicon().countFeature(cachedFeatureVector.getIdx(i));
}
}
vectorCacheFile.close();
vectorCacheFile.openReader();
return vectorCacheFile;
}
FeatureVectorCacheFile featureCache = new FeatureVectorCacheFile(cacheFile, modelToExtract, manager);
......@@ -348,41 +380,37 @@ public class Main {
"**************************************************\n" +
"** " + gapStr + "TRAINING " + model_.toUpperCase() + gapStr + " **\n" +
"**************************************************\n");
int numThreads = Runtime.getRuntime().availableProcessors();
Models model = Models.valueOf(model_);
ModelInfo modelInfo = manager.getModelInfo(model);
assert manager != null;
ModelInfo modelInfo = manager.getModelInfo(model);
String featureSet = "" + modelInfo.featureManifest.getIncludedFeatures().hashCode();
String cacheFile = properties.getPrunedFeatureCacheFile(srlType, model, featureSet, defaultParser);
AbstractInferenceSolver[] inference = new AbstractInferenceSolver[numThreads];
System.out.println("In train feat cahce is "+cacheFile);
for (int i = 0; i < inference.length; i++)
inference[i] = new SRLMulticlassInference(manager, model);
double c;
// NB: Tuning code for the C value has been deleted
double c = 0.01;
FeatureVectorCacheFile cache;
if (model == Models.Classifier) {
c = 0.00390625;
log.info("Skipping cross-validation for Classifier. c = {}", c);
}
else {
cache = new FeatureVectorCacheFile(cacheFile, model, manager);
StructuredProblem cvProblem = cache.getStructuredProblem(20000);
cache.close();
LearnerParameters params = JLISLearner.cvStructSVMSRL(cvProblem, inference, 5);
c = params.getcStruct();
log.info("c = {} for {} after cv", c, srlType + " " + model);
}
if (model == Models.Classifier) c = 0.00390625;
cache = new FeatureVectorCacheFile(cacheFile, model, manager);
SLProblem problem;
problem = cache.getStructuredProblem();
StructuredProblem problem = cache.getStructuredProblem();
cache.close();
WeightVector w = JLISLearner.trainStructSVM(inference, problem, c);
JLISLearner.saveWeightVector(w, manager.getModelFileName(model));
log.info("Setting up solver, learning may take time if you have too many instances in SLProblem ....");
SLParameters params = new SLParameters();
params.loadConfigFile(properties.getLearnerConfig());
params.C_FOR_STRUCTURE = (float) c;
SRLMulticlassInference infSolver = new SRLMulticlassInference(manager, model);
Learner learner = LearnerFactory.getLearner(infSolver, new SRLFeatureExtractor(), params);
WeightVector w = learner.train(problem);
WeightVectorUtils.save(manager.getModelFileName(model), w);
}
private static void tuneIdentifier(String srlType_) throws Exception {
......@@ -393,7 +421,8 @@ public class Main {
if (srlType == SRLType.Nom)
nF = 3;
ModelInfo modelInfo = manager.getModelInfo(Models.Identifier);
assert manager != null;
ModelInfo modelInfo = manager.getModelInfo(Models.Identifier);
modelInfo.loadWeightVector();
String featureSet = "" + modelInfo.featureManifest.getIncludedFeatures().hashCode();
......@@ -404,13 +433,13 @@ public class Main {
FeatureVectorCacheFile cache = new FeatureVectorCacheFile(cacheFile, Models.Identifier, manager);
StructuredProblem problem = cache.getStructuredProblem();
SLProblem problem = cache.getStructuredProblem();
cache.close();
IdentifierThresholdTuner tuner = new IdentifierThresholdTuner(manager, nF, problem);
List<Double> A = new ArrayList<Double>();
List<Double> B = new ArrayList<Double>();
List<Double> A = new ArrayList<>();
List<Double> B = new ArrayList<>();
for (double x = 0.01; x < 10; x += 0.01) {
A.add(x);
......@@ -433,12 +462,13 @@ public class Main {
String outDir = properties.getOutputDir();
PrintWriter goldWriter = null, predWriter = null;
ColumnFormatWriter writer = null;
String goldOutFile = null, predOutFile = null;
if (outDir != null) {
// If output directory doesn't exist, create it
if (!IOUtils.isDirectory(outDir)) IOUtils.mkdir(outDir);
assert manager != null;
String goldOutFile = null, predOutFile = null;
if (outDir != null) {
// If output directory doesn't exist, create it
if (!IOUtils.isDirectory(outDir)) IOUtils.mkdir(outDir);
String outputFilePrefix = outDir+"/" + srlType + "."
String outputFilePrefix = outDir+"/" + srlType + "."
+ manager.defaultParser + "." + new Random().nextInt();
goldOutFile = outputFilePrefix + ".gold";
......@@ -457,19 +487,18 @@ public class Main {
long start = System.currentTimeMillis();
int count = 0;
manager.getModelInfo(Models.Identifier).loadWeightVector();
manager.getModelInfo(Models.Identifier).loadWeightVector();
manager.getModelInfo(Models.Classifier).loadWeightVector();
manager.getModelInfo(Models.Sense).loadWeightVector();
IResetableIterator<TextAnnotation> dataset = SentenceDBHandler.instance.getDataset(testSet);
log.info("All models weights loaded now!");
while (dataset.hasNext()) {
TextAnnotation ta = dataset.next();
if (!ta.hasView(manager.getGoldViewName())) continue;
//ta.addView(new HeadFinderDependencyViewGenerator(manager.defaultParser));
PredicateArgumentView gold = (PredicateArgumentView) ta.getView(manager.getGoldViewName());
SRLILPInference inference = manager.getInference(solver, gold.getPredicates());
......@@ -482,18 +511,15 @@ public class Main {
if (outDir != null) {
writer.printPredicateArgumentView(gold, goldWriter);
writer.printPredicateArgumentView(prediction, predWriter);
}
count++;
if (count % 1000 == 0) {
long end = System.currentTimeMillis();
log.info(count + " sentences done. Took "
+ (end - start) + "ms, F1 so far = "
+ tester.getAverageF1());
log.info(count + " sentences done. Took " + (end - start) + "ms, " +
"F1 so far = " + tester.getAverageF1());
}
}
long end = System.currentTimeMillis();
System.out.println(count + " sentences done. Took " + (end - start) + "ms");
......
......@@ -5,6 +5,7 @@ import edu.illinois.cs.cogcomp.edison.utilities.WordNetManager;
import edu.illinois.cs.cogcomp.srl.core.Models;
import edu.illinois.cs.cogcomp.srl.core.SRLType;
import edu.illinois.cs.cogcomp.srl.data.Dataset;
import org.apache.commons.configuration.ConfigurationException;
import org.apache.commons.configuration.PropertiesConfiguration;
import org.slf4j.Logger;
......@@ -18,25 +19,18 @@ public class SRLProperties {
private static final Logger log = LoggerFactory.getLogger(SRLProperties.class);
private static SRLProperties theInstance;
private PropertiesConfiguration config;
private final String curatorHost;
private final int curatorPort, maxInferenceRounds;
private final String wordNetFile;
private final String wordNetFile;
private SRLProperties(URL url) throws ConfigurationException {
config = new PropertiesConfiguration(url);
curatorHost = config.getString("CuratorHost", "");
curatorPort = config.getInt("CuratorPort", -1);
this.wordNetFile = config.getString("WordNetConfig");
this.wordNetFile = config.getString("WordNetConfig");
if (config.containsKey("LoadWordNetConfigFromClassPath")
&& config.getBoolean("LoadWordNetConfigFromClassPath")) {
WordNetManager.loadConfigAsClasspathResource(true);
}
maxInferenceRounds = config.getInt("MaxInferenceRounds");
}
}
public static void initialize(String configFile) throws Exception {
// first try to load the file from the file system
......@@ -155,19 +149,7 @@ public class SRLProperties {
return Constants.systemVersion;
}
public String getPipelineConfigFile() {
return config.getString("PipelineConfigFile");
}
public String getCuratorHost() {
return curatorHost;
}
public int getCuratorPort() {
return curatorPort;
}
public int getMaxInferenceRounds() {
return maxInferenceRounds;
public String getLearnerConfig() {
return this.config.getString("LearnerConfig");
}
}
package edu.illinois.cs.cogcomp.srl;
import edu.illinois.cs.cogcomp.edison.data.curator.CuratorDataStructureInterface;
import edu.illinois.cs.cogcomp.edison.sentences.Constituent;
import edu.illinois.cs.cogcomp.edison.sentences.PredicateArgumentView;
import edu.illinois.cs.cogcomp.edison.sentences.TextAnnotation;
import edu.illinois.cs.cogcomp.annotation.AnnotatorException;
import edu.illinois.cs.cogcomp.core.datastructures.ViewNames;
import edu.illinois.cs.cogcomp.core.datastructures.textannotation.*;
import edu.illinois.cs.cogcomp.edison.utilities.WordNetManager;
import edu.illinois.cs.cogcomp.infer.ilp.ILPSolverFactory;
import edu.illinois.cs.cogcomp.infer.ilp.ILPSolverFactory.SolverType;
import edu.illinois.cs.cogcomp.srl.core.Models;
import edu.illinois.cs.cogcomp.srl.core.SRLManager;
import edu.illinois.cs.cogcomp.srl.core.SRLType;
import edu.illinois.cs.cogcomp.srl.experiment.TextPreProcessor;
import edu.illinois.cs.cogcomp.srl.inference.ISRLInference;
import edu.illinois.cs.cogcomp.srl.inference.SRLLagrangeInference;
import edu.illinois.cs.cogcomp.thrift.base.Forest;
import edu.illinois.cs.cogcomp.thrift.curator.Record;
import edu.illinois.cs.cogcomp.srl.inference.SRLILPInference;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.ArrayList;
import java.util.List;
public class SemanticRoleLabeler {
public class SemanticRoleLabeler implements Annotator {
private final static Logger log = LoggerFactory.getLogger(SemanticRoleLabeler.class);
public final SRLManager manager;
private static SRLProperties properties;
......@@ -35,14 +34,15 @@ public class SemanticRoleLabeler {
srlType = arguments.length == 1 ? null : arguments[1];
String input;
List<SemanticRoleLabeler> srlLabelers = new ArrayList<SemanticRoleLabeler>();
List<SemanticRoleLabeler> srlLabelers = new ArrayList<>();
try {
if (srlType != null)
srlLabelers.add(new SemanticRoleLabeler(configFile, srlType));
else {
for (SRLType type : SRLType.values()) {
srlType = type.name();
srlLabelers.add(new SemanticRoleLabeler(configFile, srlType));
srlLabelers
.add(new SemanticRoleLabeler(configFile, srlType));
}
}
} catch (Exception e) {
......@@ -54,7 +54,8 @@ public class SemanticRoleLabeler {
do {
System.out.print("Enter text (underscore to quit): ");
input = System.console().readLine().trim();
if (input.equals("_")) return;
if (input.equals("_"))
return;
if (!input.isEmpty()) {
// XXX Assuming that all SRL types require the same views
......@@ -94,7 +95,7 @@ public class SemanticRoleLabeler {
properties = SRLProperties.getInstance();
log.info("Initializing pre-processor");
TextPreProcessor.initialize(configFile, false);
TextPreProcessor.initialize(configFile);
log.info("Creating {} manager", srlType);
manager = Main.getManager(SRLType.valueOf(srlType), false);
......@@ -111,12 +112,7 @@ public class SemanticRoleLabeler {
return properties.getSRLVersion();
}
public String getCuratorName() {
return "illinoisSRL";
}
private void loadModels() throws Exception {
private void loadModels() throws Exception {
for (Models m : Models.values()) {
if (manager.getSRLType() == SRLType.Verb && m == Models.Predicate)
continue;
......@@ -137,17 +133,36 @@ public class SemanticRoleLabeler {
else
predicates = manager.getLearnedPredicateDetector().getPredicates(ta);
if (predicates.isEmpty()) return null;
ISRLInference inference = new SRLLagrangeInference(manager, ta, predicates, true, properties.getMaxInferenceRounds());
if (predicates.isEmpty())
return null;
ILPSolverFactory s = new ILPSolverFactory(SolverType.Gurobi);
ISRLInference inference = new SRLILPInference(s, manager, predicates);
return inference.getOutputView();
}
public Forest getSRLForest(Record record) throws Exception {
TextAnnotation ta = CuratorDataStructureInterface.getTextAnnotationViewsFromRecord("", "", record);
PredicateArgumentView pav = getSRL(ta);
return CuratorDataStructureInterface.convertPredicateArgumentViewToForest(pav);
@Override
public String[] getRequiredViews() {
return TextPreProcessor.requiredViews;
}
@Override
public View getView(TextAnnotation ta) throws AnnotatorException {
try {
return getSRL(ta);
} catch (Exception e) {
e.printStackTrace();
throw new AnnotatorException(e.getMessage());
}
}
@Override
public String getViewName() {
if (manager.getSRLType() == SRLType.Verb) {
return ViewNames.SRL_VERB;
} else if (manager.getSRLType() == SRLType.Nom)
return ViewNames.SRL_NOM;
return null;
}
}
......@@ -22,7 +22,7 @@ class DBHelper {
}
}
private final static Map<String, Connection> connections = new HashMap<String, Connection>();
private final static Map<String, Connection> connections = new HashMap<>();
public static void createTable(String dbFile, String tableName,
String tableDefinition) throws SQLException {
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment