PhD defense Koen Deschacht

Weakly supervised methods
for information extraction

PhD defense Koen Deschacht

Supervisors : Prof. MarieFrancine Moens
Prof. Danny De Schreye

1

Information extraction

Detect and classify structures in unstructured
Text
Images / video

Examples

Word sense disambiguation in (WSD)
Semantic role labeling (SRL)
Visual entity detection

3


Text
Images / video

WSD: Determine meaning of a word

He kicked the ball in the goal.
At a formal ball attendees wear evening attire.
He stood on the balls of his feet.

4


Text
Images / video

SRL: Who is doing what, where ?

John broke the window with a stone.
John broke the window with little doubt.
The window broke.

5


Text
Images / video

Who/what is present in the image?

Hillary Clinton
Bill Clinton

6


Text
Images / video

Common approach:

Word sense disambiguation
Semantic role labeling
Visual entity detection

and many, many more...
7


Text
Images / video

Common approach:

Semantic role labeling Supervised machine
Visual entity detection learning methods

and many, many more...
8

Supervised machine learning

Statistical methods that are trained on many
annotated examples
SRL : 113.000 verbs
WSD : 250.000 words
Learn soft rules from the data

9

Example: WSD

Ball = round object
1. He kicked the ball in the goal.
2. Ricardo blocks the ball as Benzema tries to shoot.
3. Patrice Evra almost kicked the ball in his own goal.
…

Ball = formal dance
1. Obama and his wife danced at the inaugural ball.
2. Casey Gillis was dressed in a white ball gown.
3. Dance Unlimited's Spring Ball takes place tomorrow.
...

10

Example: WSD

Soft rules :
If “kicked”
If “goal”    ball = “round object”
...

If “dance”
If “gown”    ball = “formal dance”
...

Machine learning methods can combine many
complimentary and/or contradicting rules

11

Supervised machine learning

Current stateoftheart machine learning
methods

 Manually annotated corpus
 Machine learning method
needed for every new task,
often independent of task
language or domain
 Successful for many tasks
 Features need to be
 Flexible, fast development
manually engineered
for new tasks
 High variation of language
 Only some expert
limits performance even
knowledge needed
with large training corpora

12

Solution: use unlabeled data

Unlabeled data: cheap, available for many
domains and languages
Semisupervised learning
Optimize single function that incorporates labeled
and unlabeled data
Violation of assumptions cause deteriorating results
when adding more unlabeled data
Unsupervised learning
First learn model on unlabeled data, then use model
in supervised machine learning method
13

Distributional hypothesis

It is possible to determine the meaning of a
word by investigating its occurrence in a corpus.

Example:

What does “pulque” mean?

14

Distributional hypothesis

It is possible to determine the meaning of a
word by investigating its occurrence in a corpus.

Example:
“It takes a maguey plant twelve years before it is mature
enough to produce the sap for pulque.”
“The consumption of pulque peaked in the 1800’s.”
“After the Conquest, pulque lost its sacred character, and
both indigenous and Spanish people began to drink it.”
“In this way, the making of pulque passed from being a
homemade brew to one commercially produced.”
15

Latent words language model

Directed Bayesian model that models likely
synonyms of a word, depending on context.
Automatically learns synonyms and related
words.

16


We hope there is an increasing need for reform

Original sentence

17



I believe this was the enormous chance of restructuring
They think that 's no important demand to change
You feel it are some increased potential that peace
... ... ... ... ... ... ... ... ...

Automatically learned synonyms

18



I believe this was the enormous chance of restructuring
They think that 's no important demand to change
You feel it are some increased potential that peace
... ... ... ... ... ... ... ... ...

Time to compute all possible combinations:
                                ~ very, very long...
Approximate: consider only most likely
                                          ~ pretty fast
19

LWLM: quality

Measure how well the model can predict new,
previously unseen texts in terms of perplexity

Model Reuters APNews EnWiki
ADKN 114.96 134.42 161.41
IBM 108.38 125.65 149.21
LWLM 108.78 124.57 151.98
int. LWLM 96.45 112.81 138.03

LWLM outperforms other language models

20

LWLM for information extraction

standard + cluster features + hidden words
66.32% 66.97% 67.61%

Semantic role labeling
90%

80%

70%

60%
standard
50% + clusters
40% + hidden words
5% 20% 50% 100%

Latent words : help with underspecification and
ambiguity
21

Automatic annotation of images & video

Texts describe content of images
Extract information in structured format
Entities
Attributes
Actions
Locations

22

Automatic annotation of images & video

Texts describe content of images
Extract information in structured format
Entities
Attributes
Actions
Locations

23

Annotation of entities in images

Extract entities from descriptive news text that
are present in the image.
Former President Bill Clinton, left, looks on as an honor guard
folds the U.S. flag during a graveside service for Lloyd Bentsen
in Houston, May 30, 2006. Bentsen, a former senator and
former treasury secretary, died last week at the age of 85.

 service
 Lloyd Bentsen
 Bill Clinton
 Houston
 guard  age
 flag  ...

24


Assumption:
Entity is present in image if important in
descriptive text and possible to perceive visually.
Salience:
Dependent on text
Combines analysis of discourse and syntax
Visualness:
Independent of text
Extracted from semantic database

25


Former President Bill Clinton, left, looks on as an honor guard
folds the U.S. flag during a graveside service for Lloyd Bentsen
in Houston, May 30, 2006. Bentsen, a former senator and
former treasury secretary, died last week at the age of 85.

 Bill Clinton  service
 guard
 Lloyd Bentsen
 Houston
 flag  age
 ...

26

Salience

Is the entity important in descriptive text?
Discourse model
Important entities are referred to by other entities
and terms.
Graph models entities, coreferents and other terms
Eigenvectors find most important entities
Syntactic model
Important entities appear high in parse tree
Important entities have many children in tree

27

Visualness
Can the entity be perceived visually?
Similarity measure on entities in WordNet
s(“car”,“truck”) = 0.88 s(“thought”,“house”) = 0.23
s(“car”,“horse”) = 0.38 s(“house”,“building”) = 0.91
s(“horse”, “cow”) = 0.79 s(“car”, “house”) = 0.40

Visual seeds “person”, “vehicle” , “animal”, ...

Nonvisual seeds “thought”, “power”, “air”, …

Visualness:
combine similarity measure and seeds
“entities close to visual seeds will be visual”
28

Annotation of entities: Results

Appearance model : combine visualness and
salience

All entities + visualness + salience + salience + visualness
26.66% 62.78% 59.56% 69.39%

Appearance model dramatically increases
accuracy!

29

Scene location annotation

Annotate location of every scene in sitcom series
Input : video and transcript

Shot of Buffy opening the
refrigerator and taking
out a carton of milk.
Buffy sniffs the milk and
puts it on the counter. In
the background we see
Dawn opening a cabinet
to get out a box of cereal.
Buffy turns away.
30

Scene location annotation

Annotate location of every scene in sitcom series

Dawn's room the kitchen

the living room the street
31

Scene segmentation

Segment transcript and video in scenes
Scene cut classifier in text
Shot cut detector in video

Transcript
Shot of Buffy opening the refrigerator and taking out a carton of milk.
Scene cuts

Buffy sniffs the milk and puts it on the counter. In the background we
see Joyce drinking coffee and Dawn opening a cabinet to get out a box
of cereal. ...
Buffy & Riley move into the living room. They sit on the sofa.
Buffy nods in resignation. Smooch. Riley gets up.
Cut to a shot of a bright red convertible driving down the street. Giles
is at the wheel, Buffy beside him and Dawn in the back. Classical
music plays on the radio.
....
32

Scene segmentation


33

Scene segmentation

Shot of Buffy opening the
refrigerator and taking out a
carton of milk.
...
Buffy & Riley move into the
living room. They sit on the
sofa.
…
Cut to a shot of a bright red
convertible driving down the
street.
....
34

Location detection and propagation

Detect locations in text
Shot of Buffy opening the refrigerator and taking out a carton of
milk. ...
Buffy & Riley move into the living room. They sit on the sofa.
Cut to a shot of a bright red convertible driving down the street.

Propagate locations to other scenes
Latent Dirichlet allocation: learn correlation
locations & other objects (“refrigerator”→“kitchen”)
Visual reweighting: visually similar scenes should
be in the same location
35

Location annotation results

Scene cut classifier precision recall f1measure
91.71% 97.48% 85.16%

Location detector precision recall f1measure
68.75% 75.54% 71.98%

Location annotation

episode only text text + LDA text + LDA + vision
2 54.72% 58.89% 57.39%
3 60.11% 65.87% 68.57%

36

Contributions 1/2

The latent words language model
Best ngram language model
Unsupervised learning of word similarities
Unsupervised disambiguation of words
Using the latent words for WSD
Best WSD system
Using the latent words for SRL
Improvement of soa classifier

37

Contributions 2/2

Image annotation :
First full analysis of entities in descriptive texts
Visualness: capture knowledge from WordNet
Salience: capture knowledge from syntactic
properties
Location annotation :
Automatic annotation of locations from transcripts
Including new locations
Including locations that are not explicitly mentioned

38

Thank you!

Questions?

Comments?

39

PhD defense Koen Deschacht

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (13)

Similar to PhD defense Koen Deschacht

Similar to PhD defense Koen Deschacht (20)

Recently uploaded

Recently uploaded (20)

PhD defense Koen Deschacht