Abstract

Recovering a system’s underlying structure from its historical records (also called structure mining) is essential to making valid inferences about that system’s behavior. For example, making reliable predictions about system failures based on maintenance work order data requires determining how concepts described within the work order are related. Obtaining such structural information is challenging, requiring system understanding, synthesis, and representation design. This is often either too difficult or too time consuming to produce. Consequently, a common approach to quickly elicit tacit structural knowledge from experts is to gather uncontrolled keywords as record labels—i.e., “tags.” One can then map those tags to concepts within the structure and quantitatively infer relationships between them. Existing models of tag similarity tend to either depend on correlation strength (e.g., overall co-occurrence frequencies) or on conditional strength (e.g., tag sequence probabilities). A key difficulty in applying either model is understanding under what conditions one is better than the other for overall structure recovery. In this paper, we investigate the core assumptions and implications of these two classes of similarity measures on structure recovery tasks. Then, using lessons from this characterization, we borrow from recent psychology literature on semantic fluency tasks to construct a tag similarity measure that emulates how humans recall tags from memory. We show through empirical testing that this method combines strengths of both common modeling paradigms. We also demonstrate its potential as a preprocessor for structure mining tasks via a case study in semi-supervised learning on real excavator maintenance work orders.

References

1.
ISO/TS 15926-8:2011
,
2011
, “Industrial Automation Systems and Integration—Integration of Life-Cycle Data for Process Plants Including Oil and Gas Production Facilities—Part 8: Implementation Methods for the Integration of Distributed Systems: Web Ontology Language (OWL) Implementation,”
Standard, International Organization for Standardization
,
Geneva, CH
.
2.
Batres
,
R.
,
West
,
M.
,
Leal
,
D.
,
Price
,
D.
,
Masaki
,
K.
,
Shimada
,
Y.
,
Fuchino
,
T.
, and
Naka
,
Y.
,
2007
, “
An Upper Ontology Based on ISO 15926
,”
Comput. Chem. Eng.
,
31
(
5–6
), pp.
519
534
. 10.1016/j.compchemeng.2006.07.004
3.
Klüwer
,
J. W.
,
Skjæveland
,
M. G.
, and
Valen-Sendstad
,
M.
,
2008
, “
ISO 15926 Templates and the Semantic Web
,”
Position Paper for W3C Workshop on Semantic Web in Energy Industries; Part I: Oil and Gas
,
Houston, TX
,
Dec. 9–10
.
4.
Eppinger
,
S. D.
, and
Browning
,
T. R.
,
2012
,
Design Structure Matrix Methods and Applications
,
MIT Press
,
Cambridge, MA
.
5.
Browning
,
T. R.
,
2016
, “
Design Structure Matrix Extensions and Innovations: A Survey and New Opportunities
,”
IEEE Trans. Eng. Manage.
,
63
(
1
), pp.
27
52
. 10.1109/TEM.2015.2491283
6.
Ellinas
,
C.
,
Allan
,
N.
,
Durugbo
,
C.
, and
Johansson
,
A.
,
2015
, “
How Robust Is Your Project? From Local Failures to Global Catastrophes: A Complex Networks Approach to Project Systemic Risk
,”
PLoS One
,
10
(
11
), p.
e0142469
. 10.1371/journal.pone.0142469
7.
Hodkiewicz
,
M.
, and
Ho
,
M. T.-W.
,
2016
, “
Cleaning Historical Maintenance Work Order Data for Reliability Analysis
,”
J. Qual. Maint. Eng.
,
22
(
2
), pp.
146
163
. 10.1108/JQME-04-2015-0013
8.
Ho
,
M.
,
2015
, “A Shared Reliability Database for Mobile Mining Equipment,” Ph.D. thesis,
University of Western Australia
,
Crawley, Western Australia
.
9.
Sexton
,
T.
,
Hodkiewicz
,
M.
,
Brundage
,
M. P.
, and
Smoker
,
T.
,
2018
, “
Benchmarking for Keyword Extraction Methodologies in Maintenance Work Orders
,”
PHM Society Conference
,
Philadelphia, PA
,
Sept. 24
, Vol.
10
.
10.
Kumar
,
N.
,
Kumar
,
M.
, and
Singh
,
M.
,
2016
, “
Automated Ontology Generation From a Plain Text Using Statistical and NLP Techniques
,”
Int. J. Syst. Assur. Eng. Manage.
,
7
(
1
), pp.
282
293
. 10.1007/s13198-015-0403-1
11.
Miller
,
G. A.
,
1998
,
WordNet: An Electronic Lexical Database
,
MIT Press
,
Cambridge, MA
.
12.
Speer
,
R.
,
Chin
,
J.
, and
Havasi
,
C.
,
2017
, “
Conceptnet 5.5: An Open Multilingual Graph of General Knowledge
,”
Thirty-First AAAI Conference on Artificial Intelligence
,
San Francisco, CA
,
Feb. 4–9
.
13.
Krishna
,
R.
,
Zhu
,
Y.
,
Groth
,
O.
,
Johnson
,
J.
,
Hata
,
K.
,
Kravitz
,
J.
,
Chen
,
S.
,
Kalantidis
,
Y.
,
Li
,
L.-J.
,
Shamma
,
D. A.
,
Bernstein
,
M. S.
, and
Li
,
F.-F.
,
2017
, “
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
,”
Int. J. Comput. Vision
,
123
(
1
), pp.
32
73
. 10.1007/s11263-016-0981-7
14.
Pantförder
,
D.
,
Schaupp
,
J.
, and
Vogel-Heuser
,
B.
,
2017
, “
Making Implicit Knowledge Explicit–Acquisition of Plant Staff’s Mental Models as a Basis for Developing a Decision Support System
,”
International Conference on Human-Computer Interaction
,
Vancouver, CA
,
July 9
,
Springer
,
New York
, pp.
358
365
.
15.
Hadzic
,
F.
,
Tan
,
H.
, and
Dillon
,
T. S.
,
2010
,
Mining of Data with Complex Structures
, Vol.
333
,
Springer
,
New York
.
16.
Bengio
,
Y.
,
Courville
,
A.
, and
Vincent
,
P.
,
2013
, “
Representation Learning: A Review and New Perspectives
,”
IEEE Trans. Pattern Anal. Mach. Intell.
,
35
(
8
), pp.
1798
1828
. 10.1109/TPAMI.2013.50
17.
Strohmaier
,
M.
,
Körner
,
C.
, and
Kern
,
R.
,
2012
, “
Understanding Why Users Tag: A Survey of Tagging Motivation Literature and Results From an Empirical Study
,”
Web Semant. Sci., Serv. Agents World Wide Web
,
17
(
Knowledge Technologies
), pp.
1
11
. 10.1016/j.websem.2012.09.003
18.
Macgregor
,
G.
, and
McCulloch
,
E.
,
2006
, “
Collaborative Tagging as a Knowledge Organisation and Resource Discovery Tool
,”
Lib. Rev.
,
55
(
5
), pp.
291
300
. 10.1108/00242530610667558
19.
Huang
,
Y.-M.
,
Huang
,
Y.-M.
,
Liu
,
C.-H.
, and
Tsai
,
C.-C.
,
2013
, “
Applying Social Tagging to Manage Cognitive Load in a Web 2.0 Self-Learning Environment
,”
Interac. Learn. Environ.
,
21
(
3
), pp.
273
289
. 10.1080/10494820.2011.555839
20.
Sexton
,
T.
,
Brundage
,
M. P.
,
Hoffman
,
M.
, and
Morris
,
K. C.
,
2017
, “
Hybrid Datafication of Maintenance Logs From AI-Assisted Human Tags
,”
2017 IEEE International Conference on Big Data (Big Data)
,
Boston, MA
,
Dec. 11
, IEEE, New York, pp. 1769–1777.
21.
Guimerà
,
R.
, and
Sales-Pardo
,
M.
,
2009
, “
Missing and Spurious Interactions and the Reconstruction of Complex Networks
,”
Proc. Natl. Acad. Sci.
,
106
(
52
), pp.
22073
22078
. 10.1073/pnas.0908366106
22.
Gomez-Rodriguez
,
M.
,
Leskovec
,
J.
, and
Krause
,
A.
,
2012
, “
Inferring Networks of Diffusion and Influence
,”
ACM Trans. Knowl. Discovery Data (TKDD)
,
5
(
4
), p.
21
.
23.
Linderman
,
S.
, and
Adams
,
R.
,
2014
, “
Discovering Latent Network Structure in Point Process Data
,”
International Conference on Machine Learning
,
Beijing, China
,
June 21–26
, pp.
1413
1421
.
24.
De Paula
,
Á.
,
Rasul
,
I.
, and
Souza
,
P.
,
2018
, “
Recovering Social Networks From Panel Data: Identification, Simulations and an Application to Tax Competition
,”
CEPR Discussion Paper No. DP12792
. https://ssrn.com/abstract=3143442
25.
Raissi
,
M.
,
Perdikaris
,
P.
, and
Karniadakis
,
G. E.
,
2017
, “
Machine Learning of Linear Differential Equations Using Gaussian Processes
,”
J. Comput. Phys.
,
348
(
C
), pp.
683
693
. 10.1016/j.jcp.2017.07.050
26.
Chen
,
W.
,
Fuge
,
M.
, and
Chazan
,
J.
,
2017
, “
Design Manifolds Capture the Intrinsic Complexity and Dimension of Design Spaces
,”
ASME J. Mech. Des.
,
139
(
5
), p.
051102
. 10.1115/1.4036134
27.
Heymann
,
P.
, and
Garcia-Molina
,
H.
,
2006
, “
Collaborative Creation of Communal Hierarchical Taxonomies in Social Tagging Systems
,”
Stanford University
,
Stanford
,
Technical Report
.
28.
Gerlach
,
M.
,
Peixoto
,
T. P.
, and
Altmann
,
E. G.
,
2018
, “
A Network Approach to Topic Models
,”
Sci. Adv.
,
4
(
7
), p.
eaaq1360
. 10.1126/sciadv.aaq1360
29.
Nickel
,
M.
, and
Kiela
,
D.
,
2017
, “
Poincaré embeddings for Learning Hierarchical Representations
,”
Advances in Neural Information Processing Systems
,
Long Beach, CA
,
Dec. 4–9
, pp.
6338
6347
.
30.
Nickel
,
M.
, and
Kiela
,
D.
,
2018
, “
Learning Continuous Hierarchies in the Lorentz Model of Hyperbolic Geometry
,” arXiv:1806.03417.
31.
Robertson
,
S.
,
2004
, “
Understanding Inverse Document Frequency: On Theoretical Arguments for IDF
,”
J. Doc.
,
60
(
5
), pp.
503
520
. 10.1108/00220410410560582
32.
Steyvers
,
M.
, and
Griffiths
,
T.
,
2007
, “
Probabilistic Topic Models
,”
Handb. Latent Semant. Anal.
,
427
(
7
), pp.
424
440
.
33.
Blei
,
D. M.
,
Griffiths
,
T. L.
, and
Jordan
,
M. I.
,
2010
, “
The Nested Chinese Restaurant Process and Bayesian Nonparametric Inference of Topic Hierarchies
,”
J. ACM (JACM)
,
57
(
2
), p.
7
.
34.
Mikolov
,
T.
,
Chen
,
K.
,
Corrado
,
G.
, and
Dean
,
J.
,
2013
, “
Efficient Estimation of Word Representations in Vector Space
,”
arXiv:1301.3781
.
35.
Pennington
,
J.
,
Socher
,
R.
, and
Manning
,
C.
,
2014
, “
Glove: Global Vectors for Word Representation
,”
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
,
Doha, Qatar
,
Oct. 25–29
, pp.
1532
1543
.
36.
Vander Wal
,
T.
,
2007
,
Folksonomy
, http://vanderwal.net/folksonomy.html, Accessed May 5, 2019.
37.
Specia
,
L.
, and
Motta
,
E.
,
2007
, “
Integrating Folksonomies With the Semantic Web
,”
European Semantic Web Conference
,
Innsbruck, Austria
,
June 3
,
Springer
,
New York
, pp.
624
639
.
38.
Mousselly-Sergieh
,
H.
,
Egyed-Zsigmond
,
E.
,
Gianini
,
G.
,
Döller
,
M.
,
Kosch
,
H.
, and
Pinon
,
J.-M.
,
2013
, “
Tag Similarity in Folksonomies
,”
INFORSID
, Vol.
29
, pp.
319
334
.
39.
Henschel
,
A.
,
Woon
,
W. L.
,
Wachter
,
T.
, and
Madnick
,
S.
,
2009
, “
Comparison of Generality Based Algorithm Variants for Automatic Taxonomy Generation
,”
International Conference on Innovations in Information Technology
,
Al Ain
,
Dec. 15
,
New York
, pp.
160
164
.
40.
Chang
,
J.
,
Gerrish
,
S.
,
Wang
,
C.
,
Boyd-Graber
,
J. L.
, and
Blei
,
D. M.
,
2009
, “
Reading Tea Leaves: How Humans Interpret Topic Models
,”
Advances in Neural Information Processing Systems
,
Vancouver, BC, Canada
,
Dec. 6–9
, pp.
288
296
.
41.
Lv
,
Y.
, and
Zhai
,
C.
,
2009
, “
Positional Language Models for Information Retrieval
,”
Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval
,
Boston, MA
,
July 19–23
,
ACM
, pp.
299
306
.
42.
Bergamaschi
,
S.
,
Guerra
,
F.
,
Rota
,
S.
, and
Velegrakis
,
Y.
,
2011
, “
A Hidden Markov Model Approach to Keyword-Based Search Over Relational Databases
,”
International Conference on Conceptual Modeling
,
Brussels, Belgium
,
Oct. 31
, Springer, New York, pp. 411–420.
43.
Mikolov
,
T.
,
Karafiát
,
M.
,
Burget
,
L.
,
Černockỳ
,
J.
, and
Khudanpur
,
S.
,
2010
, “
Recurrent Neural Network Based Language Model
,”
Eleventh Annual Conference of the International Speech Communication Association
,
Makuhari, Chiba, Japan
,
Sept. 26–30
.
44.
Jun
,
K.-S.
,
Zhu
,
X.
,
Rogers
,
T. T.
,
Yang
,
Z.
, and
Yuan
,
M.
,
2015
, “
Human Memory Search as Initial-Visit Emitting Random Walk
,”
Advances in Neural Information Processing Systems
,
Montreal, Canada
,
Dec. 7–12
, pp.
1072
1080
.
45.
Hills
,
T. T.
,
Todd
,
P. M.
, and
Jones
,
M. N.
,
2015
, “
Foraging in Semantic Fields: How We Search Through Memory
,”
Top. Cognit. Sci.
,
7
(
3
), pp.
513
534
. 10.1111/tops.12151
46.
Schvaneveldt
,
R. W.
,
Durso
,
F. T.
, and
Dearholt
,
D. W.
,
1989
, “
Network Structures in Proximity Data
,”
Psychology of Learning and Motivation
, Vol.
24
,
Academic Press
,
New York
, pp.
249
284
.
47.
Haley
,
B. M.
,
Dong
,
A.
, and
Tumer
,
I. Y.
,
2016
, “
A Comparison of Network-Based Metrics of Behavioral Degradation in Complex Engineered Systems
,”
ASME J. Mech. Des.
,
138
(
12
), p.
121405
. 10.1115/1.4034402
48.
Doyle
,
P. G.
, and
Snell
,
J. L.
,
2000
, “
Random Walks and Electric Networks
,”
arXiv:math/0001057
.
49.
Zemla
,
J. C.
, and
Austerweil
,
J. L.
,
2018
, “
Estimating Semantic Networks of Groups and Individuals From Fluency Data
,”
Comput. Brain Behav.
,
1
(
1
), pp.
36
58
. 10.1007/s42113-018-0003-7
50.
Walsh
,
H. S.
,
Dong
,
A.
, and
Tumer
,
I. Y.
,
2019
, “
An Analysis of Modularity as a Design Rule Using Network Theory
,”
ASME J. Mech. Des.
,
141
(
3
), p.
031102
. 10.1115/1.4042341
51.
Saito
,
T.
, and
Rehmsmeier
,
M.
,
2015
, “
The Precision-Recall Plot Is More Informative Than the Roc Plot When Evaluating Binary Classifiers on Imbalanced Datasets
,”
PLoS One
,
10
(
3
), p.
e0118432
.
An Optional Note
.
52.
Paszke
,
A.
,
Gross
,
S.
,
Chintala
,
S.
,
Chanan
,
G.
,
Yang
,
E.
,
DeVito
,
Z.
,
Lin
,
Z.
,
Desmaison
,
A.
,
Antiga
,
L.
, and
Lerer
,
A.
,
2017
, “
Automatic Differentiation in Pytorch
,”
NIPS 2017 Workshop Autodiff
,
Long Beach, CA
,
Dec. 9
.
53.
Schreiber
,
J.
,
2018
, “
Pomegranate: Fast and Flexible Probabilistic Modeling in Python
,”
J. Mach. Learn. Res.
,
18
(
164
), pp.
1
6
.
54.
Watts
,
D. J.
, and
Strogatz
,
S. H.
,
1998
, “
Collective Dynamics of ‘Small-World’ Networks
,”
Nature
,
393
(
6684
), p.
440
. 10.1038/30918
55.
Hodkiewicz
,
M. R.
,
Batsioudis
,
Z.
,
Radomiljac
,
T.
, and
Ho
,
M. T.
,
2017
, “
Why Autonomous Assets Are Good for Reliability—The Impact of ‘Operator-Related Component’ Failures on Heavy Mobile Equipment Reliability
,”
Annual Conference of the Prognostics and Health Management Society 2017
,
St. Petersburg, FL
,
Oct. 2–5
.
56.
Sexton
,
T. B.
, and
Brundage
,
M. P.
,
2019
, “
Nestor: A Tool for Natural Language Annotation of Short Texts
,”
J. Res. NIST
,
124
,
Article No. 124029
. https://doi.org/10.6028/jres.124.029
57.
Zhou
,
D.
,
Bousquet
,
O.
,
Lal
,
T. N.
,
Weston
,
J.
, and
Schölkopf
,
B.
,
2004
, “
Learning With Local and Global Consistency
,”
Advances in Neural Information Processing Systems
,
Vancouver, Canada
,
Dec. 13–18
, pp.
321
328
.
58.
Anderson
,
J. R.
,
2013
,
The Architecture of Cognition
,
Psychology Press
,
London
.
59.
Shrager
,
J.
,
Hogg
,
T.
, and
Huberman
,
B. A.
,
1987
, “
Observation of Phase Transitions in Spreading Activation Networks
,”
Science
,
236
(
4805
), pp.
1092
1094
. 10.1126/science.236.4805.1092
60.
Brent
,
R. P.
,
1971
, “
An Algorithm With Guaranteed Convergence for Finding a Zero of a Function
,”
Comput. J.
,
14
(
4
), pp.
422
425
. 10.1093/comjnl/14.4.422
61.
Brundage
,
M. P.
,
Sexton
,
T.
,
Hodkiewicz
,
M.
,
Morris
,
K.
,
Arinez
,
J.
,
Ameri
,
F.
,
Ni
,
J.
, and
Xiao
,
G.
,
2019
, “
Where Do We Start? Guidance for Technology Implementation in Maintenance Management for Manufacturing
,”
ASME J. Manuf. Sci. Eng.
,
141
(
9
), pp.
1
24
. 10.1115/1.4044105
62.
Ivanov
,
A. O.
, and
Tuzhilin
,
A. A.
,
1994
,
Minimal Networks: The Steiner Problem and Its Generalizations
,
CRC Press
,
Boca Raton, FL
.
63.
Vilnis
,
L.
,
Li
,
X.
,
Murty
,
S.
, and
McCallum
,
A.
,
2018
, “
Probabilistic Embedding of Knowledge Graphs With Box Lattice Measures
,” arXiv:1805.06627.
64.
Park
,
B.
,
Kim
,
D.-S.
, and
Park
,
H.-J.
,
2014
, “
Graph Independent Component Analysis Reveals Repertoires of Intrinsic Network Components in the Human Brain
,”
PLoS One
,
9
(
1
), p.
e82873
. 10.1371/journal.pone.0082873
You do not currently have access to this content.