The desire to use ever growing qualitative data sets of user generated content in the engineering design process in a computationally effective manner makes it increasingly necessary to draw representative samples. This work investigated the ability of alternative sampling algorithms to draw samples with conformance to characteristics of the original data set. Sampling methods investigated included: random sampling, interval sampling, fixed-increment (or systematic) sampling method, and stratified sampling. Data collected through the Vehicle Owner’s Questionnaire, a survey administered by the U.S. National Highway Traffic Safety Administration, is used as a case study throughout this paper. The paper demonstrates that existing statistical methods may be used to evaluate goodness of fit for samples drawn from large bodies of qualitative data. Evaluation of goodness of fit not only provides confidence that a sample is representative of the data set from which it is drawn, but also yields valuable real-time feedback during the sampling process. This investigation revealed two interesting and counterintuitive trends in sampling algorithm performance. The first is that larger sample sizes do not necessarily lead to improved goodness of fit. The second is that depending on the details of implementation, data cleansing may degrade performance of data sampling algorithms rather than improving it. This work illustrates the importance of aligning sampling procedures to data structures and validating the conformance of samples to characteristics of the larger data set to avoid drawing erroneous conclusions based on unexpectedly biased samples of data.
Skip Nav Destination
ASME 2017 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
August 6–9, 2017
Cleveland, Ohio, USA
Conference Sponsors:
- Design Engineering Division
- Computers and Information in Engineering Division
ISBN:
978-0-7918-5811-0
PROCEEDINGS PAPER
Evaluating Sampling Methods for Reusing Knowledge From Large and Ill-Structured Qualitative Data Sets
Jacob Nelson,
Jacob Nelson
James Madison University, Harrisonburg, VA
Search for other works by this author on:
G. Austin Marrs,
G. Austin Marrs
James Madison University, Harrisonburg, VA
Search for other works by this author on:
Greg Schmidt,
Greg Schmidt
James Madison University, Harrisonburg, VA
Search for other works by this author on:
Joseph A. Donndelinger,
Joseph A. Donndelinger
Baylor University, Waco, TX
Search for other works by this author on:
Robert L. Nagel
Robert L. Nagel
James Madison University, Harrisonburg, VA
Search for other works by this author on:
Jacob Nelson
James Madison University, Harrisonburg, VA
G. Austin Marrs
James Madison University, Harrisonburg, VA
Greg Schmidt
James Madison University, Harrisonburg, VA
Joseph A. Donndelinger
Baylor University, Waco, TX
Robert L. Nagel
James Madison University, Harrisonburg, VA
Paper No:
DETC2017-67964, V001T02A057; 10 pages
Published Online:
November 3, 2017
Citation
Nelson, J, Marrs, GA, Schmidt, G, Donndelinger, JA, & Nagel, RL. "Evaluating Sampling Methods for Reusing Knowledge From Large and Ill-Structured Qualitative Data Sets." Proceedings of the ASME 2017 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. Volume 1: 37th Computers and Information in Engineering Conference. Cleveland, Ohio, USA. August 6–9, 2017. V001T02A057. ASME. https://doi.org/10.1115/DETC2017-67964
Download citation file:
21
Views
Related Proceedings Papers
Related Articles
Metamodeling Development for Vehicle Frontal Impact
Simulation
J. Mech. Des (September,2005)
Residual Analysis of Autoregressive Models of Terrain Topology
J. Dyn. Sys., Meas., Control (May,2012)
Identification of Vehicle and Collision Impact Parameters From Crash Tests
J. Vib., Acoust., Stress, and Reliab (April,1984)
Related Chapters
Analysis and Evaluation on Statistical Characteristics of VANETs
International Conference on Computer and Automation Engineering, 4th (ICCAE 2012)
A New Speed Sign Recognition Algorithm Based on Statistical Characteristics
International Conference on Mechanical Engineering and Technology (ICMET-London 2011)
Highway Overloaded Freight Vehicle Automatic Detection Networking Governance System
International Conference on Software Technology and Engineering (ICSTE 2012)