University of Minnesota
University Relations

Minnesota Supercomputing Institute

Log out of MyMSI

Research Abstracts Online
January 2008 - March 2009

University of Minnesota Twin Cities
Office of the Vice President for Research
Minnesota Population Center

PI: Steven Ruggles
Co-PI: Ron Goeken

Population Database of the United States in 1880

These researchers are attempting to link records from their complete-count database from the 1880 census (approximately 50 million records) to one-percent samples from 1850, 1860, 1870, 1900, 1910, and 1920. They use record linkage software (FEBRL—freely extensible biomedical record linkage) developed at the Australian National University, which they have modified for the project. Each pair of datasets to be linked is divided on the basis of birthplace, race, and gender. Additionally, datasets are constructed for married couples. The resulting "demographic perspectives” are then processed independently, e.g., white males are processed together. Relatively static or predictably dynamic features are selected for comparison, including names (relatively static over the lifespan for certain subsets of the population) and ages (relatively predictably dynamic). Distance functions indicate similarity between pairs of feature values, vectors of which constitute test data for a Support Vector Machine (SVM). The SVM is trained on a set of data containing example links and non-links, after which the SVM is used to classify the test data.

Group Members

Lap Huynh, Staff
Tom Lenius, Staff
Rebecca Vick, Staff