University of Minnesota
University Relations

Minnesota Supercomputing Institute

Log out of MyMSI

Research Abstracts Online
January 2010 - March 2011

Main TOC

University of Minnesota Twin Cities
Office of the Vice President for Research
Minnesota Population Center

PI: Steven Ruggles

Population Database of the United States in 1880

This project entails linking records from a complete-count database from the 1880 census (approximately 50 million records) to one-percent samples from the 1850, 1860, 1870, 1900, 1910, 1920, and 1930 censuses. The researchers use record linkage software (FEBRL—freely extensible biomedical record linkage) developed at the Australian National University to generate distance functions for selected features. Each pair of datasets to be linked is divided on the basis of birthplace, race, and gender. Additionally, datasets are constructed for married couples. The resulting "demographic perspectives” are then processed independently, e.g., white males are processed together. Relatively static or predictably dynamic features are selected for comparison, including names (relatively static over the lifespan for certain subsets of the population) and ages (relatively predictably dynamic). Distance functions indicate similarity between pairs of feature values, vectors of which constitute test data for a Support Vector Machine (SVM). The group uses MSI resources to generate distance functions or assign previously generated distance functions to pair of records (potential links).

Group Members

Ron Goeken, Co-Principal Investigator
Lap Huynh, Staff
Tom Lynch, Staff
Rebecca Vick, Staff