January 2009 - March 2010

University of Minnesota Twin Cities
Office of the Vice President for Research
Minnesota Population Center

PI: Steven Ruggles
Co-PI: Ron Goeken

Population Database of the United States in 1880

This project entails linking records from a complete-count database from the 1880 census (approximately 50 million records) to one-percent samples from the 1850, 1860, 1870, 1900, 1910, 1920, and 1930 censuses. The researchers use record linkage software (FEBRL—freely extensible biomedical record linkage) developed at the Australian National University to generate distance functions for selected features. Each pair of datasets to be linked is divided on the basis of birthplace, race, and gender. Additionally, datasets are constructed for married couples. The resulting "demographic perspectives” are then processed independently, e.g., white males are processed together. Relatively static or predictably dynamic features are selected for comparison, including names (relatively static over the lifespan for certain subsets of the population) and ages (relatively predictably dynamic). Distance functions indicate similarity between pairs of feature values, vectors of which constitute test data for a Support Vector Machine (SVM).

Group Members

Lap Huynh, Staff
Tom Lenius, Staff
Rebecca Vick, Staff