You are here
Application of Informatics to Transcription of Ancient Papyri
While computers can do many things, there are still a few areas in which humans excel such as the discriminatory power of the eye and the natural human ability to quickly classify objects. The visual ability of recognizing patterns is at the core of the Zooniverse (https://www.zooniverse.org/) citizen science project that Professor Lucy Fortson (School of Physics and Astronomy, College of Science and Engineering) has been involved with. It started with Galaxy Zoo in 2007 by simply asking the general public to help classify about a million scientific images of galaxies and since has grown to over 25 projects enlisting the help of the public to identify whales, lions, and even planets outside our solar system.
As part of an interdisciplinary team, MSI staff have been working with Professor Fortson and her Humanities colleagues at the University of Minnesota and Oxford University (UK) to help transcribe a collection of ancient papyri. The papyri are part of the Oxyrhynchus collection maintained by Oxford University and composed of over 500,000 fragments dating from the period 150 BCE to 650 CE and excavated from the ancient trash heap of the Egyptian town called Oxyrhynchus. Contributors to the Ancient Lives (http://www.ancientlives.org/) citizen science project, members of the general public, are asked to help transcribe the contents of these individual papyri. No ancient or foreign language skills are required as the project relies solely on visual pattern recognition. Volunteers are simply asked to match characters on the papyrus to corresponding characters on an electronic virtual keyboard by first clicking the letter on the papyrus image and then clicking the corresponding Greek letter. As every single papyrus will be transcribed by many different users, a consensus will emerge from the many transcriptions. As of November 2014, nearly ten million marks have been made on over 150,000 fragments by about a million volunteers worldwide.
This wealth of clicks needs to be turned into a data product useful to the Humanities researchers through the development of a data processing pipeline. This is where Professor Fortson’s background in astrophysics and the MSI team come in. While there are many steps in the pipeline, one of the most critical is the consensus algorithm. Applying kernel density estimation (KDE) methods to the volunteers’ contributed transcription data, MSI staff developed a workflow that converts clicks into computationally deducted consensus sequences, or text strings, and thus quickly enabled the transformation of physical documents into computationally searchable data.
To enable the organization of these data sets, MSI has also developed an editorial web tool (http://papyrus.msi.umn.edu/) to support the curation and metadata annotation efforts of these data sets by scholars of ancient texts. In a final step MSI staff and collaborators at Middle Tennessee State University are applying bioinformatics tools to identify words or text strings and similarities between papyri (e.g. copies of known texts).
With a 2013 award from the National Endowment for the Humanities, the team, now led by Drs. Philip Sellew and Nita Krevans, University of Minnesota professors in Classical and Near Eastern Studies, is applying a similar strategy to the transcription of ancient Coptic papyri.
A new initiative, Zooniverse@UMN, has recently been funded* to support University of Minnesota-affiliated projects. This effort is currently soliciting proposals for text-based projects that would benefit from hundreds of thousands of online volunteers transcribing or metadata tagging a digitally imaged collection. Researchers can download the Request for Proposals on the UM Zoomanities webpage. The proposal due date window is November 24 - December 15, 2014.
Publications by these researchers include:
Williams, A.C., Wallin, J.F., Yu. H, Carroll, H.D., Lamblin., A-F., Fortson, L., Obbink, D., Lintott, C.J. & Brusuelas, J.H. (2014). A Computational Pipeline For Crowdsourced Transcriptions of Ancient Greek Papyrus Fragments. (To Appear In) Proceedings of the 2nd Workshop on Big Humanities Data.
Williams, A.C., Carroll, H.D., Wallin, J.F., Brusuelas, J., Fortson, L., Lamblin., A-F., & Yu, H. (2014). Identification of Ancient Greek Papyrus Fragments Using Genetic Sequence Alignment Algorithms. (To Appear In) Proceedings of the 1st Workshop on Digital Humanities and e-Science.
*Funding for Zooniverse@UMN is provided by the Office of the Vice President for Research, the University Libraries, the Colleges of Biological Sciences, Liberal Arts, and Science and Engineering, and the University of Minnesota Informatics Institute.
Figure descriptions: Left: a fragment from the Oxyrhynchus papyri. Right: an example of a transcribed fragment plotted on the image of the original fragment. Yellow characters are the consensus characters for the volunteers who transcribed the fragment, while the red characters are the transcription of a Greek expert. The expert characters have been shifted down a bit to provide better readability. All users’ transcriptions for the fragment are also kept in a text file for Greek scholars to review.
posted on November 26, 2014
photo credit: Bethany A. Stahl
Animals that move from surface habitats into caves exhibit evolutionally related changes in their new environments. Perhaps the most dramatic is eye loss, but there are also other changes, such as skin pigmentation and sleeping patterns. Researchers believe that studying how these changes occur and the genes involved could provide insights into some human conditions, such as degenerative eye diseases.
Assistant Professor Suzanne McGaugh, an MSI Principal Investigator from the Department of Ecology, Evolution, and Behavior (College of Biological Sciences) was the lead researcher on a recent paper that disclosed the first de novo genome assembly for the cavefish Astyanax mexicanus, the Mexican tetra fish. This discovery allows researchers to identify genes that may be involved with the evolution of traits specific to cave species. This will support further research into the mechanisms of evolutionary change and may help us to understand the underlying causes of various human diseases. The article was published online in Nature Communications on October 20, 2014. (McGaugh S.E. et al. The cavefish genome reveals candidate genes for eye loss. Nat. Commun. 5:5307 DOI: 10.1038/ncomms6307 (2014)).
Professor McGaugh uses MSI resources to perform large-scale genomic analyses, which require considerable computational power. Besides studies of cavefish, the McGaugh group is studying the transcriptomes of reptiles.
An article about this research also appeared on the University of Minnesota’s Discover blog.
Image description: a,b: surface fish; c,d: Pachón cavefish. Scale bar for a,c is 1 cm. Scale bar for b,d is 0.25 cm. (Image and description, McGaugh S.E. et al., The cavefish genome reveals candidate genes for eye loss. Nat. Commun. 5:5307 DOI: 10.1038/ncomms6307 (2014). ©Nature Publishing Group.
posted on November 12, 2014
Oral cancer is a virulent form of the disease, with a nearly 50 percent mortality rate. Early diagnosis is one of the keys to successful treatment; patients whose cancer is found and treated early have a much better survival rate. Unfortunately, the current method of diagnosing the disease, which involves excision and biopsy of tissue, is both invasive and expensive. It is also prone to errors because of under-sampling. A better method for diagnosing this disease would be a great benefit.
Two MSI Principal Investigators, Associate Professor Frank Ondrey (Director of Research and Clinical Trials, Otolaryngology - Medical School) and Associate Professor Timothy Griffin (Biochemistry, Molecular Biology, and Biophysics - Medical School and College of Biological Sciences), are co-authors with several colleagues from China and the University of Minnesota on a recent paper in PLos One that discusses a new, proteomics-based method of diagnosing oral cancer. Cells retrieved via a non-invasive oral brush biopsy were tested using mass spectrometry-based proteomics. The researchers found that the secretory leukocyte protease inhibitor (SLPI) was greatly reduced in samples from cancerous and pre-cancerous lesions, compared to normal tissue. This suggests that the reduction in SLPI could be a biomarker for oral cancer.
The paper can be read on the PLoS One website (Yang, Ya, Nelson L. Rhodus, Frank G. Ondrey, Beverly R. K. Wuertz, Xiaobing Chen, Yaqin Zhu, and Timothy J. Griffin. 2014. Quantitative proteomic analysis of oral brush biopsies identifies secretory leukocyte protease inhibitor as a promising, mechanism-based oral cancer biomarker. PLoS One 9 (4) (APR 18), 10.1371/journal.pone.0095389.). The authors used MSI software and hardware to perform data analysis.
Image description: A. Brush biopsy collection and sample preparation protocol. B. Experimental design for quantitative MSI-based proteomics experiments. One experiment used matched tissue from oral premalignant lesion tissue, and the second used matched tissue from oral squamous cell carcinoma. Image and description, Y Yang et al., 2014, PLoS One, 10.1371/journal.pone.0095389.
posted on October 29, 2014
The physics of diffuse, ionized gases - plasmas - and gravity in deep space are of great interest to researchers investigating how the early universe developed. The use of high-performance computing allows computational astrophysicists to create simulations that extend over cosmic times and distances.
Professor Tom Jones, an MSI Fellow from the Minnesota Institute for Astrophysics in the School of Physics and Astronomy (College of Science and Engineering), has used MSI’s supercomputers to investigate plasmas and gravitational fields in astrophysical environments since the 1980s. The various pioneering projects covered in this research program have formed the basis of a large number of Ph.D. theses and have resulted in many publications. In current projects, the Jones group is using powerful magnetohydrodynamics (MHD) codes on the supercomputers to create simulations that allow them to study the physical processes in diffuse gases in galaxy clusters and to investigate the outflow from massive black holes. The group has developed a high-performance MHD code called “WOMBAT,” which they are continuing to expand in order to create even high-resolution simulations and to include additional physical processes.
A 2014 paper in the Astrophysical Journal discussed a phenomenological model for thermal leakage injection in the diffusive shock acceleration (DSA) process. DSA is the acceleration that ions and electrons undergo when being repeatedly reflected and it plays an important role in astrophysical models, such as solar flares and supernova shock waves. Recently this process has been invoked to explain giant radio structures in galaxy clusters spanning several million light years. Professor Jones and his colleagues Dr. Hyesung Kang (Pusan National University, Korea), Dr. Vahe Petrosian (Stanford University, California), and Dr. Dongsu Ryu (UNIST, Korea) developed a model for injection of particles into the DSA process from kappa-like suprathermal particle populations into diffusive shock acceleration. (Kang, Hyesung, Vahe Petrosian, Dongsu Ryu, and Thomas W. Jones. 2014. Injection of kappa-like suprathermal particles into diffusive shock acceleration. Astrophysical Journal 788 (2) (JUN 20): 142.) The kappa distribution represents “thermal” population modified by resonant interactions between particles and plasma waves generated by the shock that propagate ahead of the shock. In related work, a 2013 Astrophysical Journal paper investigated nonlinear effects of wave-particle interactions on the DSA process in Type-1a-like supernova remnant blast waves. Supernova remnants are strong sources of nonthermal radiation. (Kang, Hyesung, Thomas W. Jones, and Paul P. Edmon. 2013. Nonthermal radiation from supernova remnants: Effects of magnetic field amplification and particle escape. Astrophysical Journal 777 (1) (NOV 1): 25.)
Other publications by Professor Jones during 2014 include:
• Brunetti, Gianfranco, and Thomas W. Jones. 2014. Cosmic rays in galaxy clusters and their nonthermal emission. International Journal of Modern Physics D 23 (4) (JAN 01).
• Wykes, Sarka, Huib T. Intema, Martin J. Hardcastle, Abraham Achterberg, Thomas W. Jones, Helmut Jerjen, Emanuela Orru, et al. 2014. Filaments in the southern giant lobe of Centaurus A: Constraints on nature and origin from modelling and GMRT observations. Monthly Notices of the Royal Astronomical Society 442 (4) (JAN 01): 2867-82.
Some of the largest simulations the Jones group have carried out at MSI follow the full dynamic formation of the galaxy clusters as they fall out of the expansion of the universe because of gravity. One particular study in that effort is shown in the visualizations above. The images show a snapshot in time of one cluster of galaxies formed in a simulation. The focus of this study is development of turbulent plasma flows on large scales during the collisions of clusters. A simple way to measure the strength of turbulence is to calculate the rate at which turbulent eddies spin, a measure called vorticity. In the image above, the left image shows the distribution of vorticity; the right image shows one of the physical processes that drives the generation of turbulence. From analyses like this it is possible to establish, for example, that shock waves are the principle sources for turbulence in the cluster.
posted on October 15, 2014
Epstein-Barr Virus (EBV) is a common virus in the human herpes family. It is well known as the cause of infectious mononucleosis, but it also is associated with some forms of cancer and with autoimmune diseases such as systemic lupus erythematosus and multiple sclerosis. Much of the population is infected with EBV with no symptoms, but we don’t understand why some individuals become sick while others do not. There is no vaccine and no treatment for EBV infection.
MSI Principal Investigator Kristin Hogquist, a professor in the Department of Laboratory Medicine and Pathology and Center for Immunology (Medical School), and her University of Minnesota colleagues recently investigated immune responses to EBV. They used transcriptome analysis to study the immune responses of patients who acquired EBV infections naturally. They found an interesting gene expression profile during acute infection, but no lasting changes during latent infection. They also discovered similarities in responses to EBV and the Dengue fever virus. These results provide important new information relating to natural herpesvirus infections.
The results of this research were published in the journal PLoS One (Dunmire, Samantha K., Oludare A. Odumade, Jean L. Porter, Juan Reyes-Genere, David O. Schmeling, Hatice Bilgic, Danhua Fan, Emily C. Baechler, Henry H. Balfour Jr., and Kristin A. Hogquist. 2014. Primary EBV infection induces an expression profile distinct from other viruses but similar to hemophagocytic syndromes. PLoS One 9 (1) (JAN 17), 10.1371/journal.pone.0085422). Dr. Kevin Silverstein, of MSI’s RISS group, provided assistance to the authors with bioinformatics analysis and software. The paper can be read on the PLoS One website.
Image description: A distinct gene expression profile is apparent during acute EBV infection, but not latent infection. (A) Microarray analysis was performed on pre-infection, acute, and latent timepoints for 10 subjects with primary EBV infection. 464 genes were shown to be significantly changed during the primary response to EBV at a fold change of ≥ 2 and a p-value of ≤ 0.05. No genes were significantly changed during the latent phase of infection using the same criteria. (B) Ingenuity Pathway Analysis of the 464 acute genes revealed 14 pathways that were enriched amongst the genes that changed during primary EBV. These had a significant p-value (the negative log is shown) following evaluation with the Benjamin-Hochberg multiple tests correction. (C) A heatmap representation of the highest (≥ 3 fold) gene changes during the acute and latent stages of EBV infection. (Image and description, S.K. Dunmire, et al., PLoS One 9 (1) (JAN 17), 10.1371/journal.pone.0085422).
posted on October 1, 2014