Annotating the biomedical literature for the human variome

Verspoor, K, Yepes, A, Cavedon, L, McIntosh, T, Herten-Crabb, A, Thomas, Z and Plazzer, J 2013, 'Annotating the biomedical literature for the human variome', Database, vol. 2013, bat019, pp. 1-13.

Document type: Journal Article
Collection: Journal Articles

Title Annotating the biomedical literature for the human variome
Author(s) Verspoor, K
Yepes, A
Cavedon, L
McIntosh, T
Herten-Crabb, A
Thomas, Z
Plazzer, J
Year 2013
Journal name Database
Volume number 2013
Article Number bat019
Start page 1
End page 13
Total pages 13
Publisher Oxford University Press
Abstract This article introduces the Variome Annotation Schema, a schema that aims to capture the core concepts and relations relevant to cataloguing and interpreting human genetic variation and its relationship to disease, as described in the published literature. The schema was inspired by the needs of the database curators of the International Society for Gastrointestinal Hereditary Tumours (InSiGHT) database, but is intended to have application to genetic variation information in a range of diseases. The schema has been applied to a small corpus of full text journal publications on the subject of inherited colorectal cancer. We show that the inter-annotator agreement on annotation of this corpus ranges from 0.78 to 0.95 F-score across different entity types when exact matching is measured, and improves to a minimum F-score of 0.87 when boundary matching is relaxed. Relations show more variability in agreement, but several are reliable, with the highest, cohort-has-size, reaching 0.90 F-score. We also explore the relevance of the schema to the InSiGHT database curation process. The schema and the corpus represent an important new resource for the development of text mining solutions that address relationships among patient cohorts, disease and genetic variation, and therefore, we also discuss the role text mining might play in the curation of information related to the human variome. The corpus is available at
Subject Natural Language Processing
Bioinformatics Software
Keyword(s) Biomedical text mining
human mutations
DOI - identifier 10.1093/database/bat019
Copyright notice © The Author(s) 2013. Published by Oxford University Press.
ISSN 1758-0463
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 21 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 17 times in Scopus Article | Citations
Altmetric details:
Access Statistics: 168 Abstract Views  -  Detailed Statistics
Created: Thu, 15 Jan 2015, 13:42:00 EST by Catalyst Administrator
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us