OverviewLearn about the TIES system
TIES stands for Text Information Extraction System. It provides tools for de-identification and automated coding of free-text structured pathology reports. It also has a client that can be used to search these coded reports. The client also supports Tissue Banking and Honest Broker operations.
TIES focuses on two important challenges of bioinformatics:
- Information extraction (IE) from free text
- Access to tissue.
Regarding the first challenge, information from free-text pathology documents represents a vital and often underutilized source of data for cancer researchers. Typically, extracting useful data from these documents is a slow and laborious manual process requiring significant domain expertise. Application of automated methods for IE provides a method for radically increasing the speed and scope with which this data can be accessed.
Regarding the second challenge, there is a pressing need in the cancer research community to gain access to tissue specific to certain experimental criteria. Presently, there are vast quantities of frozen tissue and paraffin embedded tissue throughout the country, due to lack of annotation or lack of access to annotation these tissues are often unavailable to individual researchers.
TIES has three goals designed to solve these problems:
- Extract coded information from free text Surgical Pathology Reports (SPRs), using controlled terminologies.
- Provide researchers with the ability to query, browse and request annotated tissue data and physical material across a network of federated sources.
- Pioneer research for distributed text information extraction.
TIES provides the functionality required for:
- organizations to de-identify and concept code a corpus of free-text SPRs and to create a data service which manages this information
- organizations to provide role-based access to TIES data
- authorized users to create queries of the TIES datastores and retrieve reports that fit these criteria
- authorized users to submit requests for tissue (orders) which will be filled by Honest Brokers.
An important aspect of the interface is the ability to manage queries and case sets. Users are able to vet query results and save them to case sets which can then be edited at a later time. These can be submitted as tissue orders or used to derive data extracts. Queries can also be saved, and modified at a later time.
Some examples of potential TIES users are:
- Attending physicians
- Department of Pathology Residents
- Translational researchers
- Basic scientists
- Molecular biologists
- Tissue Bank Personnel
TIES is designed as a three tier client-server architecture that is primarily written in Java. The user interface is implemented in Java Swing and deployed using Java Webstart. The middle-tier consists of OGSA-DAI based encrypted web services, and data processing pipelines. The NLP pipeline is implemented in GATE – General Architecture for Text Engineering. It uses an Apache Lucene Index for search.
At the core of the TIES system are the identified(PHI data) and de-identified databases, and corresponding web-services that are used to access these databases. A typical TIES installation is spread over at least two different server machines, one each to hold the identified and de-identified data. The server holding the identified data and corresponding web service is behind the organization’s firewall and is not accessible from outside the organization. The server holding the de-identified data and its corresponding web-services is outside the firewall, enabling users to access TIES from outside the organization, and for the organization’s TIES node to share its data with other TIES nodes.
TIES uses MySQL as the DBMS, and since MySQL is network accessible, it is possible to store the databases on entirely different servers than the web-services for additional security. It is also possible to install TIES on just one machine.
TIES nodes must be part of a TIES network. The TIES network is managed by the CTRM. In the simplest scenario, a standalone TIES node is part of a single node CTRM. When nodes want to share data with other TIES nodes, they must share a common CTRM.
Reports undergo several stages of processing in TIES in order to be ready for search and retrieval by the users. Each stage is handled by a separate data processing service. Each report has a specific status field in the database that indicates the stage of processing.
Data Processing Stages