The Text Information Extraction System (TIES) originally started through the Shared Pathology informatics Network (SPIN) associated with the National Cancer Institute (NCI) and was the brainchild of Dr. Jules Berman, one of the early pathology informaticians. Dr. Berman’s vision for cancer research was to create a way to search vast deidentified datastores across multiple institutions. The approach would help cut down the time spent searching and contacting tissue banks, so that multi-center studies could happen much faster. It would also make it possible for rare disorders to be studied by aggregating across centers, since any single center might not have enough cases. Ultimately, four institutions were funded to be part of SPIN- Harvard, Regenstreif, UCLA, and the University of Pittsburgh.

What SPIN hadn’t anticipated was just how much time and effort would be needed to fully establish the project or how many barriers they would encounter. The early version of SPIN was based on technology that wasn’t yet mature.  To many, it seemed unlikely that institutions would even be willing to share this de-identified clinical data. And what about scientific turf?  Wouldn’t organizations and investigators be afraid of giving up their competitive scientific advantage?

One such skeptic, new faculty member Rebecca Jacobson, MD, MS, first thought about the project was, “That’s the craziest thing I’ve ever heard! How is that possible?” University of Pittsburgh Principal Investigator, Dr. Michael Becich, explained why the SPIN project was so important, and a convinced Jacobson joined the team.

The four institutions would fortunately come together and make important progress over the next five years of funding. With the help of Generalized Architecture for Text Engineering (GATE), the team at Pittsburgh developed their own NLP pipeline called TIES based on GATE – a newly released open source framework for NLP.  Eventually, all four groups demonstrated the first query across the entire SPIN network in 2006 at Natcher auditorium (National Institute of Health), determining how many patients with pituitary adenomas existed at all four institutions.

“At the time, this was unheard of and amazing to people,” Rebecca Jacobson states. In fact, it made people realize that sharing data could open up new possibilities for research. More importantly, people recognized that data needed to be shared and could be shared on a much larger scale than ever thought possible.

Unfortunately, an operational network was never fully realized through the SPIN research grant. However, it led Jacobson to the project and kept her going even after the funding period. She remembers thinking, “I was really moved by the dream that Jules Berman had. I thought it was a transformative vision, and I think what was so transformative was the idea that you could have privacy and control as well as openness of data at the same time.” So she continued the project.

And it paid off in a big way. TIES went on to receive help from the caBIG contract with the National Cancer Institute Center for Bioinformatics (NCIB) and later by the National Cancer Institute Informatics Technology for Cancer Research (NCIITCR) program, making vast improvements and collaborating with even more institutions.

What do you think about the origin of TIES? How would you make sharing between institutions better?