Loading Your Data

The entry point for your own data in the TIES system is the private database. TIES pipelines can then take over and transform the data to its final searchable form.

Section Configuration

Irrespective of which importer your use you need to create a SectionHeaderConfig.txt file and provide it to the relevant pipelines. See Section Configuration for more information.

Using the HL7ImportPipeController

If you have the ability to generate customized HL7 files from your Laboratory Information System then you could use the HL7ImportPipeController to import your data into TIES. This is the recommended way to import data into TIES. See HL7ImportPipeController for more information.

Using the DelimitedFileImporter

If you have the ability to generate delimited files from your Laboratory Information System then you could use the DelimitedFileImporter to import your data into TIES. This is the recommended way to import data into TIES. See DelimitedFileImporter for more information.

Loading database directly

Private/identified data is stored in the €œties_private€ schema on the private database. Seven tables will need to be populated to ensure that TIES is correctly set up to read the data. For all these tables, “id” is the primary key and must be populated. If you have separate sections available for the document text, you should populate the IDENT_SECTION table with the text, and if you only have the full report text available, you should store it in the IDENT_DOCUMENT table’s DOCUMENT_TEXT column. However you must still initialize the SECTION_TYPE and SOURCE_SECTION_HEADER tables, so that TIES can identify the different sections appropriately.

  1. ORG – Only one organization (i.e. only one record) should be present in this table.
  2. IDENT_PATIENT – Contains the identified/private information for the patient.
    • ID
    • FIRST_NAME
    • LAST_NAME
    • SOCIAL_SECURITY_NUMBER and/or MEDICAL_RECORD_NUMBER
    • ORG_ID – This should be the same as ID in ORG table’s row.
  3. IDENT_DOCUMENT- Contains the identified/private information for the document
    • APPLICATION_STATUS – Must be set to “DEIDENTIFYING”€™ ( without quotes ). This indicates to the DeidPipeController to synthesize and de-identify the report in that record.
    • COLLECTION_DATE_TIME – Must contain the valid collection date.
    • RECORD_ID
    • DOCUMENT_TYPE_ID – The ID that maps into the DOCUMENT_TYPE table
    • PATIENT_ID -€“ This column is a foreign key that maps to the ID column of the IDENT_PATIENT table. It must be properly initialized.
    • ORG_ID – This column is a foreign key that maps to the ID column of the ORG table.
  4. IDENT_SECTION – This table contains the report text, or rather fragments of the report text organized as sections. Hence for each record in the IDENT_DOCUMENT table, there will be one or more records in the IDENT_SECTION table. Minimally, the following fields will need to be populated:
    • NAME -€“ The name of the section in the SPR. For example, “€˜Final Diagnosis”™ or “˜Gross Description”
    • DOCUMENT_FRAGMENT – This contains the text for the section in the report.
    • DOCUMENT_ID -€“ The id that maps into the IDENT_DOCUMENT table.
    • SECTION_TYPE_ID – ID maps to SECTION_TYPE Table
  5. DOCUMENT_TYPE – This table defines the type of the document.
    • NAME -€“ The name of the document type. E.g. Pathology or Radiology.
  6. SECTION_TYPE – Defines the types of document sections found in each document type.
    • IS_CODEABLE -€“ Identify concepts in this section
    • IS_DEIDENTIFIABLE – De-identify this section
    • IS_INDEXABLE – Index this section
    • IS_HISTOGRAMMED – Not Used
    • IS_KEYWORD – Use Lucene KeywordAnalyzer to search this section.
    • IS_VISIBLE – Is included in the document text during synthesis.
    • IS_WHITESPACE – Use Lucene Whitespace Analyzer to search this section.
    • LIMS_CODE – Unused.
    • NAME – Name of the section
    • PRIORITY – Order in which this section should appear in the document during synthesis.
    • DOCUMENT_TYPE_ID – The ID that maps into the DOCUMENT_TYPE table
  7. SOURCE_SECTION_HEADER – Defines the possible headers under which a specific section might be found.
    • HEADER – Header as it appears in the text.
    • SECTION_TYPE_ID – ID maps to SECTION_TYPE Table

Foreign Keys

It is also assumed that while populating the schema, all foreign keys are valid to ensure a ORG -> IDENT_PATIENT -> IDENT_DOCUMENT -> IDENT_SECTION hierarchy.

Running the pipelines

After all the data is loaded, run the DeidPipeController service to start the population of the de-identified/public database. If you do not need de-identification (in cases where TIES is for internal use only by authorized users who are allowed to view PHI) you can use the DoNothingDeidentifier. See DeidPipeController Configuration section for more information. You can start all the pipelines together. The order of starting is:

  1. DeidPipeController
  2. TIESPipeController
  3. IndexPipeController

DeidPipeController and IndexPipeController typically run much faster than the TIESPipeController, and hence the IndexPipeController will most probably run out of reports to index and will then go to sleep for a predetermined time before checking for reports again. The pipe activity can be monitored by looking at the log files.