An MD&DI December 1997 Column
YEAR 2000 SOFTWARE
The clock is ticking toward the next millennium. Will your data survive, or is there a time bomb programmed into your management system?
The "year 2000 software crisis," as it has come to be known, poses a serious risk to the integrity of data being collected and analyzed in clinical studies. The problem arises from the two-digit notation used by many computers and software applications to denote years in dates, such as 10/01/97. Current software will be unable to process dates correctly after December 31, 1999, and processes that use dates in key fields, calculations, comparisons, sorts, and projections may potentially report inaccurate, missing, or incomplete information. Statistical conclusions drawn from these faulty data to support clinical studies could have harmful effects on human lives.
This article outlines a structured approach to overcoming the year 2000 problem as it relates to the management of clinical trials. A planned strategy, properly implemented, is necessary to ensure that the operations and functions of existing computer systems will continue well into the next millennium.
DATES AS DATA
Dates play a fundamental role in clinical trials because of their relation to the starting point, duration, and primary and secondary end points of a study. Key conclusions and statistical statements often depend on time-related events. A survival analysis, for example, identifies the life expectancy of an event based on date-dependent measurements.
Dates often serve as the key fields when merging data files and are also used to sort and search for study compliance. The sorting routines in most programs will not be able to handle "00" dates, because 00 is less than 99. As a result, a program might place the data collected for a patient visit in the year 2000 before the data for a patient visit in 1997. Trends and analysis based on patient follow-up data can lead to inaccurate conclusions if this is not corrected. Dates are used to calculate age, follow-up time, and time-to-event. They can also be used to compare most recent measurements with baseline values.
In investigational sites, patient enrollment dates are often used to project follow-up visits based on the protocols compliance schedule. Patient calendars and monthly reminder lists are generated to assist the site personnel in scheduling patient examinations. Having a system to track patient visits is essential to maintaining high compliance.
Dates also play an important role in the resolution of data queries and anomalies. Often, a separate database will contain information on the types of queries sent to the sites and track their resolution using the date the query was sent, the patient visit date on the case report form, and the date the query was resolved. Given the integral role dates play in clinical trials management, the need to make computer systems year 2000 compliant should be of primary concern to device manufacturers.
To develop a strategy for becoming year 2000 compliant, manufacturers must have a thorough understanding of their current clinical data management systems and the methods and tools available to correct the problem. Only then can a structured approach be defined, progressing from inventory to planning, conversion, validation, and implementation.
Manufacturers can begin by performing a business-risk or impact assessment, starting with an inventory of computer software systems. Managers need to ask: How many applications, programs, languages, lines of codes, and data fields are there? Does any documentation identify the source programs that generate the reports used in the clinical trial? Are any applications and programs outdated? Which applications and programs are most frequently used? What tools are available to help with the inventory? For each application and system, the objective is to identify all the areas that could possibly be affected by the century rollover. An inventory of all programs, files, databases, computer screens, and related systems should be compiled.
The next step is to develop a plan to make the computer system year 2000 compliant. A project team with key people from all departments affected should be assembled, and funds allocated for purchasing software tools and consulting services. It will be essential for the project manager to recruit individuals with the right skills to minimize start-up time. The key skills include technical software expertise, biostatistical and clinical affairs knowledge, a regulatory background, and project management leadership.
Based on the initial inventory, the project team should develop a time line for adjusting systems and determine how the code and data files should be changed. For each application, a number of options exist; the project team can decide to leave it alone, redesign it by conversion, rehost it to a new platform, replace it with another application, or retire it from operation.
For software applications slated for redesign by conversion, the team needs to determine the most appropriate conversion method. The unaffected or outdated software applications can be replaced or archived. Each converted software application must be tested and validated to minimize the risk of introducing programming or data errors. Once the system is completed, it needs to be implemented to ensure user acceptance of the conversion process.
It is important to realize that the year 2000 problem is not a technical problem but a business enterprise problem. The challenge is to make all clinical data management systems understand dates in the next century before it is too late. Software tools by third-party vendors are available to assist in monitoring and automating the conversion process. Since fewer than 730 days remain to complete this project, all available options should be considered.
The process of converting systems and applications in each functional area cannot begin without an understanding of how date fields are stored and processed. For example, are dates entered and stored as numbers or characters? In what format? With two or four digits for the year? Are the dates transformed in any way before being saved to the data set? These questions need to be answered to determine what needs to be done and at what stage. The process presents a good opportunity to standardize and document the clinical data management system. To standardize dates, manufacturers can refer to ISO 8601, which specifies that numeric dates be represented as ccyy-mm-dd where cc = century, yy = year, mm = month, and dd = day. Examples of this notation are 1997-10-01 and 2001-01-01.
For clinical trials data management systems, the functional areas to investigate include data entry, statistical analysis, and report generation. The focus should be on date-sensitive applications and data files. The year 2000 will appear as mm/dd/00 on many computer screens and reports unless corrected.
Data Entry. In general, information is collected into a case report form and transferred to the data management center through one of four methods: hard copy, fax, diskette or tape, or modem. Each method is susceptible to the year 2000 problem. A review of the data-entry system should examine system application and files, external files, data-entry screens, and operational procedures.
The system application defines the date fields for entry into the data sets. The date fields need to be numeric and need to allow four digits for years for all new studies. In addition, any possible logical checks on the dates should be programmed. Examples of these logical checks include confirming that the three-month patient visit date is after the one-month visit date and that the patients birth date is before the enrollment date. All screens should display all four digits for the year to avoid confusion by the data-entry operator. The conversion team must also consider the compatibility of any data being merged with the clinical database from an external file and take steps to ensure that all dates in the database remain consistent. Operational procedures such as double-key entry and audit trails rely almost entirely on dates to determine when events occurred. Whether these functions occur by batch or interactive process, dates are recorded and used to minimize and identify any data-entry errors and to document changes to the clinical database.
Data backup and recovery procedures must not be overlooked. Managers should make sure that a system knows what dates are the most current so that it does not override a current backup with an older backup.
Statistical Analysis and Report Generation. The review of the statistical analysis and report generation functional areas must examine the statistical data set, printed reports, and output flat files. Statistical data sets comprise all the key fields from all the data sets in a clinical study. Often, the primary and secondary end points are tabulated in the statistical data set, which is used as the input data set for most of the statistical analysis programs for the study. Dates used to define when events occurred play a critical role in the statistical analysis.
In clinical studies, follow-up information is often collected to monitor patient progress. Because data are collected over time, typical reports might include most recent visit, mean difference in efficacy between visits, and improvement from baseline. These reports not only display dates in the listing but also depend on accurate dates to determine the patients most recent visit as well as the visit with the best response rate.
In most programs, the logic is designed to use a two-digit year in comparisons and calculations. Because 00 is less than 99, this logic will no longer work. Many programs assume that there is no year larger than 99. In addition, programs may have "19" hard-coded into them to assume that we are in the 20th century.
The data sets and programs used for statistical analysis deserve the highest priority in the conversion process. For new studies, the case report forms should be designed to facilitate the collection of accurate data. In general, four-digit years should be requested wherever dates are recorded.
All items in the inventory must be reviewed to determine which are affected by the century rollover and when the impact will occur. All applications can then be categorized by those that need attention and those that do not. Remember that the mission-critical systems deserve the highest priority.
The best conversion strategy for achieving year 2000 compliance is determined by time and money. The easier solutions may take less time to implement and work fine in the short term, but may require greater maintenance costs. The harder solutions may take longer to implement, but may last longer without additional maintenance.
There are three basic approaches to correcting the year 2000 date-field problem: Fix the data, fix the code, or do a combination of the two. Whatever the approach, three conversion strategies should be considered: four-digit date expansion, the fixed/sliding window technique, and encapsulation. Figure 1 provides a decision tree for determining the most appropriate approach.
Four-Digit Date Expansion. Four-digit date expansion entails changing the two-digit year notation to four digits. This modification usually applies to the data files. In addition, minor modifications may need to be made to the programs in order to read the dates in the new format. Four-digit date expansion may be the first option to consider because it makes the most sense and is the most final solution; however, depending on the extent of the problem, there may not be enough time to expand all applications to the four-digit notation. In addition, complex files and program relationships may prevent the use of the field-expansion strategy.
Fixed/Sliding Window. A good alternative to four-digit date expansion is the fixed and sliding window technique. The basic idea is that by setting a date in a computer systeme.g., 1950the system can then have a 100-year window definedin this case from 1950 to 2049; two-digit year representations are not ambiguous: 49 is 2049, 50 is 1950, 51 is 1951. This works well for systems that have dates within 100 years of each other. This technique is attractive because it should be applicable to more than half the applications and programs that need to be changed. Most important, it is straightforward and easy to apply. It requires no change to the data structures (in flat files and databases) and only top-level changes to the program, usually only once. Major fourth-generation languages have a statement that sets the window at the top of the program, and the window dates need to be added only once. Even sliding windows are easy to install, where the window moves to stay relatively positioned over the current year.
For example, in SAS applications (SAS Institute; Cary, NC), time-span calculations and database storage of date values have been year 2000 compliant from their inception. In addition, the YEARCUTOFF option allows an easy 100-year window of two-digit representation of years. For many companies, this method is appropriate for 70 to 99% of their SAS programs and systems. When working with several SAS programs that access the same data file, the YEARCUTOFF option should be consistent in each SAS program to preserve data integrity.
Encapsulation. Basically, encapsulation is a means of preventing modification of either the program or the data file, depending on which is more important. In program encapsulation, the program remains unchanged while the data are modified. In data encapsulation, the reverse is true.
If program encapsulation is chosen, then an option is to "time shift" the data by subtracting 28 years from each date before doing any calculations and then adding 28 years to the result. A constant 28, or a multiple of 28, is used because every 28 years, the days of the week are the same for any given day in the year. Since all the dates will be shifted back by 28 years, any date calculations will avoid the century rollover problem and generate accurate results. In addition, the method requires no other logic change to the program.
If data encapsulation is desired, then an option is to read and write the data storage using hex to retain the old record layout. This involves changing the format of the date entered to accept four digits and storing the four digits as a two-digit representation. Hex storage will preserve the original length requirement of the data file. Two hexadecimal bytes can represent year values from 0 to over 65,000, removing the 100-year limit. This method has the advantage of obviating the need to redesign old file structures, with the added benefit that other languages can also read the files.
Each application must be looked at individually to determine the appropriate method for correcting the century rollover problem. Different applications could each require a different fix. A schedule needs to be developed to define each applications start date based on the programming resources and the number of programs affected.
PROGRAM FOR SELECTING A YEAR 2000 COMPLIANCE STRATEGY
(1) Do . . . Examine your system/program/data:
(2) If the representation of years is OK, and there is
(3) If (Priority > Difficulty) then expand all date fields;
(4) ELSE IF
For SAS programs it is:
OPTION YEARCUTOFF=ccyy (like 1950);
(5) ELSE IF you prefer a sliding window: use system calls based on the system current year:
For SAS programs it can be:
(6) With choices (4) and (5), doublecheck output reports and input/output screens to ensure they are to your liking (and your customers), i.e., four-digit years;
(7) Else if (other languages use files) or (years spanned are too diverse) then use an encapsulation method
(8) Last resort (sometimes best) is to reengineer: to place program features into other systemspossibly year 2000compliant off-the-shelf or turnkey software.
TESTING AND VALIDATION
Even though the repair of a two-digit field is relatively simple, testing the result involves a massive process to ensure that new problems were not introduced into the application and to verify that errors were not introduced into data files. Maintaining consistency in designing, executing, and documenting is vital. Programs need to work today and after the year 2000. The strategy is to develop a simple yet effective method to complete the testing objectives in a structured and controlled environment.
To ensure system compliance, manufacturers must perform unit testing, system and integration testing, and acceptance testing. In defining the acceptance level, testing managers must strike a balance between the risk of missing an error and the rewards of having a thoroughly tested system. The highest level of compliance involves testing for four different points in time. The first is before the year 2000for example, December 31, 1999. The second is in the year 2000, such as January 1, 2000. The third is after the year 2000, such as January 1, 2001. The fourth covers a transition into the year 2000 and back, such as December 31, 1999, to January 1, 2000, to December 31, 1999. This last test assesses the ability of the application logic to cross from 1999 into 2000 and from 2000 back to 1999. More extensive testing, if needed, can involve several business cycles that move across the change of century. Also, a system must be implemented for continuously monitoring the compliance and satisfaction of all members of the clinical study.
For health-care companies, clinical data are a valuable resource. If a system needs conversion to continue functioning into the next century, then the conversion must be accurate. Mistakes from processing dates can invalidate the conclusions of any clinical study.
FDA requires validation for any software or computer system that manages clinical data for clinical trials. FDA submissions such as annual reports, clinical updates and amendments, 510(k)s, and premarket approval applications all need to be error-free. Moreover, the submission should not have confusing information about dates and calculations. The Division of Small Manufacturers Assistance within FDAs CDRH has information about the device regulatory issues concerning the year 2000 computer problem. More details can be found on CDRHs home page at http://www.fda.gov/cdrh/yr2000.html
The year 2000 problem imposes a fixed deadline, and only limited resources exist to address the programming issues involved. Because of the urgency and significance of this problem, senior management must make a commitment to immediate action. There is no simple solution. Implementing a successful year 2000 conversion will require effective project management and considerable teamwork.
Sunil Kumar Gupta is senior consultant and founder of Gupta Programming (Simi Valley, CA) and cofounder of Millennium Technologies Institute, Inc. (Spring Valley, CA).
Illustration by Tim Teebken