Data migration

Data migration is a key element to consider for any company or organisation when adopting any new system to replace an older system, either through purchase or new development. It would be easy to assume that any two systems, old and new, that maintain similar data must perform similar tasks. Consequently, information from one system should map to the other with ease. In reality, this is rarely the case.

Although migrating data can be a fairly time-consuming process, the benefits can be worth the cost for those that "live and die" by trends in data. Migrating data to new systems means that older and more cumbersome applications need not be maintained. However, there are many more reasons as to why data migration is needed within any business. This section includes a discussion on the following topics: Data migration definition, Migrating legacy data decision, and How to migrate data.

Some key terms in understanding data migration are:

  • Legacy data is the recorded information that exists in your current storage system, and can include database records, spreadsheets, text files, scanned images and paper documents. All these data formats can be migrated to a new system.
  • Data migration is the process of importing legacy data, from one or more old systems, to a new system or systems. This could involve entering the data manually, moving disk files from one folder (or computer) to another, using database insert queries, developing custom software, or other processes. The specific method used for any particular system depends entirely on the systems involved and the nature and state of the data being migrated.
  • Data cleansing is the process of preparing legacy data for migration to a new system. Because the architecture and storage method of legacy data can be quite different from the new system, the legacy data often does not meet the criteria set by the new system, and must be modified prior to migration. For example, the legacy system may have allowed data to be entered in a way that is incompatible with the new system. Architecture differences, design flaws in the legacy system, or other factors can also render the data unfit for migration in its present state. The data cleansing process manipulates, and cleanses, the legacy data so that it conforms to the new system's requirements.
Do we migrate legacy data?

The "Do we migrate legacy data?" is a question that has been asked ever since the first companies put data in one repository and decided later to change to new systems. Here are some commonly asked questions:

  • Should we bring the data over to the new system?
  • If so, should we bring all or part of the data?
  • If just part, which parts? - based on creation date? based on status of case open or closed? or some combination of these?
  • If we choose to bring over data on or after a certain date, which should it be - last 3 months, last 6 months, last year?...should it be all the data?
  • Should we filter specific data, in or out?
  • Are our desired criteria extractable from the existing database?
  • Are the desired fields of data importable into the new system?
  • Is data migration from our legacy system included in the purchase of the new system?
  • If not, do we have the expertise in-house to script an automated process to migrate the data?
  • If not, will we hire someone to do this?
  • Will we perform this task manually?
When deciding on data migration, all factors should be examined before making the assumption that the whole dataset or none of the dataset should be moved over to the new system. The proof is in whether these data records will be used and acted upon when the new system and process is in place. There are key variables to consider when considering data migration. These include data volume and data value.

Data volume is the easiest variable in the decision process. How many data records are we talking about: 1000, 10,000, 100,000, 250,000? How many are expected to come into the new system on a weekly/monthly basis to replenish this supply? Check to see if there are any technical barriers to bringing over a certain amount of data and also if large databases will affect performance of system functions like searching. If not, then 10 records or 100,000 records should not make any difference.

If volume is low, then it may be well worth carrying out a migration so there is some existing database for users and for trend analysis. If volume is high, then it may make sense to examine the age/value of the data and start filtering on certain criteria.

Data value is a much harder variable in the decision process. There are different perceptions concerning what value the existing data provides. If users are not working with older data in the current system, chances are they may not work with older data in the new system even with improved search functionality. If migrating, you may want to look at shorter-term date parameters - why bog down a system's performance with data that is never used?

Criteria, as discussed in the questions above, can be date parameters, but can also include other factors. Extracting the exact data based on some of these factors will depend on the abilities of your current system and database as well as the ability to write the detailed extraction script. Keeping it simple when possible is the best approach. However, there may be circumstances where filtering data may make sense.

Once you have determined which data you want to migrate, determining what parts of the data record are required will also be important.

This is by no means an exhaustive list of things to consider when embarking on a data migration. Halogence can offer you consultancy to help provide you with answers to these questions and formulate a tailored approach.

How do we migrate data?

Once the decision is made to perform data migration and before migration can begin the following analysis must be performed:

  • Analyse and define source structure (structure of data in the legacy system)
  • Analyse and define target structure (structure of data in the new system)
  • Perform field mapping (mapping between the source and target structure with data cleansing, if necessary)
  • Define the migration process (automated vs. manual)
To analyse and define source and target structures, analysis must be performed on the existing system as well as the new system to understand how it works, who uses it, and what they use it for. A good starting point for gathering this information is in the existing documentation for each system. This documentation could take the form of the original specifications for the application, as well as the systems design and documentation produced once the application was completed. Often this information will be missing or incomplete with legacy applications, because there may be some time between when the application was first developed and now.

You may also find crucial information in other forms of documentation, including guides, manuals, tutorials, and training materials that end-users may have used. Most often this type of material will provide background information on the functionality exposed to end-users but may not provide details of how the underlying processes work.

For this part of the analysis, you may actually need to review and document the existing code. This may be difficult, depending on the legacy platform. For example, if data is being migrated from an AS/400 application written in RPG, assistance from an experienced RPG programmer will be required, those skills may not available in-house. This can be an expensive part of the analysis process, because an external resource to perform this analysis may be necessary. However, it is a vital part of the process, especially if the application has been running and maintained over a long period of time, because undocumented code, or fixes, that are critical to the application that have not been documented elsewhere may exist.

Another key area to examine is how the data in the system is stored (i.e., in flat files, files, or tables). What fields are included in those files/tables and what indexes are in use? A detailed analysis of any server processes that are running that may be related to the data must also be performed (e.g., if a nightly process runs across a file and updates it from another system). Bespoke pre parsers may be needed here to sanitise the data into a readable format prior to migration. This is something to consider and Halogence can help with any customised parser work.

Now that the source and target structures are defined, the mapping from the legacy to the target should fall into place fairly easily. Mapping should include documentation that specifically identifies fields from the legacy system mapped to fields in the target system and any necessary conversion or cleansing. This part of the project can be achieved with the HT fusion client. The user will specify the mapping, validation, and transform rules within a single application. In addition to this the HT fusion client can export a report on the governance and mappings set within the client. This allows any data migration project manager to easily "lock" down the requirements to the end customer or user of the data.

Once the analysis and mapping steps are completed, the process of importing the data into the new system must be defined. This process may be a combination of automation and manual processes, it may be completely automated, or it may be completely manual. For example, a process may:

  • Create data extractions from the legacy system
  • Cleanse the data extractions according to mapping guidelines
  • Import the data extractions into the new system using that system's import feature
  • Verify test samples of data within the new system against those within the old system

HT fusion and Halogence consultancy can help simplify data migration processes greatly, which will ultimately reduces time and cost on a data migration project.

Bottom Line. Data migration is a key element to consider when adopting any new or updated system which requires legacy data. Data migration is not a simple task. If not given due consideration early in the process of developing or purchasing a new system it could become a very expensive task.