This phase involves analysing and profiling the potential source systems.
Data analysis (also called Data Discovery) involves:
- Analysing and documenting the data structures (databases, tables, columns, primary keys, foreign keys) - Note there are tools that can automate this activity.
- Identifying data volumes
- Identifying Data Custodians, Data Stewards and Data Owners
- Identifying data standards and data governance approach.
- Identifying the Infrastructure and technologies which the system uses
- Identifying software which used such as COTS solution, Data Management tools and custom software.
- Analysing the data being passed to and from interfacing systems
- Identifying the Source of Truth for data elements
- Identifying peak usage times
- Identifying times when backups and interfaces are run (to identify a window to run regular data extracts and potential access issues during trial migrations and at go-live)
- Identifying any data retention requirements
- Analysing the archiving approach and the location of historical data
- Identifying Potential Data Migration Issues, Risks, Constraints and Dependencies.
Data Profiling (also called Data Quality Analysis) involves analysing the quality of the data by applying the applicable Data Quality Dimensions:
- Correctness
- Validity
- Duplication
- Consistency
- Non-standard values
- Obsolete Data
- Timeliness
- Completeness
- Missing values
- Integrity
- Precision
A Data Analysis and Data Profiling report should be done for each source system. Generally a single report is done for each source system. Alternately a separate Data Analysis Report and Data Profiling Report can be done for each system. In order to keep the explanation simple separate reports will be assumed.
Links: