Data Cleansing refers to the process of modifying data to ensure that it is free of irrelevances and incorrect information and to guarantee, with a certain level of reliability, the accuracy of a large volume of data (database, data warehouse, dataset, etc.). This term has been used in the past to define filtering on the basis of data mining. The process precedes the actual extraction (mining) of a potentially useful and previously unknown amount of information to produce knowledge. When acquiring data, the usage of the cleansing process guarantees a higher level of data quality. A data cleansing system must meet the qualitative criteria:
The following activities are typical for the data cleansing process: