# Data Analysis and Data Processing

To create value from data is a challenge. To tackle these challenges, you need a variety of skills as shown in the following figure:

**Computer science:** We need this knowledge for efficient data processing.

**Artificial intelligence and machine learning**: These help us to model the data and learn from it in order to provide smart software solutions.

**Statistics and mathematics**: Advanced statistical techniques and higher mathematics are essential to extract useful information from raw data.

**Knowledge domain:** It is also important to have an insight into the specific domain the data come from. Based on the expertise we possess or acquire, we explore and analyse raw data by applying the above skills.

**Data analysis**

**Data requirements gathering**: We must define what kind of data will be collected based on your specific requirements.

**Data collection**: Data may be collected from a variety of sources and it may also be obtained in different ways. The problem is how we can find and gather it to solve your problem.

**Data processing**: The gathered data have to be organized for analysis and must be processed. The techniques to create, insert, update, or query data must be carefully designed so that the task could be actually performed. A large amount of data is not easy to handle.

**Data cleaning**: After being processed and organized, the data may still contain duplicates or syntactic or semantic errors. We need a cleaning step in order to increase the quality of the results. This step may require a lot of work.

**Exploratory data analysis:** This is when we start analysing your data. We may detect additional problems with data cleaning or we could discover we’ll need further data. Complex data visualization techniques may be used to examine the data since visualization is often vital to understanding data sets.

**Modelling and algorithms**: Mathematical formulas and algorithms may be applied to detect or predict useful knowledge from your raw data. We might use linear regression models or classic time series models. Or we may use similarity measures to cluster data in order to find, say, customers who exhibit similar behaviour.

**Data product**: Eventually we provide a software solution to take input data and to generate output data which will solve your problem on an “as needed basis”. This step needs computer science knowledge to implement the selected algorithms as well as to manage the data storage and processing.