5G (acronym of 5th Generation) is the term used to describe the next generation of mobile networks beyond LTE mobile networks. 5G network infrastructure comprises macro and small-cell base stations with edge computing capabilities; in a 5G network, network functions that typically run on hardware become virtualized, working as software.
There are two infrastructure options, standalone infrastructure (SA) and non-standalone infrastructure (NSA). A non-standalone infrastructure is partly based on the existing 4G LTE infrastructure and brings some new technologies such as 5G New Radio (NR). According to Release 15 of the 3GPP standards body of October 2019, the NSA architecture has the 5G RAN and the 5G NR interface working together with the existing LTE infrastructure and the core network. The 5G standard says that while only LTE services are supported, the network has the capabilities offered by 5G NR, such as lower latency.
The standalone infrastructure refers to a 5G network that is not based on LTE networks and has its cloud-native network core that connects to the NR. According to 3GPP release 15, the standalone deployment option consists of user equipment, the RAN – which includes the NR – and the 5G core network. The core 5G network is based on a service-based 5G architecture framework with virtualized network functions.
An analytical database is a specialized database management system that is optimized for business analytics applications and services. An analytical database has built-in features to store, manage, and analyze a large volume of data extremely quickly, provide faster query response times, and is more scalable than standard databases.
These features include columnar databases which organize data in columns to reduce the number of data points to be processed; data warehouse applications which include databasing tools in a single platform; hardware usage (in-memory databases) which use system memory to expedite processing; MPP databases which use multiple server clusters operating simultaneously; and online analytical processing databases which keep data cubes which can be analyzed based on multiple parameters.
Backtesting is a term used in modeling, referring to testing a predictive model on historical data. It is a prediction and a particular type of cross-validation applied to a previous timeframe. For example, in a business strategy, investment strategy, or (financial) risk model, backtesting seeks to estimate a strategy's or model's performance during a previous period. This requires a simulation of the conditions above with sufficient detail. This is the first limitation of backtesting: it requires detailed, reliable historical data. Secondly, there is a limit to modeling strategies; they are not to influence historical prices. Finally, backtesting, like other models, is limited by possible over-adjustment. Despite these limitations, backtesting provides information unavailable when models and strategies are tested with synthetic data.
The first step in backtesting is to select the threshold values within a period covered by the historical data. Then, historical data are truncated at the threshold for each threshold value. Next, the forecast model is trained and applied to truncated data. The forecasts thus obtained are compared with the complete original data. Finally, an average forecast error is established for all thresholds. This error can be read as an estimate of the misconception associated with the model when making forecasts (for future data). Choosing the most appropriate threshold values requires a minimum of knowledge. As a rule, increasing the number of threshold values improves resistance to overfitting problems. In stock optimization, since there are hundreds of SKUs to analyze, a few threshold values are enough to determine whether one forecast method is better than the others.
A Big Data architecture is the basis for Big Data Analysis, and it is developed to manage the input, processing, and analysis of data that are too big or complex to be handled by traditional DB systems. Typically, Big Data solutions encompass one or more of the following working load types: batch processing of inactive Big Data sources; real-time processing of Big Data on the go; the interactive exploration of Big Data and predictive Analysis and Machine Learning.
Most architectures for Big Data include some or all of the following components:
Business analytics is a field that drives practical, data-driven changes in a business by using different processes and methodologies such as data mining, predictive analysis, and statistical analysis to analyze and transform raw data into useful insights, identify and anticipate trends and results, and measure past performance to guide an organization’s business strategy. Business Analytics can also be broken down into multiple components:
Business forecasting is the method employed by companies to make predictions or projections of their future economic conditions, such as sales, potential revenues, spending, by using analytics, data, insights, and experience. Business Forecasting helps to automate and optimize business processes and develop better business strategies. Two approaches can be used to identify patterns and make accurate predictions to drive better decision-making: a qualitative approach and a quantitative approach.
Business Intelligence describes a set of processes, technologies, and practices for gathering, storing, analyzing, and interpreting raw data from internal and external sources to convert it into meaningful information that organizations can use to make more tactical and strategic decisions. Business intelligence tools are a suite of software and services used to process (access and analyze) data sets and present the resulting analytical findings. This typically in the form of reports, comprehensive summaries, and visuals (dashboards, graphs, charts, and maps). The objective is to provide users with detailed, self-explanatory intelligence about the state of the business, like a cockpit in a plane.
Capacity Management is the process of IT monitoring, administration, and planning actions taken to ensure that the IT capacity can handle the data processing requirements and the continuous provision of a consistent and acceptable service level at a known and controlled cost. The capacity management process covers the operating and development environment, including hardware, network equipment, peripherals, software, and human resources. Capacity management assures that IT and resources are planned and programmed to provide a consistent service level appropriate to the company’s current and future needs.
The objectives of capacity management are to:
A Columnar Database is a database management system that stores table data as sections of data columns, compared to most relational databases that store data in rows. This has advantages in data warehouses, customer relationship management (CRM) systems, bibliographic card catalogs, and other ad hoc systems, where aggregates are calculated on a large volume of similar data. A columnar database refers to both a column-oriented structure and a focus on optimization for column-oriented workloads.
This approach contrasts with row-oriented databases or databases stored per row and related databases, which use a value-based storage structure. Columnar databases are designed to return data for a limited number of columns efficiently. All values of a column are stored together. A columnar database excels at reading operations on a limited number of columns. The columnar database systems allow the optimization of the performance of analysis queries by drastically reducing the overall disk I/O requirements and the amount of data to be loaded from the disk.
CPU and GPU are processor units. A CPU (Central Processing Unit) is considered as the brain of the computer and consists of ALU (Arithmetic Logic Unit), which stores the information, performs calculations, and CU (Control Unit), which in turn is in charge of instructions and branching. GPUs (Graphical Processing Unit) are typically used to enhance images and process videos on computers due to the high performance on specific tasks. It mainly consists of ALUs. The main difference lies in the architecture and its purpose. A CPU is designed for a wide variety of workloads, and it focuses on a smaller number of cores on individual tasks and on getting things done quickly. A GPU is the powerful sibling created for jobs that require high performance (initially developed to accelerate specific 3D rendering tasks, but it has evolved to become more general-purpose parallel processors, handling a growing range of applications).
Data Anonymization refers to the process used to protect private or sensitive information by erasing or encrypting identifiers that connect individuals to stored data. Data anonymization aims to protect the confidential activities of an individual or company while maintaining the integrity of the data collected and shared. Data anonymization is carried out by most sectors that deal with sensitive information such as healthcare, finance, and digital media while promoting data sharing integrity. Data anonymization minimizes the risk of unintentional disclosure when data is shared between countries, industries, and even departments in the same company. It also reduces the chances of identity fraud.
DBMS (database management system) refers to a software application used to access, create, and manage databases. Organizations use a large amount of data. DBMS makes it possible to organize data in a database, store and transform them into valuable information, and support making strategic decisions. The main functions provided by DBMS are:
Data Cleansing refers to the process of modifying data to ensure that it is free of irrelevances and incorrect information and to guarantee, with a certain level of reliability, the accuracy of a large volume of data (database, data warehouse, dataset, etc.). This term has been used in the past to define filtering on the basis of data mining. The process precedes the actual extraction (mining) of a potentially useful and previously unknown amount of information to produce knowledge. When acquiring data, the usage of the cleansing process guarantees a higher level of data quality. A data cleansing system must meet the qualitative criteria:
The following activities are typical for the data cleansing process:
Data Integration is a process that uses both technical and business processes of combining data from different sources into a single, unified location, e.g., a data warehouse. Data Integration includes some common elements as a network of data sources, a master server, and a client accessing data from the master server. This process is often a prerequisite of other processes like analysis, reporting, and forecasting. Data Integration allows managing data more efficiently and, by centralizing all data, provides easier access to data for those who need it. Automated updates enable reports to synchronize and run efficiently in real-time whenever needed, reducing errors and rework. In an organized and centralized system, issues are automatically identified, and improvements are applied, resulting in increased quality of business data, providing more accurate data and analysis results.
Data Intelligence is the practice of employing artificial intelligence and machine learning tools to analyze and convert massive datasets into valuable insights to allow businesses to make better strategic decisions for future developments. Data intelligence techniques include
Data Intelligence five main components:
Organizations can leverage data intelligence to adapt more rapidly to industry trends. By monitoring the analytics that data intelligence provides, they gain insights about patterns, changes, and trends that allow them to develop ideas and directions based on that valuable information. Using big data and AI, data intelligence provides structure to the management and allocation of that data. Besides, it is the leading actor in data transformation, as it transforms a massive amount of data into experienced-based and constantly growing information.
Data Management describes collecting, keeping, and using data securely, efficiently, and cost-effectively. Data management is developing, executing, and supervising projects, policies, programs, and practices that control, protect, transport, and increase the value of data and information resources. Businesses must handle large amounts of data from non-heterogeneous databases and sources. Data management provides access to heterogeneous information from a central authority to manage effective business strategies based on real insight. Data management work has a broad scope with several main activities, such as:
Creating, accessing, and updating data in a different data layer;
Data storage on several levels;
Provide high availability to companies;
Using data in a growing variety of applications, analyses, and algorithms;
Ensure confidentiality and data security;
Data storage and destruction according to retention programs and compliance requirements.
Data Validation ensures that data has been cleansed to guarantee its quality. It employs routines, often called “validation rules,” “validation constraints,” or “control routines,” which check the accuracy, meaningfulness, and reliability of the data being entered into the system. The rules can be implemented through the automated structures of a data dictionary or the inclusion of an express validating logic for the computer’s application program and its application.
Data validation has been recognized as an essential part of any data management operation. Data must be verified and validated before using it to avoid inaccurate results. It is a vital part of the workflow as it allows optimal results creation.
Validation of the data's accuracy, transparency, and detail is essential to minimize any project defects. If data validation is not performed, decisions based on data can show imperfections and inaccuracy, not representing the current situation. In addition to verifying data inputs and values, it is required to validate the data model. An unstructured or correctly constructed data model causes problems using various applications and software data files. Validation rules to clean data before use helps mitigate “garbage in = garbage out” scenarios. Ensuring data integrity guarantee the legitimacy of the conclusions.
Extrapolation is a methodology that involves making statistical forecasts using historical trends projected for a specified period of time into the future. This is a type of estimation of the value of a variable based on its relationship with another variable. In that sense, it resembles interpolation, which does produce estimates between known observations; however, extrapolation is subject to higher uncertainty and a higher risk of producing meaningless results.
Extrapolation can also mean the extension of a method, assuming that similar processes are applicable. It can apply to human experience to project, extend or expand the known experience into an unknown or previously experienced area to gain (usual conjecture) knowledge of the unknown. The extrapolation method can be applied to the problem of internal reconstruction.
Geocoding refers to transforming geographical-administrative location data into geographical coordinates and points. Geocoding is based on recognizing an address in a specific database and allows an initial identification of the asset on the territory. Geocoding includes two main components: the reference dataset (the underlying geographic database containing geographic features that a geocoder uses to generate a geographic output) and the geocoding algorithm. The process generally begins with the input data stored in the database. Subsequently, those data are classified into Relative or Absolute input data. However, only Absolute input data can be geocoded and transformed into a list of coordinates.
These coordinates are compelling business information and can be helpful in several fields. For example, this information allows businesses to recognize geographical patterns to develop targeted marketing strategies for specific customers using data management on their geographical location. It is also helpful in analyzing address data, monitoring the population growth in a particular area, and better planning events and future projects.
Geodata is location information stored in a format that can be used with a geographic information system (GIS). There are different geodata types:
vector data, consisting of vertices and paths (three basic types: points, lines, and polygons);
Raster data consists of pixels or grid cells. Commonly they are square and regularly spaced, but they also can be rectangular;
geographic database whose purpose is to host vectors and raster;
web files (internet type of storage and access to geodata);
multitemporal data links a temporal component to information but also includes a geographical feature.
Technologies that can be used to gather geographical data are Global Positioning System (GPS) data, telematics devices, geospatial satellite images, Internet of Things (IoT), and geotagging.
Geospatial Analysis is the process of using time and position information within traditional data analysis processes. It includes collecting, displaying, and manipulating Geographic Information System (GIS) data such as images, satellite photographs, historical information, etc.
Geospatial analytics uses geographical coordinates, i.e., longitude and latitude, postal codes, street addresses, and other identifiers, to create geographical methods. These models include graphs, statistics, maps, charts, and data views that allow for building more understandable complex relationships. Geospatial Analysis will enable businesses to simultaneously analyze a large amount of data, knowing what is happening at different places and times and enabling more effective decisions and more accurate results. Maps allow recognizing the models previously indicated in the spreadsheets as contiguity, proximity, affiliation, and distance. Businesses can gather information from different locations in real-time, using tools like the Internet of Things (IoT), mobile devices, social media, and position sensors. By including time and location analysis, trends can be understood in a geographical or linear context. Meaning that forecasts can be made at a given site and at a given time in the future.
Geospatial Intelligence has been defined in U.S. Code Title 10, §467: “The term geospatial intelligence means the exploitation and analysis of imagery and geospatial information to describe, assess, and visually depict physical features and geographically referenced activities on the earth. Therefore, geospatial intelligence consists of imagery, imagery intelligence, and geospatial information”.
In practice, geospatial intelligence refers to a discipline that includes extracting and analyzing geospatial images and information to describe, evaluate, and visually represent physical characteristics and geographically related activities on earth.
Geospatial Intelligence combines different fields, such as mapping, cartography, imagery analysis, and imagery intelligence. In addition to its use in a military context, many organizations in sectors such as telecommunication, smart cities, retail, municipalities, transportation, public health and safety and real estate use geospatial intelligence to improve or optimize everyday life quality.
The main principle of geospatial intelligence is to organize and combine all available data around its geographical location on earth and then leverage it to develop products that planners and decision-makers can use.
GIS stands for “Geographical Information System.” It refers to a software system that allows to acquire, analyze, visualize, and share information derived from geographic data and represent what occurs in it. The technology behind GIS integrates the features of a database that allows performing searches, storing data, and drawing graphs with a map that provides spatial data and geographical representations. Thus, GIS software can handle a large amount of geo-referenced information. These data can be expressed through maps or tables and referred to extensive portions of territory as needed. GIS has many differences from other IT systems, as it offers infinite possibilities of use for all needs related to geographical components. From geo-location of objects to the study of landscape evolution, GIS allows detailed and complex planning of the territory and the actions to be performed on it.
GPGPU is an acronym for General-Purpose computing on Graphics Processing Units. In IT, it refers to the use of a graphics processing unit (GPU) for additional purposes than traditional use in computer graphics. The GPGPU is used for processing extremely demanding in terms of processing power and for which conventional CPU architectures cannot provide sufficient processing capacity. Due to their nature, these processes are highly parallel and capable of widely benefitting from the typical architecture of the GPU. In addition, this architecture has evolved, offering extreme programmability and increased processing power and versatility.
A GPU database is a relational or non-relational database that uses a GPU (graphics processing unit) to execute specific database operations. Databases on GPUs are usually fast. As a result, GPU databases are more flexible in processing many different data types or massive amounts of data. The GPU database allows the application of the GPU’s processing power to analyze massive amounts of information and quickly return results.
GPU Rendering allows using the graphics card for rendering instead of the CPU. This accelerates the rendering process as modern GPUs leverage higher processing power.
GPU and CPU process data similarly, but a GPU focuses on parallel processing. In contrast with CPU technology, GPUs are designed to process instructions simultaneously on multiple cores. For example, GPU Rendering takes a single set of instructions and runs them on numerous cores (32 to hundreds) over various data. Compared to a CPU that can simultaneously work on about 24 blocks of data, GPU can handle about 3000 blocks of data.
GPU-accelerated Analytics involves a set of applications that exploit the massive parallelism of a graphics processing unit (GPU) to accelerate compute-intensive operations for data science, deep learning, machine learning, and other large-scale applications.
Information Visualization studies visual (interactive) representations of abstract data. Abstract data includes numerical and non-numerical data, such as text and geographic information. It is critical in scientific research, digital libraries, data mining, financial data analysis, market research, etc.
Information visualization assumes that visual representation and interaction techniques take advantage of the path’s width from the human eye to the mind to allow users to simultaneously see, explore, and understand large amounts of information. Information visualization focuses on the study of approaches to communicate abstract information in intuitive ways.
Dashboards and scatter diagrams are common examples of information visualization. By representing an overview and the visualization of relevant connections, the visualization of information allows users to extract insights from abstract data efficiently and effectively.
Information visualization is essential in making data accessible and transforming raw data into usable information. It is drawn from the fields of human-machine interaction, visual design, computer science, and cognitive science. Examples include world map-style representations, line graphs, and methods of virtual buildings in 3D or urban plans.
Interpolation is a statistical method used to estimate values of an unknown function f(x) for specific subjects x in a specific range [a, b] when a number of observed values f(x) are available within that range. Interpolation is an estimation method of constructing new data points within the scope of a discrete set of known data points. Interpolation helps to determine the data points among those indicated. Interpolation is required to calculate the value of a function for an intermediate value of the independent function. This is a process of determining unknown values between known data points. It is mainly used to predict unknown values for related geographical data points, such as noise level, precipitation, altitude, etc.
LAN stands for Local Area Network. It refers to networks with limited spatial extension. LANs are usually used in private or business premises to configure home or business networks. It supports communication between different devices and the exchange of data.
LAN consists of at least two terminals but can also connect several thousand devices. LAN can connect computers, smartphones, printers, scanners, storage devices, servers, and other network devices to each other and connect them to the Internet. However, if wider spatial distances must be covered, MAN and WAN networks are more suitable.
By now, most LANs are made via Ethernet cables. An Ethernet LAN can be divided into several virtual (VLAN) or physical LANs. Switches and routers are used to structure Local Area Networks. As an interface, the hardware controls the connections between individual network users and ensures that data patches reach their destination.
Location Intelligence allows the collection, analysis, and organizing of spatial data using various Geographical Information Systems (GIS) tools. This process transforms large amounts of data into color-coded visual representations that enable an easier understanding of trends and generate meaningful insights.
Location or spatial intelligence can also be defined as an extension of traditional Business Intelligence (BI). It refers to a process of deriving meaningful insights from geospatial data, organizing, and understanding the technologies, applications, and practices that allow the relation of GIS's spatial data with business data processed by BI applications. The insights acquired can be harnessed from organizations to understand better spatial patterns, consumers’ behaviors, interests, and preferences and make more effective decisions.
Real-time Analytics refers to the analysis of big data, including technologies and processes used to measure, manage and analyze data in real-time as soon as it enters the system, thus allowing organizations to visualize and understand the data immediately.
Real-time analytics in business contexts allows organizations to obtain insights immediately, operate on them, understand customer needs, and prevent potential issues. As a result, businesses can leverage the power of real-time analysis and big data to optimize internal operations, improve workflows, support sales, and apply more effective marketing strategies. These tools also provide real-time insight into customer behavior and market trends, enabling immediate response and staying ahead of the competition.
SQL, the acronym of Structure Query Language, can be defined as a server that allows managing databases based on a relational model containing specific information. SQL engine is a program that identifies and enables SQL commands to enter a relational DB and process data.
Spatiotemporal Data Analysis is a new research area resulting from the development and application of new computational techniques. It allows the analysis of massive space-time databases. Data that includes at least one spatial and one temporal property lead to spatiotemporal models. An event in a space-time data set represents a spatial and temporal circumstance at a specific time (T) and a specific position (X).
Visual Analytics describes the science of analytical reasoning supported by interactive graphical interfaces. Despite different methods of automated data analysis, visual analytics can provide a more effective response in a context where data is produced at an increasing rate and the ability to collect and store it is higher than the ability to analyze it. It is also essential to include it to approach the complex nature of the many problems early in the data analysis process.
Visual analysis methods allow decision-makers to combine their human ﬂexibility, creativity, and basic knowledge with today's computers' massive storage and processing capabilities to get an overview of complex problems. Using advanced visual interfaces, users can interact directly with data analysis capabilities, making well-informed decisions in complex situations. Visual Analytics combines business intelligence and analysis tools in a single system representing reality in a “data-driven” format. Thus, millions of data are analyzed in a few seconds and displayed in a graphical interface, logically and not in a pre-constituted format.