In the financial services industry, "Big Data" is one of the most frequently used buzzwords in addition to "Cloud", "IoT", "Open Banking", and "Machine Learning", but like the others, a precise definition is not so easy to come by.
Especially since "Big Data" is often used interchangeably with customer analytics, real-time analytics, and predictive analytics.
The general consensus is that "Big Data" is the collective term used for the contemporary methodologies and technologies used to collect, organize, process and analyse large, diverse (structured and unstructured) and complex sets of data, while "customer / real-time / predictive analytics" mainly refers to specific types of analyses done on these data sets to find patterns and create business value. However, since the ultimate business goal of Big Data is not the data itself, but to get business insights into the data, the analytics part of the chain is the most visible and important for a business user, which explains why the terms are often interchanged.
According to a study of IBM in 2015, it is estimated that every day we create 2.5 quintillion (1018) bytes of data and that 90% of the data in the world today has been created in the last 2 years.
In recent years, Big Data has grown exponentially and will continue to grow in coming years, especially due to the adoption of mobile technologies and IoT.
It is hard to overestimate the impact of Big Data on the financial services sector, as it is probably the most data-intensive sector in the global economy.
Despite the immense amount of customer data banks have (i.e. deposits/withdrawals at ATMs, purchases at points-of-sale, payments made online, customer profiles for KYC, etc. ), they are not very good at utilizing these rich data sets.
Financial services have invested heavily for more than a decade in data collection and processing technologies (such as data warehouses and Business Intelligence) and have been among the first to adopt Big Data technologies.
Because of the changing expectations of customers and the increased competition of Fintech players, the financial services sector simply cannot ignore the opportunity to make use of those vast amounts of data. To gain a competitive advantage, banks and insurers should instead leverage existing (and new) data sets to better understand their customers.
Big Data techniques are already being used by several players in the market, but many organizations are still lagging behind.
The recent rise of Big Data has been driven by several factors, which work together to increase the amount of data and the need to manage it:
Digital interactions between banks and insurers are reducing personal interaction, but also much more data can be collected about the customer (e.g. his browsing history, geolocation data from his mobile phone, the exact timing of the interactions, etc.) than when the customer visits a branch. To compensate for the loss of personal interaction, this data should be used to boost customer engagement.
Customers are increasingly using social media: Where the use of these media used to be limited to closed private circles of friends, customers now use them more and more frequently in their everyday lives, such as interacting with companies. Consequently, banks and insurers should use these channels more often to offer services to customers and to gather insights about them.
More and more, customers expect a seamless, low-friction, 24/7, customer-centric experience across multiple channels. It is necessary to have a comprehensive understanding of the customer in order to provide such a personalised service. Using Big Data techniques, this can only be accomplished by leveraging all the available customer data.
Increasing IoT (Internet of Things) usage will result in continuous streams of customer data, even if the customer does not interact with the bank or insurer.
The amount of data to be processed in near real-time will also increase considerably with the introduction of advanced authentication technologies, such as biometrics and continuous authentication (e.g., mouse movements and keyboard rhythms or accelerometer and gyro sensor readings on mobile phones).
With the advent of Open Architectures (Open APIs), banks and insurers can now collect data about their customers from their competitors.
Competition of Fintech players using already Big Data techniques for new financial services. For example, Fintechs have already been able to turn Big Data into new compelling customer services through robo-advisors, offering automated digital investment advice based on customer profiles. Unless banks can deliver quickly similar services, they are likely to lose considerable business to these Fintech companies.
Banks have been forced to disclose a variety of data to regulators and central banks because of new regulations (Basel III, FRTB, MiFID II, AML/KYC, FATCA...). Furthermore, the fines associated when not complying to these regulations are climbing. As a result, banks are obligated to collect more and more data in a controlled way so that the necessary regulatory reporting can be generated automatically, but also so that regulatory inquiries can be conducted on an ad-hoc basis.
With fraud and financial crimes increasing, banks need to protect their most valuable asset, namely the "trust" that customer give to their bank. Through different security techniques, this increases the pressure to further protect the interaction channels and the customer data. A promising approach is risk-based authentication, in which a fraud-detection engine calculates a risk profile for each channel request, determining the required level of security (authentication). Customer analytics are used in this fraud detection engine to identify irregularities in user behaviour.
Due to the increased competition and low interest rates, profit margins in the financial services industry are plummeting. In order to reduce operational costs, banks and insurers must improve business efficiency. The insights gained from Big Data can be used to drive many of these efficiency gains.
With data sets growing so large and complex, traditional tools are no longer able to process this data at sufficiently low cost and in reasonable time. Thankfully, new technologies offer a solution to this issue, allowing for the rapid processing of these data sets at a lower cost, i.e.
Event streaming: Streaming of large volumes of events in real-time.
NoSQL databases: Databases which allow to store and retrieve data in a much more scalable and flexible way than the traditional relational databases (RDBMS).
In-memory data stores: Data structure which resides entirely in RAM and is distributed among multiple servers.
Distributed processing (= distributed computing): Using a network of computers to split up a task in smaller tasks, which are executed in parallel over the different computers (often commodity hardware), after which the result is aggregated.
Machine-learning: Give computers the ability to learn without explicitly being programmed.
Advanced data visualization: Tools to assist users in visualizing in a user-friendly way the large amounts of data and the insights derived from it (e.g. through bubble charts, word clouds, geospatial heat maps…?).
Cloud solutions: Cloud solutions offer a cheap and flexible (i.e. elastic scalability) infrastructure (but also higher-level services) to support these Big Data technologies.
Big data is characterised by the 3 V’s, i.e.
Volume: A vast quantity of data (i.e. terabytes or petabytes) to be handled. The sheer amount of data makes it impossible to process with traditional data processing tools in a reasonable amount of time.
Velocity: Big Data technologies should be able to process both batch and real-time data. In the case of real-time data, quick analysis for generating (near) real-time insights is a necessity.
Variety: Multiple types of data should be supported, including highly structured data and unstructured information such as blogs, tweets, Facebook status updates.