Big data analytics for 3PL
A big data solution for a third-party logistics company to streamline and enhance data collection, processing, and storage capabilities
Business context
Our client is a US-based third-party logistics (3PL) company with several headquarters across the US. The company provides its clients with the following services:
Order fulfillment
Inventory management
Freight forwarding
Warehousing and distribution
Our client deals with large chunks of data daily coming from multiple sources including customer logs, emails, reports from supply chain systems (TMSs, WMSs, ERPs), GPS signals, IoT devices, etc. Our task was to convert raw data into information that can support decision-making. In doing so, we faced a number of challenges:
Inefficient data aggregation
There was no properly configured process for consistently aggregating data in real time from multiple sources. A lot of valuable data wasn’t gathered at all or wasn’t gathered fully — for example, some customer profiles didn’t have addresses.
Large percentage of unstructured data
The company collected and stored a lot of data that eventually couldn’t be used due to a lack of data capture, visualization, and analysis tools.
Lack of unified access to data
Data from different systems and company departments was stored in different databases. Logistics operators couldn’t easily find and access necessary reports.
Big data analytics as a solution
To streamline the data aggregation flow, ensure unified access to all gathered data, and properly organize data for further analysis, we implemented a big data analytics solution that consists of four crucial elements:
A data lake to capture data from any sources, store structured and unstructured data at any scale, and ensure sufficient security and privacy
Real-time data extraction for tracking deliveries as well as optimizing time for refueling and vehicle maintenance based on GPS signals and data from IoT devices
Data visualization and analysis for tracking performance, consolidating route planning insights, and keeping track of vital financial indicators
Predictive analytics for predicting seasonal client demand, storage space optimization with forecasts on stock counts, and anticipating possible risks and exceptions in the supply chain to take proactive measures
Software architecture and tech stack
To enable big data analytics technology, we built a layered, component-oriented architecture that ensures separation of concerns, decoupling of tasks, and flexibility. Such an architecture orchestrates the high-volume process of implementing big data in logistics business by dividing key data handling responsibilities among different layers. Plus, thanks to the decoupled nature of such an architecture, we can quickly implement any new data sources and support new supply chain data analytics methods in the future.
We split our architecture into six logical layers:
Ingestion layer
The ingestion layer is responsible for extracting data from internal and external sources to our system. This layer consists of a number of AWS services. For instance, AWS Data Exchange helps us put data received from third-party services (e.g. for checking transportation prices and truck rates) into our system. Kinesis Data Firehose handles loading data streams from IoT devices and MacroPoint directly into AWS products for processing.
Storage layer
This layer stores structured and unstructured data, and all other layers can easily use data from it. The storage layer consists of three zones:
Raw zone with data just received from the ingestion layer
Cleaned zone with data after basic quality checks
Cleaned zone with data ready for use in the consumption layer
We’ve ensured a storage layer in our architecture with the help of Amazon S3. Amazon S3 provides practically unlimited and low-cost scalability for our serverless data lake.
Cataloging and search layer
The cataloging and search layer stores metadata about datasets located in the storage layer. This layer has a central data catalog for managing metadata for all datasets in the data lake. To enable this process, we use the Lake Formation tool. AWS Glue, Amazon EMR, and Amazon Athena can natively integrate with Lake Formation and automate the process of discovering and registering dataset metadata in the Lake Formation catalog.
Processing layer
The processing layer makes data ready for consumption with the help of validation, cleanup, and normalization processes. We used AWS Glue and AWS Step functions for building the processing layer. In particular, AWS Glue helped us build and run ETL (extract, transform, load) jobs written in Python.
Consumption layer
This layer is responsible for visualizing data, providing analysis for business intelligence (BI) dashboards, and enabling machine learning. Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Amazon QuickSight was used as a scalable and serverless BI service for data visualization. Amazon SageMaker enables machine learning, which in turn enables predictive analytics on large-scale data.
What revenue growth can we expect till the end of 2021? (in US dollars)
*This data is fictional
*This data is fictional
Security and governance layer
This layer protects the data in all layers. We ensure data security and governance with the help of Amazon Identity and Access Management, AWS Key Management Service, Amazon Virtual Private Cloud, and other native AWS services.
Results
Optimized route and load planning
Transportation analytics allows mapping more efficient shipping routes with the possibility to analyze such information as weather forecasts, traffic conditions, order frequency, locations with the most and least orders, etc.
Improved decision-making
Our digital solution has simplified the decision-making strategy for company management. Now, managers can make informed decisions such as which new clients to target and which services to expand or improve.
Increased LTV
Customer managers are now receiving more valuable insights about their clients, such as preferred delivery time frames, the most frequently ordered services, and other preferences. This data allows managers to better tackle clients’ issues and concerns and consequently increase LTV.
Predictive stock planning
Predictive analytics and visualization techniques help to increase supply chain efficiency and productivity. Real-time updates of sales figures, stock counts, and order frequency enable logistics managers to make accurate predictions on timely stock replenishment and eventually help the company decrease warehouse operating costs.