<iframe src="https://www.googletagmanager.com/ns.html?id=GTM-WKXBVFF" height="0" width="0" style="display:none;visibility:hidden"></iframe>

Big data analytics for 3PL

A big data solution for a third-party logistics company to streamline and enhance data collection, processing, and storage capabilities

Business context

Our client is a US-based third-party logistics (3PL) company with several headquarters across the US. The company provides its clients with the following services:

  • Order fulfillment

  • Inventory management

  • Freight forwarding

  • Warehousing and distribution

Our client deals with large chunks of data daily coming from multiple sources including customer logs, emails, reports from supply chain systems (TMSs, WMSs, ERPs), GPS signals, IoT devices, etc. Our task was to convert raw data into information that can support decision-making. In doing so, we faced a number of challenges:

Inefficient data aggregation

There was no properly configured process for consistently aggregating data in real time from multiple sources. A lot of valuable data wasn’t gathered at all or wasn’t gathered fully — for example, some customer profiles didn’t have addresses.

Large percentage of unstructured data

The company collected and stored a lot of data that eventually couldn’t be used due to a lack of data capture, visualization, and analysis tools.

Lack of unified access to data

Data from different systems and company departments was stored in different databases. Logistics operators couldn’t easily find and access necessary reports.

Big data analytics as a solution

To streamline the data aggregation flow, ensure unified access to all gathered data, and properly organize data for further analysis, we implemented a big data analytics solution that consists of four crucial elements:

A data lake to capture data from any sources, store structured and unstructured data at any scale, and ensure sufficient security and privacy

Real-time data extraction for tracking deliveries as well as optimizing time for refueling and vehicle maintenance based on GPS signals and data from IoT devices

Data visualization and analysis for tracking performance, consolidating route planning insights, and keeping track of vital financial indicators

Predictive analytics for predicting seasonal client demand, storage space optimization with forecasts on stock counts, and anticipating possible risks and exceptions in the supply chain to take proactive measures

Software architecture and tech stack

To enable big data analytics technology, we built a layered, component-oriented architecture that ensures separation of concerns, decoupling of tasks, and flexibility. Such an architecture orchestrates the high-volume process of implementing big data in logistics business by dividing key data handling responsibilities among different layers. Plus, thanks to the decoupled nature of such an architecture, we can quickly implement any new data sources and support new supply chain data analytics methods in the future.

We split our architecture into six logical layers:

Ingestion layer

The ingestion layer is responsible for extracting data from internal and external sources to our system. This layer consists of a number of AWS services. For instance, AWS Data Exchange helps us put data received from third-party services (e.g. for checking transportation prices and truck rates) into our system. Kinesis Data Firehose handles loading data streams from IoT devices and MacroPoint directly into AWS products for processing.

Storage layer

This layer stores structured and unstructured data, and all other layers can easily use data from it. The storage layer consists of three zones:

  • Raw zone with data just received from the ingestion layer

  • Cleaned zone with data after basic quality checks

  • Cleaned zone with data ready for use in the consumption layer

We’ve ensured a storage layer in our architecture with the help of Amazon S3. Amazon S3 provides practically unlimited and low-cost scalability for our serverless data lake.

Cataloging and search layer

The cataloging and search layer stores metadata about datasets located in the storage layer. This layer has a central data catalog for managing metadata for all datasets in the data lake. To enable this process, we use the Lake Formation tool. AWS Glue, Amazon EMR, and Amazon Athena can natively integrate with Lake Formation and automate the process of discovering and registering dataset metadata in the Lake Formation catalog.

Processing layer

The processing layer makes data ready for consumption with the help of validation, cleanup, and normalization processes. We used AWS Glue and AWS Step functions for building the processing layer. In particular, AWS Glue helped us build and run ETL (extract, transform, load) jobs written in Python.

Consumption layer

This layer is responsible for visualizing data, providing analysis for business intelligence (BI) dashboards, and enabling machine learning. Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Amazon QuickSight was used as a scalable and serverless BI service for data visualization. Amazon SageMaker enables machine learning, which in turn enables predictive analytics on large-scale data.

What revenue growth can we expect till the end of 2021? (in US dollars)

*This data is fictional

*This data is fictional

Security and governance layer

This layer protects the data in all layers. We ensure data security and governance with the help of Amazon Identity and Access Management, AWS Key Management Service, Amazon Virtual Private Cloud, and other native AWS services.


Optimized route and load planning

Transportation analytics allows mapping more efficient shipping routes with the possibility to analyze such information as weather forecasts, traffic conditions, order frequency, locations with the most and least orders, etc.

Improved decision-making

Our digital solution has simplified the decision-making strategy for company management. Now, managers can make informed decisions such as which new clients to target and which services to expand or improve.

Increased LTV

Customer managers are now receiving more valuable insights about their clients, such as preferred delivery time frames, the most frequently ordered services, and other preferences. This data allows managers to better tackle clients’ issues and concerns and consequently increase LTV.

Predictive stock planning

Predictive analytics and visualization techniques help to increase supply chain efficiency and productivity. Real-time updates of sales figures, stock counts, and order frequency enable logistics managers to make accurate predictions on timely stock replenishment and eventually help the company decrease warehouse operating costs.