Client Background
A renowned American Company leads in technology and communication services, providing voice, data, and video solutions through its highly regarded networks and platforms. It meets customers'dependable network connectivity, security, mobility, and control needs.
The organization's responsible business strategy aims to drive economic, social, and environmental progress. The company pioneered commercial 5G for mobility, fixed wireless, and mobile edge computing. Its operating model centers on two customer-oriented domains: Consumer and Business.
.jpg)
Project Outcome
The Telecom Giant experienced a significant reduction in operating costs and a corresponding increase in the number of files to be processed per day. After implementing the custom IoT solution built on Flex83 Middleware, the company scaled from 5000 files per day to a massive 40,000+ files per day. The project goals were achieved at one-third of the expected time of completion.
.jpg)
Introduction
A Fortune 500 company with over 100,000 employees and several global establishments needed help in scaling their application related to processing files consisting of geospatial network signal data and improving their cellular networks. They were required to collect numerous data packets (at milliseconds granularity) through the signals corresponding to each location using their devices or network scanners, creating 500mb-1gb-sized files. It impeded the application's processing capabilities, and they could not fully utilize their hardware resources, negatively influencing the overall operating costs.
To address these issues, the company partnered with IoT83 Ltd. to effectively augment the file processing technique.
Technical Challenges
The company encountered challenges in efficiently scaling the processing of critical data files—such as DML_DLF, DLF, and SIG—collected from tower network scanners and other signal-capturing devices. Despite the large volume of data, including millions of signal readings per file, the existing infrastructure was underutilized, leading to inflated operational costs. To address this, the company aimed to process these high-density files using a Spark-based distributed computing system.
Steps Involved in the Workflow
Stage 1:
Device-generated files in formats such as DML_DLF, DLF, and SIG/Scanner were initially transformed into a spark compatible format. Leveraging Spark and custom parsing logic, each file—often containing millions of records—was converted into a sequence file format. However, due to the intensive processing involved, this stage could only handle 250–300 files per hour.
Stage 2:
The previously generated sequence files were further processed through a robust parsing layer that mapped encoded data fields to meaningful business-specific values. This transformation made the data suitable for advanced geospatial processing in the next phase. The processed data was then stored using Hive, enabling schema definitions and facilitating standardized querying through SQL.
Stage 3:
The analyzer stage involves reading data from Hive tables and executing thousands of business logic operation son each individual record across all files. This stage performs complex aggregations and joins across multiple datasets, ultimately producing an output of approximately 300,000 to 700,000 records for every batch of 30–60 files. The results are indexed and stored in Elasticsearch for efficient querying and analysis.
Challenges in the Workflow
The client could process only 5000–7000 files per day. Here, the average file size was around 250–400 Megabytes, which means the total Amount of data transferred every 24 hours was between 1.2 to 2.8 Terabytes. The company wanted to scale up this number to 40,000 files per day. It required a decent infrastructure, code optimization, and spark Parameters/config tunings that could process 16 Terabytes every 24 hours.
The Solution
The company partnered with IoT83 to significantly scale and enhance its existing Spark-based file processing system. The team at IoT83 leveraged the Flex83 Middleware to accelerate the custom IoT solution development, utilizing pre-built microservices to streamline the process while enabling the client to maintain full ownership of the application IP. This included a comprehensive overhaul, such as codebase refactoring, Spark configuration tuning, and infrastructure optimization. A key architectural change was the recreation of Stage 2 and Stage 3 into a single, more efficient job on the Flex83 Middleware, along with replacing Hive-based intermediate storage with parquet files to streamline later-stage analytics.
One of the major challenges was optimizing the previously complex Stage 3 job, which involved thousands of sequential and parallel operations, including heavy data joins. This resulted in underutilized CPU resources due TOI/O-bound operations, where linear scaling provided diminishing returns. To overcome this, IoT83 introduced a strategy to decompose the new Stage 2 (merged Stage 2 + Stage 3) into smaller, independent spark applications.
Stage 1: File Conversion Using AWS Lambda
To initiate the pipeline, AWS Lambda functions were introduced to convert device-generated files (DML_DLF, DLF,SIG/Scanner formats) into a Spark-readable format (Sequence Files). The serverless architecture allowed for scalable, parallel processing of thousands of files per hour, depending on the configured concurrency limits. This greatly improved ingestion throughput.
Stage 2: Parsing, Transformation & Geospatial Processing
Stage 2 was restructured into two distinct phases:
- Phase 1: Parsed the sequence files, applied business logic mappings to decode field values, and converted them into a format compatible with geospatial processing. The results were written directly to Parquet format for downstream analytics.
- Phase 2: Without requiring another read cycle (unlike the old Stage 3 which reloaded data from Hive), the transformed data in memory was used directly for applying complex geospatial transformations and thousands of business rules.
This optimization eliminated unnecessary I/O overhead and significantly improved overall system performance.
Major improvements in this stage included:
- Upgraded geospatial libraries
- Refactor geospatial transformations with custom parallel processing logic
- Enhanced parallelism and spark configuration tuning
- Optimized join strategies to reduce data shuffling between the nodes over the network, thereby reducing the processing time
- Dynamic tuning of spark parallelism based on incoming data volume
Stage 3: Scalability & Reliability Enhancements
To further increase throughput and resilience:
- The restructured Stage 2 job now runs multiple instances in parallel, effectively eliminating CPU bottlenecks previously caused by I/O constraints. As a result, CPU utilization has significantly improved, reaching 80%–90%, compared to the earlier 40%–50% in the previous implementation.
- Auto-healing mechanisms were added to reprocess failed files.
- A priority queuing system was introduced to handle high-priority files in parallel with standard files.
With this architecture, the platform is now capable of handling not just 40,000 files per day—but any multiple of that, depending on the required scale.
Monitoring & Observability
An analytics portal was also developed, providing comprehensive visibility into every aspect of file processing. It includes:
- Batch detail
- Processing time
- Retry count
- Record count
- Stage-wise progress and timestamps
This granular observability ensures robust monitoring and debugging capabilities.
Obstacles & the Overcoming
.jpg)
Results
After implementing the custom IoT solution built on Flex83 Middleware, the Telecom Giant saw a significant reduction in operating costs and a corresponding increase in the number of file transfers per day. After implementing the solution, the company could scale from 5000 to 40,000+ files daily.
IoT83 was able to understand the foundational aspects of the company's process and was able to develop and implement a strategy to scale by significant numbers. The Flex83 Middleware also allowed to avoid the complex code and introduced quick way of application deployment on any cloud, making it easy to set up for any machine, geography, or use-case.
Conclusion
Developing a custom IoT solution for a process with such a high volume of file processing and record mapping became easy with Flex83, a versatile, innovative, and programmable IoT Middleware that enables you to build and own custom IIoT Solutions in just a few weeks. By partnering with IoT83 Ltd, the global telecom company achieved its business goals with 10x faster TTM, 6x cost savings, and a complete risk mitigation.