EDI Import Processing – A Shipping Industry Case

      • Requirement

  1. EDI files contain large amount of container data and this needs to be imported into application databases.
  2. Bulk import of data should be done as a background process for efficient performance.
  3. Importing of EDI data to the application DB should be done pretty fast
  4. The user should be notified on every batch processed, how the import process is going on, how much it is completed (error and processed)

Concepts in Asynchronous process:

Concurrency vs Parallelism:

Concurrency means that two or more tasks are making progress even though they might not be executing simultaneously. This can for example be realized with time slicing where parts of tasks are executed sequentially and mixed with parts of other tasks.

Parallelism on the other hand means, execution is truly simultaneous.

Blocking vs Non Blocking –

Blocking: if the delay in one thread can indefeinitely delay some of the other threads. A good example is a resource which can be used exclusively by one thread using mutual exclusion. If a thread holds on to the resource for long time (for example waiting for data from external system) other threads waiting on the resource can not progress. IO Operations performed are blocking process.

In contrast, non-blocking means that no thread is able to indefinitely delay others.

Non-blocking operations are preferred to blocking ones, as the overall progress of the system is not trivially guaranteed when it contains blocking operations

      • Solutions Analysis

        1. Java Concurrent Libraries

Since Java 5 – Thread Pools, Future and Promise provide a good handle for performing tasks asynchronously, multi core CPU can be utilised to do the tasks in parallel. Use a simple Worker thread model, create a bunch of threads and have master thread to assign different containers to process it.

Currently this model is used in N4 EDI Upload process.

Pros:

  • Uses JVM Threads

Cons:

  • Thread management is a difficult process – Thread join, notify, creation has to be a manual process
  • Developer has to maintain the thread behaviour – Deadlock, thread synchronization
  • No process supervision

    2. Java 8 Parallel Streams

Introduced as part of Stream API in Lamda Architecture. Parallel streams create a number of CPU core + 1 threads and process the data asynchronously. This can be applied on any Java Collection Stream API (for ex: for). Uses fork/join framework.

Pros:

  • very simple implementation
  • No thread management by the developers

Cons:

  • good for immutable data only, no shared state
  • containers can be processed parallel, but the update of the EDI process table will become synchronous operation, there is no way you can process the partial data in the next stage (considering insert to unit use is first stage and update process status is second stage)
  • No proper supervision strategy, if any one thread hangs it holds the entire process
  • Executors to be defined diligently, otherwise all the thread process will be done by one executor can slow down the entire process

    3. Spring Batch

Part of the spring.io family, provides a lightweight batch framework. Has the option to schedule a job. Ideal for managing huge volume of Jobs to be processed in different nodes, takes care of scheduling the job, maintaining who executes it and ensures job is executed by a node in the cluster.

Pros:

  • Good for Batch processing workflow designs
  • Individual work to be carried out is created as Job and executed parallel

Cons:

  • for EDI processing, it is going to be some list of jobs and these jobs to be processed asynchronously
  • EDI Processing doesn’t involve too complex batch processing workflows

    4. Akka

Uses fork/join thread framework for effectively processing the messages. Works based on message queues(which takes the data for processing), Actors (who performs the task with the help of workers). Akka has a well defined thread scheduling algorithm which helps to compute the data faster and achieve maximum concurrency.

Pros:

  • Part of Play framework, no extra libraries to be added for EDI processing
  • Have proven records of processing 10 million messages non blocking operations in 23 sec in a Quad Core processor (http://www.akkaessentials.in/2012/03/processing-10-million-messages-with.html)
  • Works based on the Reactive manifesto (http://www.reactivemanifesto.org)
  • No thread level programming, all the programming stuff includes messages and actors. These are boilerplate code to build the system. Business logic will not have any Akka implementation details.
  • provides self heal process to reprocess a data, if processing thread is died somehow
  • Location Transparency
    • Everything in Akka is designed to work in a distributed environment: all interactions of actors use pure message passing and everything is asynchronous

Cons:

  • Learning curve for the developers
  • Different patterns and strategies are available, evaluation to be done to pick the right choice
  • Long term options for cluster deployment

     Proposed Solution – Overview

Let us consider a solution using the Akka framework for EDI import processing.

Akka actors provide:

  • Simple and high-level abstractions for concurrency and parallelism.
  • Asynchronous, non-blocking and high performance event-driven programming model.
  • Very lightweight event-driven processes (several million actors per GB of heap memory).

Fault Tolerance:

  • Supervisor hierarchies with “let-it-crash” semantics.
  • Supervisor hierarchies can span over multiple JVMs to provide truly fault-tolerant systems.
  • Excellent for writing highly fault-tolerant systems that self-heal and never stop.

Pattern:

Balancing workload:

2

  • Akka actor system for EDI processing will be created at the time of application start-up
  • EDI Poster get the list of messages to be processed, identifies number of workers needed based on the number of messages
  • Actor System creates Master worker, Workers, Result listener actors for that EDI process
  • Master worker – Responsible for holding all the messages for processing, will maintain list of registered worker and what message they are processing currently. Will give the work to worker actors
  • Workers – Does the actual processing of the message and persist to the database, send the result to Result listener actor for further processing. Once work is done request master worker for work to be done
  • Result listener – gets the processed message(success, failures, warnings) from workers, maintains the list of number of messages to be processed and how many are processed. Updates the process and error table on a regular basis. Once all the messages are processed responsible for sending stop messages to actors part of that edi process.
  • Fault-tolerant (Supervision) – While processing if any one of the worker actor is died because of exception, actor system gets the notification and it will tell master actor to send the message to another actor for processing and remove that worker for further message flow

Reference: http://letitcrash.com/post/29044669086/balancing-workload-across-nodes-with-akka-2

Akka & Spring:

  • To be able to use the Spring application context to let Spring create and wire up actors, we need to store it away in a place that is easily accessible from within the actor system. Akka extension helps to achieve the same.
  • The extension consist of two parts. The SpringExtension class that defines the methods used by Akka to create the extension for an actor system, and the SpringExt class that defines the methods and fields available on the extension.

Reference: http://www.typesafe.com/activator/template/akka-java-spring

 Implementation Details

 

3

Above class diagram represents the different classes part of the Akka system to process the EDI messages Asynchronous

5. Further Steps

  • Supervision strategy for timeouts
  • Spliting the stowplan validation and object creation and DB Transaction as separate actors executitng it parallelism

Leave a Reply