The Real Time Streaming Processing with Edge Computing Gateways

by Mark Hsia

An edge computing gateway plays multiple roles at the same time. It receives incoming data from the cloud, processes it and disseminates them to the local nodes. On the other hand, it collects data from local nodes, summarizes the data and send them to the cloud centers. Key components of the edge computing gateway are: a Complex Event Processor (CEP), an Message Queue (MQ) system and a fast real-time DataBase.

Streaming data

Streaming data is the data generated continuously from multiple sources. Normally these data come with arrival timestamps to process them sequentially. Since data keeps pouring into the system, so we have to use the sliding window technique to remove all expired data outside the sliding window. A typical example of streaming data is stock prices. Stock quotes, prices keep coming tick by tick. Each tick includes price, time stamp and other related information. Normally we keep these tick data in two places: the CEP sliding window and a database. The CEP processes data and throw them away, but a database always tries to keep them forever.

Tick by Tick data

Market data from the data feed is sent in a tick by tick way. A tick from the data feed must be shown on all users desktop immediately. The high frequency trading system applications usually expect a very low latency environment. Industry gateway could offer a service level of less than 10 us latency. Top-notch products made by FPGA can reach a level of 3 us latency.

Periodical Data: Minute Data

Periodical data like minute data, hourly data or even daily data could be generated from tick data in the sliding window. These types of data keep only the open, high, low, close data of each period. Therefore, converting data from tick to period can offer a very high compression rate.

Tick by Tick or Micro-Batch Processing

If a process can handle 1000 tasks per second. When we do a task with tick by tick basis, then the maximum throughput will be 1000 tasks/sec. If each task can handle one message, the throughput will be 1000 messages/sec.

But if we only do 100 tasks, but each task handles 20 messages, then the overall throughput becomes 2000 messages/sec. One task handles one message is a tick by tick process. One task handles multiple messages is a micro-batch process. The ratio of variable cost and fixed cost of a task decides which type of process performs better. The smaller ratio is the variable cost, and the micro-batch performance will be better.

No comments:

Post a Comment