Friday, 28 July 2017

TIBCO lays Focus on Apache Spark Accelerator


Taking a look at the big data management section, the hype is mainly about Apache Spark. TIBCO Software Inc. is out with its Accelerator for Apache Spark. As the name implies, the core factor is to speed up the usage of Spark.


StreamBase And TIBCO
Hayden Schultz of TIBCO provides more information regarding how the accelerator works. He was initially working with a startup called StreamBase which was acquired by TIBCO in 2013. He has been into financial market and asset class managements for over 13 years before he became associated with TIBCO. With TIBCO his experience is varied in sections, and now he is associated with significant data.

Focus Area of TIBCO
TIBCO has been focused on providing customizable solutions for the users. The company has focused on creating a standard application framework and the current one in concern is big data application. They provide an example of how the system works together and how it is able to perform well with better solutions.

Apache Spark Accelerator – Free license
The company is not intent on selling the accelerator. They are releasing it as an open-source with a free license so that the users can take it up and manipulate according to one's need. The focus was laid on Hadoop. Apache Spark makes use of Hadoop.

Even though the accelerator is a new product from TIBCO, it is not a new idea. It is possible for any knowledgeable individual to take up the products and build it over the clusters on top the Spark system. They would not need the help from the company, but the Global Architect of TIBCO states that they have created this design for those people out there who are not so knowledgeable about the operations to use the Spark components efficiently.

Who needs the Apache Spark Accelerator?
An example of such a customer is an individual who is working with StreamBase and Spark and requires to have the details of currency trading system entered into StreamBase. Their requirement is to do back testing. By back-testing, it means that they want to check if their new formula is working better than the previous one. But they do not know what to expect as it involves a huge data pool and they do not know any details about the new currency data too. The person goes on to save all the details of the finances in the big data cluster. Each and every single and minute data is stored in the cluster.

The new algorithm has to be trained before comparing it with the old one. To train the algorithm, per day groups of data is created from a specified six-month period. Each chunk of data comprises of several other data chunks. The new algorithm is operated from StreamBase which is running from within the Spark Cluster. This usually took several hours initially, and it can be done in less than an hour now. The users can use their own level of implementation. The system assumes that the user is working with TIBCO products.

StreamBase is used to get the data. With the initial data source, adapters like StreamBase Applications are used to connect to the data, and it is written on to HDFS or Flume. With this, the data is imbibed into the primary data cluster. When a large data is looked into using data analytics, TIBCO solution like a Spotfire can be employed.

From within Spotfire, the SQL command can be run, and even the Databricks Spark connector can be used from within Spotfire. TIBCO even provides a solution to analyse data. Coming to Spark Accelerator, Spotfire gets the data and analyses it and then uses Sparkling Water ai on the Spark. This AI trains the machine learning system which is saved inside the data cluster.





No comments:

Post a Comment