EDITING BOARD
RO
EN
×
▼ BROWSE ISSUES ▼
Issue 25

Data Integration with Talend Open Studio

Dănuţ Chindriş
Java Developer
@Elektrobit Automotive
OTHERS

As Jonathan Bowen mentioned in his book "Getting Started with Talend Open Studio for Data Integration" as soon as the second computer was manufactured, systems" integration became an essential part of IT teams" work.

Today"s systems complexity together with the fast pace that the businesses evolve at highlight the need of a set of tools that allows us to quickly execute the integration tasks. Also, we have to be able to promptly react in front of new business opportunities.

Experience showed us that, most of the time, new clients come asking to integrate the product we are offering into their ecosystem. Rarely does an informational system work isolated in its own universe. We noticed on several occasions that the success of the project presented to a client depended on our ability to integrate the system with the products they were already using.

The process we are talking about could mean the synchronization of two databases once or recurrently, the consumption of some services - web services or other kinds of them - the generation and transfer of various types of files etc. Thus, we notice that we deal with a variety of ways to accomplish these tasks, a thing that contributes to the increase of the problem"s complexity. Also, sometimes it is our responsibility to decide the way we will accomplish the integration but most of the time the client has specific requirements regarding this aspect as well.

When we deal with such a situation we can either manually build the interface between the systems as a custom solution or use a tool specialized on solving integration problems. Such a tool is Talend Open Studio which comes with an interesting offer to help us solve our integration tasks.

An Overview of the Talend Open Studio Environment

Talend Open Studio for Data Integration is a graphical development environment that, as its name states, is specialized in data integration between systems. At the core of this open source system stays the Eclipse environment. Together with the creation of integration solutions, Talend Open Studio includes also the necessary mechanisms for delivering them - the jobs can be run both within the environment and as stand-alone scripts.

For modelling the processes, the system uses connectors. The developers of the product offer us over 800 such connectors which give us the possibility to easily connect databases, read information from various sources, transfer files and perform operations on them. Also, we are given the possibility to connect specialized components for defining complex integration processes.

A good portion of the work we perform with Talend Open Studio is represented by graphical modelling of the processes we want to define. All this time the platform does its builder work in background, generating Java code. Actually every component we use has an associated behavior, described by the Java code.

Having in mind the fact that this is a graphical tool, the product can be used both by programmers and persons that don"t have programming skills. Yet in order to be able to define certain complex behavior we have to write Java code from time to time. This fact leads us to the conclusion that the users who don"t know programming face certain limitations.

Talend Open Studio is relatively easy to use; it"s a quick way to model integration scenarios, most of the time reducing the implementation time from weeks or months to days or even hours, depending on the complexity of the project. However, we have to warn the readers that, alike many other areas, if due to overzealousness or an unfit design we over-engineer things, we risk to get a complex solution, hard to understand for other users or even inefficient. There is the need to follow some best practices that ensure the quality of our solution here as well.

Among other advantages of using Talend Open Studio we have to notice that this is an open source product that allows the users to extend the platform as needed. Also, using it boosts the productivity because the developers can concentrate more on the definition of the process than on the technical implementation thereof. We have at our disposal a multitude of components, fit for situations more or less common, that we can operate with to define our processes. In addition, the Talend users" community is active and ready to offer technical advice.

Use Cases

As we mentioned in the previous section the most common use cases of the Talend Open Studio project are these:

Transfer between databases: When new systems are created or the existing ones are upgraded, the data needs to be migrated in a new database. This can have the same schema or a different one and Talend Open Studio offers us the connectors and actions necessary for this process.

Files transfer: The integration tasks may need to transfer data in large quantities. This thing is often performed using files. An example of such a file is the classic CSV (comma separated values). Also, it is possible that the system which receives the transfer file needs the data in a different format. This case is also handled by the Studio because it offers us the possibility to define processes that perform transformations on the transferred data. Moreover we have at our disposal file management capabilities through operations such as FTP transfers or archiving.

Synchronization: The systems that collaborate are not always connected to the same data repository, which means certain information may be duplicated within an ecosystem. Consequently, we need to make sure that this information is periodically synchronized. This is the case of the data about the clients of a company which can be present, for instance, within the finance system, the distribution system or the CRM platform. Talend Open Studio can be used for performing the systems" synchronization with the help of some jobs that automate the process.

ETL: This is an acronym for Extract, Transform, Load, terms that describe an essential process for data warehouse systems. Such a process extracts data from operational systems, transforms it applying rules or functions and then loads it into the data warehouse. Again Talend Open Studio makes our lives easier helping us substantially at implementing this type of process.

Example

To illustrate how easy is to use this platform we create a project containing one job that transforms an XML file into a CSV file. The graphical model for this job is illustrated in the figure below.

On the left hand we have a tFileInputXML component and on the right hand a tFileOutputDelimited component. They are connected through a Main connector. Before dragging the input component inside the design area we defined a metadata object which we associated an XML file to. The Studio automatically detected the schema of the document and offered us the possibility to select which nodes to be transferred to output. Through the Main connector, Talend transferred to the output file the exact structure we defined without us writing a single line of code. All we had to configure within the output component was the path and the CSV file name.

Of course we can extend this job by further connecting other components such as the ones that work with FTP connections to transfer our file to the target system.

Conclusion

In this article we briefly saw which the context for the integration processes is, what Talend Open Studio is and which the benefits of using this product are. Also, by means of a small example, we tried to illustrate the simplicity of using the Studio, the quickness of implementing the jobs and to get an idea about the potential of this platform.

Sponsors

  • comply advantage
  • ntt data
  • 3PillarGlobal
  • Betfair
  • Telenav
  • Accenture
  • Siemens
  • Bosch
  • FlowTraders
  • MHP
  • Connatix
  • UIPatj
  • MetroSystems
  • Globant
  • Colors in projects