Interclypse Blog

Home » Interclypse Blog

December 14, 2015

Messaging Frameworks: Apache NIFI and JMS

Date 12/08/2015
Author: Joshua Little
Messaging Frameworks: Apache NIFI and JMS

This paper outlines the benefits of the Apache NIFI and Java Messaging Service (JMS) messaging frameworks. Our goal is for readers to be able to make a more informed decision on how either of these messaging frameworks can be best utilized in their enterprise.

Apache NIFI
Apache NIFI is dataflow software that can manipulate and transform files as it passes through the dataflow. NIFI provides a user interface that allows a user to visually create and manage multiple dataflows in one view. NIFI comes with about ~50 processors already built in to get, ingest, put, and manipulate files. A lot of support is already built in for things such as http, JMS, HDFS, ftp. When building your dataflow, you configure each processor with the specific properties you need. You can also set the logging level for each processor. For the most part, you do not need to be a software engineer to create, manage, and change dataflows. You can download NIFI and start creating dataflows in minutes. NIFI allows for extensions so new processors can be written if needed. Within a dataflow, you can group processes into a process group, thereby nesting dataflows within dataflows. Additionally, NIFI has built in data provenance capabilities which provides detailed traceability of data. NIFI allows scalability through clustering. You create a NIFI Cluster Manager and add nodes to the cluster. It is configured in a Master/Slave relationship. If the NCM is lost, the slave nodes continue processing. You just cannot add new nodes or change existing node configurations until the NCM is back online. NIFI runs in a JVM. It has built in diagnostics to view the status of the JVM and the load on the system. NIFI can be hardware intensive when managing many or large dataflows based on the data throughput; however, it does not require a lot of personnel resources to manage and troubleshoot. One person can manage many dataflows. NIFI is fairly new in the open source world, but was developed and matured prior to its open source deview. NIFI successfully completed its incubation period with Apache and is a top level project now with a lot of community support and good online documentation.

JMS has been around for a long time and is a very mature product. It allows you to decouple system communications. It does require message oriented middleware (MOM) to manage dataflow between systems. If you only want to worry about sending or receiving data and not managing entire dataflows then this is a good approach. JMS is a great way to asynchronously share data with consumers. A producer doesn’t even need to know about the consumer, it just publishes messages that can be shared with anyone. Whereas with NIFI, you need to know where you are sending your files, JMS is a great way to fire and forget. The two major components of JMS are topics and queues. A topic shares a copy of a JMS message with every consumer subscribed to it. If there are no subscribers, the messages are lost. A consumer can make a durable subscription that will cause messages to be persisted until the consumer is once again able to consume the messages. A queue is a way to scale through load balancing when dataflow is high. A system generally configures multiple threads to a queue to pull off messages as they are available. Queues are also persistent, if there are no subscribers, the queue will store messages until something subscribes to pull them off the queue. With JMS, you need to have a software engineer to build a system to send the data, use a MOM to manage the dataflow, and build a system to receive the data. Because you are using a distributed architecture though the hardware needed is less intensive depending on the volume you require.

Both software solutions are effective ways to share data between systems. The fact that almost anyone can install NIFI and immediately start configuring dataflows is quite impressive. Most people think about messaging as having to be delivered to a specific person or system. If you consider instead publishing a message for anybody to consume that might be interested, then JMS is a very intriguing approach. You do not need to worry about who consumes it, just that it is made available. Since NIFI can actually support publishing to and consuming from JMS, it seems like a tool that should be considered as more than just a messaging tool; it is more of an entire dataflow solution. It is a way to manage and manipulate messages, while also being able to track data, aid in documenting dataflows, and quickly diagnose bottlenecks in dataflows. 

No Comments »

April 9, 2013

Mongo Attack!

Recently, Interclypse, Inc., attended the MongoDC 2013 conference - an annual appearance of the open-source, no-SQL community surrounding 10Gen's product, MongoDB. The conference was hosted at the very-cool Newseum in downtown Washington DC. What can I say about Mongo...what can't I say about Mongo!? If you are in the data management arena and you haven't played with MongoDB or any other no-SQL solution yet, what are you waiting for?
The Conference
First, about the conference: I attended only the general event (just to set the context). Obviously, there were a range of presentations, but these were the ones that I found to stand out:
1. Max Schireson's (the CEO of 10Gen) discussion on aggregation and indexing was excellent. First of all, the CEO gave it! Outstanding! The CEO is a technical guy with plenty of database experience - both structured and unstructured. Max was able to demonstrate new features of the MongoDB product that really make it a disruptive force in the data management /analysis market. Chief among these is the ability to aggregate documents given different criteria. This would be like constructing a view or a join. In addition, to producing new aggregate documents at query time was the ability to chain this creation to other mutations, such as sums and sorts. For example, let's assume the following JSON documents in the same collection:
{"_id: "1",
"org" : "JUG",

{"_id: "2",
"org" : "MUG",

{"_id: "3",
"org" : "JUG",

{"_id: "4",
"org" : "JUG",

The following query would produce an aggregate from this collection:
db.collection.aggregate({$group : { _id : "$org", totalMembership : { $sum : "$membership" } } }, { $match : { org : { $eq : "JUG"} } })

This is equivalent to the SQL statement:
SELECT org, SUM(membership) as totalMembership FROM collection
GROUP by org
WHERE org is "JUG";

{"_id: "JUG",
"totalMembership": 2221}

...where the results are grouped and sorted. As you can see, the aggregate is a reduced set despite the fact that one of the documents having slightly different structure. The "." notation acts as a "|" or pipe function, similar to that found in UNIX shells, allowing for the chaining of operations.

Finally, Max covered index construction and optimization, how not to index everything, and finally how to profile your queries through the "explain plan" to understand what indices’, if any, are being used during your queries and when.
2. Rob Moore's presentation on tricks-to-teach MongoDB showcased potential depth with the product. Imagine MongoDB acting as a pub-sub message broker! Building upon his asynchronous (and super-fast) Mongos client, he was able to demonstrate callback functionality within Mongo so that modification events to a document or collection were passed back to subscribed mongos clients much a JMS broker would do so. Although this is not part of the supported product, it did entice us to dream of certain possibilities.
3. Auto-Sharding and matters of scale. The sharding capability of MongoDB makes it attractive and a viable alternative to some big data solutions! Now, with MongoDB 2.4 release, the pain of shard management has gotten easier with automated shard extension support. So, as my Mongo cluster grows and I need to add more members to my shard collections, I "simply" startup a new MongoDB server instance and add it as a member of my shard collection. The distribution of key space across the new member is handled incrementally during normal shard collection rebalancing. Of course, if you didn't know, MongoDB also has a mapReduce implementation using JavaScript. Before you bash it, consider the performance of mapReduce across a shard collection filtered by time where all the working documents are in memory - yep, that's how MongoDB rolls. MapReduce, given your use case and tuning, can plow through documents faster than you would expect. (Sorry for the caveats, but that's life, right?)
There were other topics discussed, of course - spatial querying, security, a RedHat OpenShift highlight, and others. One thing new I did do that day was connect to the mongodb IRC on FreeNode through a Chrome add-in and help someone with their document "_id" (the shard key) strategy. Yeah old-school collaboration!

No Comments »

March 8, 2013

Network Cabling Tells All!

Movement to QSFP and XFP show density in the datacenter on the rise! 10/100 Gbps are making inroads, particularly within the rack and across the datacenter. Long range 40Gbps is 4 trunked 10 Ge links, either MO4 fiber or extended range out to 3000 Meters - that’s 1.86 miles. QSFP (Quad Small Form-factor Pluggable) allows 2x 10Ge fiber connections (fiber count of 4) connections in a single plug switch-side with a split tail host-side. That is 1 cable for 2x 10Ge drops from host to switch.  That just cut a typical cabling load in the rack in 1/2.  Choose QSFP in copper for even more affordable cabling! Did someone say 10Gbps over copper? Boy, the photons must be getting scared. For another reference, see: or Google QSFP.

Now to the storage scene...Twinax is yet another copper alternative to more expensive fiber. Typical uses for Twinax are FCoE (Fiber Channel over Ethernet) where the runs are generally short (not exceeding 300m) and are therefore used mostly in a ToR (Top of Rack) configuration. Twinax, however, may be proprietary. Twinax cables come in both active and passive modes, depending on your network and attached storage solution. Here’s a brief overview of some of the nuences around twinax:

No Comments »

February 13, 2013

Technology Working Session -  Monday evenings 8pm-midnight

Interclypse opens our office every Monday evening from 8pm till midnight for information technology working sessions. Anyone is welcome to participate in these session as they are an informal gathering of individuals working on various technology projects ranging from open source projects to training modules. If you have an interest in a specific technology than feel free to contact our organizer Brian Walsh (.(JavaScript must be enabled to view this email address)) to see if anyone else is working on a related topic or if we have some recommended training material. Otherwise, feel free to show-up, introduce yourself and share the enjoyment of technology with us.

No Comments »

July 10, 2012

Perspective on Technology

Yesterday, I got a new laptop - an Asus ZenBook - ultra light and fast (with a keyboard!  Novel idea - the iPad was not very convenient for lots of typing and business stuff).  As cool as a mini-laptop with an SSD drive and a boot time of 2 seconds is, it still is just a tool.  This reminds me of a graduating student's  thesis talk I heard this last May on how technology "Moves us Sideways."  Advancements in technology do do things - they elighten interpretation of a future, they empower potential for optimizations in work done today.  Advancements may also let us spin in circles and lead us astray - these are the parodies of the da y.  Never-the-less, continuing to advance, is to continue to learn.  What moves us sideways is advancement without a purpose or meaning.  Although, not all-together bad, advancements are not necessarily all-together good, either.  What's the point of this?  Well, take my laptop.  What am I going to do with it to make the investment worthwhile?  Why did I want a smaller device?  At Interclypse, we want to make sure the customer is buying with intent and that they know to a certain degree what they can get out of their investment - and to that end - that they are buying what they need for their business - not just what is latest-and-greatest or the secular, en-vogue tech.  Of course, there is sometime the need to stay relevant, or "with the age."  Blackberry?

No Comments »

June 30, 2012

On IT Governance

It seems that often "what works best" for one solution turns into "this works" for all solutions and then somehow into a mandate to "only use this" for all solutions.
On my previous program, I was able to use technology at will for the purpose of meeting the customer's needs. I wanted to keep higher development velocity, however, so I did limit the tech stack in production. What resulted were a series of AOAs (Analysis of Alternatives) for hard problems, mitigating the problem through a down-selection of various solutions to a short-list of acceptable alternatives. We did this repeatedly. Starting in 2008, we wound our way through Hadoop, Spring Integration, Servicemix, Camel, Active MQ, 0MQ, TibcoEMS, RabbitMQ, F5 BigIP, Voldemort, ExtremeDB, BerkleyDB, MySQL, Terracotta, MongoDB, Askaban, CouchDB, Sandra, Gemfire, Solr, Scala, etc. Phew! For example, in the arena of data analytics, we presented map reduce techniques on MongoDB and Hadoop using JAQL. A customer database evaluator worked on the team at that time. Later we now hear that Mongo and Mysql, among other things, are on the list... Maybe we influenced the evaluation?
Generally speaking, we should "use the right tool for the right job." At the same time, I constrained certain technology choices - the use of Java, the use of Spring, assertion of certain conventions, how software was constructed, tested, etc. You could say, I asserted a certain level of IT governance.
As I mentioned at the top, we performed different AOAs. Some of the products chosen were commercial ones. The final decisions to use or not use certain products definitely have to do with money, but cost is not the only factor. At the end of the day, product selection is a mix of community support, O&M costs in the out-years, quality of he product, importance of the product to the company portfolio, or in the case of FOSS or GOTS, what the followership and popularity trends appear to be. ACCUMULO is an example of an agency taking ownership of the code as it is a fork of Google's BigTable. In this case, O&M costs are an internal investment projection - nothing is for free.
In a funny way, the current licensing pain exists because IT governance is somewhat working - Project x need a DB and the customer used to mandate a commercial RDBMS solution. Well, now we have a need for "IT efficiency." IT Efficiency drives firstly at the consolidation of hardware. Both the data cloud and the utility cloud (VMWare and the like) start to make sence on a consolodated hardware platform, but what about the data?
The shift to Mongo, MySQL, Hadoop and ACCUMULO still does not deal with IT Efficiencies around data schema consolidation. I wonder how many duplicated solutions will exist out there on our consolodated data cloud? Perhaps the next, great IT Governance adventure?

No Comments »

January 9, 2011

Red Hat Magic Sysrq Key Sequence (Stop-A for Red Hat Linux)

In the case when a system is inoperable or hung and needs to be rebooted you can use the Magic Sysrq key sequence from the system console. The only downside of this is that it must be activated before it can be accessed. To activate sysrq run the following command and add kernel.sysrq = 1 to /etc/sysctl.conf.

# sysctl -w kernel.sysrq=0

Now you can force a reboot using Sysrq:

  • NOTE: The Sysrq key is the Prt Scr key
  • Hit alt-sysrq-s. This will attempt to sync your mounted file systems so that changes to files get flushed to disk. You may hear or see disk activity. If you are looking at the console, the system should print the devices which were flushed.
  • A few seconds later, hit alt-Sysrq-u. This will attempt to remount all your mounted file systems as read-only. You may hear or see disk activity. If you are looking at the console, the system should print the devices which were remounted.
  • A few seconds later, use the alt-Sysrq-b to reboot the system

No Comments »

October 22, 2010

Are You Really Following an Agile Methodology?

I recently came across a project that stated they were following an agile methodology, specifically Scrum; however they had not delivered any usable software since the projects inception (over 18 months). In addition, they required a detailed specification prior to the start of development. The specification took over 6 months to produce by a separate team and required several review approval meetings before approval was given to begin development. Are they really following an Scrum? ABSOLUTELY NOT!


What is Scrum?

Scrum is the most popular agile methodology in use today. It was first introduced by Dr. Jeff Sutherland and Ken Schwaber in 1995 at OOPSLA. According to the Agile Alliance, Scrum is an agile, lightweight process that can be used to manage and control software and product development using iterative, incremental practices. Scrum, when used properly, significantly increases productivity and reduces time to market while facilitating adaptive, empirical systems development.


Scrum adheres to the values as defined in the Agile Manifesto: “Through this work we have come to value:

  • Individuals and interactions over processes and tools
  • Working software over comprehensive documentation
  • Customer collaboration over contract negotiation
  • Responding to change over following a plan

Twelve principles underlie the Agile Manifesto, including:

  • Customer satisfaction by rapid delivery of useful software
  • Working software is delivered frequently (weeks rather than months)
  • Working software is the principal measure of progress
  • Even late changes in requirements are welcome
  • Close, daily cooperation between businesspeople and developers
  • Face-to-face conversation is the best form of communication (co-location)
  • Projects are built around motivated individuals, who should be trusted
  • Continuous attention to technical excellence and good design
  • Simplicity
  • Self-organizing teams
  • Regular adaptation to changing circumstances

Scrum practices:

  • Three basic roles: Product Owner, ScrumMaster, and Project Team
  • Product Backlog
  • Sprint Backlog
  • Sprint Planning Meeting
  • Daily Scrum Meeting
  • Thirty-day iterations, delivering increments of potentially shippable functionality at the end of each iteration
  • Retrospectives

Simple questions to determine if you are following Scrum; if you answer no to any of the below then you are not adhering to Scrum.

  • Do you have fixed/time boxed iterative development; typically 2-4 week sprints?
  • Do you have working software at the end of each iteration or sprint?
  • Do you have testing as part of the iteration or sprint?
  • Do you have an integrated team that consists of the people formulating the product, the people building the product, the people testing the product, and the people that will use the product?
  • Do you require a complete specification before you start development? If you do you are not following Scrum.
  • Do you have daily meetings?
  • Do you have product backlog?
  • Do you have a product owner?

No Comments »

October 11, 2010

How to Flash the BIOS using a USB Stick

Flashing a computers BIOS is a task most have little experience in doing. Although, the steps outlined here are quite simple one must still be cautious so that your computer is not left in an unusable state. This article focuses on the basic steps and explains why it is necessary to flash the BIOS, precautions, and how to recover in case of a bad flash.

What is the BIOS and why do you need to flash it?

BIOS stands for Basic Input/Output System. It is the most basic component in your computer and determines what your computer can do without accessing any other files or programs from the hard drive. The BIOS contains all the information needed for your computer to Power on Self Test (POST), which is needed in order for the computer to boot. The BIOS also includes how to control your keyboard, communicate with your CPU, send/receive video signals to/from your monitor, and recognize your components (hard drives, optical drives, USB devices, serial ports).

Why should we flash the BIOS?

Flashing your BIOS to the latest release is crucial because it enhances your system's capabilities, helps it to detect newer devices and components (newer hard drivers, newer CPUs, etc), and improves stability through the introduction of patches by the manufacturer.


  • Make sure you have a USB stick. Since the original method to flash the BIOS was using a 1.44MB floppy USB size is not important - they are all greater than 1.44MB
  • Download the BIOS version appropriate for your motherboard.


  • Download HP USB Disk Storage Utility used to format and make your USB stick bootable
  • Download the DOS bootable files
  • Install the HP USB Disk Storage Utility
  • Unzip the DOS bootable files to your C: drive
  • Insert your USB stick in your computer
  • Run the HP USB Disk Storage Utility software. NOTE: With Microsoft Vista you need to start the software as an administrator by right clicking on the icon and selecting "Run as Administrator".
  • Select your USB as the “Device”, FAT as the “File system”, BOOTDISK as the “Volume label”, select “Quick Format”, Select “Create a DOS startup disk”, and enter C:\win98boot (or the path where you downloaded the DOS bootable files. Then select the “Start” button. This process only takes a few seconds.

  • (Optional) Check the USB stick; there should be 3 files installed on it:, io.sys and msdos.sys. If you do not see them you may have to unhide them by selecting Tools->Folder Options->View unselect the ”Hide protected operating system files (Recommended)”.

  • Next copy the exploded BIOS upgrade files you already downloaded to the USB stick
  • You are now ready to flash
  • Reboot your system and enter BIOS setup (usually by selecting the F2 key during initial boot)
  • Move to the Boot tab
  • Change your USB stick to be the first on the list
  • Save and exist the BIOS setup
  • A reboot will occur; this time booting from your USB stick
  • Once booted you will be left at a DOS prompt
  • Change directory to where the updated BIOS files were saved
  • Execute the program (typically you just type "flash" at the prompt) as indicated in the BIOS flash update documentation
  • Remove the USN stick
  • Reboot the system


  • Flashing the BIOS can be dangerous if the flashing process isn't finished successfully or if the newly flashed file doesn't match your system or is incorrect
  • Never flash if there is possibility of losing electricity
  • Before proceeding to flash don't forget to go into your BIOS and write down all of your settings. This is crucial because the "default" settings may not be the best option for your system, especially if you've tweaked BIOS
  • Do NOT reboot and/or shut down your system while flashing

No Comments »

Page 1 of 1 pages