Saturday, May 23, 2015

A Java Developer's Perspective on the Power and Danger of JavaScript's Object Prototype

In the Anti-Patterns section of the book Learning JavaScript Design Patterns, author Addy Osmani calls "Modifying the Object class prototype" a "particularly bad anti-pattern." One of the interesting (and scary) aspects of this is that a developer can change the behavior for all JavaScript objects with one definition. This is analogous to what would be possible in Java if a Java developer was allowed to change Java's Object class.

I mentioned this risky feature briefly in my post JavaScript Objects from a Java Developer Perspective. Imagine the havoc that could be rendered if one was able to change, for example, how Java's Object's default equals(Object) implementation was implemented. In the blog post I just mentioned, I demonstrated overriding a particular Java object's toString() implementation. I mentioned, but did not demonstrate, overriding toString() for all JavaScript objects via Object.prototype. In this post, I do demonstrate this, which is the equivalent of what a Java developer could do in Java if allowed to change Java's Object's toString() implementation directly (Java developers can only extend Object and override it on per-class basis).

It's all too easy to change the default behavior of all JavaScript objects. The next code listing shows how easy it is to change the default JavaScript toString() behavior from providing the string "[object Object]" to providing the string "I'm a JavaScript object!"

Overriding All JavaScript Objects' Default toString() Implementations
Object.prototype.toString = function objectToString()
{
   return "I'm a JavaScript object!";
}

The simple four lines in the above code listing (and I could have easily had them all on a single line) change the default behavior of toString() for all JavaScript objects. I can still override this default implementation of toString on a per named object basis (there are no classes in JavaScript as of today). This was demonstrated in my previous post and is reproduced here for convenience:

Overriding toString() Implementation for Person Object Only
function Person(lastName, firstName)
{
   this.firstName = firstName;
   this.lastName = lastName;
}

Person.prototype.toString = function personToString()
{
   return this.firstName + ' ' + this.lastName;
}

The code listing above shows creation of a JavaScript Person object with a constructor function and shows overriding toString() for that newly created Person object. The next code listing demonstrates testing of the toString() implementations in such a way that the overridden default implementation and the customized Person implementations are rendered. The output of running this demonstration code is shown after the code.

Demonstrating Overridden Default toString() and Customized Person toString()
function demonstrateObjectPrototype()
{
   var indy = new Person('Jones', 'Henry');
   console.log("Indiana Jones's real name is " + indy);
   
   var solo = {};
   solo.lastName = 'Solo';
   solo.firstName = 'Han';
   console.log("Chewbacca's buddy is " + solo);
}

From the output shown above and the code listing before it, we can see that we have changed the default toString() from "[object Object]" to "I'm a JavaScript object!" and that we can still override a particular object's implementation to use its own customized behavior rather than the default behavior.

It is easy to see how this ability to easily manipulate the default behavior for all JavaScript objects can be both alluring and frightening. It wouldn't be a repeated "pattern" (even if it's an anti-pattern) if it didn't have appeal. Java's default Object.toString() implementation that provides the system identity hashcode of the object upon which it's called rarely seems helpful other than for differentiating it from other objects of the same type. It might be tempting at first, if one could easily change Java's Object's toString(), to change this implementation to use recursion to iterate over all of a given object's data members. However, there would also be significant risks and questions:

  • How would one prevent the toString() that used reflection from showing fields' values that should not be shown for security or other reasons?
  • Would class-level (static) data members be shown in addition to instance-level members?
  • Should all objects pay the reflection performance cost, especially when these objects might include collections of other objects that might lead to reflection on deep collections?
  • What would the preferred output format be?

These questions and concerns regarding overriding default Object.toString() behavior in Java are only a subset of the questions and concerns one might have and it could be argued that changing toString()'s default behavior is less risky than changing the default behavior of other Object methods such as equals(Object). One could always override the behavior in Java of changed default Object implementations, but it would need to be overridden in every extended class either directly or through its ancestor classes. Developers new to the code base might assume the JDK default Object behaviors and realize a nasty surprise when they find out that the codebase has changed default Object behaviors.

In this post, I have demonstrated how easy it is to override JavaScript's default Object behaviors via use of Object.prototype and have tried to also show why this should be rarely or never used. I have intentionally approached this from a Java developer's perspective in an effort to articulate more differences in the object models between Java and JavaScript.

Saturday, May 9, 2015

JavaScript Objects from a Java Developer Perspective

One of the challenges for Java developers learning and applying JavaScript is the very different interpretations each language has of "objects." I find it much easier to context switch between Java and languages such as Groovy, Ruby, Python, C#, and C++ than switching context between Java and JavaScript. The other languages' class-based approach to object-orientation is similar enough that their differences are primary syntax and syntax is relatively easy to learn and switch context on. JavaScript syntax in many ways is actually more like Java's than some of these languages, but its prototype-based approach to object-orientation is very different. In this post, I look at JavaScript "objects" from a Java developer perspective and offer some tips and tactics Java developers can use to better bridge the concepts of object-oriented Java and object-oriented JavaScript.

Whereas Java and several other programming languages have a wide and rich range of datatypes and collections types, JavaScript only has a small number of primitive datatypes (Boolean, Number, String, Null, and Undefined), a single collection-like data structure (array), and supports JavaScript objects. This makes it simpler to learn these basic datatypes and array collection, but means additional work (or a framework) is required to implement specific functionality that other languages' types and collections might provide.

Constructor Function Approach for Instantiating JavaScript Objects

There are multiple approaches for instantiating JavaScript objects. As a Java developer, I prefer the "constructor function" approach. One of its most significant advantages is that the objects created with this approach can be used by multiple pieces of code because it is named and available for their use. However, as a Java developer, this approach appeals to me because it is the most like Java (and other class-based object-oriented languages).

The next two code listings contrast two common approaches for instantiating JavaScript objects (object initializer and constructor function).

JavaScript Object Instantiation via Constructor Function
// Person objects are instantiated with a 'constructor function' approach.
// With this approach, more than one instance of 'Person' can easily be
// instantiated as needed via the "new" keyword. It is convention to use
// uppercase for the first letter of the function to indicate that it's a
// 'constructor function'.
function Person(lastName, firstName)
{
   this.firstName = firstName;
   this.lastName = lastName;
}

var person = new Person('Clouseau', 'Jacques');
console.log('The person is ' + person);

Constructing a JavaScript object with a "constructor function" allows it to be referenced by name, allows the familiar "new" keyword to be used, and, when the function's name begins with a capital letter, looks like a convention that would fit in Java.

JavaScript Object Instantiation via Object Initializer
// The 'object initializer' approach is used here to define an "individual"
// object. This is a one-time approach because it's not named and is less
// like approaches in class-based object-oriented languages.
var individual = {}
individual.lastName = 'Panther';
individual.firstName = 'Pink';
console.log('The animated character is ' + individual);

The object initializer approach is a single-use approach because there is no named function to be referenced for a separate instantiation. Its syntax is also quite a bit different than that we're used to in Java.

The next screen snapshot indicates how the above code listings are rendered in Chrome's JavaScript Console:

Adding toString() to JavaScript Objects

In the previous screen snapshot, the names that were displayed in the console were both shown as "[object Object]". Like Java, all objects in JavaScript extend a common object called Object. In particular, all JavaScript objects inherit properties from Object.prototype. In this case, Object.prototype.toString() provides a default string representation for all JavaScript objects. As the screen snapshot demonstrates, it's only minimally valuable (similar to how Java objects' default toString() implementations inherited from Java's java.lang.Object are minimally valuable).

Just as one can override toString() in Java classes so that objects provide useful data on themselves, objects instantiated with constructor functions can override their Object.prototype.toString() implementations. The next code listing adapts the example above on constructor function and adds code to override the toString() (see lines 12-15).

Overriding JavaScript Object's toString() Implementation
// Person objects are instantiated with a 'constructor function' approach.
// With this approach, more than one instance of 'Person' can easily be
// instantiated as needed via the "new" keyword. It is convention to use
// uppercase for the first letter of the function to indicate that it's a
// 'constructor function'.
function Person(lastName, firstName)
{
   this.firstName = firstName;
   this.lastName = lastName;
}

Person.prototype.toString = function personToString()
{
   return this.firstName + ' ' + this.lastName;
}

var person = new Person('Clouseau', 'Jacques');
console.log('The person is ' + person);

The new output for the instance of Person using this overridden toString() is shown in the next screen snapshot.

In the example just covered, I overrode the prototype specifically for Person. This is a nice localized use of the ability to override objects' prototype. A broader (and potentially much more dangerous) capability is presented by being able to override Object.prototype and thus affect all JavaScript objects' behaviors. This is analogous to the risks and rewards one would get in Java if able to override java.lang.Object's behaviors one time for all Java objects. In other words, if you imagine java.lang.Object's behaviors being changeable at that level, that's what JavaScript's Object.prototype allows.

Avoiding JavaScript's Ubiquitous Global Scope

JavaScript makes it far too easy to make variables global scope. Effective use of constructs such as the var and this keywords can help. Variables inside JavaScript functions are limited to those functions' scope when they are designated var. In contrast, variables declared within a function without var are not limited to that function's scope and so any changes anywhere can change the "state" of that function. This is an idea that probably makes most C++ and Java developers cringe. The this keyword is surprisingly difficult in JavaScript because it varies greatly depending on how used, how called, and whether in strict mode or not. In other words, JavaScript's use of this is far more difficult than Java's (which is essentially a reference to a particular instance's class members). However, my usage of this above (in the function constructors) is not surprising and is described in Mozilla Developer Network JavaScript Reference, "When a function is used as a constructor (with the new keyword), its this is bound to the new object being constructed."

JavaScript Objects are Like Java Maps

Although most browsers now support a first-class map and the forthcoming EMCAScript specification spells one out, previous versions of standard JavaScript has relied on arrays for collections needs. However, most introductions to JavaScript objects describe them as collections of name/value pairs that are very similar to maps. Indeed, some of the references that introduce JavaScript from a Java developer have shown Java Maps with String keys and Object values to illustrate functionality provided by JavaScript objects.

Conclusion

Despite seemingly common syntax and even sharing four letters in their name, Java and JavaScript are very different in many ways. In particular, while both can be said to be object-oriented or object-based, their very different manners of implementing objects (class-based versus prototype-based) has significant impacts on the respective languages. Understanding some of the basic similarities and differences between these two languages' objects can help make context switching between the two languages easier. It's also worth noting that there are plans for ECMAScript 6 (Harmony) to provide some semblance of support for semantics familiar to those who have used other object-oriented languages. For example, there is a proposal for classes in ECMAScript 6.

Additional Resources

Thursday, April 30, 2015

Software Development Lessons Learned from Consumer Experience

Because we software developers are also inevitably consumers of others' software applications, we are undoubtedly influenced in the creation of our own software by the software we use. For example, our opinions of what makes an effective interface for users are influenced by our experiences "on the other side" using someone else's software interface. This is particularly true for web and mobile development as web application and mobile applications have become pervasive in our lives.

We are prone to adopt idioms and styles that appeal to us and shun idioms and styles that we don't like. The degree of this influence may vary widely based on the type of software we are developing versus the type of software we use as a consumer (the more alike they are, the stronger the influence). There are times when the influence may be more subconscious and other times when the influence of others' software may be obvious. In this post, I describe a recent experience with an online site that reminded me of some important software development practices to keep in mind when creating software (particularly web applications).

I was recently creating a photobook on a popular photography-related web site. I began by uploading numerous photographs for the book. The web application reported that all but one of the photographs uploaded successfully. It reported that it failed to uploaded one photograph and recommended that I verify my Internet connection. After verifying my Internet connection, I tried uploading the single photograph several more times without any success. I tried changing the name of the file and changing its file location, but still had no success. I was able to upload other photographs after those failures, but could still not upload the one particular photograph.

I decided to work on the photobook with the photographs that did upload and spent a couple hours arranging the photographs exactly the way I wanted them. When I tried to save my photobook project, however, the application would not allow me to save because it said it could not save until it finished uploading the photograph that it kept failing to upload. I clicked on the link to save the project several times without success. I could not remove the reference to the offending photograph and even the Save-As option did not allow me to save because the application thought the photograph was still uploading. Ultimately, I gave up and closed the browser, knowing that my two hours' worth of work was lost.

When I looked carefully at the characteristics of the problematic photograph, I noticed that it was exactly 1 MB (1024 KB) in size. I used some image manipulation software to make a minor change to it that affected its size (made it a bit larger) and it uploaded without incident. I had to start over, but at that point I was able to arrange (again) the photographs where I wanted them and able to save the project as desired.

As a consumer of this software, I learned a few lessons. One lesson is the need to explicitly save often when using that application because it does not have an implicit save feature and because the act of saving a project seems to be the only way to find out that the software is in an inconsistent state in which no more future work on the project will be savable. I also learned to avoid the rare case of attempting to upload an image that is exactly 1 MB to that application.

I was reminded of even more important lessons as a software developer. First, I was reminded of the importance of unit testing, especially boundary conditions. I speculate that the code used by this application looked something like this pseudo code:

if (imageSize < 1024000)
{
   // upload as-is
}
else if (imageSize > 1024000)
{
   // compress and then upload
}

In my speculative pseudo code shown above, the case of an image that is exactly 1024000 bytes leads to an image that is not explicitly uploaded. It's an easy error to make, especially if in a hurry and if no code review is performed or is rushed. Effective unit tests are perfect for driving out this type of bug and unit tests that test boundary conditions like 1024000 in this case are easy to implement. Often, just the writing of the unit test to test this boundary condition will cause the developer to realize the error of his or her ways.

Being an irritated and frustrated consumer of this software also reminded me of the importance of planning for "unhappy paths" in the software I develop. Use of this online photobook-creating software would have been less frustrating in this case if one of several options had been implemented. Had the application supported an auto-save feature that reported when it couldn't save, I'd have known that what I was working on wasn't savable. Blogger, which I used for this blog, has such a feature and when it reports to me that it cannot save, I know to stop adding new content to my post until it can save and I have a chance to copy and paste what I have typed into a file on my local hard drive.

Another option that could have saved me significant frustration would have been a Save-As feature that allowed me to save my project as a different project. I can understand the software being written to not allow me to save while it thinks it's in an inconsistent state (it thinks it's still trying to save), but it should still be able to save it as a different project. I have seen this allowed on several desktop applications that think the currently loaded document is corrupt or inconsistent but allow me to save that document anyway as a separate document (and not overwriting the previous document).

My frustration with the online photobook creation software reminded me of the importance of keeping the users' experience in mind when writing software. We can write all of the clean, readable, and maintainable software we want, but if it provides a poor user experience, that effort is for naught. This experience also was a good reminder of the importance of thorough testing, especially of "unhappy paths" and boundary conditions.

Packt Publishing's Free Learning Offer

In February of this year, Packt Publishing offered "a free eBook every day" for 18 days. The offer is back again at https://www.packtpub.com/packt/offers/free-learning with the title Getting Started with C++ Audio Programming for Game Development being offered as I write this post (Kinect in Motion – Audio and Visual Tracking by Example was available free yesterday, the first day of this new offer).

The e-mail message I received from Packt about this new offer states, "It's back! And this time for good. Following on from the huge success of our Free Learning campaign, we've decided to open up our free library to you permanently, with better titles than ever." Unlike the offer in February that lasted for 18 days, it sounds like this one may be available indefinitely. The e-mail message further describes the offer, "Each eBook will only be free for 24 hours, so make sure you come back every day to grab your Free Learning fix!"

You need to create a Packt Publishing account if you don't already have one so that you can login to claim each day's free book. I have had an account for some time, but the most challenging part for me in the previous campaign was thinking of going to the site each day to see which free book was being offered.

Saturday, April 18, 2015

The JDK 8 SummaryStatistics Classes

Three of the new classes introduced in JDK 8 are DoubleSummaryStatistics, IntSummaryStatistics, and LongSummaryStatistics of the java.util package. These classes make quick and easy work of calculating total number of elements, minimum value of elements, maximum value of elements, average value of elements, and the sum of elements in a collection of doubles, integers, or longs. Each class's class-level Javadoc documentation begins with the same single sentence that succinctly articulates this, describing each as "A state object for collecting statistics such as count, min, max, sum, and average."

The class-level Javadoc for each of these three classes also states of each class, "This class is designed to work with (though does not require) streams." The most obvious reason for the inclusion of these three types of SummaryStatistics classes is to be used with streams that were also introduced with JDK 8.

Indeed, each of the three class's class-level Javadoc comments also provide an example of using each class in conjunction with streams of the corresponding data type. These examples demonstrate invoking the respective Streams' collect(Supplier, BiConsumer, BiConsumer) method (a mutable reduction terminal stream operation) and passing each SummaryStatistics class's new instance (constructor), accept, and combine methods (as method references) to this collect method as its "supplier", "accumulator", and "combiner" arguments respectively.

The rest of this post demonstrates use of IntSummaryStatistics, LongSummaryStatistics, and DoubleSummaryStatistics. Several of these examples will reference a map of The X-Files television series's seasons to the Nielsen rating for that season's premiere. This is shown in the next code listing.

Declaring and Initializing xFilesSeasonPremierRatings
/**
 * Maps the number of each X-Files season to the Nielsen rating
 * (millions of viewers) for the premiere episode of that season.
 */
private final static Map<Integer, Double> xFilesSeasonPremierRatings;

static
{
   final Map<Integer, Double> temporary = new HashMap<>();
   temporary.put(1, 12.0);
   temporary.put(2, 16.1);
   temporary.put(3, 19.94);
   temporary.put(4, 21.11);
   temporary.put(5, 27.34);
   temporary.put(6, 20.24);
   temporary.put(7, 17.82);
   temporary.put(8, 15.87);
   temporary.put(9, 10.6);
   xFilesSeasonPremierRatings = Collections.unmodifiableMap(temporary);
}

The next code listing uses the map created in the previous code listing, demonstrates applying DoubleSummaryStatistics to stream of the "values" portion of the map, and is very similar to the examples provided in the Javadoc for the three SummaryStatistics classes. The DoubleSummaryStatistics class, the IntSummaryStatistics class, and the LongSummaryStatistics class have essentially the same fields, methods, and APIs (only differences being the supported datatypes). Therefore, even though this and many of this post's examples specifically use DoubleSummaryStatistics (because the X-Files's Nielsen ratings are doubles), the principles apply to the other two integral types of SummaryStatistics classes.

Using DoubleSummaryStatistics with a Collection-based Stream
/**
 * Demonstrate use of DoubleSummaryStatistics collected from a
 * Collection Stream via use of DoubleSummaryStatistics method
 * references "new", "accept", and "combine".
 */
private static void demonstrateDoubleSummaryStatisticsOnCollectionStream()
{
   final DoubleSummaryStatistics doubleSummaryStatistics =
      xFilesSeasonPremierRatings.values().stream().collect(
         DoubleSummaryStatistics::new,
         DoubleSummaryStatistics::accept,
         DoubleSummaryStatistics::combine);
   out.println("X-Files Season Premieres: " + doubleSummaryStatistics);
}

The output from running the above demonstration is shown next:

X-Files Season Premieres: DoubleSummaryStatistics{count=9, sum=161.020000, min=10.600000, average=17.891111, max=27.340000}

The previous example applied the SummaryStatistics class to a stream based directly on a collection (the "values" portion of a Map). The next code listing demonstrates a similar example, but uses an IntSummaryStatistics and uses a stream's intermediate map operation to specify which Function to invoke on the collection's objects for populating the SummaryStatistics object. In this case, the collection being acted upon in a Set<Movie> as returned by the Java8StreamsMoviesDemo.getMoviesSample() method and spelled out in my blog post Stream-Powered Collections Functionality in JDK 8.

Using IntSummaryStatistics with Stream's map(Function)
/**
 * Demonstrate collecting IntSummaryStatistics via mapping of
 * certain method calls on objects within a collection and using
 * lambda expressions (method references in particular).
 */
private static void demonstrateIntSummaryStatisticsWithMethodReference()
{
   final Set<Movie> movies = Java8StreamsMoviesDemo.getMoviesSample();
   IntSummaryStatistics intSummaryStatistics =
      movies.stream().map(Movie::getImdbTopRating).collect(
         IntSummaryStatistics::new, IntSummaryStatistics::accept, IntSummaryStatistics::combine);
   out.println("IntSummaryStatistics on IMDB Top Rated Movies: " + intSummaryStatistics);
}

When the demonstration above is executed, its output looks like this:

IntSummaryStatistics on IMDB Top Rated Movies: IntSummaryStatistics{count=5, sum=106, min=1, average=21.200000, max=49}

The examples so far have demonstrated using the SummaryStatistics classes in their most common use case (in conjunction with data from streams based on existing collections). The next example demonstrates how a DoubleStream can be instantiated from scratch via use of DoubleStream.Builder and then the DoubleStream's summaryStatistics() method can be called to get an instance of DoubleSummaryStatistics.

Obtaining Instance of DoubleSummaryStatistics from DoubleStream
/**
 * Uses DoubleStream.builder to build an arbitrary DoubleStream.
 *
 * @return DoubleStream constructed with hard-coded doubles using
 *    a DoubleStream.builder.
 */
private static DoubleStream createSampleOfArbitraryDoubles()
{
   return DoubleStream.builder().add(12.4).add(13.6).add(9.7).add(24.5).add(10.2).add(3.0).build();
}

/**
 * Demonstrate use of an instance of DoubleSummaryStatistics
 * provided by DoubleStream.summaryStatistics().
 */
private static void demonstrateDoubleSummaryStatisticsOnDoubleStream()
{
   final DoubleSummaryStatistics doubleSummaryStatistics =
      createSampleOfArbitraryDoubles().summaryStatistics();
   out.println("'Arbitrary' Double Statistics: " + doubleSummaryStatistics);
}

The just-listed code produces this output:

'Arbitrary' Double Statistics: DoubleSummaryStatistics{count=6, sum=73.400000, min=3.000000, average=12.233333, max=24.500000}

Of course, similarly to the example just shown, IntStream and IntStream.Builder can provide an instance of IntSummaryStatistics and LongStream and LongStream.Builder can provide an instance of LongSummaryStatistics.

One doesn't need to have a collection stream or other instance of BaseStream to use the SummaryStatistics classes because they can be instantiated directly and used directly for the predefined numeric statistical operations. The next code listing demonstrates this by directly instantiating and then populating an instance of DoubleSummaryStatistics.

Directly Instantiating DoubleSummaryStatistics
/**
 * Demonstrate direct instantiation of and population of instance
 * of DoubleSummaryStatistics instance.
 */
private static void demonstrateDirectAccessToDoubleSummaryStatistics()
{
   final DoubleSummaryStatistics doubleSummaryStatistics =
      new DoubleSummaryStatistics();
   doubleSummaryStatistics.accept(5.0);
   doubleSummaryStatistics.accept(10.0);
   doubleSummaryStatistics.accept(15.0);
   doubleSummaryStatistics.accept(20.0);
   out.println("Direct DoubleSummaryStatistics Usage: " + doubleSummaryStatistics);
}

The output from running the previous code listing is shown next:

Direct DoubleSummaryStatistics Usage: DoubleSummaryStatistics{count=4, sum=50.000000, min=5.000000, average=12.500000, max=20.000000}

As done in the previous code listing for a DoubleSummaryStatistics, the next code listing instantiates a LongSummaryStatistics directly and populates it). This example also demonstrates how the SummaryStatistics classes provide individual methods for requesting individual statistics.

Directly Instantiating LongSummaryStatistics / Requesting Individual Statistics
/**
 * Demonstrate use of LongSummaryStatistics with this particular
 * example directly instantiating and populating an instance of
 * LongSummaryStatistics that represents hypothetical time
 * durations measured in milliseconds.
 */
private static void demonstrateLongSummaryStatistics()
{
   // This is a series of longs that might represent durations
   // of times such as might be calculated by subtracting the
   // value returned by System.currentTimeMillis() earlier in
   // code from the value returned by System.currentTimeMillis()
   // called later in the code.
   LongSummaryStatistics timeDurations = new LongSummaryStatistics();
   timeDurations.accept(5067054);
   timeDurations.accept(7064544);
   timeDurations.accept(5454544);
   timeDurations.accept(4455667);
   timeDurations.accept(9894450);
   timeDurations.accept(5555654);
   out.println("Test Results Analysis:");
   out.println("\tTotal Number of Tests: " + timeDurations.getCount());
   out.println("\tAverage Time Duration: " + timeDurations.getAverage());
   out.println("\tTotal Test Time: " + timeDurations.getSum());
   out.println("\tShortest Test Time: " + timeDurations.getMin());
   out.println("\tLongest Test Time: " + timeDurations.getMax());
}

The output from this example is now shown:

Test Results Analysis:
 Total Number of Tests: 6
 Average Time Duration: 6248652.166666667
 Total Test Time: 37491913
 Shortest Test Time: 4455667
 Longest Test Time: 9894450

In most examples in this post, I relied on the SummaryStatistics classes' readable toString() implementations to demonstrate the statistics available in each class. This last example, however, demonstrated that each individual type of statistic (number of values, maximum value, minimum value, sum of values, and average value) can be retrieved individually in numeric form.

Conclusion

Whether the data being analyzed is directly provided as a numeric Stream, is provided indirectly via a collection's stream, or is manually placed in the appropriate SummaryStatistics class instance, the three SummaryStatistics classes can provide useful common statistical calculations on integers, longs, and doubles.

Friday, March 20, 2015

Displaying Paths in Ant

In the blog posts Java and Ant Properties Refresher and Ant <echoproperties /> Task, I wrote about how being able to see how properties are seen by an Ant build can be helpful in understanding that build better. It is often the case that it'd also be valuable to see various paths used in the build as the build sees them, especially if the paths are composed of other paths and pieces from other build files. Fortunately, as described in the StackOverflow thread Ant: how to echo class path variable to a file, this is easily done with Ant's PathConvert task.

The following XML snippet is a very simple Ant build file that demonstrates use of <pathconvert> to display an Ant path's contents via the normal mechanisms used to display Ant properties.

build-show-paths.xml: Ant build.xml Using pathconvert
<project name="ShowPaths" default="showPaths" basedir=".">

   <path id="classpath">
      <pathelement path="C:\groovy-2.4.0\lib"/>
      <pathelement location="C:\lib\tika-1.7\tika-app-1.7.jar"/>
   </path>
   
   <target name="showPaths">
      <pathconvert property="classpath.path" refid="classpath" />
      <echo message="classpath = ${classpath.path}" />
   </target>

</project>

The simple Ant build file example shown above creates an Ant path named "classpath". It then uses the pathconvert task to create a new property ("classpath.path") that holds the value held in the "classpath" path. With this done, the property "classpath.path" can have its value displayed using Ant's echo task as demonstrated in "Java and Ant Properties Refresher."

When debugging issues with Ant builds, use of Ant's -verbose is often handy. However, sometimes -verbose is a heavier solution than is actually required and often the simple ability to easily identify what properties and paths the Ant build "sees" can be very helpful in diagnosing build issues.

Thursday, March 19, 2015

Validating XML Against XSD(s) in Java

There are numerous tools available for validating an XML document against an XSD. These include operating system scripts and tools such as xmllint, XML editors and IDEs, and even online validators. I have found it useful to have my own easy-to-use XML validation tool because of limitations or issues of the previously mentioned approaches. Java makes it easy to write such a tool and this post demonstrates how easy it is to develop a simple XML validation tool in Java.

The Java tool developed in this post requires JDK 8. However, the simple Java application can be modified fairly easily to work with JDK 7 or even with a version of Java as old as JDK 5. In most cases, I have tried to comment the code that requires JDK 7 or JDK 8 to identify these dependencies and provide alternative approaches in earlier versions of Java. I have done this so that the tool can be adapted to work even in environments with older versions of Java.

The complete code listing for the Java-based XML validation tool discussed in this post is included at the end of the post. The most significant lines of code from that application when discussing validation of XML against one or more XSDs is shown next.

Essence of Validating XML Against XSD with Java
final Schema schema = schemaFactory.newSchema(xsdSources);
final Validator validator = schema.newValidator();
validator.validate(new StreamSource(new File(xmlFilePathAndName)));

The previous code listing shows the straightforward approach available in the standard JDK for validating XML against XSDs. An instance of javax.xml.validation.Schema is instantiated with a call to javax.xml.validation.SchemaFactory.newSchema(Source[]) (where the array of javax.xml.transform.Source objects represents one or more XSDs). An instance of javax.xml.validation.Validator is obtained from the Schema instance via Schema's newValidator() method. The XML to be validated can be passed to that Validator's validate(Source) method to perform the validation of the XML against the XSD or XSDs originally provided to the Schema object created with SchemaFactory.newSchema(Source[]).

The next code listing includes the code just highlighted but represents the entire method in which that code resides.

validateXmlAgainstXsds(String, String[])
/**
 * Validate provided XML against the provided XSD schema files.
 *
 * @param xmlFilePathAndName Path/name of XML file to be validated;
 *    should not be null or empty.
 * @param xsdFilesPathsAndNames XSDs against which to validate the XML;
 *    should not be null or empty.
 */
public static void validateXmlAgainstXsds(
   final String xmlFilePathAndName, final String[] xsdFilesPathsAndNames)
{
   if (xmlFilePathAndName == null || xmlFilePathAndName.isEmpty())
   {
      out.println("ERROR: Path/name of XML to be validated cannot be null.");
      return;
   }
   if (xsdFilesPathsAndNames == null || xsdFilesPathsAndNames.length < 1)
   {
      out.println("ERROR: At least one XSD must be provided to validate XML against.");
      return;
   }
   final SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);

   final StreamSource[] xsdSources = generateStreamSourcesFromXsdPathsJdk8(xsdFilesPathsAndNames);

   try
   {
      final Schema schema = schemaFactory.newSchema(xsdSources);
      final Validator validator = schema.newValidator();
      out.println(  "Validating " + xmlFilePathAndName + " against XSDs "
                  + Arrays.toString(xsdFilesPathsAndNames) + "...");
      validator.validate(new StreamSource(new File(xmlFilePathAndName)));
   }
   catch (IOException | SAXException exception)  // JDK 7 multi-exception catch
   {
      out.println(
           "ERROR: Unable to validate " + xmlFilePathAndName
         + " against XSDs " + Arrays.toString(xsdFilesPathsAndNames)
         + " - " + exception);
   }
   out.println("Validation process completed.");
}

The code listing for the validateXmlAgainstXsds(String, String[]) method shows how a SchemaFactory instance can be obtained with the specified type of schema (XMLConstants.W3C_XML_SCHEMA_NS_URI). This method also handles the various types of exceptions that might be thrown during the validation process. As the comment in the code states, the JDK 7 language change supporting catching of multiple exceptions in a single catch clause is used in this method but could be replaced with separate catch clauses or catching of a single more general exception for code bases earlier than JDK 7.

The method just shown calls a method called generateStreamSourcesFromXsdPathsJdk8(String[]) and the next listing is of that invoked method.

generateStreamSourcesFromXsdPathsJdk8(String[])
/**
 * Generates array of StreamSource instances representing XSDs
 * associated with the file paths/names provided and use JDK 8
 * Stream API.
 *
 * This method can be commented out if using a version of
 * Java prior to JDK 8.
 *
 * @param xsdFilesPaths String representations of paths/names
 *    of XSD files.
 * @return StreamSource instances representing XSDs.
 */
private static StreamSource[] generateStreamSourcesFromXsdPathsJdk8(
   final String[] xsdFilesPaths)
{
   return Arrays.stream(xsdFilesPaths)
                .map(StreamSource::new)
                .collect(Collectors.toList())
                .toArray(new StreamSource[xsdFilesPaths.length]);
}

The method just shown uses JDK 8 stream support to convert the array of Strings representing paths/names of XSD files to instances of StreamSource based on the contents of the XSDs pointed to by the path/name Strings. In the class's complete code listing, there is also a deprecated method generateStreamSourcesFromXsdPathsJdk7(final String[]) that could be used instead of this method for code bases on a version of Java earlier than JDK 8.

This single-class Java application is most useful when it's executed from the command line. To enable this, a main function is defined as shown in the next code listing.

Executable main(String[]) Function
/**
 * Validates provided XML against provided XSD.
 *
 * @param arguments XML file to be validated (first argument) and
 *    XSD against which it should be validated (second and later
 *    arguments).
 */
public static void main(final String[] arguments)
{
   if (arguments.length < 2)
   {
      out.println("\nUSAGE: java XmlValidator <xmlFile> <xsdFile1> ... <xsdFileN>\n");
      out.println("\tOrder of XSDs can be significant (place XSDs that are");
      out.println("\tdependent on other XSDs after those they depend on)");
      System.exit(-1);
   }
   // Arrays.copyOfRange requires JDK 6; see
   // http://stackoverflow.com/questions/7970486/porting-arrays-copyofrange-from-java-6-to-java-5
   // for additional details for versions of Java prior to JDK 6.
   final String[] schemas = Arrays.copyOfRange(arguments, 1, arguments.length);
   validateXmlAgainstXsds(arguments[0], schemas);
}

The executable main(String[]) function prints a usage statement if fewer than two command line arguments are passed to it because it expects at least the name/path of the XML file to be validated and the name/path of an XSD to validate the XML against.

The main function takes the first command line argument and treats that as the XML file's path/name and then treats all remaining command lin arguments as the paths/names of one or more XSDs.

The simple Java tool for validating XML against one or more XSDs has now been shown (complete code listing is at bottom of post). With it in place, we can run it against an example XML file and associated XSDs. For this demonstration, I'm using a very simple manifestation of a Servlet 2.5 web.xml deployment descriptor.

Sample Valid Servlet 2.5 web.xml
<web-app xmlns="http://java.sun.com/xml/ns/javaee"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd"
         version="2.5"> 

    <display-name>Sample Java Servlet 2.5 Web Application</display-name>
</web-app>

The simple web.xml file just shown is valid per the Servlet 2.5 XSDs and the output of running this simple Java-based XSD validation tool prove that by not reporting any validation errors.

An XSD-valid XML file does not lead to very interesting results with this tool. The next code listing shows an intentionally invalid web.xml file that has a "title" element not specified in the associated Servlet 2.5 XSD. The output with the most significant portions of the error message highlighted is shown after the code listing.

Sample Invalid Servlet 2.5 web.xml (web-invalid.xml)
<web-app xmlns="http://java.sun.com/xml/ns/javaee"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd"
         version="2.5">

    <display-name>Java Servlet 2.5 Web Application</display-name>
    <title>A handy example</title>
</web-app>

As the last output shows, things are more interesting in terms of output when the provided XML is not XSD valid.

There is one important caveat I wish to emphasize here. The XSDs provided to this Java-based tool sometimes need to be specified in a particular order. In particular, XSDs with "include" dependencies on other XSDs should be listed on the command line AFTER the XSD they include. In other words, XSDs with no "include" dependencies will generally be provided on the command line before those XSDs that include them.

The next code listing is for the complete XmlValidator class.

XmlValidator.java (Complete Class Listing)
package dustin.examples.xmlvalidation;

import org.xml.sax.SAXException;

import javax.xml.XMLConstants;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import javax.xml.validation.Validator;
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.stream.Collectors;

import static java.lang.System.out;

/**
 * Validate provided XML against the provided XSDs.
 */
public class XmlValidator
{
   /**
    * Validate provided XML against the provided XSD schema files.
    *
    * @param xmlFilePathAndName Path/name of XML file to be validated;
    *    should not be null or empty.
    * @param xsdFilesPathsAndNames XSDs against which to validate the XML;
    *    should not be null or empty.
    */
   public static void validateXmlAgainstXsds(
      final String xmlFilePathAndName, final String[] xsdFilesPathsAndNames)
   {
      if (xmlFilePathAndName == null || xmlFilePathAndName.isEmpty())
      {
         out.println("ERROR: Path/name of XML to be validated cannot be null.");
         return;
      }
      if (xsdFilesPathsAndNames == null || xsdFilesPathsAndNames.length < 1)
      {
         out.println("ERROR: At least one XSD must be provided to validate XML against.");
         return;
      }
      final SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);

      final StreamSource[] xsdSources = generateStreamSourcesFromXsdPathsJdk8(xsdFilesPathsAndNames);

      try
      {
         final Schema schema = schemaFactory.newSchema(xsdSources);
         final Validator validator = schema.newValidator();
         out.println("Validating " + xmlFilePathAndName + " against XSDs "
            + Arrays.toString(xsdFilesPathsAndNames) + "...");
         validator.validate(new StreamSource(new File(xmlFilePathAndName)));
      }
      catch (IOException | SAXException exception)  // JDK 7 multi-exception catch
      {
         out.println(
            "ERROR: Unable to validate " + xmlFilePathAndName
            + " against XSDs " + Arrays.toString(xsdFilesPathsAndNames)
            + " - " + exception);
      }
      out.println("Validation process completed.");
   }

   /**
    * Generates array of StreamSource instances representing XSDs
    * associated with the file paths/names provided and use JDK 8
    * Stream API.
    *
    * This method can be commented out if using a version of
    * Java prior to JDK 8.
    *
    * @param xsdFilesPaths String representations of paths/names
    *    of XSD files.
    * @return StreamSource instances representing XSDs.
    */
   private static StreamSource[] generateStreamSourcesFromXsdPathsJdk8(
      final String[] xsdFilesPaths)
   {
      return Arrays.stream(xsdFilesPaths)
                   .map(StreamSource::new)
                   .collect(Collectors.toList())
                   .toArray(new StreamSource[xsdFilesPaths.length]);
   }

   /**
    * Generates array of StreamSource instances representing XSDs
    * associated with the file paths/names provided and uses
    * pre-JDK 8 Java APIs.
    *
    * This method can be commented out (or better yet, removed
    * altogether) if using JDK 8 or later.
    *
    * @param xsdFilesPaths String representations of paths/names
    *    of XSD files.
    * @return StreamSource instances representing XSDs.
    * @deprecated Use generateStreamSourcesFromXsdPathsJdk8 instead
    *    when JDK 8 or later is available.
    */
   @Deprecated
   private static StreamSource[] generateStreamSourcesFromXsdPathsJdk7(
      final String[] xsdFilesPaths)
   {
      // Diamond operator used here requires JDK 7; add type of
      // StreamSource to generic specification of ArrayList for
      // JDK 5 or JDK 6
      final List<StreamSource> streamSources = new ArrayList<>();
      for (final String xsdPath : xsdFilesPaths)
      {
         streamSources.add(new StreamSource(xsdPath));
      }
      return streamSources.toArray(new StreamSource[xsdFilesPaths.length]);
   }

   /**
    * Validates provided XML against provided XSD.
    *
    * @param arguments XML file to be validated (first argument) and
    *    XSD against which it should be validated (second and later
    *    arguments).
    */
   public static void main(final String[] arguments)
   {
      if (arguments.length < 2)
      {
         out.println("\nUSAGE: java XmlValidator <xmlFile> <xsdFile1> ... <xsdFileN>\n");
         out.println("\tOrder of XSDs can be significant (place XSDs that are");
         out.println("\tdependent on other XSDs after those they depend on)");
         System.exit(-1);
      }
      // Arrays.copyOfRange requires JDK 6; see
      // http://stackoverflow.com/questions/7970486/porting-arrays-copyofrange-from-java-6-to-java-5
      // for additional details for versions of Java prior to JDK 6.
      final String[] schemas = Arrays.copyOfRange(arguments, 1, arguments.length);
      validateXmlAgainstXsds(arguments[0], schemas);
   }
}

Despite what the length of this post might initially suggest, using Java to validate XML against an XSD is fairly straightforward. The sample application shown and explained here attempts to demonstrate that and is a useful tool for simple command line validation of XML documents against specified XSDs. One could easily port this to Groovy to be even more script-friendly. As mentioned earlier, this simple tool requires JDK 8 as currently written but could be easily adapted to work on JDK 5, JDK 6, or JDK 7.

UPDATE (20 March 2015): I have pushed the Java class shown in this post (XmlValidator.java) onto the GitHub repository dustinmarx/xmlutilities.