Streams in Practice

Streams in Practice #

We now discuss a command line application demonstrating how streams help solving problems in practice.

Our application will search through files in a given folder and print all lines containing a match for a given regular expression. We will only search in Java source files in the given folder. When searching for lines matching the regular expression /public static[^=]*\(/ in our own src folder, our application might produce output like this.

/home/me/java-fun-code/src/main/java/sebfisch/ExplainCommands.java
  public static void main(String[] args) {
/home/me/java-fun-code/src/main/java/sebfisch/SrcFileSearch.java
  public static void main(String[] args) {

File names are printed (using their absolute path) before all matching lines from the corresponding Java files.

To compute a stream of Java source file names in a given folder we define the following method.

static Stream<Path> walkJavaFiles(Path root) throws IOException {
  return Files.walk(root)
      .filter(Files::isReadable)
      .filter(path -> path.toString().endsWith(".java"));
}

The predefined method Files.walk expects a Path as argument and returns a stream of paths representing files or directories contained inside the given root path. It might throw an IOException which we pass on to the caller of walkJavaFiles.

We use filter twice to keep only those paths that represent readable files with a .java extension. The method reference Files::isReadable is, in this case, equivalent to the lambda expression path -> Files.isReadable(path).

To simplify our main method, we hard-code the directory and regular expressions to the values used above. Replacing those values with command-line arguments would be straight forward.

final Path srcPath = Path.of("src");
final String regExp = "public static[^=]*\\(";
final Predicate<String> containsMatch = Pattern.compile(regExp).asPredicate();

When javaFiles contains a stream of paths returned by walkJavaFiles, we can use the predicate containsMatch defined here to print matching lines as follows.

javaFiles
    .flatMap(SrcFileSearch::readLines)
    .filter(containsMatch)
    .forEach(System.out::println);

Here, the method reference SrcFileSearch::readLines is equivalent to the lambda expression path -> SrcFileSearch.readLines(path) calling a static method that we will discuss below.

The forEach method expects a Consumer as argument which is a function that does not return a result. It applies the given consumer to each element of the stream it is called on. The method reference System.out::println is equivalent to the following instance of an anonymous class.

new Consumer<Path>() {
  public void accept(Path path) {
    System.out.println(path);
  }
}

Extending the application #

We managed to write our application using a stream pipeline, but as it stands it prints only matching lines and no file names. We can extend the pipeline to print file names as follows.

javaFiles
    .peek(System.out::println); // ONLY THIS LINE IS NEW
    .flatMap(SrcFileSearch::readLines)
    .filter(containsMatch)
    .forEach(System.out::println);

The peek method expects a consumer as argument, applies it to every element, and returns a new stream containing the same elements as the stream it was called on. However, the evaluation order is different than the previous sentence might suggest. The peek method does not apply the consumer to every element of the stream first, and only then create a new stream as result.

Java streams are evaluated on demand. The consumer passed to peek (in our case the method reference System.out::println) is only called when an element is traversed in the resulting stream. Elements are traversed by the terminal operation of the pipeline (in our case forEach), and as a consequence the different invocations of System.out::println in peek and forEach are interleaved when executing the pipeline. As a result, the output of filenames appears directly before the output of matching lines from the corresponding file.

As it stands, our application prints the name of every searched file. We can extend it as follows to print only files that include at least one matching line.

javaFiles
    .filter(file -> readLines(file).anyMatch(containsMatch)) // ADDED THIS LINE
    .peek(System.out::println);
    .flatMap(SrcFileSearch::readLines)
    .filter(containsMatch)
    .forEach(System.out::println);

The lambda expression passed to filter in the new line uses readLines to compute a stream of lines in the given file. The terminal operation anyMatch expects a predicate as argument and returns a boolean result that is true if and only if the stream contains at least one element satisfying the given predicate.

As it stands, our application prints file names relative to the src folder used as search root. We can extend it as follows to print absolute file names instead.

javaFiles
    .map(Path::toAbsolutePath) // THE ONLY NEW LINE
    .filter(file -> readLines(file).anyMatch(containsMatch))
    .peek(System.out::println);
    .flatMap(SrcFileSearch::readLines)
    .filter(containsMatch)
    .forEach(System.out::println);

We use a method reference to the predefined method Path::toAbsolutePath as argument of the map combinator, to convert every searched file name.

By using stream combinators we were able to extend a basic version of our program (that only printed matching lines) in a modular way. Each of the extensions

  • printing file names,
  • printing only relevant file names,
  • and printing absolute instead of relative paths

required one new line of code and no changes in existing code.

Handling exceptions #

Several of the underlying operations in our application like searching through directories as well as opening and closing files can throw exceptions. We will now discuss how the application handles them. The stream pipeline developed before does not need to be changed.

The pipeline starts with a stream of file names returned by the method walkJavaFiles defined above. That method can throw an IOException which we handle in our main method.

try (Stream<Path> javaFiles = walkJavaFiles(srcPath)) {
  javaFiles
      .map(Path::toAbsolutePath)
      .filter(file -> readLines(file).anyMatch(containsMatch))
      .peek(System.out::println)
      .flatMap(SrcFileSearch::readLines)
      .filter(containsMatch)
      .forEach(System.out::println);
} catch (IOException e) {
  System.err.println(e.getMessage());
}

We use a try-with-resources statement to make sure that the stream is closed eventually in addition to handling exceptions. We will discuss managing resources in the next section.

The method readLines is used twice in the pipeline, and so far we have not discussed its implementation. In fact, there is a method with almost the same signature in the predefined Files class.

public static Stream<String> lines(Path path) throws IOException;

This is almost the signature we need except for the possible IOException. We use readLines in arguments of the filter and flatMap combinators. The corresponding functional interfaces Predicate and Function define methods test and apply, and those methods do not declare any exceptions in their signature. As a consequence we cannot pass predicates or functions that may throw exceptions to the stream combinators.

The method readLines is defined as follows.1

private static Stream<String> readLines(Path file) {
  try {
    return Files.lines(file);
  } catch (IOException e) {
    throw new UncheckedIOException(e);
  }
}

This method wraps the Files.lines method and “handles” the IOException by re-throwing it wrapped in an UncheckedIOException. Unchecked exceptions allow us to hide underlying exceptions because they don’t need to be handled or declared in method signatures. As a consequence, we can use readLines in predicates and functions passed to stream combinators. To make sure we actually handle the wrapped exception, we need to add a catch clause for UncheckedIOException in our main method.

try (Stream<Path> javaFiles = walkJavaFiles(srcPath)) {
  javaFiles
      .map(Path::toAbsolutePath)
      .filter(file -> readLines(file).anyMatch(containsMatch))
      .peek(System.out::println)
      .flatMap(SrcFileSearch::readLines)
      .filter(containsMatch)
      .forEach(System.out::println);
} catch (IOException e) {
  System.err.println(e.getMessage());
} catch (UncheckedIOException e) {
  System.err.println(e.getMessage());
}

The necessity to wrap exceptions in unchecked exception types is an indication that the integration of functional programming patterns in conventional languages is not always seamless because functional features are added to existing features like exception handling after the fact. Having both feature sets in mind from the start could potentially help language designers find a more seamless integration.

Managing resources #

The implementation developed so far has a problem. The API documentation for Files.lines includes the following note.

This method must be used within a try-with-resources statement or similar control structure to ensure that the stream’s open file is closed promptly after the stream’s operations have completed.

In this case, using a try-with-resources statement in the definition of readLines would not have the desired effect because the stream is returned from the function and consumed outside of it. Instead we make sure that the stream returned by readLines is closed in a different way.

The first call to readLines in the argument to filter is problematic. In general, terminal operations do not close the streams they consume, and anyMatch is no exception. As a consequence, files are left open after checking if they contain a matching line.

Interestingly, the second use of readLines as argument of flatMap does not have this problem. The API documentation for flatMap contains the following remark.

Each mapped stream is closed after its contents have been placed into this stream.

As a consequence, files are closed correctly when opened for printing matching lines by the second use of readLines. We can change the definition of readLines as follows to make sure files are always closed after consuming the returned stream.

private static Stream<String> readLines(Path file) {
  try {
    return Stream.of(Files.lines(file)).flatMap(s -> s); // CHANGED
  } catch (IOException e) {
    throw new UncheckedIOException(e);
  }
}

We create a one-element stream containing the stream we want to return and call flatMap with the identity function to immediately flatten the created one-element stream. The mentioned property of flatMap ensures that the underlying file is closed when the returned stream is consumed completely.

In general, wrapping a stream inside a one-element stream and then flattening the resulting stream using flatMap with the identity function will return a new stream that contains the same elements as the original stream. Our discussion shows that, in Java, this wrapping-and-flattening operation is not insignificant regarding resource consumtion which is another indication that the integration of functional programming patterns into conventional languages is not always seamless.

Task: Avoid reopening files #

The presented implementation opens some files twice. Modify the stream pipeline in such a way that no files are opened more than once and only matching lines are held in memory.


  1. We will define a different version of this method below. ↩︎