Skip to content

Latest commit

 

History

History
823 lines (586 loc) · 27 KB

README.adoc

File metadata and controls

823 lines (586 loc) · 27 KB

Classif

Build Status Code Coverage

A library to sift through java classes using a structural search.

Why would anyone want something like this? Because there are policies on how the code should be organized, what is safe to use and how, etc. Classif is the first stepping stone for tools that can check such policies. There are tools that can do similar things with the source code but Classif works on the bytecode level.

Classif is not usable on its own. It can sift through the Java code elements that are thrown at it but it cannot find them on its own. To use Classif, you need something that will scan the "classpath" (whatever that might mean in your case) and inform Classif about what you found. Classif can then run its queries on top of your findings. See Developer Guide below for more thorough explanation on how to do that.

Examples

Each of these is actually implemented as a test here.

  1. Match anything annotated by annotation Anno with attribute a equal to 'x':

    @a.Anno(a = 'x') ^;
  2. Match anything that is annotated by annotation Anno with attribute a not equal to ’x':

    @a.Anno(a != 'x') ^;
  3. Match anything that is not annotated by Anno:

  4. Match any type in the package pkg (as in Java, type declaration needs to contain the {} pair):

    type ^pkg.* {}
  5. Match any class (i.e. not an interface, enum or annotation type) that not in the package pkg (notice the use of negation and regex):

    class ^!/p[kK]g/.* {}
  6. Match any interface in package pkg or any subpackage of it:

    interface ^pkg.** {}
  7. Match any type whose name ends in Private and which is part of any package:

    type ^**./.*Private/ {}
  8. Match anything that (indirectly) uses a class annotated as @Unstable:

    ^ uses %c; @Unstable class %c=* {}
  9. Match any type that is annotated @Unstable and is used by something annotated @Stable:

    match %e; @Unstable type %e=* {} @Stable * uses %e;

    or

    @Unstable type ^ usedby %c {} @Stable %c=*;

    (the latter only works if supported by the underlying structure provider, while the former should always work)

  10. Match anything that uses some direct implementation of an interface Iface:

    ^ uses %impl; type %impl=* directly implements Iface;
  11. Match anything that uses some type that is either itself annotated @Beta or extends or implements a type annotated as @Beta:

    ^ uses %beta | %extendsBeta | %implementsBeta;
    @Example11.Beta type %beta=* {}
    @Example11.Beta type %betaForExtends=* {}
    @Example11.Beta type %betaForImplements=* {}
    type %extendsBeta=* extends %betaForExtends {}
    type %implementsBeta=* implements %betaForImplements {}
  12. Match a method called m with any number of parameters that is declared in a class that extends a class declared in the package x.y that extends java.io.InputStream. Additionally the class with the method m implements an interface that declares a method k with any number of parameters. The method m is annotated with an annotation that has an attribute called attr with default value of 3:

    class * extends %b implements %i {
      @%a
      ^m(**);
    }
    
    class %b=x.y.* extends java.io.InputStream;
    
    interface %i=* {
      k(**);
    }
    
    @interface %a=* {
        int attr() default 3;
    }

User Guide

Classif matches Java elements with certain structure. You define the needed structure using a "query" that strongly resembles the actual Java declarations.

Quite often you will want to express dependencies between different declarations and for that you can use variables within the query.

So without further ado, let dive in and learn to write the queries.

Basics

What to return

In a simple query, you specify what you want to be the result of the query by using the ^ in front of the name of the element. E.g. query

class ^java.lang.Object {}

will return the java.lang.Object class.

In case you want to match more things (like for example a class and some method), you might want to use a different way of declaring what to match by the query.

match %class | %method;

class %class=java.lang.Object {}

type * {
  %method=/_.*/(**);
}

Whoa, that’s a bit of a mouthful. But what can be seen there is that you can declare variables that match certain elements in the query (in the example above the %class and %method are variables "assigned" to the matching elements) and use them in the match statement (which needs to be the first statement in the query).

Btw. the above example will match the java.lang.Object class and any method with name beginning with and underscore that has any number of parameters and is present in any type (regardless of whether it is a class, interface, enum, etc).

What’s not specified is not matched

This is the general principle. You can match different elements by visibility, annotation presence, contents, usage, etc.

For example:

public class * {}

will match all public classes (but not interfaces, enums or annotation types), while

class * {}

will match the classes regardless of their visibility - all private, package private, protected and public classes will be matched.

Matching Types

Matching by modifiers

If you want to constrain the matched types by their visibility, you can do so by using one of the public, protected, private or Classif’s own packageprivate qualifiers. The packageprivate modifier is required to distinguish between "any visibility", which in Classif is expressed by the lack of any visibility modifier, and "package private visibility" which is what Java assumes when there is on other visibility modifier.

There are also other modifiers supported on the types: static, final, abstract and strictfp, each as understood by your favorite Java compiler..

Some examples:

public final type **./.*Impl/ {}

matches all final types in any package whose names end with "Impl".

packageprivate enum * {}

matches all package private enums.

Tip

As mentioned above, to return anything from a query, you need to use the ^ operator in front of the name of the returned element or the match statement mentioning the named elements. So to return all the package private enums, you’d write:

packageprivate enum ^* {}

In addition to specifying single modifiers, you can also "or" multiple together like so:

public|protected final static class * {}

will match all public or protected static final classes.

Finally, you can also negate the modifiers:

!public static !final class * {}

matches all classes that are static, not final and not public.

!public|static type * {}

matches all types that are either not public or static.

Matching by annotations

You can match not only by annotation presence on an element, but also by basic checks on the attribute values.

Specifying an annotation again resembles the declarations in the Java source code.

Note
Classif doesn’t assume anything about the annotation retention. If the calling code is able to supply also annotations with the source retention, they will be considered. If on the other hand the calling code obtains the declarations from the compiled bytecode, the annotation with the source retention wil not be available and therefore not considered.

Basic example:

@javax.persistence.Entity
public class * {}

will return all public classes annotated as JPA entities.

To find something that is not annotated by some annotation you can write:

[email protected]
@com.acme.MyAnno
type * {}

which will find all types that are not annotated by the javax.persistence.Entity annotation but are annotated by the com.acme.MyAnno annotation.

To match by annotation attributes, you can write something like this:

@javax.persistence.Entity(name != "")
class * {}

which will match all JPA entities with an explicitly assigned name (this stems from the fact that the name attribute of the Entity annotation happens to have an empty string as its default value).

Notice that you can use more than just assignment when matching the attribute values. The allowed operators are: =, !=, >, >=, < or <=. Obviously the less/greater operators only make sense on the numeric attribute values.

When it comes to specifying the value of an attribute to match, there is again a couple of options. You can either specify the value as in the source code, e.g.

@MyAnnotation(stringAttribute = "val", intAttribute > 3, typeAttribute != java.lang.Object.class, enumAttribute = MyEnum.VALUE, arrayAttribute = {1, 2}, annotationAttribute = @MyOtherAnnotation(attribute = 42)) class * {}

or you can try using regular expressions for matching strings:

@javax.persistence.Entity(name != /.*Private/) class * {}

which will match all JPA entity classes with an explicit name attribute which doesn’t end with "Private".

additionally, you can specify that you don’t actually care about the value using *.

Finally, you can check whether an annotation attribute has a value different from its default value like so:

@javax.persistence.Entity(name != default) class * {}

This will match all JPA entity classes with an explicit name. Note that this is essentially the same as our first example above with the only difference being that you don’t have to know the default value.

The annotation attributes also support globbing. I.e. you can put a wildcard in place of a single or many attributes.

@com.acme.Acme(*) type * {}

will match any type annotated with the @com.acme.Acme annotation with a single attribute specified.

@com.acme.Acme(**) type * {}

will match any annotated with the @com.acme.Acme annotation with zero or more attributes of any name with any value.

Values can be replaced by a *, too, meaning, somewhat obviously, "any value". If you happen to match an array value, like for example:

@java.lang.annotation.Target(value = {java.lang.annotation.ElementType.TYPE, **}) @interface * {}

you can use globbing of the values as well, as you probably have guessed from the provided example. The example will match any annotation type that is itself annotated with the @java.lang.annotation.Target annotation with the value attribute having the TYPE as the first element in its values, followed by zero or more other element types. Requiring the TYPE to be the first in the array is somewhat restrictive, so you could update it to read {, java.lang.annotation.ElementType.TYPE, } which would make the query match with TYPE on any position in the array.

Matching by names

In the above examples we were mostly using * in place of a type name. That is one of the special symbols supported by Classif. A single * stands for "any type in any package". Sometimes though we need to be more specific. That’s why Classif also supports full featured globbing of the fully qualified type names.

The single * is a special case put in place for convenience. In a normal case the fully qualified type names are globbed similarly to Ant path expressions.

  • * stands for single part of the hierarchical name

  • ** stands for any number of parts (0 to many) of the hierarchical name

  • a sequence of characters stands for the single part of the hierarchical name with the same name

  • the parts of the hierarchical name are separated by .

  • instead of a sequence of characters, one can also use a regex enclosed in a pair of /.

Ok, that’s a little bit dense so let’s explore it on a couple of examples. Let’s suppose we want to match the type with the fully qualified name com.acme.util.StringUtils. Here is a couple of ways how to do it using the above described globbing features:

com.acme.util.StringUtils

The simplest thing to do is to simply write down all parts of the hierarchical fully qualified name in full.

com.*.util.*

In here we’re matching the acme and StringUtils parts of the name using a *. This expression would of course match any other class in any package with 3 hierarchical name parts first of which would be com and the third one util with the middle being anything. This expression would not match com.util.Clazz though because it has too few parts of the name as well as it would not match com.acme.util.internal.Misc because that has too many parts of the fully qualified name.

**.StringUtils

This expression will match a class called StringUtils located in any package no matter how deep the hierarchy of the package name. Contrast that to *.StringUtils which would only match the class StringUtils in packages with a single part (i.e. the package name without any dots in them). Also notice that the ** doesn’t match just the packages. It merely matches parts of the hierarchical name, so if the class StringUtils was an inner class of another class (of any name, in this concrete example) it would be matched also.

com.**.StringUtils

Similar to the above example but showing that you can put the ** wildcard in any place of the hierarchical name.

**./.*Utils/

In this example we’re using a regular expression .*Utils to match the class name and the ** wildcard to match a package of any depth.

Note
The regular expression always matches only a single part of the hierarchical name.

The whole name can further be negated:

type !java.lang.** {}

would return all types that are not from the java.lang package or any of its sub-packages.

Matching by type parameters

Types can of course be parameterized. To match types by their type parameters, Classif has you covered, of course.

Note
Classif conflates wildcards and type variables into a single concept referred to simply as ?. This is because with Classif it is not possible to reference the type variables in the later declarations. This may change in the future.

Let’s go through some examples of why conflating the wildcards and type variables is kinda ok for a tool like Classif and also to explain how to use the matching by type parameters.

type * extends java.util.Collection<java.io.Serializable> {}

This simple example will match any type with any name that extends Collection<Serializable>. The type may or may not be parameterized itself (we don’t declare any requirements on the type variables so they’re not considered in the match).

The above declaration will match types like:

// imports ommitted
public interface SerializableCol extends Collection<Serializable> {
  //...
}

but will NOT match:

public interface StringCol extends Collection<String> {
  // ...
}

That’s because we’re matching for a concrete type parameter and we’re not checking any of its qualities. So for Classif, Collection<Serializable> is just different from Collection<String> (as is for Java compiler, too, actually).

Another example:

type *<? extends java.io.Serializable> {}

This will match any type with a single type parameter that extends Serializable (notice that we don’t specify any "name" of the type pamareter, we just use ?).

This will match types declared like this for example:

public class Serializator<T extends java.io.Serializable> {
  // ...
}

But it will NOT match a type declared like this (however such declaration is silly):

public class StringSerializator<T extends java.lang.String> extends Serializator<T> {
  // ...
}

because the type variable bounds do not match (even though String implements Serializable). To match both Serializator and StringSerializator (and any other type like them), you’d need to write:

type ^*<? extends %s|%es> {}

type %s=java.io.Serializable {}
type %es=* extends java.io.Serializable {}

Now if you for example wanted to match all types with a single type parameter no matter their type or anything, you can use Classif globbing:

type ^* extends java.util.Collection<*> {}

This will match any type that extends Collection and it doesn’t matter if the type parameter of the collection is a concrete type or a type variable.

You can also use globbing for saying "I don’t care about type parameters":

type ^*<**, ? extends *[], **> {}

This says, "find me all types that for some silly reasons require their type parameters to be arrays (of any type)". The ** before and after are saying that any other type parameters (if any at all) may precede or succeed the array-requiring type parameter.

Matching by contents

Often you will want to match a type by its "contents". I.e. what fields and methods are declared in it. Taking inspiration from the regular Java syntax, Classif can express such requirements like this:

public type **./.*Util/ {
  public <init>(**) {}
}

This would match any class ending in "Util" in any package that happens to have a public constructor with any number of arguments (0 or more). This is usually considered a code smell because utility classes usually only contain static helper methods and therefore should not usually be instantiated.

Note
As explained further down, Classif uses <init> to refer to a constructor so that it can match it without having to know the name of the enclosing class.

Matching by usage

If the type structure provider connected to Classif supports it (by default, Classif doesn’t provide any), the queries can also match types by their usage in other classes/methods/fields declarations (not in the code of the methods).

For example, to find all types used in declarations of another type, you could write:

type ^* directly usedby %field | %method {}

class MyClass {
  %field=*;
  %method=*(**);
}

If you haven’t specified the directly before usedby the search for usages would be recursive.

A more useful example of this could be:

match %type | %method | %field;

@Stable type %type=* uses %unstable {}

@Stable type * {
  public %method=*(**) uses %unstable;
}

@Stable type * {
  public %field=* uses %unstable;
}

@Unstable public|protected type * {}

If we had @Stable and @Unstable annotations that would mark types in the codebase as stable or unstable parts of the API, the above would match all the types, fields or methods from the stable API that use some unstable API. Note that because we haven’t declared directly uses but merely uses, the search is performed recursively.

Note
It is not prescribed what is exactly meant by "use". It is up to the type structure provider supplied to Classif to establish that.

Matching Hierarchy

You now may wonder what exactly is matched by type *;. Only top level types or also inner classes? The answer is that it depends :) By default this would match all types, top-level and inner. If on the other hand you want to really only match top-level types, you need to tell Classif about it like so:

#strictHierarchy;

type * {}

The #strictHierarchy is a "pragma" that tells Classif to consider the hierarchy precisely. type * {} will only ever match top-level types, because it itself is a top-level declaration.

This takes into account the structure of the types, of course. If you for example wanted to find all types that have some inner type, you could write:

type ^ {
  type * {}
}

Without the #strictHierarchy pragma, this would match all types that have an inner class, regardless of whether they themselves are top-level or inner. With the pragma, it would only return top-level types with an inner class.

Matching Methods

We’ve already seen some example of the fact that Classif can work with methods. In this chapter we will go into the details of what is possible and how.

In the normal Java code, methods are always enclosed in some class. This is of course reflected in Classif quries.

interface * {
  default ^*(**);
}

would return all default methods of any interface (public or private).

Let’s try to decipher that. First we declare what type should the method be in (it should be an interface type with any name and any visibility (because we don’t require any visibility)). Then we declare the method in that type - it should be a default method with any name (*) and any number of arguments (glob **). The ^ tells the query to return the method as the result of the query.

If our query looked like this:

interface ^* {
  default *(**);
}

we’d be looking for all interface types that have at least one default method.

Matching by annotations

Similarly to types, methods, too, can be queried by the declared annotations.

Note
Classif currently doesn’t support type-use annotations introduced in Java 8.

If you wanted to convert all method-based JPA definitions to field-based definitions, because your project policy dictates so, you could find all the violators like this:

type * {
  @javax.persistence.*
  ^*(**);
}

The above query could be rephrased as "In all types, look for any method with any number of parameters that is annotated by an annotation from the javax.persistence package.

As with types, Classif supports specifying the required annotation attributes.

Matching by names

This is very much the same as with types except the fact that method names are simple identifiers and don’t contain any hierarchy. It therefore doesn’t make sense to use the ** glob to match them.

The method names can either be spelled out in full, matched by the * glob or matched using a regular expression:

type * {
  namedMethod();

  *(int);

  /__.*/(**);
}

This would match any type that would have a method called namedMethod that would have no parameters, in addition the type would also have a method of any name with a single parameter of type int and finally the type would have a method with any number of parameters and a name starting with __.

Matching by return type

You may have noticed that in all the examples above, the methods we were looking for lacked any information about their return type. In accordance with the Classif principles, we didn’t care about the return type and therefore we didn’t declare any.

If you wanted to match methods based on the return type though, you could certainly do so.

type * {
   !void /set.*/(*);
}

This could be rephrased as "find all setters that don’t return void". We see a couple of things here. First we don’t specify the visibility, so it is not considered. Then we specify the return type (as we would in Java code) but we negate it. Then we use a regular expression to match the name of the method. Finally we require that the method has a single parameter of any type.

Matching by parameters

We’ve already seen examples of matching by method parameters in the sections above. That’s because Classif can only distinguish a field declaration from a method declaration by the presence of the parameters on the method (this is the only violation of the "what’s not specified is not considered" principle).

Therefore you always need to specify what kind of parameters a method should have. The parameters are matched using a glob, so you can either spell out their type names, use for matching any type or * to match any number of parameters.

type * {
  methodWithNoParameters();

  methodWithOneParameter(*);

  methodWithZeroOrMoreParameters(**);

  methodWithIntAndPossiblySomeOtherParameters(int, **);
}

Of course, you can also match methods based on the presence of annotations on the method parameters.

type * {
  *(@javax.enterprise.event.Observes *);
}

This will find all the CDI event handlers. In any type, look for methods with any name that have a single parameter of any type that is annotated by the @Observes annotation.

Matching by throws declarations

As you can declare what exceptions can be thrown from a java method, you can also match the methods using the thrown exceptions with Classif.

type * {
  ^*(**) throws %e;
}

class %e=* extends java.lang.RuntimeException;

This will return all the methods in all types that are declared to throw any exception that inherits from RuntimeException. Methods don’t need to declare that and so this may be flagged in some way.

Matching by usage

Similarly to types, methods can also be matched by what types they use. This can either be a direct usage (e.g. the return type of a method, one of its parameters, …​) or indirect usage as well. The semantics of what determines an indirect usage of a type is left to the structure provider that is supplied to Classif.

type * {
  public ^*(**) uses sun.misc.Unsafe;
}

This will return all the methods that somehow reference the sun.misc.Unsafe class. Again the semantics of what is a use of a type is left to the structure provider.

Matching by overriding

You can also match all methods that override some other method.

type * {
    ^*(**) overrides;
}

This will return all methods in all types that override some other method from any supertype.

type ^* {
  *(**) overrides from %t;
}

@com.acme.DoNotOverride
type %t=* {}

This will return all types that declare some methods that override methods declared in any type annotated by the @com.acme.DoNotOverride annotation. Yes, this is a contrived example.. :)

Matching by default value

If you want want to for example find all annotation types that declare some attributes without a default value, you could:

@interface ^* {
  *() !default;
}
Note
Because what is not specified is not considered, Classif somehow needs to express the lack of a default differently than Java, which merely omits the declaration of the default value.

It is also possible to match by certain default value:

@interface * {
  ^*() default = {*, *, **}
}

This will return all the annotation attributes of any annotation type that have a default value which is an array with 2 or more elements.

Matching by type parameters

This is very much the same as in the case of types.

Matching Fields

Matching fields is very similar to matching methods, only simpler, because fields don’t have that many moving parts.

type ^* {
  public !final *;
}

This will match any type that has a public mutable field of any name and any type.

Matching different types of elements in a single statement

It sometimes might be too verbose to wrap everything in a type * {…​}. Classif therefore supports "generic" matches that can match types, fields or methods using a single declaration. Because all these types don’t share many common characteristics, you can only match by visibility, the annotations present on them or on the fact whether they use some type. Notice that you cannot even match them by name, because that type names are hierarchical, whereas method and field names aren’t.

@com.acme.Stable public|protected ^* uses %u;

@com.acme.Unstable type %u=*;

This will return any public or protected type, method or field annotated as @Stable that uses, in any sense determined by the structure provider, a type that is annotated as @Unstable.

Developer Guide

TODO