OPAL enables you to transform Java bytecode into a 3-address code (TAC) which is also sometimes called "Quadruples Code". The standard representation provided by OPAL is always in SSA form and is created after performing a low-level, highly configurable data-flow analysis; based on the configured low-level analysis, the amount of additional information about the program's values may differ greatly. In the simplest case, only basic type information is provided. In more advanced cases, constant computations are perfomed, constants are propagated, must-alias information is provided, dead paths due to programming errors/context-dependent impossible paths are automatically pruned, and so on. In all cases the 3-address code (in the following called TAC) immediately provides complete def-use information and the control-flow graph is also reified.
A very simple way to explore the TAC generated by OPAL is to use the TAC tool (org.opalj.tac.TAC
). It is part of the Abstract Interpretation Framework. To use it, start the sbt console, change to the Abstract Interpretation Framework and specify the method for which you want to have the TAC.
current_folder$ sbt shell
> project AbstracInterpretationFramework
> run -source <source jar; e.g., ../bi/target/scala-2.13/resource_managed/test/ai.jar> -class <name of the class; e.g., ai.domain.DeadVariables> -method <name of the mehtod; e.g., initialValueIsAlwaysDead>
If the method is found, you will get the control-flow graph and the TAC generated using a lightweight abstract interpretation/advanced data-flow analysis.
Alternatively, you can download a pre-packaged jar containing the TAC tool from our download site (OPALTACDisassembler.jar); you can run it using java -jar OPALTACDisassembler.jar
.
A very straight forward way to get a method's TAC is to use the respective ProjectInformationKey
as shown next.
Step 0 - import the respective classes:
import org.opalj.ai._; import org.opalj.br._ ; import org.opalj.br.analyses._; import org.opalj.br.instructions._; import org.opalj.tac._
Step 1 - Load a project's class files - here we load a test project which is part of OPAL - and also load the public API of the JDK; the latter is primarily necessary to get the complete type hierarchy.
implicit val p = Project(new java.io.File("OPAL/bi/target/scala-2.13/resource_managed/test/ai.jar"),org.opalj.bytecode.RTJar)
Step 2 - Tell the project that it should make the 3-address code of the method available, when required:
val tac = p.get(ComputeACAIKey)
Step 3 - To get the TAC of a specific method with a body just look it up:
val cf = p.classFile(ObjectType("ai/domain/IntegerValuesFrenzy")).get
val m = cf.findMethod("someSwitch").head
val taCode = tac(m)
In some cases it is desirable to exchange the underlying data-flow analysis that is used as a foundation for creating the TAC. This can easily be done by updating the project's AIDomainFactoryKey
. The latter is used when computing the TAC and can be changed before the key (ComputeTACAIKey
) is passed to the project (<Project>.get(ComputeTACAIKey)
).
E.g., to set the domain to the simplest/most basic supported domain, you can use:
p.updateProjectInformationKeyInitializationData(AIDomainFactoryKey) {
case None => Set(classOf[domain.l0.BaseDomainWithDefUse[_]])
case Some(requirements) => requirements + classOf[domain.l0.BaseDomainWithDefUse[_]]
}
For example, creating the TAC using the simplest domain – and a production build of OPAL – takes roughly one third of the time of using the default domain. However, the default domain provides much more information and is therefore the recommended base analysis. To be more precise, on a Core i7 (4 Cores - Sandy Bridge, 2011) generating the standard TAC for the rt.jar of the JDK 8u121 using the simplest possible domain takes 13 seconds and for the default domain 33 seconds.
To better understand OPAL's three-address code representation, we will discuss some examples.
You can generate the 3-address code on your own using the steps discussed above.
Loops and Exceptions
Given the following very simple loop implemented in Java.
public static void endless() {
while (true) {
System.out.println(System.nanoTime());
}
}
The three-address code will be:
static void endless(){
// <start>, 3 ->
0:/*pc=0:*/ lv0 = java.lang.System.out
1:/*pc=3:*/ lv1 = java.lang.System.nanoTime()
// ! - potentially uncaught exception/abnormal return
// 1 ->
2:/*pc=6:*/ {lv0}/*java.io.PrintStream*/.println({lv1})
// ! - potentially uncaught exception/abnormal return
// 2 ->
3:/*pc=9:*/ goto 0
}
As stated above, the def-use chain and the control-flow graph is directly made available. For example, the assignment statement with index 0 initializes the local variable lv0
with the value of System.out
. This local variable is then used by statement 2 as the receiver object of the println
call which is given the value of the local variable 1 (lv1
).
Given that the underlying analysis by default only uses intra-procedural information, it cannot determine whether the called methods potentially throw any exceptions or not and therefore has to assume that they potentially do. In the above code, the basic block boundaries of the underlying code are determined by empty lines.
On The Origin Of Values
Now, let's assume that the target stream is determined by a parameter:
public static void endless(boolean toErr) {
PrintStream out = toErr ? System.err : System.out;
while (true) {
out.println(System.nanoTime());
}
}
In this case the initial three-address code will be:
static void endless(boolean){
/* PARAMETERS: param1: useSites={0} (origin=-2) */
0:/*pc=1:*/ if({param1} == 0) goto 3
// 0 ->
1:/*pc=4:*/ lv1 = java.lang.System.err
2:/*pc=7:*/ goto 4
// 0 ->
3:/*pc=10:*/ lv3 = java.lang.System.out
// 2, 3 ->
4:/*pc=13:*/ ; // <= NOP can be optimized away
// 7, 4 ->
5:/*pc=15:*/ lv5 = java.lang.System.nanoTime()
// ! - potentially uncaught exception/abnormal return
// 5 ->
6:/*pc=18:*/ {lv1, lv3}/*java.io.PrintStream*/.println({lv5})
// ! - potentially uncaught exception/abnormal return
// 6 ->
7:/*pc=21:*/ goto 5
}
In the above example, the static
method endless
defines, e.g., a parameter which is immediately used by the first statement with index 0 (useSites
); the parameter it not used any further - useSites
for param1
only contains one value. Def/use information is always directly available at a local-variable initialization (DVar
) or usage site (UVar
).
Parameters of methods always get origins in the range [-2-method.parametersCount..-2]
. This way a trivial check (-512 < origin < 0
) is sufficient to determine that a parameter is used. Furthermore, the origin -1
is reserved for this
; if the method is an instance method. For example, a method with the parameters (Object o, int i, double d, Float[] fs)
will have the origins: o -> -2
, i -> -3
, d -> -4
and fs -> -5
independent of the method being static or not. By mapping the explicitly declared parameters as described, an analysis can handle static and instance methods similarily.
Furthermore, given that the def-use information is explicitly reified, the information that the receiver object of the println
call (statement 6) is either lv1
or lv3
, i.e., System.out
or System.err
, is directly available.
Getting Type Information (Type Checks and Type Casts)
The following demonstrates how to get advanced type information about a specific value. Given the following code:
package ai;
class MethodsWithTypeChecks {
public static FileNotFoundException requiredCast(Cloneable o) {
if(!(o instanceof FileNotFoundException)) return null;
return (FileNotFoundException)o;
}
}
We can get the 3-address code (using, e.g., the sbt console
) as follows (the method is part of OPAL's test suite):
import org.opalj._
val p = br.analyses.Project(new java.io.File("OPAL/bi/target/scala-2.13/resource_managed/test/ai.jar"))
val tacAIKey = p.get(tac.ComputeTACAIKey)
val code = tacAIKey(p.classFile(br.ObjectType("ai/MethodsWithTypeChecks")).get.findMethod("requiredCast").head)
The 3-address code of the above method is shown next:
static java.io.FileNotFoundException requiredCast(java.lang.Cloneable){
/* PARAMETERS: param1: useSites={0,4,5} (origin=-2) */
0:/*pc=1:*/ lv0 = {param1} instanceof java.io.FileNotFoundException
1:/*pc=4:*/ if({lv0} != 0) goto 4
// 1 ->
2:/*pc=7:*/ lv2 = null
3:/*pc=8:*/ return {lv2}
// 1 ->
4:/*pc=10:*/ (java.io.FileNotFoundException) {param1}
// ! - potentially uncaught exception/abnormal return
// 4 ->
5:/*pc=13:*/ return {param1}
}
Given the method's three-address code, we can now get the definition sites and the upper type bound of the method's return values as follows:
Get all return statements using the CFG (the implicit cast _.asBasicBlock
is safe, because a CatchNode
cannot be a predecessor of an ExitNode
):
val returnStmts = code.cfg.normalReturnNode.predecessors.iterator.map(bb => code.stmts(bb.asBasicBlock.endPC))
Use simple pattern matching to get the defintion sites and the value information.
for { tac.ReturnValue(pc,tac.UVar(v,defSites)) <- returnStmts} {
println(defSites.mkString(", ") + " => "+ v.asDomainReferenceValue)
}
In this case, the def-site is 2 for the first return statement (index: 3) and -2
for the last return statement; i.e., in the latter case the value of the first (formal/explicitly declared) parameter is returned, but now the type is FileNotFoundException with Cloneable
; i.e., if the method returns successfully (statement 5), then the returned value inherits from java.io.FileNotFoundException
and also implements the marker interface Cloneable
. Note that the precise type information that is available is determined by the underlying data-flow analysis; however, type information at the described level is generally available.
[-2...-2-#Parameters]
.[0..index of the last instruction]
ValueExpr
.pc
s of those instructions are computed by negating the original pc
and subtracting 1 (-pc-1
).pc
directly references the program counter/index into the instruction array of the original bytecode instruction. The pc
can generally be used to lookup line number etc. information in the original code attributes. (Recall that negative pcs have to be transformed first if necessary.)