Read JavaScript Source Code, Using an AST

Bart Ledoux

Let’s say you have a big JavaScript file, remaining from the old days. It’s 70K lines long and you desperately need to split it up using webpack or consorts. Then you need know what function or constants it exposes to the global scope.

Let a computer read through your code and extract what you want from it.

It’s a job for Abstract Syntax Trees (AST).

AST Hero

The following example is small. Our mission, should you choose to accept it, will be to extract the names of all the functions exposed in the global scope.

// test the code
function decrementAndAdd(a, b){
   function add(c, d){
      return c + d;
   }
   a--;
   b = b - 1;
   return add(a,b)
}

// test the code
function incrementAndMultiply(a, b){
    function multiply(c, d){
      return c * d;
    }
    a++;
    b = b + 1;
    return multiply(a, b)
}

Result should be ["decrementAndAdd", "incrementAndMultiply"].

Learn JavaScript basics for free

Parsing the Code

An AST is the result of parsing code. For JavaScript, an AST is a JavaScript object containing a tree representation of your source. Before we use it, we have to create it. Depending on the code we are parsing, we choose the appropriate parser.

Here since the code is ES5-compatible, we can choose the acorn parser.

Here are some of the most well known Open Source ECMAScript parsers:

ParserSupported LanguagesGithub
acornesnext & JSX (using acorn-jsx)https://github.com/acornjs/acorn
esprimaesnext & olderhttps://github.com/jquery/esprima
cherowesnext & olderhttps://github.com/cherow/cherow
espreeesnext & olderhttps://github.com/eslint/espree
shiftesnext & olderhttps://github.com/shapesecurity/shift-parser-js
babelesnext, JSX & typescripthttps://github.com/babel/babel
TypeScriptesnext & typescripthttps://github.com/Microsoft/TypeScript

All parsers work the same. Give it some code, get an AST.

const { Parser } = require("acorn")

const ast = Parser.parse(readFileSync(fileName).toString())

The TypeScript parser syntax is a little different. But it is well documented here.

This is the tree obtained by parsing the code in black with @babel/parser.

Tree graph from @babel/parser

Traversing

In order to find what we are going to extract, it’s often better not to treat the whole AST at once. It’ll be a large object with thousands of nodes even for small code snippets. So, before we extract the information we need, we refine our search.

The best way to do that is to only filter the tokens one cares about.

Once again, many tools are available to do this traversing part. For our example we are going to use recast. It’s very fast and has the advantage of keeping a version of your code untouched. This way, it can return the part of your code you want with it’s original formatting.

While traversing, we’ll find all the function tokens. This is why we use the visitFunctionDeclaration method.

If we wanted to look at variable assignments we would use visitAssignmentExpression.

const recast = require('recast');
const { Parser } = require("acorn");

const ast = Parser.parse(readFileSync(fileName).toString());

recast.visit(ast, visitFunctionDeclaration(path){
  // the navigation code here...

  // return false to stop at this depth
  return false;
})

AST node types

Usually the names of the token types are not obvious. One can use ast-explorer to look up the types researched. Just paste your code in the left panel, select the parser you are using and “voilà!”. Browse the parsed code on the right and find what Node Type you’re looking for.

Shallow or deep

We don’t always want to look at every level of the tree. Sometimes we want to do a deep search while other times we just want to look at the top layer. Depending on the framework, the syntax differs. Fortunately, it’s usually well documented.

With recast, if we want to stop searching at the current depth, just return false when you are done. This is what we did before. If we want to traverse through (go deep), we can use this.traverse(path) like you’ll see below.

With @babel/traverse no need to tell babel where to continue. One only needs to specify where to stop with a return false statement.

recast.visit(ast, visitFunctionDeclaration(path){
  // deal with the path here...

  // run the visit on every child node
  this.traverse(path);
})

We went from a very broad search to a smaller sample. We can now extract the data we need.

The path object passed to the visitFunctionDeclaration is a NodePath. This object represents the connection between a parent and child AST Nodes. This path on its own is of no use to us because it represents the link between the function declaration and the body of the function.

Using ast-explorer, we find the contents of he path we are looking for.

The classic thing to do: path.node. It gets the child Node in the parent-child relationship. If you chose to search functions, the node in path.node will be of type Function:

const functionNames = [];
recast.visit(ast, visitFunctionDeclaration(path){
  console.log(path.node.type); // will print "FunctionDeclaration"
  functionNames.push(path.node.id.name); // will add the name of the function to the array

  // return false to avoid looking inside of the functions body
  // we stop our search at this level
  return false;
})

Try wrapping traversing functions in each other to look at subtrees. The code below will return every function that’s exactly on the second level down. It would not recognize a function in a function in a function:

const functionNames = [];
recast.visit(ast, visitFunctionDeclaration(path){
  var newPath = path.get('body');

  // subtraversing
  recast.visit(newPath, visitFunctionDeclaration(path){
    functionNames.push(path.node.id.name);
    return false;
  })

  // return false to not look at other functions contained in this function
  // leave this role to the sub-traversing
  return false;
})

Mission Accomplished!! 🏅

We programmatically found all the function names. We could as easily find the names of the arguments, or the exposed variables.

Glossary

AST Node one object in a tree. Examples: function declaration, variable assignment, object expression

NodePath link between a parent Node and a child Node in a tree

NodeProperty parts of the definition of the node. Depending on the node, one can have just a name or more info

  Tweet It

🕵 Search Results

🔎 Searching...

Sponsored by #native_company# — Learn More
#native_title# #native_desc#
#native_cta#