Next post: Cellular Auto


Let's say I want to be able to parse an arbitrary expression like
f(f( a + f(a * b * f(c))))

where a, b,c are variables, and f is some function. I want to make a parse tree. Last night I thought of a very unorthodox way to do this.

I know about yacc/lex, pyparsing, and Python's ast. However, for what I have in mind this is all I need:
  • Python supports overloading operators.
  • Python has an "eval" ability to dynamically run code.
My idea is to create Python objects for f, a, b, c, and then *eval* the expression as if it were Python code(!)

The objects I create will have their operators overloaded so that a+b, for example, returns an expression like ['+', a, b]. And in the end, I'll get a parse tree just like I wanted. Here's an example that supports addition, multiplication, and functions taking any number of arguments. (Adding the rest of the operators is simple).
class FunctionSymbol:
  def __init__(self, name): = name
  def __call__(self, *args):
    return SubExp(self, list(args))
  def __str__(self):
    return 'FN_'

class VariableSymbol:
  def __init__(self, name): = name
  def __add__(self, other):
    return SubExp('OP_ADD', [self, other])
  def __mul__(self, other): 
    return SubExp('OP_MULT', [self, other])
  def __str__(self):
    return 'VAR_'

class SubExp(VariableSymbol):
  def __init__(self, op, args):
    self.op = op; self.args= args
  def __str__(self):
    return str(self.op) + '(' + ','.join(map(str, self.args)) + ')'

def strangeparser(s):
  #parse something by evaluating it as if it were Python code
  symbols = {}
  #create objects for the symbols in the string 
  snospace = s.replace(' ','').replace('\t','') + ' '
  import re
  for match in re.finditer('[a-zA-Z_]+', snospace):
    strfound =
    if strfound not in symbols:
      #assume that if the next character is "(", then it is a function
      if snospace[match.end()]=='(':
        symbols[strfound] = FunctionSymbol(strfound)
        symbols[strfound] = VariableSymbol(strfound)
  # evaluate it
    return eval( s , globals(), symbols )
  except Exception, e: 
    print 'Could not parse. %s' % str(e)
    return None
def main():
  tree = strangeparser('f(f( a+f(a * b * f(c))))')
  print tree
  # it works!

I could even use this to write a compiler -- I wrote another script to turn the tree into this,
 f(f( a+f(a * b * f(c))))

 return i7;

I'm currently working on a way to conserve the temporary variables used here, because there is probably a way to reuse them after they aren't needed. What I do is go down to the bottom of the tree, find something that can be evaluated, and replace that deepest node with a temporary variable. I then repeat that process until the whole tree has been "flattened".

The code has been added to GitHub here under the GPLv3.