Next post: Cellular Auto

Strangeparse

Let's say I want to be able to parse an arbitrary expression like
f(f( a + f(a * b * f(c))))

where a, b,c are variables, and f is some function. I want to make a parse tree. Last night I thought of a very unorthodox way to do this.

I know about yacc/lex, pyparsing, and Python's ast. However, for what I have in mind this is all I need:
  • Python supports overloading operators.
  • Python has an "eval" ability to dynamically run code.
My idea is to create Python objects for f, a, b, c, and then *eval* the expression as if it were Python code(!)

The objects I create will have their operators overloaded so that a+b, for example, returns an expression like ['+', a, b]. And in the end, I'll get a parse tree just like I wanted. Here's an example that supports addition, multiplication, and functions taking any number of arguments. (Adding the rest of the operators is simple).
class FunctionSymbol:
  def __init__(self, name):
    self.name = name
  def __call__(self, *args):
    return SubExp(self, list(args))
  def __str__(self):
    return 'FN_'+self.name

class VariableSymbol:
  def __init__(self, name):
    self.name = name
  def __add__(self, other):
    return SubExp('OP_ADD', [self, other])
  def __mul__(self, other): 
    return SubExp('OP_MULT', [self, other])
  def __str__(self):
    return 'VAR_'+self.name

class SubExp(VariableSymbol):
  def __init__(self, op, args):
    self.op = op; self.args= args
  def __str__(self):
    return str(self.op) + '(' + ','.join(map(str, self.args)) + ')'

def strangeparser(s):
  #parse something by evaluating it as if it were Python code
  symbols = {}
  #create objects for the symbols in the string 
  snospace = s.replace(' ','').replace('\t','') + ' '
  import re
  for match in re.finditer('[a-zA-Z_]+', snospace):
    strfound = match.group()
    if strfound not in symbols:
      #assume that if the next character is "(", then it is a function
      if snospace[match.end()]=='(':
        symbols[strfound] = FunctionSymbol(strfound)
      else:
        symbols[strfound] = VariableSymbol(strfound)
  # evaluate it
  try: 
    return eval( s , globals(), symbols )
  except Exception, e: 
    print 'Could not parse. %s' % str(e)
    return None
  
def main():
  tree = strangeparser('f(f( a+f(a * b * f(c))))')
  print tree
  # it works!

I could even use this to write a compiler -- I wrote another script to turn the tree into this,
input:
 f(f( a+f(a * b * f(c))))

output:
 i1=OP_MULT(a,b);
 i2=FN_f(c);
 i3=OP_MULT(i1,i2);
 i4=FN_f(i3);
 i5=OP_ADD(a,i4);
 i6=FN_f(i5);
 i7=FN_f(i6);
 return i7;

I'm currently working on a way to conserve the temporary variables used here, because there is probably a way to reuse them after they aren't needed. What I do is go down to the bottom of the tree, find something that can be evaluated, and replace that deepest node with a temporary variable. I then repeat that process until the whole tree has been "flattened".

The code has been added to GitHub here under the GPLv3.