f(f( a + f(a * b * f(c))))
where a, b,c are variables, and f is some function. I want to make a parse tree. Last night I thought of a very unorthodox way to do this.
I know about yacc/lex, pyparsing, and Python's ast. However, for what I have in mind this is all I need:
- Python supports overloading operators.
- Python has an "eval" ability to dynamically run code.
The objects I create will have their operators overloaded so that a+b, for example, returns an expression like ['+', a, b]. And in the end, I'll get a parse tree just like I wanted. Here's an example that supports addition, multiplication, and functions taking any number of arguments. (Adding the rest of the operators is simple).
class FunctionSymbol:
def __init__(self, name):
self.name = name
def __call__(self, *args):
return SubExp(self, list(args))
def __str__(self):
return 'FN_'+self.name
class VariableSymbol:
def __init__(self, name):
self.name = name
def __add__(self, other):
return SubExp('OP_ADD', [self, other])
def __mul__(self, other):
return SubExp('OP_MULT', [self, other])
def __str__(self):
return 'VAR_'+self.name
class SubExp(VariableSymbol):
def __init__(self, op, args):
self.op = op; self.args= args
def __str__(self):
return str(self.op) + '(' + ','.join(map(str, self.args)) + ')'
def strangeparser(s):
#parse something by evaluating it as if it were Python code
symbols = {}
#create objects for the symbols in the string
snospace = s.replace(' ','').replace('\t','') + ' '
import re
for match in re.finditer('[a-zA-Z_]+', snospace):
strfound = match.group()
if strfound not in symbols:
#assume that if the next character is "(", then it is a function
if snospace[match.end()]=='(':
symbols[strfound] = FunctionSymbol(strfound)
else:
symbols[strfound] = VariableSymbol(strfound)
# evaluate it
try:
return eval( s , globals(), symbols )
except Exception, e:
print 'Could not parse. %s' % str(e)
return None
def main():
tree = strangeparser('f(f( a+f(a * b * f(c))))')
print tree
# it works!
I could even use this to write a compiler -- I wrote another script to turn the tree into this,
input:
f(f( a+f(a * b * f(c))))
output:
i1=OP_MULT(a,b);
i2=FN_f(c);
i3=OP_MULT(i1,i2);
i4=FN_f(i3);
i5=OP_ADD(a,i4);
i6=FN_f(i5);
i7=FN_f(i6);
return i7;
I'm currently working on a way to conserve the temporary variables used here, because there is probably a way to reuse them after they aren't needed. What I do is go down to the bottom of the tree, find something that can be evaluated, and replace that deepest node with a temporary variable. I then repeat that process until the whole tree has been "flattened".
The code has been added to GitHub here under the GPLv3.