Sunday, March 9, 2008

python4ply

python4ply is a Python parser for the Python language. The grammar definition uses PLY, a parser system for Python modelled on yacc/lex. The parser rules use the "compiler" module from the standard library to build a Python AST and to generate byte code for .pyc file.

This is a place for you to leave comments about python4ply and its documentation.

5 comments:

Ralph said...

Very interesting, thanks.

Some typos: The new t_DEC_NUMBER() definition doesn't have the _-added regexp given previously but the old one. Also, is it wise to allow 42___ as a number?

t_BIN_NUMBER() allows 0b_ in the regexp.

Cheers, Ralph Corderoy.

Andrew Dalke said...

Thanks. I fixed the t_DEC_NUMBER definition and updated the online version of the tutorial.

I thought about the "_" problem and decided to leave it the way it is. The point was to show how to tweak the lexer, and I think I gave enough warnings that there are things like this to worry about.

I'm taking the idea from Perl. Trying it out now I see it allows things like:

$a = 1_2_3_4;
$b = 5__6;
$c = 0b__;
print "a=$a and b=$b and c=$c\n";

a=1234 and b=56 and c=0

which matches what I did. Though I didn't plan it that way.

Jay said...

Andrew,

Would it be possible to use this as a way to maintain backwards compatibility of 2.5 code with 3.0?

Seems like it wouldn't be too hard syntax-wise. I guess the big thing would be wrapping API's that have changed. Would it be possible to have two implementations of dict, for instance? (Well, I guess there'd likely be one implementation, the 3.0 one, with a wrapper for the 2.5 interface.) Any thoughts? Thanks!

Ralph said...

Hi Andrew, I think you're right to leave the "_" problem as it is since, as you say, the code is pedagogical.

However, unlike Perl which gives int(0) for 0b_, t_BIN_NUMBER() attempts int('', 2) which gives an exception so the code doesn't match Perl (which some may say is a good thing).

Cheers, Ralph Corderoy.

Andrew Dalke said...

Okay, I fixed t_BIN_NUMBER so it raises a syntax error for cases like "0b_". Thanks again for pointing that out, Ralph.

Jay? Python 3 will come with a 2to3 converter which does syntax-level transformation of Python 2 code to Python 3. I think that's the tool you're thinking of.

There are limitations in the conversion. One thought I've had is to modify the AST and add run-time checks to either support Python 2 idioms under Python 3 (eg, dict.keys() becomes a set view rather than a list) or to generate warnings in Python 2 that code will break in Python 3.