Sunday, March 9, 2008

python4ply

python4ply is a Python parser for the Python language. The grammar definition uses PLY, a parser system for Python modelled on yacc/lex. The parser rules use the "compiler" module from the standard library to build a Python AST and to generate byte code for .pyc file.

This is a place for you to leave comments about python4ply and its documentation.

10 comments:

Ralph Corderoy said...

Very interesting, thanks.

Some typos: The new t_DEC_NUMBER() definition doesn't have the _-added regexp given previously but the old one. Also, is it wise to allow 42___ as a number?

t_BIN_NUMBER() allows 0b_ in the regexp.

Cheers, Ralph Corderoy.

Andrew Dalke said...

Thanks. I fixed the t_DEC_NUMBER definition and updated the online version of the tutorial.

I thought about the "_" problem and decided to leave it the way it is. The point was to show how to tweak the lexer, and I think I gave enough warnings that there are things like this to worry about.

I'm taking the idea from Perl. Trying it out now I see it allows things like:

$a = 1_2_3_4;
$b = 5__6;
$c = 0b__;
print "a=$a and b=$b and c=$c\n";

a=1234 and b=56 and c=0

which matches what I did. Though I didn't plan it that way.

Jay Camp said...

Andrew,

Would it be possible to use this as a way to maintain backwards compatibility of 2.5 code with 3.0?

Seems like it wouldn't be too hard syntax-wise. I guess the big thing would be wrapping API's that have changed. Would it be possible to have two implementations of dict, for instance? (Well, I guess there'd likely be one implementation, the 3.0 one, with a wrapper for the 2.5 interface.) Any thoughts? Thanks!

Ralph Corderoy said...

Hi Andrew, I think you're right to leave the "_" problem as it is since, as you say, the code is pedagogical.

However, unlike Perl which gives int(0) for 0b_, t_BIN_NUMBER() attempts int('', 2) which gives an exception so the code doesn't match Perl (which some may say is a good thing).

Cheers, Ralph Corderoy.

Andrew Dalke said...

Okay, I fixed t_BIN_NUMBER so it raises a syntax error for cases like "0b_". Thanks again for pointing that out, Ralph.

Jay? Python 3 will come with a 2to3 converter which does syntax-level transformation of Python 2 code to Python 3. I think that's the tool you're thinking of.

There are limitations in the conversion. One thought I've had is to modify the AST and add run-time checks to either support Python 2 idioms under Python 3 (eg, dict.keys() becomes a set view rather than a list) or to generate warnings in Python 2 that code will break in Python 3.

Anonymous said...

Andrew,

I'm using python4ply and I'm very happy with it. Thank you for this great work.

Unfortunately, I just discovered that new Python 2.6 deprecates module compiler and that Python 3.0 will not have it anymore.

Do you intend to modify python4ply so that it does not rely on compiler and compiler.ast anymore?

Cheers,
Franck

Andrew Dalke said...

Thanks for using it, and for your compliment.

I learned about the compiler module deprecation when I was developing python4ply. The compiler module is cumbersome to use, and I considered supporting the ast module (which was available even before 2.5 as the undocumented _ast module) but there were a few features missing.

The undocumented _ast module was read-only, so there was no way to generate byte code. The 2.6 ast module lets you build an AST and emit byte code, so it's a good replacement. But there's still an advantage to using the compiler module - it let me do my monkeypatch to allow expression assignment.

I've considered pushing for an unused op-code like that, designed for people developing PVM-based languages. But I'm not involved enough to do that and there's really very little justification for such a change.

This is a project I did in my free time, and it's a surprising amount of work, especially if you include the GardenSnake and LOLPython languages which were sort of practice versions. So no, I can't say I'm going to do anything with it.

Feel free to modify the code to support the AST module!

Unknown said...

python4ply parses "\n" as Module('\n', Stmt([])) whereas it should be Module(None, Stmt([Discard(Const('\\n'))]))

Unknown said...

Sorry, I was wrong. But this behavior is unintuitive.

Unknown said...

'\n' really should be '\\n'. Otherwise generated and source texts are not identical. Same about "\"" etc.