Friday, August 8, 2008

Python bug or not?

I posted an example of how
reading 0 bytes from a file can cause Python to run out of memory. Is it a bug?


Doug Napoleone said...

its a bug, plain and simple

Andrew Dalke said...

I started to file a bug about it but couldn't figure out where the behavior was specified.

Ehh, I'll file a bug and see what happens.

Andrew Dalke said...

Posted as issue 3531.

Chuck Mason said...

Correct behavior.

In order to execute a read() you need memory to store that which you read.

Unless Python performs a file size check internally (which a LOT of people would argue against) there's no way for the system to fail before allocating the storage.

The other alternative is for Python to "guess" at a chunk size, and say read 1024 bytes at a time until it gets to the requested size. I disagree with this method too, as different block sizes affect performance on different platforms. I would prefer to leave that up to the (ideally) intelligible programmer.

It's up to the programmer to be smart about block allocations.


Anonymous said...

Well, keeping the preallocated memory around doesn't seem right, but maybe there is a reason.

But having 10M blocks when the file are just a few Ks doesn't seem very clever either. ;)

Andrew Dalke said...

Chuck: Python already does chunked reads when doing "read()". The implementation uses exponentially larger sizes to get amortized constant overhead.

Another implementation possibility is to resize the final string if the preallocated space is very much larger than the actual string. The extra overhead for this is very small.

Andrew Dalke said...

Interesting. It seems that Darwin (and likely FreeBSD) don't implement realloc the way I thought it would. This is a known problem with Python on Macs. See:

* Bob Ippolito's blog post

* Bob's thread on the this on python-dev

* Issue1092502 which has the same problem

Doug Napoleone said...

After looking everything over I would say that there should still be a patch to python made to account for the arguably buggy behavior of realloc on target platforms. There are other places in the python C code where we ifdef to get around bad or insecure platform behavior; this strikes me as being no different.

Doug Napoleone said...

NOTE: this is my personal opinion and is not shared by most of the core developers ;-)

Andrew Dalke said...

I read the thread between Tim Peters and Bob Ippolito on this. I don't think there will be a change to Python. The main question is, why should Python code have #ifdef or other fixes for this if there's no evidence that this is an actual problem in correctly working code?

I came up with some highly contrived denial of service attacks, but nothing that was a problem in real life.