PyIB Thoughts (7)

2 Name: Anonymous : 2010-03-17 17:31 ID:Heaven [Del]

All right, I'm up for a bit of a code review... let's see what a casual look through the source code reveals.

database.py
  • This comment worries me slightly:
    # Although SQLAlchemy is optional, it is highly recommended
    This suggests that all the SQL code has to be duplicated in order to handle both cases, and the Settings._.USING_SQLALCHEMY is a gigantic red-flag, because that suggests that any duplication probably isn't being done with a subclass, it's a bunch of hacked-together conditionals. But maybe this is a low level thing that's patched up in some other file. Moving on.
fcgi.py
  • Nothing interesting here, this is 3rd party code. Might be worthwhile to get the original version and diff the two, to check for any "curious" changes, but I'm not going to do that here.
formatting.py
  • For starters, the tripcode algorithm uses try: / except:, with no exception class, and a pass on the except clause. Stylistically this is horrible, not to mention, why would there be an error with string encoding? And why is it ok to just ignore it? A comment answering these questions clearly would be nice; avoiding such constructs would be better.
  • That aside, I see that the post_ variables in nameBlock() are taken as-is and dumped into html-formatted strings. Hopefully they are being escaped elsewhere when that function is called.
  • clickableURLs is a damn mess of a function.
  • matchCrossThreadRefLinks() has this gem:
    parentid = FetchOne("SELECT `parentid` FROM `posts` WHERE `id` = '%s' LIMIT 1" % postid)["parentid"]
    Jesus H. Christ! Is that raw SQL that doesn't escape its parameter? What the hell is this, PHPython? That function had better be receiving a properly sanitized SQL string, that's for fucking sure... ok, it looks like its parameter is coming from a regex match against [0-9]+, so it "should" be fine, but the question of what that function receives simply shouldn't ever need to be asked. It's playing with fire.
  • onlyAllowedHTML() is amazing. Never mind that it allows only nonsemantic <b>/<i><u>/<strike>, it's using regexes and not bothering to check for proper nesting.
  • markdown() is apparently not bothering to sanitize the html output. The markdown spec allows for raw html, so it's way too easy to write a <script> tag that does all sorts of bad things. Keep that in mind, though; who knows, maybe the function that calls that is doing the leg work.
framework.py
  • More raw SQL-strings. Some of them are at least run through _mysql.escape_string, but what if we're not using MySQL? (Whatever happened to SQLAlchemy, by the way? All of this looks to be hard-coded for MySQL. Frickin' ugh.)
  • I see usage of the pickle module .... stored inside the database! What the hell?! This might be cute but to me, it's just got Bad Idea written all over it. I never use pickle unless I'm making a one-off hack, it's insecure; and speaking from experience, it can cause subtle bugs should you ever decide to change the data structure. There's already a perfectly good SQL database at hand; use it!
  • IP addresses are stored as text? Not really a security issue, but it's kind of WTF-ish.
  • More WTF code: int(time.mktime(t.timetuple())) - hey, why not run it through ctime() and strftime(), too!
    There are several other bits like that, like the str(round(..., 2)) a few lines down from that.
  • The md5 module (line 144) is hella deprecated; that function needs a rewrite.
img.py
  • Would be nice to use os.path.join / posixpath.join as appropriate here. No problem as is though.
  • I see os.system() with interpolated parameters. This really really should be rewritten to use subprocess.call(); a space or something in the board's pathname will cause all hell to break loose here.
  • more raw SQL here... hardly worth mentioning anymore
manage.py
  • Passwords are md5-encoded in the database (good) with no salt (bad). So if someone manages to get a dump of the DB it would be fairly easy to crack them. (This is a minor issue; if someone gets the dump of your DB then you have bigger problems anyway)
  • What the hell? The manager password is stored plaintext in the login cookie?! Combine that with an XSS exploit that posts the cookie to some remote site, and you're suddenly broadcasting your full login details. Nice, considering that we might have one with markdown (formatting.py, above).
  • This is an unnecessarily huge function, that's almost entirely encapsulated in the else-clause of a giant if statement. Badly in need of refactoring.
  • Not going to delve into potential security issues after login is validated, if that stuff's fucked up, the board admin's the one who broke it anyway :)
markdown.py
  • More 3rd party code, although apparently this is not the markdown version I'm familiar with.
post.py
  • This is doing one hell of a lot of SQL queries for just rebuilding some HTML files. Chances are pretty good that the majority of the table will need to be read anyway, might as well pull it all and save a lot of round trips between the app and the DB.
  • This uses threads and job queues. What the fuck? Is this really necessary?
pyib.py
  • Question: Why do so many parts of this file call a function in another file, and raise an exception on failed condition? Why not just raise the exception in the other file?
  • Ok, so markdown is apparently a board related setting, and it always runs that nonsensical html-stripping code even when markdown is enabled. Fucking insane. Guess that clears up the issue of embedding script tags (but it's still badly coded, there's no getting around that fact).
  • Another question: how large is the message field in the database? The quote-links could potentially expand the post far beyond the 8000 characters that it's initially trimmed to (line 148); if there's a database-enforced limit there is definite potential for tags to be truncated -- which opens up some XSS possibilities.
  • lol psyco. I bet that saves a whole millisecond of processing time.
settings.py
  • Way to put some of your own settings in there for distribution, Trevor.
template.py
  • Nothing interesting here
tenjin.py
  • Another 3rd party module. Tenjin looks disgusting, btw. Figures a PHP programmer would pick such an abomination.
Name: Link:
Leave these fields empty (spam trap):
More options...
Verification: