Jonathan Hilgeman

Everything complex is made up of simpler things.

Using Sessions Securely

Jul-12-2009
php, programming

When using sessions, usually your biggest concern is cross-site scripting (or XSS for short). Without getting into too much depth, XSS is basically when one of your users can steal the cookies of other users. The malicious user (call him Bob) is able to write a script that is displayed to other users. That script (when viewed by other users) reads the cookie from the viewing user’s PC, and then transmits the cookie back to Bob. At that point, Bob can take the cookie and pretend to be any of the users whose cookies he stole.

Just for explanation purposes, here’s another analogy. Let’s say you want to break into John’s house. If you had a copy of John’s key to his front door, it’d be easy, right? So all you need to do is find a way to pickpocket John and copy his key. All the door cares about is that the key fits the lock – it doesn’t care who uses it.

The door is the session authentication mechanism in PHP, and the key is your session ID. The session ID is stored inside a cookie, so there is nothing that prevents you or anyone else from just editing the cookie and changing the session ID to whatever you want. Now, if you change the session ID to something that doesn’t match up to a valid session on the server, then nothing will happen. BUT, if you change your session ID to something that -is- valid on the server, then you’ll automatically be logged into that session, no questions asked.

The security of sessions is all about the complexity of session IDs. It’d be one thing if the session ID was just a number between 1 and 100, but trying to figure out a long combination of letters and numbers is pretty hard to just do.

That’s where XSS comes in – most XSS attacks are all about trying to figure out valid session IDs so hackers don’t have to guess at which ones are valid. Now, XSS is just a concept. In practice, it’s usually done with Javascript, because Javascript can read cookies (there are some minor exceptions). Now, it’s easy to write Javascript that will read your OWN cookies, because you can run the Javascript on your OWN computer. The trick is to get OTHER people to run your cookie-stealing Javascript on THEIR computers (especially without them knowing about it). So how do hackers do this?

Take a message board for example. I’m sure you’ve been on message boards where people have their own special “signatures” with images and favorite quotes and stuff. That’s all custom HTML / code that the users have provided after they’ve signed up. If the message board program doesn’t do any security checks on the signature, then someone could put their cookie-stealing Javascript code into their signature. Now, it’s just a waiting game. As soon as someone else “sees” your signature, they’re unknowingly running your cookie-stealing Javascript. The Javascript reads that user’s cookie (which has their session ID), and transmits it back to the hacker.

So, the ultimate point of all this is that you should ALWAYS ALWAYS ALWAYS sanitize any data before allowing it to be saved  or used in any way. Generally speaking, you should never use $_GET or $_POST or $_REQUEST (or any other $_….) variables without first running them through a function that erases characters that aren’t applicable. For example, if someone’s typing in their first name and sending it to your server, you should probably strip out any characters that don’t appear in first names (letters, numbers, spaces, and single/double quote marks, commas, and periods are usually okay for names), and then run addslashes() on the final value for good measure.

As long as you’re properly sanitizing your data before using it, you should take care of 99% of all potential XSS attacks.

ParosProxy is a good open-source tool for scanning web applications and checking for security problems. There’s also a commercial spin-off of ParosProxy called Burp Professional. It’s basically the same thing but has some better/easier reports, better recommendations, and scanning for more recent problems.

Improving Email Delivery

Mar-13-2008
email

Recently I was reading a forum post where someone was having a problem with their newsletters not being delivered to most of their recipients. I ended up writing a lengthy response with some of the different e-mail delivery tips and tricks I’ve come across over the years. Some of these are specific to PHP mailing applications. So if you want to get your e-mail into someone’s inbox, read through these items:

1. Limit the number of recipients to 1 per e-mail.

2. Use phpMailer (http://phpmailer.codeworxtech.com/), which is a free PHP application (it just recently was picked up by a company, but it’s still free from them). phpMailer gives you a lot more control over the different options when you’re creating mails, and gives better structure to the e-mail than the built-in PHP mail() function. For example, you can add “friendly” names like “Bob Johnson” to your To/From addresses, and later add HTML content or attachments if you want. You should also change the X-Mailer header within phpMailer to be something like boltMail (just a random name). This should help avoid any sort of spam filters that check for mails that originate from a web programming language like PHP or ASP.

3. Don’t use CC if you can help it. Preferably send to 1 recipient and if someone needs to be copied, use BCC.

4. Check your e-mail content to make sure you’re not using words or phrases that you see in other spam e-mails a lot. I’ve never used this service, but I’ve heard lots of good things about InboxInspector, which is supposed to check your e-mails to see how well they fare against different firewalls and spam filters, and how they’ll look in different clients. http://www.mailchimp.com/add-ons/inboxinspector/ (I don’t know if it costs anything)

5. DNS is becoming more and more important. Specifically “reverse DNS” (or pointer / PTR records) is a type of record set up by your hosting company or by THEIR internet provider that lets them update the IP address of your mail server so that when some 3rd party “asks” to see what domain name an IP address belongs to, they’ll be told that it’s your domain name / mail server. Setting up rDNS isn’t something that is done overnight usually, so put in the request and explain to your hosting company what you’re trying to do. They should be able to help you, and if they can’t, then they’re probably not a very good hosting company, and you can find a hundred better ones. I personally use JaguarPC (jaguarpc.com) and have been with them for about 5 years now (but ask the ISP about rDNS before signing up).

6. Someone on the forum had mentioned DomainKeys and SPF (Sender Policy Framework). What are those? Here’s an explanation: You could write a snail mail letter, put someone else’s “From” address on the envelope and drop it in just about any mailbox in the U.S. and it would probably still reach the recipient. However, that recipient has no way of knowing whether it’s REALLY from you or someone else pretending to be you. If the postmark said it was sent from a post office in Arizona and the recipient knew that you ONLY used the post office next to your house in California, then he/she would know that it probably was not really from you. This same idea, but with e-mail, is where SPF and DomainKeys come in. They are basically lists of mail servers that YOU have said, “Okay, these mail servers can send mail using e-mail addresses with my domain name in them.” Then, on the recipient’s side, when the recipient receives an e-mail, it goes out and looks at the domain of the “From” e-mail address and looks up that list of mail servers. Once it knows what mail servers are the ONLY ones approved to send mail from that domain, it then looks at what mail server DID send that message. If that mail server isn’t in the approved list, then the recipient can say, “This message is probably coming from a spammer who just faked their From address.” Whoever controls your DNS (possibly your hosting provider) should have access to set up these tools. If you have a network administrator that does these things, then have him/her go to http://www.openspf.org/ for some help with SPF and http://domainkeys.sourceforge.net/ for help with DomainKeys.

7. Blacklists are quick ways to stop your e-mail from being delivered. Blacklists are simply services (usually free) that have lists of mail servers that send out spam or for whatever reason, should not be trusted to send “good” mail. Try this tool here to check your mail server’s IP address on all the major different blacklists: http://www.mxtoolbox.com/blacklists.aspx    Unfortunately, blacklists are tricky beasts that all work differently. Some have actually died but the service is still available and simply never updated with real content. Normally, no self-respecting mail administrator would use one of these lists, but you should still make sure you’re not on any of the others. This tool should also give you links that take you to that particular blacklist’s web page so you can see details about if, (sometimes) how, (sometimes) why, and (sometimes) when you were blocked. Most blacklists will give you a way to de-list yourself, and it usually takes them at least 24 hours to process your request. Just make sure you’re not having to de-list yourself multiple times at the same blacklist – that would indicate something evil afoot.

8. The way to get yourself instantly blacklisted is to have an open relay. If you, or someone on your IP address (sometimes even close to your IP address) has one of these, then you need to figure out a way to shut it down. An open relay is simply a term that means a mail server that has no proper security set up, and basically allows ANYONE to send e-mail to anyone else for free. Open relays are used by spammers a LOT because they can send their junk mail through without any worries in the world and it doesn’t affect them or cost them a thing. Spammer heaven. If you’re unlucky enough to be in the same range of IP addresses as an open relay or someone else that has been marked as a spammer, then you need to contact your hosting company and have them deal with that other person – not much you can do.

9. If you have any mail forms on your site, then you may want to try adding captcha onto them. Mail forms, while not exactly open relays, can almost be used as ones if they let the visitor specify the “To” address anywhere where it could be changed (even in a hidden field). Sometimes all it takes is one spam message to somebody who reports it to the blacklists… Fortunately, the blacklists are there to help, not to harm, so most will give you second chances and help you with advice on how to prevent yourself from getting listed again. Usually they don’t mention “open” forms like this, because they’re not too common, so this is a “just in case you have one of these….” comment.

10. Be careful when sending HTML mail (if you do this in the future) – if you send badly-formed HTML, it can count against you in some spam filters. Also when the message contains a high image-to-text-content ratio, that’s also a red flag (remember all those spams with almost no content but one big image that had stock quotes in it?).

11. Don’t use a lot of all-caps words or lots of exclamation marks like GUARANTEED!!!!

12. Don’t use the phrase “You are receiving this because you signed up for…” etc…

13. ALWAYS include a link for unsubscribing and ONLY send to people who have requested the newsletter.

14. As a worst case scenario, there are whitelisting services. Someone mentioned to me that Hotmail has one, but that would probably be limited to Hotmail users (I might be wrong on that). There is a service called Habeas which is a popular whitelist and they have a lot of cool toys for tracking your e-mails. The downside is the high cost. For just about every project I’ve ever come across, they are prohibitively expensive. So you’ll have to weigh the potential ROI of your e-mails against their cost. If you’re not selling anything, or if you’re not a really big company, then chances are it’s not going to be worth it to spend $5k to $20k just to whitelist your newsletter. (The 5k to 20k is just a rounded range of quotes that I’ve received from them – you would need to check with them for more accurate pricing – just trying to say that they are not a one-time $100 – $1000 solution). But the option is there.

15. If you’re sending attachments, make sure you’re not sending .EXE files, password-protected ZIP/RAR/other-compressed files, .BAT, .COM, .PIF, .ELF, or any other sort of executable program file. If you do need to send a program attachment of some sort, put it in an unprotected ZIP file so it can be scanned by the recipient’s antivirus scanner. Even .DOC files are becoming hard to send without putting them into ZIP files. Generally, ZIP files and images are almost always safe to send. I’ve seen filters and scanners choke at least once on just about everything else, from MS Word docs to Powerpoints to proprietary formats to OpenOffice files. PDFs are iffy – it’s a popular format to send, but many filters and scanners have the ability to “understand” PDF files, and if the PDF files are written by a program that doesn’t do it right, then that could cause problems, too (so stick it in a ZIP file to help).

16. If you get blacklisted on Earthlink, don’t bother trying to contact their abuse department. Their abuse department is an unmanned voicemail machine. I was blacklisted once on there, and their automated service kept thinking that I was blacklisted. After several e-mails, voicemails, and finally a BBB complaint, I got nowhere.

I’m sure I’ve left a few things out that I’ll probably slap my head later and say, “D’oh – how could I not include that?” but this should give you a decent head start on getting your e-mails through to your recipients.

Speed Up Your Application!

Go SpeedSo your PHP application is running slow… no… scratch that – slow still implies that it seems like your application is doing something after 45 seconds of loading. No, your application is a crippled duckling, dragging itself slowly towards the shoreline so it can end it all. What do you do??? Here are a few quick steps to help:

Add Log Points
Create a function that writes a message to a file. Then go through the code and add calls to this function at strategic points (i.e. after a particularly large query). In the message, dump the date and time, the __LINE__ constant (which simply outputs the current line number being executed), and a brief description of what happened since the last message. If YOU can reproduce the speed problem, then it also helps to make the function only write to the log file when your IP is the one visiting the application, so your log file doesn’t fill up too quickly or with other data.

Once the log file has some data in it, you should be able to see the flow of the program and be able to determine chunks of code that are running slow. Continue to refine the locations of the function calls to drill down to the problem points.

Improve Your SQL Queries
In many cases, a slow application is due to slow queries. Often, slow queries can be DRAMATICALLY improved with some very minor and safe tweaks to the database table indexes. I can’t begin to count the number of queries I’ve seen that tried to join two large tables using fields that were not indexed. There are several things to do to improve performance, but simply indexing those fields can often make a HUGE difference in query speed. Some databases allow for more specific indexing options that can make additional improvements, but nearly every database has basic indexing.

Speaking of joining tables, data types can also play a large part in performance. Joining tables on numeric field types like INT is usually much faster than joining on VARCHAR fields (although you should be VERY careful about a decision to change a VARCHAR to a numeric field). This is why it’s a good habit to add auto-incrementing, numeric ID fields to the tables you create. However, data types aren’t just important when joining. Minor improvements can be made by making sure that you’re using the right data types to store things. There’s no reason to use a BLOB or TEXT field to store a Unix timestamp, a first name, or a tiny on/off flag (would you use a crate to hold a tiny pebble?).

If you have a query with a WHERE clause that looks up more than one field, and is looking through a single, big table, then consider making a multi-field index that contains each of the fields used in the WHERE clause.

Some databases, like MySQL, have additional features that allow you to discover problematic queries. These features include things like automatically logging any queries that take longer than a certain number of seconds, or commands that will show details about the query that you’re running. For example, if you’re using MySQL, take a slow-running SELECT query and simply add the word EXPLAIN before the query. The result is a description of how MySQL runs the query, what indexes it uses (if any), and other useful information.

There are too many tricks to list here, but it’s not difficult to find out even more simple ways of optimizing your queries and your database performance. If the simplest approaches don’t fix the problem, then you may be facing a hardware issue or something more complex. Hiring a temp DBA may be a good idea here.

Use Datasets
In cases where you might be re-using a set of records from the database more than once, consider copying those records into a multi-dimensional array, using a primary key (or something else appropriate) as the index of that array. This essentially creates a “cached” version of that recordset that you can use throughout the rest of the script. When it comes time to loop through those records to generate a dropdown box or refer to a value, then you don’t need to go back to the database again. This can also help eliminate an additional JOIN from your queries if all the data you need is in that array. Datasets are most effective when they’re small so they don’t take up much memory and don’t take too much time to loop through.

An example of a good dataset would be a list of car manufacturers (not that many records, and possibly re-used multiple times throughout the rest of the page).

An example of a bad dataset would be an inventory of cars (probably too many records, and you probably wouldn’t re-use them on the same page).

Reduce Output
I’ve seen a lot of scheduled jobs / cron job scripts that print out a lot of output, and some of it includes calculations and additional processing simply for the purposes of outputting to the screen. But if the output isn’t been seen by anyone or processed by anything, then why send the output? Output is especially draining when it’s inside large loops, which brings us to that topic.

Take Back Control Over Loops
Lots of scripts have processes with loops that have tens of thousands, hundreds of thousands, even millions of iterations. This means that every improvement you make is multiplied times the number of times that loop runs. If you have some old debugging code that opens a log file, writes to it, and closes the file, then running that 100,000 times as fast as possible is going to be a real big system hit. Try as hard as possible to NOT run SELECT queries inside loops, because it often means loops within loops (exponentially increasing the speed hit). Even simple things like a substr(), in_array(), or strpos() call can take a bit of processing time when you run them a million times. But if you’re performing the same function with the same variables over and over again, then consider storing the result in a variable and checking that variable instead:

Before:
$MyText = “The Quick Brown Fox”;
LOOP BEGINS
if(strpos($MyText,”Quick”))
{
// do something
}
LOOP ENDS

After:
$MyText = “The Quick Brown Fox”;
$QuickIsInMyText = strpos($MyText,”Quick”);
LOOP BEGINS
if($QuickIsInMyText)
{
// do something
}
LOOP ENDS

I try to get into the habit of creating boolean flags like $QuickIsInMyText. If you name the variables correctly ($IsAdmin, $HasEditingPrivileges), they make the code easy to read and eliminate possibilities of rewriting code over and over again.

Install and Use XDebug
XDebug (http://xdebug.org) is a free extension for PHP that can be a godsend. It’s usually easy to install without recompiling PHP, and offers a slew of features for finding performance issues with your application (although it is best used in a development environment, NOT in a production environment).

One of the most valuable features of it is its profiler, which will basically attach a little homing device to PHP so when PHP goes to execute your application, the homing device follows it all the way through and logs everything to a file. You end up with a file that shows you the details of every line of code that was executed in your application, and how long each line took to run. Sounds useful but complicated, right? Well, it is… if you were to look at the file manually.

The file that gets generated is called a cachegrind file, and it’s pretty big, and is not meant to be read as-is. Instead, there are free programs out there like KCacheGrind (for Linux) and WinCacheGrind (for Windows) which will read a cachegrind file, and display it in an easy-to-understand fashion. You can see a top-level view of the major points in your program, and drill down into the areas that are taking more processing power, down to the exact line. It’s pretty much like a super-charged version of the Log Points I mentioned earlier.

Hopefully these tips will help you get on your way to making your application run faster. Good luck!