Jonathan Hilgeman

Everything complex is made up of simpler things.

Archive for July, 2009

Version Control – SVN vs. VSS vs. Git

Jul-16-2009
Uncategorized

Someone asked a question about version control software on Experts Exchange recently, and my answer turned into potential blog material:

Version Control Concepts
I’ve used Subversion (SVN), Visual SourceSafe (VSS), and Git. The concepts are all the same, though. Version control basically means that all of your application files are “checked in” to a big container (called a repository) that sits in some central location (usually a server) that your developers can all access.

To work on the application, you usually “check out” the project from the repository. This basically downloads the files from the repository onto your local computer so you can edit them and so on. Usually, developers have their own testing environment on their local computer so they can quickly test their changes. After a developer has finished with his/her changes, the dev “commits” the files back to the repository. This usually does not remove the files from the developer’s machine – it just copies the changes back to the server. The server receives the files, figures out what’s new and what’s changed, and then applies those changes to the repository.

The version control part of it comes into the picture here. The server keeps track of each change of the file (when it’s committed). So when you start with a file that just has “abc” in it and you edit the file, change it to “abcdef” and commit it back to the server, the server now knows that the LATEST version of the file has “abcdef” but if you ever need a previous version (often called a “revision”), then you can request it from the server.

Using Version Control for Software Development Lifecycle
What I often do for smaller projects is have one repository. I then set up my production, testing, and development environments all as working copies (meaning that each is basically a “checked out” version of the application). The working copies don’t update themselves automatically, so each environment will stay at its particular revision until you tell it to update.

So if I start out at revision #1, all 3 of my environments are also at revision #1. I then make changes and stuff on my development machine and commit the changes (let’s say I do this a few times unti l get to revision 4 or something). Now my environments look like:

Production: Revision #1
Testing: Revision #1
Development: Revision #4
Central Server / Repository: Latest Revision is #4

When I’m ready to officially test, I go to the Testing server and simply use the appropriate tools to check out the files for Revision #4. (Side note: you usually don’t have to remember revision numbers. You can often just tell the tool to check out the latest revision, whatever it is. But if you have a lot of developers or something, you may want to specify the revision in case other people are committing things that should not be in the official testing environment). Now my environments look like:

Production: Revision #1
Testing: Revision #4
Development: Revision #4
Central Server / Repository: Latest Revision is #4

Let’s say revision #4 tested successfully, and is now considered ready for production. At this point, I go to the Production environment, and use the version control tools to update to Revision #4. (I always specify the revision for production releases, just to be extra certain I’m not pushing anything that should not be pushed).

Now all environments are at Revision #4, and the lifecycle is complete.

That’s basically how I use version control systems to control new version releases of my applications.

Now that I’ve offered a basic explanation of the idea of version control, here are some specifics:

Visual SourceSafe – Not Good
I started using Visual SourceSafe (VSS). About 4 months into a non-.NET project, we ditched it because it was occasionally corrupting files. It seems okay for .NET projects, but I wouldn’t trust it on anything else.

Subversion – Better, Friendlier
I then used SVN, and it worked decently well for the most part. It has 3 downsides, in my opinion:

#1. After about a year, I noticed that there was a significant decrease in speed when checking for updated files or committing updated files.

#2. SVN creates a .svn folder in each folder of your application in order to track the files. Normally, this doesn’t bother me, but it did significantly increase the number of files in the system (the .svn folder contains a structure that has separate files that contain information about each file that is tracked).

#3. SVN is not very smart about merging branches of development. This may not be an issue for you if you only have a couple developers or if you don’t really want to use branches at all.

One upside is that there are a lot of nice tools you can use with SVN, like TortoiseSVN. Many of those tools make the process pretty simple (1 or 2 clicks).

Git – Best, Not as Friendly
Git is what I’m using now for most of my projects. I use it because it is much faster than SVN in many respects (especially when you have a lot of small files), it only needs one .git folder in the top folder of each application to track the entire app (instead of the many .svn folders), and it is much better at merging and branching (which is what I use for more complicated projects).

The only two downsides of Git that I’ve found are:
#1. it is not as immediately user-friendly as SVN. There IS a TortoiseGit program that is supposed to simulate the popular TortoiseSVN program, but I have not tried it yet. I use a tool that gives me a Linux-like shell on Windows and I use that. Once you’re used to it, it’s just about as fast (if not faster) than the GUI tools.

#2. there are some SVN commands that were nice that have not yet been introduced to Git (some analysis tools, mainly). This is not a big deal, though, because of Git’s other strengths that let me easily work around this.

Conclusion
Ultimately, I would go with SVN or Git. Maybe use SVN to start out and get used to the whole idea, and then upgrade to Git when you recognize the differences. They are both free, open-source products (with free, open-source client tools, too), so you’re not really going to spend any money on either one.

Using Sessions Securely

Jul-12-2009
php, programming

When using sessions, usually your biggest concern is cross-site scripting (or XSS for short). Without getting into too much depth, XSS is basically when one of your users can steal the cookies of other users. The malicious user (call him Bob) is able to write a script that is displayed to other users. That script (when viewed by other users) reads the cookie from the viewing user’s PC, and then transmits the cookie back to Bob. At that point, Bob can take the cookie and pretend to be any of the users whose cookies he stole.

Just for explanation purposes, here’s another analogy. Let’s say you want to break into John’s house. If you had a copy of John’s key to his front door, it’d be easy, right? So all you need to do is find a way to pickpocket John and copy his key. All the door cares about is that the key fits the lock – it doesn’t care who uses it.

The door is the session authentication mechanism in PHP, and the key is your session ID. The session ID is stored inside a cookie, so there is nothing that prevents you or anyone else from just editing the cookie and changing the session ID to whatever you want. Now, if you change the session ID to something that doesn’t match up to a valid session on the server, then nothing will happen. BUT, if you change your session ID to something that -is- valid on the server, then you’ll automatically be logged into that session, no questions asked.

The security of sessions is all about the complexity of session IDs. It’d be one thing if the session ID was just a number between 1 and 100, but trying to figure out a long combination of letters and numbers is pretty hard to just do.

That’s where XSS comes in – most XSS attacks are all about trying to figure out valid session IDs so hackers don’t have to guess at which ones are valid. Now, XSS is just a concept. In practice, it’s usually done with Javascript, because Javascript can read cookies (there are some minor exceptions). Now, it’s easy to write Javascript that will read your OWN cookies, because you can run the Javascript on your OWN computer. The trick is to get OTHER people to run your cookie-stealing Javascript on THEIR computers (especially without them knowing about it). So how do hackers do this?

Take a message board for example. I’m sure you’ve been on message boards where people have their own special “signatures” with images and favorite quotes and stuff. That’s all custom HTML / code that the users have provided after they’ve signed up. If the message board program doesn’t do any security checks on the signature, then someone could put their cookie-stealing Javascript code into their signature. Now, it’s just a waiting game. As soon as someone else “sees” your signature, they’re unknowingly running your cookie-stealing Javascript. The Javascript reads that user’s cookie (which has their session ID), and transmits it back to the hacker.

So, the ultimate point of all this is that you should ALWAYS ALWAYS ALWAYS sanitize any data before allowing it to be savedĀ  or used in any way. Generally speaking, you should never use $_GET or $_POST or $_REQUEST (or any other $_….) variables without first running them through a function that erases characters that aren’t applicable. For example, if someone’s typing in their first name and sending it to your server, you should probably strip out any characters that don’t appear in first names (letters, numbers, spaces, and single/double quote marks, commas, and periods are usually okay for names), and then run addslashes() on the final value for good measure.

As long as you’re properly sanitizing your data before using it, you should take care of 99% of all potential XSS attacks.

ParosProxy is a good open-source tool for scanning web applications and checking for security problems. There’s also a commercial spin-off of ParosProxy called Burp Professional. It’s basically the same thing but has some better/easier reports, better recommendations, and scanning for more recent problems.