Mike Knepper

Curl-y Q&A

May 8, 2014

I’ve begun working on my Java server–another classic 8th Light apprenticeship project. This adventure will likely spark several blog posts, beginning with this one: a quick primer on the bash command curl. This command often shows up in the context of downloading and installing software from the internet, and frequently looks like the most intimidating and scary method of doing so. It’s definitely a powerful tool, but can also be used very simply to provide basic, but critical, information. I’ve found it especially useful while building my server.

What is Curl?

man curl. Next question?

…OK fine, I’ll describe it in my own words, but seriously, I learned almost all of what I know so far about curl from reading its man page. For those who don’t know, every bash command has a “manual” that describes what the command is and what all its options do. Simply run man [command] to read the docs.

Curl is “a tool to transfer data from or to a server, using one of the supported protocols. The command is designed to work without user interaction.” Among the supported protocols is HTTP, the foundational protocol of the World Wide Web. So, if we focus on using curl exclusively in an HTTP context (which I have been so far), curl is a tool for transferring data to or from a server via HTTP requests.

Guess what? This is exactly what a web browser does! When you type a URL into the address bar in a web browser, the browser sends a request to a server for some resource. Usually this is an HTML page. The server locates the resource and sends it back to the browser, which renders that resource in the window.

The main difference between web browsers and curl is that the former understand HTML and CSS, so they present HTML files appropriately (i.e. structured into headers, blocks, etc. without displaying the actual HTML tags). Curl, however, simply prints the contents of the resource to the screen, much in the way the cat command does for a local file. (Not familiar with cat? man cat!) In other words, curl and web browsers do not differ in what they do (send a request and receive data in response), only in how they do it (command-line text vs. address bar, links, etc.). Go ahead, give it a try! Visit this page in your browser, and run curl http://justinjackson.ca/words.html in your terminal. The output itself is exactly the same, just presented differently. (I chose this example for its simplicity, as it is almost entirely HTML.)

Why use Curl?

You might be thinking my enthusiasm for curl is nothing more than a symptom of software developers’ obsession with the command line. So far, it may seem like curl is just a web browser that isn’t smart enough to display HTML properly. However, curl has several optional flags that provide valuable information normally hidden by the browser. (Technically this information can be accessed within the browser, but it involves a lot of time-consuming clicking and isn’t formatted quite as well.)

First, let’s look at the --include option (abbr. -i). This flag tells curl to include the HTTP header information in the output. HTTP headers contain metadata like the name of the server, the HTTP version and response, and the content type of the requested document. The server provides this information back to the client on every request, even in web browsers, but the average user does not really care about these details, so both web browsers and curl hide it. For someone like me building a custom server, however, this information is important–it’s helpful to see what data other servers are providing so I know what kind of functionality my server needs to implement.

The include flag is nice, but as a complete beginner to servers, sockets, HTTP requests and the like, I needed still more information. Before I worked on generating and sending a response with my server, I needed to know what an HTTP request looked like so I could understand what exactly my server was waiting for. Enter curl’s --verbose option (abbr. -v). Verbose doesn’t just include the header info returned by the server–it also includes the header data of the request that it (curl) sends to the server. This is just what I needed! Try out the verbose option with the same link above: curl -v http://justinjackson.ca/words.html. The lines beginning with “>” are the request sent by curl to the server, and the lines beginning with “<” are the data received back from the server, followed by the body of the resource. The first line of the request is critical: it states that we are making a GET request for the “/words.html” resource, using the HTTP/1.1 protocol.

Conclusion

By playing around with curl, I was able to better understand exactly what my server needed to listen for and provide back to clients. It turns out that both the request and the response are deceptively simple… But that’s a topic for another blog post!