# Titanic: Machine Learning from Disaster – insight to features

While going over the some forums related to machine learning, I stumbled upon kaggle.com which has a machine learning competition with the titanic dataset, called Titanic: Machine Learning from Disaster. Basically, you are given a list of Titanic passengers, which states whether each passenger survived the tragedy or not, the class which they travelled in, their age, gender, and many other related attributes. Then you are given a list of passengers with all the above attributes, without mentioning whether they survived or not, and your task is to predict which of the passengers survived. The moment I saw this competition I was hooked up! It was quite interesting to play with the given dataset.

It is a very surreal feeling that you get, when you scroll through the list of names, to see that some of these people were lucky enough to survive and some were not so lucky. (And it is almost like being God, trying to predict the fate of the rest of the passengers). But looking closely at the data set, we can see that there are many factors which decided the fate of these passengers, apart from their luck.

## Passenger Features

The intention of this post is to try to look at the given training data set to identify its patterns. The following is the list of parameters available in the dataset:

```survival        Survival
(0 = No; 1 = Yes)
pclass          Passenger Class
(1 = 1st; 2 = 2nd; 3 = 3rd)
name            Name
sex             Sex
age             Age
sibsp           Number of Siblings/Spouses Aboard
parch           Number of Parents/Children Aboard
ticket          Ticket Number
fare            Passenger Fare
cabin           Cabin
embarked        Port of Embarkation
(C = Cherbourg; Q = Queenstown; S = Southampton)```

After downloading the dataset, you can open it in excel. Before starting to work with machine learning algorithms, you can play around the data and identify a lot of useful features by creating a pivot table, or by using the “format as table” option and conditional formatting.

The given data set contains 891 records, out of this, 342 passengers survived.

### Gender

When looking at the statistics we see that about 74% of the females survived, whereas only 19% of the males survived. This is a strong indication that if the passenger was a female, she had a better chance of surviving than a male. This fact is expected, because women and children were given priority when passengers were evacuated to the rescue boats.

### Passenger Class

Titanic passengers were belonging to three classes, class 1, 2 and 3. Looking at the data we see that about 63% from class 1 were survived, and only 25% from class 3 survived.

The following chart, taken from here gives a clear picture of how gender, age and passenger class determined the survival of passengers.

While the given passenger attributes can be directly used as features for machine learning algorithms, there are a couple of other features that we can compute from the given data.

### Travelling alone vs. with family

If we add the number of parents and children and siblings parameters, we can decide whether the particular individual travelled with his/her family or not. We can see that only 30% of the people who travelled alone survived.

### Family survived vs. died

If you closely look at the data, you can see that most of the families tend to survive or perish together. We can assume that the people with the same last name are of the same family. We can identify 133 such families in the data set.

Out of these, 93 families died together, and 29 families survived together. There were 11 families where some members died and others survived. Therefore 122 out of 133 (91.7%) of the families either died or survived together. Therefore when we consider a passenger in the training set who travelled with his family, there is a very good chance that he shared the same fate as his/her family members.

Many other interesting features can be computed with the existing data, which will increase the accuracy of your classifiers. Excel and Matlab can be used to quickly visualize the relationships among features. Put yourself in the shoes of a Titanic passenger, and try to think which factors would help you in surviving the tragedy and observe the existing data to test your hypothesis.

# Releasing Codeigniter-BoilerplateJS

I’ve been thinking about using Codeigniter as the back end for a couple of single page applications where the front end would be implemented using BoilerplateJS architecture. I wanted to create the basic structure for such a project and share it. Today I came across this question and thought I’d do this project and share it. Surprisingly putting together BoilerplateJS and Codeigniter was a breeze.

Most of the BoilerplateJS code will reside in a public folder in the Codeigniter root folder. The index page of BoilerplateJS will be in views folder and will be loaded by a controller when necessary. After some path modifications in the BoilerplateJS index file and requirejs route configs, everything started to work perfectly. All the server calls that were simulated by JSON text files are now actually served from a Codeigniter controller. I also used CodeIgniter Rest Server to implement the REST API.

The project is at: https://github.com/slayerjay/codeigniter-boilerplatejs

# Raspberry Pi and Arduino LCD

I previously had posted on how to communicate with an Arduino and display messages on an LCD using python. And I’ve been working on Raspberry Pi as well, setting it as a torrent box.

So I thought of hooking up these two together so that the LCD can display the torrent status. I also added two buttons so that I can scroll through the list of torrents. The Arduino is connected to the RPi via the USB, and the Arduino is powered by an external AV adapter. Initially I thought this will be unsafe, but apparently it is safe to do so (http://arduino.cc/forum/index.php/topic,22132.0.html, http://aeroquad.com/archive/index.php/t-1911.html?s=5273633e6fd3970524bf4473996b9f7d) The LCD also displays the current system temperature.

The source code the the python program and the Arduino can be accessed at: https://github.com/slayerjay/RaspberryPi_Arduino

Other Resources:

http://www.hobbytronics.co.uk/raspberry-pi-serial-port

http://www.raspberrypi.org/phpBB3/viewtopic.php?f=32&t=6832

http://raspberrypi.stackexchange.com/questions/357/how-do-i-monitor-and-or-control-the-temperature-of-the-soc

# Accessing Raspberry Pi from anywhere: Dynamic DNS for RPi

So you may have your RPi  set up as a torrent box, or a web-server  or you may want to login to it remotely out of your home network, and your ISP will be giving you a dynamic IP. This is where Dynamic DNS (DDNS) comes to the rescue. I used noip.com as my DDNS provider (which is free), but you are can use any other similar service.

The Theory

Your home network’s public IP changes time to time because it is assigned dynamically by your ISP. A DDNS service points to your public IP, and changes its records about your public IP whenever it changes. You will have to download and run a small client program that will report you IP to the DDNS service when it changes. So whenever someone accesses your domain address he/she will be pointed to your IP.

Setting Up

The different services that you are running on the RPi will be listening on different ports. The web server by default would be listening on port 80, and transmission torrent service will be listening on port 9091 (by default). You need to tell your router that any incoming packets that are coming for a specific port, (say port 80) should be forwarded to your RPi.Now the exact way to set up port forwarding depends on your router, but it is pretty straightforward if you know the above theory behind port forwarding. You can get some help from portforward.comIf port forwarding is set up you can check it using port scanner. And you will be able to access your service by your public IP.

I registered with noip.com and downloaded their Linux client software, and installed it (where you will have to enter your noip.com credentials)

• Set the client program to run at startup of RPi

The following post explains how to set the noip2 client to run at startup: http://www.stuffaboutcode.com/2012/06/raspberry-pi-run-program-at-start-up.html

If all works well, you will be able to access the services on your RPi from anywhere!

# Setting up a RaspberryPi Torrent Box with Transmission

Preparing and Mounting the External Storage

I used one of my pen drives that is formatted as an NTFS file system. Connect the external storage to the RPi. Open a SSH session and type:

`\$ sudo fdisk –l`

This lists all the hard drives that are connected and you will be able to find your external storage.

Note the ‘Device Boot’ record (Mine is ‘/dev/sda1′ ).

Now let’s mound the drive. All mounted drives are accessed though /media/ folder.

```\$ cd /media/

If it says that ntfs-3g is an unknown type or gives a similar error message, install it by:

`\$  sudo apt-get install ntfs-3g`

Now the device is mounted. However we want RPi to mount it automatically every time it boots up. For this you need to edit the ‘fstab’ file and enter the details of your device.

To edit the file use:

`\$ sudo nano /etc/fstab`

It will bring up the file which contains a table as follows:

Enter the following record at the end of the table:

`/dev/sda1       /media/downloads        ntfs-3g defaults        0       0`

Save and exit.

This is an excellent reference on this matter.

Installing and configuring Transmission

`\$ sudo apt-get install transmission-daemon`

We need to do some configurations. For this we nead to stop the daemon and edit the settings file.

Stop the daemon using:

`\$ sudo service transmission-daemons stop`

Bring up the settings file by:

`\$ sudo nano /etc/transmission-daemon/settings.json`

`"download-dir": "/media/downloads",`

You can enable or disable RPC Authentication. If you enabled it you can set the username and password here as well. (The plain text password entered will be changed to the hash value and stored when transmission starts up).

By default, transmission only allows a white listed set of IPs to access it. You can either enter your IPs to the whitelist or disable this.

Save and exit the settings file and start the daemon:

`\$ sudo service transmission-daemons start`

```rpi_ip:9091/
Ex: 192.168.1.5:9091/```

I have noticed that sometimes an error occurs: “Error: Input/output error” To fix this re boot the RPi and ‘Verify local data’ of the torrent. This of course is not a permanent fix. I have tried the fixes here: http://stevenhickson.blogspot.com/2012/10/fixing-raspberry-pi-crashes.html and I’m still looking in to this issue.

Update: I’ve applied the fixes on the above link and reduced the number of peers in transmission. But apparently the main reason for the IO errors were with my Transcend flash drive. I tried with another (unbranded cheep) flash drive, and things are now working like a charm

Update 2: I am using a SATA hard disk to store the downloads.

You can setup a Samba server on RPi to access your downloaded files from other machines. This article provides a comprehensive guide on how to set this up.

# RaspberryPi initial setups

So I got my RaspberryPi today, and would like to share my RaspberryPi setup plan. These are some steps that you can use to kick start your RaspberryPi journey!

First Boot

Raspberry Pi’s Quick start guide explains the steps that are necessary for the first boot.

Setting up a static IP

You will most probably require a static IP to your RaspberryPi so you can access it over the network. To set up a static IP,  open up the terminal and enter:

`\$ sudo nano /etc/network/interfaces`

This will allow you to edit the file. Change

`iface eth0 inet dhcp`

to

`iface eth0 inet static`

Below it, enter the following lines

```address YOUR_STATIC_IP
gateway 192.168.1.1```

After entering these lines, save the file (Ctrl+O) and exit (Ctrl+X)

Then reboot by entering:

`\$ sudo reboot`

`\$ ifconfig eth0`

Enabling Remote Desktop

You will probably need to remotely log in to your machine. I followed this post and successfully configured remote desktop: http://www.raspberrypiblog.com/2012/10/how-to-setup-remote-desktop-from.html

It is always good to change the default password (‘raspberry’). To do this enter the following on the terminal:

`\$ passwd`

# Arduino and Python Serial Connection with LCD

I’ve got an Arduino UNO board, and a Hitachi HD44780 type LCD. I wanted to write a python program to communicate with the Arduino board. The board can be connected to the computer via USB, and it appears as a COM port. Therefore we can easily communicate with the Arduino serial interface with python.

My aim is to create a python program that takes the input from the keyboard and display on the LCD. The LCD is connected to the Arduino board as mentioned in the Arduino example:

Circuit Diagram

The following is the code for the Arduino:

```// include the library code:
#include <LiquidCrystal.h>

// initialize the library with the numbers of the interface pins
LiquidCrystal lcd(12, 11, 5, 4, 3, 2);

void setup() {
Serial.begin(9600);
lcd.begin(16, 2);
lcd.print("start");
}

void loop() {
if (Serial.available()) {
delay(100);  //wait some time for the data to fully be read
lcd.clear();
while (Serial.available() > 0) {
lcd.write(c);
}
}
}
```

To access the serial ports you need to set up the pySerial module for python. The Python code:

```import serial
import time

s = serial.Serial(11, 9600) #port is 11 (for COM12), and baud rate is 9600
time.sleep(2)    #wait for the Serial to initialize
while True:
str = raw_input('Enter text: ')
str = str.strip()
if str == 'exit' :
break
s.write(str)

```

I’m on a windows machine, and by default the Arduino is connected to COM12. Note that we need to wait for a little while after initializing the serial connection.

Arduino Board and the LCD Ready for input

Arduino Board and the LCD, Python message is displayed

Python Console

# Resources for JavaScript Applications

Thought I’d share some resources that I followed when learning about Single Page Applications.

Architecture and Design Resources:

Scalable JavaScript Application Architecture  – by Nicholas Zakas

This is a great inspiration for BoilerplateJS and an excellent presentation. A must watch if you are stepping in to Single Page App development.

Patterns For Large-Scale JavaScript Application Architecture by Addy Osmani Contains some good design patterns for JavaScript Applications

Tools and Libraries:

There are a lot of tools, frameworks and JS libraries out there. This is by no way a complete list, but these are some stuff that I’m familiar with.

# Maintainability in Single Page Applications

As I mentioned in an earlier post, Single Paged Applications provide several advantages to the user. They provide a smooth and fast user experience.

Having worked in a large scale single paged application development project and co-authoring BoilerplateJS, I’ve realized Single Page Applications are easy to de-couple and modularize. For me, development wise, it is very easy to manage Single Paged Applications.

You may start your desktop application or web application with a clean code base, with a nice modularized architecture with a good separation of concerns. But I have seen many applications where the code starts to get messy as the development progresses. Given you have the right tools and follow good practices, a Single Paged Application is best type of application for a clean and maintainable code base.

Unlike traditional web applications, the presentation logic and the main logic of the application are highly de-coupled.

On a Single Page Application, the server-side will be responsible for:

• Handling CRUD (Create, Read, Update and Delete) operations
• Executing different operations and workflows (these may include changing states of entities, updating database records)
• Authentication and Authorization (this should always be done on the server side to ensure that the requests are legitimate)
• Validation of web requests
• Providing an interface for the client application to perform operations (typically done via a REST API)

With the help of proper frameworks the above tasks can be handled easily on the server side. Entities that are related could be treated as modules. Aspect Oriented Programming techniques can be used for authorization.

The client side will be responsible for:

• Populating and rendering the UI with proper data
• Access the server via AJAX
• Perform client side routing
• Perform client side validation

If a proper REST API is defined, back end developers and front end developers can work simultaneously on the project easily. And since the front end and the back end are de-coupled by the REST API it is easy to change these layers without significant modifications.  Scaling up the back end or providing a new interface can be done with minimal impact.

This kind of separation where the server is treated as a service provider and the presentation logic purely done on the client side with JavaScript, makes the application design very straightforward and clean.

Posted |

# From Desktop Applications to Web Applications and Single Paged Applications (SPAs)

I am still an undergraduate, but I was involved in developing several applications, as projects and course work in the university and as products for several clients while I’m in the university and during my internship. I have seen that more and more enterprise applications are moving towards the web and have seen a trend towards enterprise applications being developed as Single Paged Applications.

Me, being an undergraduate is far from being an experienced software architect, but I have experience in developing the above types of applications, so here is my take on them.

Desktop applications may be connected through the internet or an intranet, and there may be server software or a database server in a central location. We have been used to these types of applications for a long time. They are reliable and very stable. But things have got complicated lately. There are a couple of popular operating systems, and most enterprises want their systems to be accessed by all such operating systems and devices as well. Developing separate native applications for different platforms is extremely costly and impractical. I believe that this is a major reason behind web applications being popular among enterprise application developers.

Web applications reside in a web server and accessed by users via the internet or an intranet and viewed on a web browser. Web applications provide some significant advantages. As mentioned above being platform independent is one of them. Unlike desktop applications they do not need special roll out or deployment procedures. Therefore deploying updates and new versions are very easy. The load on client machines is pretty low as well.

However there are several drawbacks of web applications. The user experience tend to be discontinuous because the pages need to refreshed each time an operation occurs, and response time is very slow because the webpage needs to be sent from the server. An un-interrupted connection to the server (or the internet) is required as well. But this however is minimized with HTML5’s new local storage feature.

Single Paged Applications tries to eliminate some of the drawbacks of traditional web applications.

A single-page application (SPA), also known as single-page interface (SPI), is a web application or web site that fits on a single web page with the goal of providing a more fluid user experience akin to a desktop application.

In an SPA, either all necessary code – HTML, JavaScript, and CSS – is retrieved with a single page load, or partial changes are performed loading new code on demand from the web server, usually driven by user actions. The page does not automatically reload during user interaction with the application, nor does control transfer to another page. Updates to the displayed page may or may not involve interaction with a server.

– Wikipedia article on Single Paged Applications

The following figure shows the architecture of a typical SPA:

Since most of the UI rendering and manipulation happens on the browser it gives the user an uninterrupted experience, and since the load on server side is minimum, these applications tend to be very scalable. The Single Paged Applications tend to have a lot more modularized maintainable code, since the presentation layer is decoupled from the business logic and controllers. However Single Paged Applications have a long way to go and the technology is still maturing. But my bet is that we will see more and such applications in the future.