Removing unnecessary attributes¶
It seems like cabin won't be of much use. We could do some research about cabin naming conventions and try to extract some features from it, but we'll leave that for later.
For now, we'll remove PassengerID, Name, Ticket Number, and Cabin Number. Everything else is either a continuous variable, or a categorical with 2 or 3 categories.
We'll also store Survived in a separate series as the classifier. We'll encode Sex, Pclass and Embarked as one-hot encoded dummy variables.