Search This Blog

Saturday, February 13, 2016

AirBnB New User Bookings

The AirBnB New User Bookings competition was held on Kaggle in Nov-15 to Feb-16.

The objective was to predict in which country a new user on AirBnB would make their first booking.

There were 11 potential countries along with a 12th class - NDF (No Destination Found), indicating the user did not make any booking.

The data consisted of user characteristics like language, age, browser, date-of-account-creation, OS, etc. for the train and test users.

There was data on the actions taken by users on the website along with the details of the action and duration.

It was evident that the best way to quickly get a good score was to focus on classifying the NDF vs non-NDF users. So, I built a Logistic Regression on the one-hot encoded action features from the sessions data as a binary classifier for NDF vs non-NDF, only considering users present in the sessions data. This was the base classifier.

I then built a meta classifier using, well, everyone's favourite nowadays, XGBoost. It used the raw user features, along with the one-hot encoded features from sessions data, and finally, the LR predictions.

I did not complicate the model or ensemble too much due to lack of time, and also since the CV and LB were not perfectly correlating. Hence, I chose fairly simple models with some feature engineering.

View GitHub Repository for the complete code, results and output.

This model scored 0.88081 on the public LB which was ranked 89 and scored 0.88625 on the private LB which was ranked 23.
The metric used was NDCG.

View Public LB
View Final Results

It was a very interesting dataset, and a good practise in building features from the sessions data, and without that, it wasn't possible to get a good score. It was disappointing that I had so many ideas which involved a lot more time to try out and code, but wasn't able to.

So, I think it was a simple stable model with lesser overfitting compared to many other competitors who dropped on the private LB.
In the end, I'm happy with the result, and this improves my overall Kaggle rank to 96th. So, finally I get into the Top-100 and on the first page of the rankings :-)

Hoping to improve on this further this year, and hopefully get into the Top-50 or Top-25 some day.

Check out My Best Kaggle Performances

Thursday, February 11, 2016

Puzzle Grand Prix 2016

The WPF Puzzle Grand Prix 2016 is here! After successful editions in 2014 and 2015, the 2016 edition consists of eight rounds held across the first seven months.
The top-10 finalists will be invited for the GP playoffs during WPC 2016 (Slovakia).

The format has changed a bit this year, with the contest having two sections. A Competitive section which is the main section on which the toppers will be decided, and a Casual section, comprising of more 'culture-neutral', non-grid based puzzles, geared towards leisure solvers. Its very unlikely competitors will be able to participate in both sections, so, I guess most players need to choose one.

I think I'm better at the 'Casual' sort of puzzles, and hence, will be competing only in that section.

Scoring System
There has been some discussions regarding the new scoring system for the GPs. Historically, normalization of scores for a championship consisting of various rounds has worked well due to the unreliability of having similar rounds in terms of scores and difficulty.
The GPs did use normalization in the previous editions, and it was universally accepted.

I'm not convinced there was a need to go ahead without normalization this year, so it remains to see whether or not it will work. You can find some pros/cons being discussed about the new scoring system on the GP Forum.
Being part of the organizing team of Sudoku Mahabharat / Puzzle Ramayan (which are very similar to the structure of GPs), we had to change the scoring system during this year's rounds, due to the inconsistencies without normalization. So, I'm not particularly in favour of dealing with raw scores.

View Championship Page
View Current Rankings

Round 6: Serbia (10th - 13th Jun, 2016)
As the competition is getting heated up, I was fairly comfortable with the puzzles in this round. Enjoyed the set and managed to finish all but one puzzle (Weights).

Before this round, I was leading with ~ 25points over Adam Bissett and ~ 50 points over Yuhei Kusui. Ironically, all three of us scored the exact same points: 344, in this round, keeping the standings the same.

It now boils down to the last two rounds to decide the winner.

Round 5: USA (13th - 16th May, 2016)
I was afraid that I barely managed to score 200 points in this hugely big 500+ pointer round. But it seems like it was too hard and everyone struggled.

The concept of Escape The Grand Prix is really nice, but not suitable for a time-contrained online puzzle round. This is easily going to be the discarded round for most of the top players.

Kudos to Randy Rogers for scoring 254 points, and Sinchai Rungsangrattanakul for scoring 242, way above the rest of the lot, while I scored just 205. Which is my poorest round so far.

Round 4: Hungary (15th - 18th Apr, 2016)
An easy set here. Managed to finish all and score full points, thus keeping my lead intact.
Nice puzzles too.

The four rounds so far have been worth 293, 402, 420 and 259 points. What in the world is the logic of not having normalization. That is the biggest failure of this year's GPs.

Round 3: Germany (18th - 21st Mar, 2016)
What an amazing set of puzzles. Just wow! This is the best set of puzzles I've solved in a long time... each and every one of them is a beauty.

The Instructionless Machine puzzles... exceptionally interesting and well-made. I think it was the sheer fun of the round that made me perform so well. I topped with 420 points, scoring over 100 points more than everyone else, except Jarett Prouse who scored 382.
Which means, I'm back in the top-3.

Round 2: Slovakia (19th - 22nd Feb, 2016)
Bad round. Lost time on the scrabble puzzles and wasn't able to complete them at the end.

Scored 283 points which is bad compared to the highest being 402. Hopefully, this could be one of my discards.

With no normalization, we have two rounds, one worth 293 points and the other worth 402 points. I just don't get it.

Round 1: India (22nd - 25th Jan, 2016)
It was a spur of the moment decision to participate in this round, since, being authored by Indians, I just assumed I couldn't compete. But Prasanna informed me that I was the test-solver only for the Competitive section and I could, in fact, participate in the Casual. And I did.

I finished 4th with 262 points. The highest score was 275 points by Adam Bissett of UK. Not a bad start.

The puzzles were excellent. The Buttons and the Number Series really got me scratching my head for a long time, and ultimately ended up missing out on two Number Series and a minor error in Shape Count.

Its quite funny that considering normalization is not being used this year, you'd expect the rounds to at least be similar in terms of points. The Competitive section was worth 697 points while the Casual section was worth 293 points. So, I have no clue where this is going. Lets hope its not too bad.