Search This Blog

Wednesday, November 12, 2014

Tradeshift Text Classification

The Tradeshift Text Classification competition was held on Kaggle from 2nd October, 2014 to 10th November, 2014.

Tradeshift has a dataset of thousands of documents, and groups of words are assigned certain labels, eg: Date, Address, Name, etc. The challenge was to create an automated model that predicts which label a certain group of words belongs to.

The train data consisted of 1.7 million rows along with its correct label(s). There were 145 variables containing various attributes about the group of words and also regarding some of the surrounding words/text. There were 33 different possible labels.
The test data consisted of ~ 0.5 million rows for which we had to predict the labels.

I started off with an absolutely beautiful and elegant online logistic regression code by tinrtgu. Well, so wonderful was this model that most of the top competitors started off and finally used it as part of their best models. Here's the kicker: at the time of sharing the code, it was powerful enough to get into 1st place!

The online model one-hot encodes all the variables, so I rounded off all the numeric variables to one decimal since 'similar' values should ideally be treated the same.

Built Random Forest for y33, the most important label. I took some of the most important variables and tried some interactions with the hash variables for the online code.
Then built RF for all labels individually. Too intensive! Took me over a day across two PCs!

Added the RF-predictions into the online code, and bang! that was it. This submission got me into the top-10.

Tried some other models, but without much success. XGBoost gave promising results, and I added the XGB-predictions into the online code.

Final Model
My final model was M1-M2-M3-M4 into the online code which got me a score of 0.0049356 / 0.0050160 on the LB and would've been ranked in the twenties.

'would've been'? That's right. Abhishek Thakur teamed up with me and we tried some ideas and models together. I don't remember the last ditch ideas that Abhishek tried (I'll update soon), but the final ensemble with a score of 0.0048200 / 0.0048783 certainly helped us secure 13th rank out of the 375 teams.

We tried some other models like GBM, SGD, etc. and some other features, variables, tweaking/tuning of parameters, without much success.

This was the first competition where I came up with a very competitive result and its given me the confidence of coming up with more in the future.

The CV results and LB scores were very close, consistent, and it was an absolutely perfect data-set. The online code is what I'm taking away from this competition, it is now one of my favourite models :-) (Thank You tinrtgu).

Congrats to the Chinese team of rcarson and Xueer Chen and also to the second place team of three French guys who all did a fabulous job of entertaining us to the last day when they had the exact same score on the Public LB! I mean, seriously, how close can you get. Its unfortunate this competition has only one prize, but hey, that's life.

Now my overall Kaggle rank is 206th! Yay! My career-best. You can View My Kaggle Profile.
I'm hoping to get into the top-100 early next year, and hopefully the top-50 (or top-25?) by the end of 2015.

So, lots more to come!

Check out My Best Kaggle Performances

Monday, October 13, 2014

Sudoku Mahabharat 2015

During my trip to London, UK for the World Sudoku Championship 2014, the Indian team had a very entertaining dinner just before the WSC finale and prize distribution. When you sit at a table with Deb, Jaipal, Jayant, Prasanna, Rishi and Swaroop, there's bound to be some fun. Kunal was lost somewhere and Sumit had left to meet some relatives in London. Well, they certainly missed a lot of fun.

During the humorous discussions, we touched upon the topic of creating some more enthusiasm among sudoku solvers in India. It is certainly true that Prasanna, Rishi and me have been the top players in India and people tend to lose motivation knowing its hard to beat us and make it to the team. This was the sixth consecutive year I've been in the India A-Team, and though the top players have been performing extremely well at the World Championships, there was a need to do something more for the players who have potential of becoming the top solvers of India some day. And thus, Sudoku Mahabharat was born.

The idea suddenly hit Deb that we should conduct some sort of national event in India, but more to encourage new players and potential players to participate, win and get a feel of being and performing at the top-level. The idea was discussed and we came up with this event.

Read more about the rules, eligibility, schedule and format of Sudoku Mahabharat 2014-2015

Six of the top players: Jaipal, Prasanna, Rishi, Sumit, Swaroop and me won't be participating, in fact, we'll be organizing the event. This would give some of the upcoming solvers a chance to win this event and hopefully be a part of the Indian team soon.

Episode 1: Standard Variants by Rishi Puri
Dates: 20th-22nd September, 2014
Download Instruction Booklet
Download Puzzle Booklet
View Forum
View Results

Episode 2: Irregular Variants by Rohan Rao
Dates: 18th-20th October, 2014
Download Instruction Booklet
Download Puzzle Booklet
View Forum
View Results

Episode 3: Odd-Even Variants by Deb Mohanty
Dates: 15th-17th November, 2014
Download Instruction Booklet
Download Puzzle Booklet
View Forum
View Results

Episode 4: Outside Variants by Rakesh Rai
Dates: 20th-22nd December, 2014
Download Instruction Booklet
Download Puzzle Booklet
View Forum
View Results

Episode 5: Math Variants by Prasanna Seshadri
Dates: 17th-19th January, 2015
Download Instruction Booklet
Download Puzzle Booklet
View Forum
View Results

Episode 6: Neighbours by Rajesh Kumar
Dates: 21st -23rd February, 2015
Download Instruction Booklet
Download Puzzle Booklet
View Forum
View Results

Episode 7: Math Variants by Swaroop Guggilam
Dates: 21st-23rd March, 2015
Download Instruction Booklet
Download Puzzle Booklet
View Forum
View Results

Whether you're a beginner, regular or pro, whether you're 17yrs, 31yrs or 67yrs, whether you're an Indian or not, this is a chance for you to participate (even for fun!) and solve some different interesting sudokus created by some of the best sudoku creators in India. There is something for everyone to enjoy! Who knows, maybe you can be the one who wins the Sudoku Mahabharat !

Sunday, August 24, 2014

World Sudoku Championship 2014

The 9th World Sudoku Championship was held in London, UK in August, 2014.

Official Website

The Indian Team for the WSC were the winners of the Times Sudoku Championship: Prasanna Seshadri, Rishi Puri, Sumit Bothra and me.

Prasanna, Rishi and me were in the team last year as well, and we secured 15th, 28th and 16th rank respectively, so, we were hoping to do better this year, in the individuals as well as the team (India were 7th last year). We had a B-Team too, Swaroop, Jaipal, Jayant and Kunal.

I've been on a long break from solving puzzles, but after ISC, I decided to prepare for the WSC. I received a jolt with a pathetic TSC performance, but I was glad I made mistakes in TSC, rather than ISC and hopefully WSC.

When the WSC IB was released, I was a little disappointed. And after the WPC IB was released, I spent all my time in preparing for WPC. Well, so much that I organized the WPC (World Practice Championship) on LMI.

So, what was wrong with WSC? Nothing wrong, it was just that the WSC looked a little bland. All the usual variants, some repetition among types, no hard Classics and nothing specifically to practice or prepare for. The rounds felt easy-ish and finish-able, and it did turn out that way.

Round 1 was good, I missed a couple of low-pointer Classics.
Round 2 was bad. I messed up the Surplus, missed the Killer (and later had a terrible error in Diagonal).
Round 3 was good, just missed the Max Triplet.
Round 4 was excellent, just missed the Inequality (and later had a single cell error in a Classic).
Round 5 was ok, got stuck at Killer Pro and lost some time, but managed a decent score.
Round 6 was a round I was dreading, and I didn't do well. Wasted time on the Toroidal that went wrong towards the end. Few more seconds and I could've finished the Parquet.

Round 7 was a team round, where we finished with bonus. We made a single digit error and lost half the puzzle points and all the bonus :-|
Round 8 was a team round that we royally screwed up. Well, we certainly didn't prepare enough as a team.

So, that was Day-1. I was 12th after the Day-1 results were out. Felt OK-ish. Playoffs chances were bright, but had to do well on the second day.
Prasanna had a terrible Round 5 and Rishi had an average day, pushing them outside the top-20.

I was all set for Day-2.

Round 9 was the 'big round', the longest round with maximum points. During last year's WSC, there was a similar 'big round' on the morning of the second day, which I totally messed up and dropped a few ranks. This year, I didn't mess up, but I didn't do my best either. I broke Clone and Cylindrical. Didn't have time for Between. And later had a mistake in Diagonal. Why can't I solve Diagonal Sudokus this year?
After this round I knew I didn't have any chance for playoffs. So, my goal was to at least improve on last year's rank (16th).

Round 10 was the last round, an overlapping sudoku (similar to last round of WSC 2012), and I like these. I finished WSC on a high by completing this round in 8mins, securing a 12-min bonus. Only 3 players finished better, with a 13-min bonus.

Round 11 was a team round where we had to solve sudokus that were linked. We were confident of doing well in this round since we all had 'different' variants that we liked, and we did well.

So, that was my WSC 2014. I stood 14th. Better than last year, but still a big gap between 10th and 14th. I'm happy I've been performing consistently over the years, with/without practice: My last 5 WSC ranks are: 14, 16, 8, 12, 15

Prasanna never really recovered from that bad round 5, and finished 21st. Rishi's performance was below par, 36th. Sumit made a couple of mistakes towards the end, and dropped out of the top-50, with his 55th.

Our Team was swapping between 7th and 8th over the rounds, and we thought we'll be 7th (again!). But, special thanks to the French team for making a goof-up in the last team round. They lost nearly 2000 points, and that pushed us to 6th, which is India's best team rank ever.

The playoffs format was different from last few years, and even though there were 10 in the playoffs, it was mainly a fight between the top-4. I found this format better than most other WSCs especially since, the worst case scenario for the preliminary toppers is 4th place.

Tiit and Kota were going head-to-head, it was a very close contest between the two, which Kota eventually won. (If he had lost, he would be WSC runner-up for the 4th consecutive year!). On the other hand, Tiit has won the preliminary twice but still can't call himself a WSC Champion. Well... that's life. Maybe next year.
(There was some problem with Bastien's and Jakub's papers during the playoffs, but I think, the organizers found a way out by awarding them joint 3rd place)

Congrats to Kota Morinishi for winning WSC 2014, Tiit Vunk for 2nd and Bastien-Vial-Jaime and Jakub Ondrousek for joint 3rd. They have been performing well consistently and they are the deserving winners.

View the complete WSC Results

Overall, I think this was a much easier WSC than the last few years. I enjoyed the sudokus, the rounds and the experience. The team rounds were fun and easy. The playoffs were good and fair. So, that brings an end to a good and successful WSC, but a little bland one. Yes, the only little negative is that too many standard variants, and some unnecessary repetition. But, I guess that's what made it feel easy and fun :-)

Thanks to UKPA and the UK Organizers and Volunteers for conducting this wonderful WSC. Thanks to all the authors for the enjoyable sudokus and hope to see more in future.

This was my 6th WSC, and it would be in my favourite two.

Monday, July 21, 2014

Beginners Contest on LMI

I authored the July Beginners Contest on LMI. This was a short contest consisting of 4 Classic Sudokus and 4 Sudoku Variants. The difficulty was aimed at ensuring new comers to be able to solve these. The contest was held from 23rd July - 28th July, 2014.

The Sudoku Variants that appeared are Arrow, Diagonal, Extra Region and Trio. You can view some examples here.

You can discuss about the contest on the LMI Forum.

There were totally 295 participants across 38 countries! Congrats to sworls (USA), Mehmet Eren (Turkey) and Vidhya (India) for being the top-3 Beginners. And congrats to Kota Morinishi (Japan), Timothy Doyle (France) and Hideaki Jo (Japan) for being the top-3 Seasoners.

You can view the Complete Results.

Overall from the feedback I received, most participants enjoyed the 'easy' Classics. Among the variants, Trio is always easy, Diagonal was easy, Extra Region was medium and Arrow, well, Arrow was something, right? The Arrow Sudoku was certainly challenging, especially for beginners, but I don't think it was too tough :-)
Maybe you can decide for yourself after checking the solve below:

This is the puzzle:

The 'long' arrow has to be '6'. After that, I admit, the next step is not straightforward. Intuitively, I thought people will target the bottom-left, and there is an opening there. You should get the following pencilmarks using simple addition rules and constraints.

If you look at R8C4, it cannot be '2' or '3' because it will result in the following contradictions for those two arrows. This is not very easy to identify, but filling those two arrows is quite tight.

Hence, R8C4 is a '1'. From here, the solve is smooth.

At this stage, R8C3 cannot be '3', hence it is '2' and that will also enable the other arrow to be completed. There are other ways too, but if you try filling up box 7, along with the arrows, there is just one possibility.

Using Classic rules, you can reach this stage, with those two possibilities for '5' in box-8.

The arrow in box-2 covers R1C5 and R2C6. Minimum in R1C5 is '3' and minimum in R2C6 is '6'. So, it has to be 3+6=9, which will also give you the 5-9 pair in box-8.

The centre box can then be completed, which will give you a few more digits using Classic rules.

The arrow with '7' can only be 1-6 and you get a few more digits.

You have just one arrow left to complete, and eliminating '4' is simple since neither 1+3 nor 2+2 is possible. Using '8', you should get 3+5 and the rest gets solved using simple Classic rules.

So, what do you think now? Tough? Maybe not that much!

This Arrow Sudoku was not very trivial and a lot of solvers struggled on this variant. But I hope you enjoyed solving it at the end.

And, I hope everyone enjoyed the sudokus in general, and thanks for participating in this contest! :-)

Monday, July 14, 2014

Happy Birthday Nanamma!

I created an Alphabet Sudoku for my Grandmother's birthday themed on her name. Her name is Shanta Murthy and she loves solving all kinds of puzzles. She solves most of the online puzzle contests at leisure during her spare time and usually completes solving all puzzles, sometimes taking multiple weeks for difficult ones! (I myself give up on really tough puzzles :-) )

She turned 82-yrs on 14th July, 2014 but has a puzzling mind of a 28-year-old. She feels proud of all my puzzle-related achievements but is too shy to compete herself. I keep telling her she will easily win an over-75-years category prize (if there was any!) and I still hope some day she does.

I lovingly call her 'Nanamma' (which means 'paternal grandmother' in our local language) and I dedicate this Alphabet Sudoku to her.

Nanamma solves more puzzles than me every week and it has helped keep her mind active and agile through these years. I hope she continues to enjoy solving puzzles and maybe some day she'll create one for me :-)

Thursday, June 19, 2014


I've left off from my previous blog The Logical World of Puzzles, and I start here, something similar, yet, new and exciting.

Its going to be a mix of my passion, interests, hobbies, along with my opinions, views, results and analysis of them. Let me not write a literal 'introduction' here, I'll let the blog speak for itself.