Solution

Since we had statistics for each individual player, we tried just adding each individual stat as an attribute. Intuitively, this means that classifiers would be looking at say, a star player from one team’s batting average and trying to decide whether that team would win. From this intuition we figured it would be better to try the difference in each player’s statistics, rather than just each individually. Another thought was to try to score the batting roster based on having several high averages in a row, since hits in a row score runs. We also thought that maybe adding fielding statistics or other less-talked-about team stats might give accuracy a boost.

However, these attempts at improving performance failed; the average of each team’s statistics still had the best cross-validation accuracy for each of the four methods we focused on. Because of this, we decided to investigate thoroughly which statistics were the best predictors, and whether some off-the-wall statistics like total salary of a team could predict results better than a naive rule, such as picking the home team to always win (~53.5% of the time).