I will be attending ICML for the first time this year, and wanted to dig a bit deeper into the as preparation. ICML is one of the major academic Machine Learning conferences, alongside NIPS, ICLR and others.
Last year, Tesla’s AI Director Andrej Karpathy published a similar post detailing 2017 accepted paper statistics. This was a source of inspiration for this work and can be used as a comparison point to this year’s statistics.
The first thing I did was to check which institutions were most mentioned in papers. I used python, regex, and manually resolved name collisions (ie. collapsing “Google Inc”, “Google Brain”, etc. into “Google”). Each institute’s name was only counted once per paper, so we are looking at the number of mentions across papers as opposed to the number of authors from each institute.
In total, we get ~480 unique institution mentions, up from last year’s 420. The total number of accepted papers was 621, indicating that ~141 papers had cross-institute collaboration.
The top 30 institutes are listed below with their counts.
In order to get a better idea of industry vs. academic representation at ICML, I looked at the percentage of industry vs. academic authors in the top 10 institutes.
Of the paper counts for the top 10 institutes, 45% of papers mentioned industry authors.
Looking specifically at Alphabet’s involvement in the paper counts for the top 10 institutes, I found that
Google & DeepMind were mentioned in 28% of papers.
What’s more surprising is that
Google & DeepMind were mentioned in nearly 13% of all papers accepted to ICML.
This is up from last year’s 6.3%: a clear sign that Alphabet is successfully continuing to scale its Machine Learning research efforts. This is especially prevalent when considering that similar to last year, academia is mentioned in ~75% of papers. Therefore, authors from academia are roughly maintaining their strong representation at this conference.
It would be interesting to continue posting this analysis over the years, and watch trends shift. I would also like to run topic modelling on the abstracts of the accepted papers over time to uncover trends in the prevalence of different research topics.
If you find any errors as you read through this, please leave comments and I will resolve them shortly!