Sunday, April 17, 2016

Summary about the 2016 big scholarly workshop (WWW companion)

Summary for 2016 Big Scholar Workshop

This is a great event for scholarly data oriented researchers to share the ideas and interact with each other. The talk from this workshop is very informative from several scholars with rich academic reputation. The keynote speaker included: Dr. C. Lee Giles (Pennsylvania State University), Dr. Jie Tang (Tsinghua University). Dr. Giles shared his works on CiteSeerX system and the future plan about open and share the dataset to other researchers. He also mentioned some of the interesting work, e.g. To estimate the scholarly document number on the web. Dr. Tang shared about his works on Aminer. This is another famous project about the scholarly portal and recommendation system. His talk is mainly about the detail of system implementation, e.g. How to collect paper and parse them into structured text, how to maintain the user profile for data accuracy, how to find out the domain expert based on collected scholarly data, etc. The Aminer system is now focusing on the expert recommend task. I raised one question about how many percentages the user actually maintains/interact with the system. Dr. Tang replied: although everyone can edit/modify any profiles, but few users are actually using the system. To make sure the data accuracy, they will focus on the manually maintain on some of the listed domain experts.

There are some more famous scholars joined the workshop, included: Dr. Jevin West (University of Washington), Dr. Feng Xia (Dalian University of Technology), Dr. Huan Liu (Arizona State University), Dr. Kuansan Wang (Microsoft Research) and Dr. Philip S. Yu (University of Illinois at Chicago). It is nice to hear their presentation and have some feedback on my work from them. Their feedback included: 1) why not included venue information for the prediction model? 2) Can you predict the future junior school productivity based on your model? 3) How do you sample the postive/negative size for the model evaluation? This is a critical point for the performance of the classification problem. 4) How do you define the junior scholar age? All of the feedback is valuable for me to refine my future works. [1]

Since there are many projects work on the scholarly data analysis, e.g. Google Scholar, Microsoft Academic Search, CiteSeerX, Aminer and more (Conference Navigator). The closing remark is discussing the platform to utilize the data in different sources and create this as a community for scholarly data researches. The research topic can be extended into data sciences, education, health and more. Dr. Giles and Dr. Wang is actually initiated the next workshop or conference into a broader scope. Dr. Wang, as a representative from industry, is agreeing to provide some of the infrastructure support for all scholars to work on the big scholarly projects. Dr. Giles is also willing to open the dataset for further collaborations. I believe this would be a potential research direction for future studies.


  1. Tsai, C. H. and Lin, Y.-R. (2016), Tracing and Predicting Collaboration for Junior Scholars. WWW 2016 Proceedings (workshop paper)