Friday, October 23, 2015

A talk summary of "Professional Research Opportunities for Ph.D. Students in MSR"

Summary 

This is a talk summary of [1], the speaker mentioned:


  1. It is a new era of Microsoft research with new CEO and board member. The research lab now more follows and support the three pillars of the new company agenda, compare to the time of bill gate. There are three pillars of MSR (Microsoft Research Lab): Productivity, Smart cloud and Personal computing, e.g. health care. The mission statement is, like academics to pursue knowledge, looking for new ideas and innovation from MSR to the corporation. 
  2. Two constraint of applying a funding for researcher: the budget from the Congress is shrinking, also the current political situation in Congress: more young assistant that might lack of the passion on sciences and more focus on sweet issues, e.g. education bills and subsidy. 
  3. The difference between the research lab and academia: in MSR, you don't need to worry about the research funding for graduate students, you have colleagues and interns. You will get 12 months salary and the research lab inspires the fundamental research questions. It is a bottom up style; In academia, you will need more writing proposals, sometimes it is hard to get money to support your research. 
  4. MSR cares the career professional development. Their own career. Welcome to co-work with other teams. Seek to impact to the world. Get your knowledge or innovation on the Microsoft product. Get things into practical products. 
  5. In big-data era, there is an advantage to work in research lab: the massive data inside the company that cannot share outside of the world. This would be benefited with the research that doing speech, translation, machine learning and some more research subjects. 
  6. Microsoft also puts things out of open source. Not like IBM, who much more value patents, but MSR respects more to academic value. When you interview a position, what is the cultural difference in there? This is a question worth to ask. 
  7. MSR evaluated the research by their impact. They expected you as the expert in some fields, also bring value to companies. Change the peer-review rule two years ago, not to only counting your publication. No tenure track. 
  8. MSR also encourage to deliver the research to start up. Welcome that kind of people to bring in more different genes. 
  9. No matter you are in industry or academia, you will learn how to expose your work in confidence.  Who does your audience you focus on? E.g. how to explain your work to WSJ reporter? This is a story telling absolute you need to learn. Also, to get funding is important in academic, good to know who control the faucet (money). 


Thoughts

I feel the goal of Microsoft from the new board of directors is clear: productivity. All the products and services from MS will be around this core. The MSR would not be an exception. The vice-president speech clearly points out the three pillars of the mission statement. It is hard to avoid the demanding from boards about delivery the research, innovation to the value of the company. In another way, this might be a chance to bring the research into a real world products or services. However, is the mission goal of company will crowd out the support of fundamental researches? This might be an another question worth to ask. Besides, to leverage the huge dataset from the MS products and services would really be a beneficial for researchers, e.g. the social network from Hotmail, the behavior patterns from Office 365 or the user generated data from Windows. All the data is valuable for researcher to proceed the experiments. This might be also a disadvantage and challenging for data sciences researchers in academia. Any thoughts?

Reference

  1. Professional Research Opportunities for Ph.D. Students, Jeannette M. Wing, http://halley.exp.sis.pitt.edu/comet/presentColloquium.do?col_id=8982


Technology policy research of Korea - mobile platform and network neutrality

Summary

The technology policy from the government always plays a critical role in the industry. In the research of [1], the authors intended to measure the efficiency difference before and after the platform standardization policy formulation and implementation (WIPI). They adopted the idea of "efficiency frontier"to measure the mobile company performance difference. According to their empirical finding, the government-led mobile platform standardization policy is negatively affecting the mobile companies' efficient, compare to the company independent to the mobile network operators. The author suggested the government should play a supporting-role in the policy regulation.

In paper [2], the author discussed the network neutrality effects on new internet application services. They proposed a simulation experiment to examine the service diffusion in different network regulation settings. They found the "more latency sensitivity and broader bandwidth services have displayed a higher willingness to pay (WTP) for high-priority Internet services" which accordance to with the assertion of network provider have the incentive to charge additional fees on certain services. However, they further explored the diffusion effect under government regulation. They found the discrimination from network provider might hurt the growth of new-coming internet service diffusion. They also suggested the government need to take in action to protect the new innovative service in early stage.

Thoughts

The research of technology researches is full of the regional differences. However, this is a good way to reference the experience of other countries. For the first paper, the authors imply an interesting research question: what is the role that government should play in the new mobile era? The technology is rapidly changing time by time. It is pretty hard for a government to propose a complete and flexible policy for the new industrial business. The idea of WIPI is clear, the Korean government intends to construct a universal mobile platform inside Korea. If the regulation succeeds, they can have a giant eco-system to promote their mobile application industry, as a game rule maker. The idea is similar to Japan mobile platform that adopted more localize special specifications. However, even larger market in Japan can not resist the platform competence from Apple ios and Google Android. That is why the government-led standard regulation is hard to compete with the other two industries-led platforms. There are more economic issues behind the scene, this could be another interesting research subject.

The dispute of network neutrality is another debate between network service providers and operator. There is a famous case from Netflix, a high latency sensitivity and broader bandwidth on-line video streaming services. The story is ended in Netflix pays the extra access fee to network operator Comcast. This action is a revenue worth decision for Netflix, but might be a huge barrier for some new internet service, as the paper [2] claims. The FCC is trying to forbid the discrimination charge from network operators, but this rule also arouses great controversy on the government internet control issue. The multiple stakeholder in this game makes this dispute continual. The simulation approach of this paper would be a way for us to examine the internet control policy.

Reference:

  1. Hongbum Kim, Daeho Lee, Junseok Hwang. (2016). Measuring the Efficiency of Standardisation Policy Using Meta-Frontier Analysis: A Case of Mobile Platform Standardisation, International Journal of Mobile Communications.
  2. Lee, Daeho, and Hongbum Kim. "The effects of network neutrality on the diffusion of new Internet application services." Telematics and Informatics 31.3 (2014): 386-396.

Tuesday, October 20, 2015

An summary review of three recommendation systems for academia.


Summary

The recommendation system is a way to help users to better retrieval or digest the ubiquitous enormous information. In [1], the authors tried to build up a committee candidate recommendation system to help conference organizer. This is one sub-domain of expert finding studies. They adopted the social network of the program committee (PC), publication history and topical expertise matching to generate a list of potential committee members. They found the three prediction features are all useful to provide a useful recommendation result. 

In [2], the authors intended to help scholars to choose the suitable journal to contribute their works. They built a system to ask user input their publication title, abstract and domain tag. Based on the input information, they proposed an information retrieval model to generate the high-similarity journal list. This is basically a content-based approach using BM2.5 algorithm.  This study focus on the publisher of Elsevier that might favor the paper in relevant discipline. If the sample paper is enough, the recommendation performance is valid. 

In [3], this paper helped user to search the relevant research publication based on their publication reference list. The idea was to enhance the search ability rather than keyword-based search. This could be treated as an extension of information retrieval research. They proposed a system to ask user to input a list of reference paper and built up the citation graph. This is a graph-based approach to do the recommendation system. The idea behind this system implies the potential of secondary or more search in different applications. 

Thoughts

The idea of recommendation system could be applied to several research questions. However, it is pretty difficult to examine the effectiveness of the recommendation result. There are three main directions to solve this issue: 1) ground truth[1][2]; 2) user study[3]; 3)domain expert (e.g. knowledge ontology, expert review). 

  1. The ground truth approach: is widely used in many different data mining researches. The study compared the proposed model with the previous real world user generated data. This is a way to claim the effectiveness of your model or system. However, there are two issues here. First, in some research topics, it is pretty hard to get the ground truth (or maybe not re-producible). Second, this is encouraging research to "fit" model into the exist dataset. The model might not fit the new growing data or features. 
  2. The user study approach: is another widely used approach in cross-domain studies. For example, the psychology that study in human behavior would hire the users to do experiments in a closed, controlled environment. In another way, the proposed recommendation system would record the user feedback to examine or improve the system effectiveness. However, the cost of user study is high, either to hire participants or build up a system with large number of users. Moreover, the closed, controlled experiment might not fit the sense of ecologically. The conclusion might lack of the utility in the real world. 
  3. The domain expert: is an approach that more used in social science. They invited the domain expert to verify the result. Based on the reliability of the domain experts to prove the effectiveness of the research finding. In another way, some research focused on the ontology building of domain knowledge. This is a way to transfer domain knowledge to a recommendation model. However, the cost to invite domain expert is high, moreover, the comments from experts might be an inconsistency or contradiction. The same issue also exists in domain knowledge building, to construct a complete and rigorous logic, ontology is still a challenge research problem.

    Reference:
    1. Han, Shuguang, Jiepu Jiang, Zhen Yue, and Daqing He. “Recommending Program Committee Candidates for Academic Conferences.” In Proceedings of the 2013 Workshop on Computational Scientometrics: Theory & Applications, 1–6. CompSci ’13. New York, NY, USA: ACM, 2013. doi:10.1145/2508497.2508498.
    2. Kang, Ning, Marius A. Doornenbal, and Robert J.A. Schijvenaars. “Elsevier Journal Finder: Recommending Journals for Your Paper.” In Proceedings of the 9th ACM Conference on Recommender Systems, 261–64. RecSys ’15. New York, NY, USA: ACM, 2015. doi:10.1145/2792838.2799663.
    3. Küçüktunç, Onur, Erik Saule, Kamer Kaya, and Ümit V. Çatalyürek. “TheAdvisor: A Webservice for Academic Recommendation.” In Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries, 433–34. JCDL ’13. New York, NY, USA: ACM, 2013. doi:10.1145/2467696.2467752.

    Friday, October 16, 2015

    Reading Summary: "Content Driven User Profiling for Comment-Worthy Recommendations of News and Blog Articles"

    Summary

    The main research goal of this paper [1] is to recommend a "Comment-Worthy" article list for users. The authors argued the previous research [2] adopted a content-based collaborative filter approach that lack of the consideration of the relation between article segments and comments. They provided an example about cell phone introduction news, the comments listed below might connect to the different segments inside the article, or even the irreverent interests from readers. Hence, it is inappropriate to consider an article and its comments as a single topic. They proposed a Collaborative Correspondence Topic Models (CCTM) approach that can capture the probability distribution between article-comment and article-user, and follow by the Monte Carlo Simulation, Gibbs Sampling, stochastic MCEM algorithm to estimate the latent offsets and generated the perdition list. According to the paper, the experiment indicated the performance is better than three baseline models in both cold-start and warm-start settings basis. 

    Thoughts

    The idea to consider the relation between article-comment and article-user is interesting. The authors take the user profiling (based on the user comments) into account, to build up a user-personalized comment-worthy recommendation list. This paper provided a way to link the user feedback and their user interests. This could be a way to build up the user profiling (modeling). The idea in this paper could be applied to the person's recommendation as well. However, I have a couple comments on this paper: 1) the empirical result pointed a clean dataset difference. The model performance of the Daily Mail is much better than the other two. There might be some characteristic exist in the dataset that make the model can do a better prediction task; 2) the commenting behavior is diversity. Many of the comments are actually junk posts. I think an exploration of the dataset comment distribution would be better to understand the commenting behavior; 3) the proposed is adopted content-based and topic model approaches. We can only see a slight improvement between the new proposed model and the CoTM (pure content-based approach). 

    Some possible potential research topics: 
    • Inline commenting behavior: what is the user replying to? For any real world events or the articles, news, social network posts. There are tons of the re-ply posts from the public readers or their social network friends. It will be interesting to see the re-play behavior and analysis in different scenarios. For example, when a new apple product announcement news published, what is the comment mainly about? Or, when an emotional (happy, sad, exciting, etc.) tweets showed on Twitter, what is the response from their following/followers? 
    • Content Driven User Modeling: we might take the published text or other user generated content to build up a user model. This model could apply to recommendation task, behavior comparison, performance analysis, etc. It will be interesting if we can leverage the current information to the other targets, e.g. to solve the cold-start problem or learn from the other rich data source. 
    More...

    Reference
    1. Bansal, T., Das, M., & Bhattacharyya, C. (2015, September). Content Driven User Profiling for Comment-Worthy Recommendations of News and Blog Articles. In Proceedings of the 9th ACM Conference on Recommender Systems (pp. 195-202). ACM.
    2. Shmueli, E., Kagian, A., Koren, Y., & Lempel, R. (2012, April). Care to comment?: recommendations for commenting on news stories. In Proceedings of the 21st international conference on World Wide Web (pp. 429-438). ACM. ISO 690

    Monday, October 5, 2015

    Linkedin "add connections" functions legal disputes, what we should do?

    I got this email from Linkedin few days ago. According some blogger and new reporting, this is true. The email I got looks like:

    NOTICE OF PENDING CLASS ACTION AND NOTICE OF PROPOSED SETTLEMENT
    PERKINS V. LINKEDIN CORP.
    You are receiving this e-mail because you may have used LinkedIn's Add Connections feature between September 17, 2011 and October 31, 2014.
    A federal court authorized this Notice. This is not a solicitation from a lawyer.
    Why did I get this notice? This Notice relates to a proposed settlement ("Settlement") of a class action lawsuit ("Action") against LinkedIn Corporation ("LinkedIn") based on LinkedIn's alleged improper use of a service called "Add Connections" to grow its member base.
    What is the Action about? The Action challenges LinkedIn's use of a service called Add Connections to grow its member base. Add Connections allows LinkedIn members to import contacts from their external email accounts and email connection invitations to one or more of those contacts inviting them to connect on LinkedIn. If a connection invitation is not accepted within a certain period of time, up to two "reminder emails" are sent reminding the recipient that the connection invitation is pending. The Court found that members consented to importing their contacts and sending the connection invitation, but did not find that members consented to LinkedIn sending the two reminder emails. The Plaintiffs contend that LinkedIn members did not consent to the use of their names and likenesses in those reminder emails. LinkedIn denies these allegations and any and all wrongdoing or liability. No court or other entity has made a judgment or other determination of any liability.
    What relief does the Settlement provide? LinkedIn has revised disclosures, clarifying that up to two reminders are sent for each connection invitation so members can make fully-informed decisions before sending a connection invitation. In addition, by the end of 2015, LinkedIn will implement new functionality allowing members to stop reminders from being sent by canceling the connection invitation. LinkedIn has also agreed to pay $13 million into a fund that can be used, in part, to make payments to members of the Settlement Class who file approved claims. Attorneys representing the Settlement Class will petition the Court for payment of the following from the fund: (1) reasonable attorneys' fees, expenses, and costs up to a maximum of $3,250,000, and (2) service awards for the Plaintiffs up to a maximum of $1,500 each. The payment amount for members of the Settlement Class who file approved claims will be calculated on a pro ratabasis, which means that it will depend on the total number of approved claims. If the number of approved claims results in a payment amount of less than $10, LinkedIn will pay an additional amount up to $750,000 into the fund. If the pro rata amount is so small that it cannot be distributed in a way that is economically feasible, payments will be made, instead, to Cy Pres Recipients selected by the Parties and approved by the Court. No one knows in advance whether or in what amount payments will be made to claimants.
    This is an interesting case about the customer privacy protection. We all know the Linkedin add friend function is annoying. It sneaky your contacts through your email account or social media webs. This is always a gray area the between end-user and the company who using these data. I send out the claim to see what will happen in the next move. I don't think we will get a big check by this case. But it will be interesting to follow the upcoming event. So you have also gotten this email, you can either choose to file a claim or just ignore it. To file a claim is pretty simple, you just click the link at the bottom of the email and enter your "Claim ID" and some of your personal information. You will get the check or money transfer in the next couple months.

    For a data sciences research, this might be a case that we need to be aware of. These kind of legal dispute will be continued while the industry trying to collect and utilize more and more data from different sources - Web, SNS, Cellular and even the sensor around us. I believe some people will not be happy to see their data be leaked in this way. Maybe there would be another research subject to prevent or help this. We will see.