Originality in research
We never think entirely alone: we think in company, in a vast collaboration; we work with the workers of the past and of the present. [In] the whole intellectual world … each one finds in those about him [or her] the initiation, help, verification, information, encouragement, that he [or she] needs.
— A. D. Sertillanges [11, p.145]
In a post called It’s academic, I used the phrase “original contribution to scholarship” to describe one of many important outcomes of PhD research. The phrase is usually synonymous with publications in peer-reviewed professional literature. Yes, having our research published in a research journal is an important outcome. It is one of many tangible outcomes of our candidature, besides a hard-bound thesis thick enough to be used as a paper weight and chock full of jargon to the extent of frightening away everyone except the brave souls of our professional society. But what about intangible outcomes? In this post, I start exploring some common intangible outcomes of a PhD education, drawing on the phrase “original contribution to scholarship” as a springboard for discussion. Think of this post as the first installment in the series of SOC posts: scholarship, originality, contribution, but not necessarily to be discussed in that order.
First, I deal with the issue of originality in this installment. Originality can be described as something that has not been conceived of or done before by someone. With a vast body of research literature out there, how are we to know what we’re doing is original research? How do we know we’re not duplicating someone else’s published work? Short answer: it’s almost impossible to know for certain. There’s an apocryphal anecdote that John von Neumann came up with two solutions to two problems, but on both occasions afterward learnt that someone had already solved the problems. Another example is Gauss’ work. Gauss didn’t publish much of his mathematics research. In many cases, later works were found to have been rediscoveries of Gauss’ unpublished work. Or consider the calculus war , a priority dispute over whether Leibniz or Newton was the original inventor of calculus. One of our best guards against duplicating someone else’s work is to be aware of the general research trends in our chosen area and have a clear statement of our research questions. These two topics will be discussed in turn.
To know what’s happening in our area, we should start off by reading some research papers that we think are directly relevant to our project. A good starting point should be expository papers or Wikipedia articles, or publications providing high-level introductions to certain topics. The objective is to get a lay-person’s overview of our research area before delving into research papers. Survey papers generally provide technical overviews of particular research areas, with detailed analyses and discussions of salient theories, trends, topics, and techniques. A survey paper’s purpose is to be a guide to the vast body of research publications on a particular topic, often citing hundreds of relevant publications spread over dozens of journals and books. More often than not, a survey paper should be read as if it is a practitioner’s guide to recent research. Don’t expect every concept or technique to be clearly explained, or explained at all. A good strategy for reading survey papers is to have ready access to relevant textbooks and reference materials. In fact, we should find out whether there are any textbooks or reference books relevant to our research topic and read them as well. In short, reading survey papers, expository publications, and textbooks should equip us with a bird’s eye-view of previous work in our area.
Once we have a general (or vague) idea of what has been published in our research area, we should start writing a literature review. This is similar to writing a survey of previous published work in our area, with detailed discussions of relevant theories, trends, topics, and techniques. It is by means of writing a literature review that we acquaint ourselves with relevant technical details, and successes and limitations of published techniques. Think of the literature review as a survey paper based on a sample of publications most relevant to our research questions. As we write our literature review, we should also maintain a list of research questions together with high-level strategies to investigate them. There’s never enough time to write a thorough and comprehensive literature review. The best we could do is to incrementally add to our literature review as we progress along our research project. When the time comes to write our research proposal, we would have available our literature review and a list of research ideas from which to draw inspiration and material.
Originality in conduct
Thus far, I have discussed how to start doing original research by first being aware of prior research. I now turn to originality in terms of how we conduct or carry out our project. Cryer [2, pp.193–196] identified a number of broad approaches in which a project is seen to be original. The author also identified originality in terms of a project’s outcomes, but I won’t discuss that point here. A project can be original in terms of: (1) tools, techniques, procedures, and methods; (2) exploring an unknown; (3) exploring something unanticipated; or (4) using data in a novel way. These will be discussed in turn.
First, our project could be original in terms of developing new tools, techniques, procedures or methods, or applying these to an existing problem or to a context where they have not been tried before. Regardless of the success or failure of such investigations, we would more often than not discover cases when a new approach yields new or insightful results, and situations where the approach sheds no light at all. Often, it is not enough to develop a computer program implementing a new algorithm or technique; we need to show how that program could be used to shed new understanding on a problem.
Second, we could venture into an unexplored area and in the process discover a new field of research. Examples of such investigations into the unknown include Hardin’s “tragedy of the commons” , Diffie and Hellman’s development of public-key cryptography , Haar’s introduction of wavelets , and Euler’s introduction of graph theory . It’s almost rare to develop a new field from scratch, and even if we do so it might take years or decades or even longer before anyone realizes the new field’s full potentials. We should also be aware that scientific research is a competitive enterprise with the attendant occasional priority disputes of who did what before whom.
Third, while investigating a problem we might serendipitously come across an unexpected result, phenomenon, or research direction. Examples of fortuitous discoveries include Louis Pasteur’s discovery of the cholera vaccine, Oskar Minkowski’s discovery that diabetes results from a pancreas disorder, and Alexander Fleming’s discovery of the enzyme lysozyme. In some cases such as the above, it is worth investigating the unexpected, but in other cases we should be cautious about investigating something that might lead down a dead end. Exploring the unanticipated can also be seen in research that lends a new perspective on an existing field, such as applying the notion of the tragedy of the commons to information and communications technology . Much fruitful research falling under the rubric of exploring the unanticipated also consists in applying existing tools, techniques, or methods to problems that have already enjoyed much attention. An example is the application of graph theory to analyzing social, technological, biological, and other types of networks .
Fourth, an existing dataset could be explored in a novel way. Among other things, this includes interpreting the dataset differently from how it has been interpreted in the existing research literature, or applying new techniques to process the dataset. The task of collating a new dataset itself is an original research project often involving a team of researchers. At every step of the project, we need to adhere to established data collection protocols. Once the dataset is finally assembled, there are protocols specifying how to cleanse the dataset, how to code the data, etc. Examples of original research one of whose primary goals was the collation of a new dataset include the human genome project, the Facebook dataset project , the C. elegans nervous system project , and the modENCODE Project .
 J. Bardi. The Calculus War: Newton, Leibniz and the Greatest Mathematical Clash of All Time. High Stakes Publishing, 2007.
 P. Cryer. The Research Student’s Guide to Success. Open University Press, 3rd edition, 2006.
 W. Diffie and M. Hellman. New directions in cryptography. IEEE Transactions on Information Theory, 22(6):644–654, 1976.
 D. Easley and J. Kleinberg. Networks, Crowds, and Markets: Reasoning about a Highly Connected World. Cambridge University Press, 2010.
 L. Euler. Solutio problematis ad geometriam situs pertinentis. Comment Acad. Sci. Imperialis Petropolit, 8:128–140, 1736.
 M. B. Gerstein, Z. J. Lu, E. L. V. Nostrand, C. Cheng, B. I. Arshinoff, T. Liu, K. Y. Yip, R. Robilotto, A. Rechtsteiner, K. Ikegami, P. Alves, A. Chateigner, M. Perry, M. Morris, R. K. Auerbach, X. Feng, J. Leng, A. Vielle, W. Niu, K. Rhrissorrakrai, A. Agarwal, R. P. Alexander, G. Barber, C. M. Brdlik, J. Brennan, J. J. Brouillet, A. Carr, M.-S. Cheung, H. Clawson, S. Contrino, L. O. Dannenberg, A. F. Dernburg, A. Desai, L. Dick, A. C. Dose, J. Du, T. Egelhofer, S. Ercan, G. Euskirchen, B. Ewing, E. A. Feingold, R. Gassmann, P. J. Good, P. Green, F. Gullier, M. Gutwein, M. S. Guyer, L. Habegger, T. Han, J. G. Henikoff, S. R. Henz, A. Hinrichs, H. Holster, T. Hyman, A. L. Iniguez, J. Janette, M. Jensen, M. Kato, W. J. Kent, E. Kephart, V. Khivansara, E. Khurana, J. K. Kim, P. Kolasinska-Zwierz, E. C. Lai, I. Latorre, A. Leahey, S. Lewis, P. Lloyd, L. Lochovsky, R. F. Lowdon, Y. Lubling, R. Lyne, M. MacCoss, S. D. Mackowiak, M. Mangone, S. McKay, D. Mecenas, G. Merrihew, D. M. Miller III, A. Muroyama, J. I. Murray, S.-L. Ooi, H. Pham, T. Phippen, E. A. Preston, N. Rajewsky, G. Ratsch, H. Rosenbaum, J. Rozowsky, K. Rutherford, P. Ruzanov, M. Sarov, R. Sasidharan, A. Sboner, P. Scheid, E. Segal, H. Shin, C. Shou, F. J. Slack, C. Slightam, R. Smith, W. C. Spencer, E. O. Stinson, S. Taing, T. Takasaki, D. Vafeados, K. Voronina, G. Wang, N. L. Washington, C. M. Whittle, B. Wu, K.-K. Yan, G. Zeller, Z. Zha, M. Zhong, X. Zhou, modENCODE Consortium, J. Ahringer, S. Strome, K. C. Gunsalus, G. Micklem, X. S. Liu, V. Reinke, S. K. Kim, L. W. Hillier, S. Henikoff, F. Piano, M. Snyder, L. Stein, J. D. Lieb, and R. H. Waterston. Integrative analysis of the Caenorhabditis elegans genome by the modENCODE Project. Science, 330(6012):1775–1787, 2010.
 G. M. Greco and L. Floridi. The tragedy of the digital commons. Ethics and Information Technology, 6(2):73–81, 2004.
 A. Haar. Zur theorie der orthogonalen funktionensysteme. Mathematische Annalen, 71(1):38–53, 1911.
 G. Hardin. The tragedy of the commons. Science, 162(3859):1243–1248, 1968.
 K. Lewis, J. Kaufman, M. Gonzalez, A. Wimmer, and N. Christakis. Tastes, ties, and time: A new social network dataset using Facebook.com. Social Networks, 30(4):330–342, 2008.
 A. D. Sertillanges. The Intellectual Life: Its Spirits, Conditions and Methods. Mercier Press, 1978. Translated by Mary Ryan.
 J. G. White, E. Southgate, J. N. Thompson, and S. Brenner. The structure of the nervous system of the nematode Caenorhabditis elegans. Philosophical Transactions of The Royal Society B, 314(1165):1–340, 1986.