Conceptual distance matters when building on others' ideas in crowd-collaborative innovation platforms

In crowd-collaborative innovation platforms, other contributors' ideas can serve as sources of inspiration for creative ideas, but what patterns of interactions with others' ideas are most helpful? We investigate the hypothesis that building on inspiration sources that are conceptually far from one's target domain are most helpful, a popular hypothesis with mixed empirical support. We predict the success rate of 2,344 ideas for 12 different design challenges in a collaborative Web-based innovation platform based on their cited sources' conceptual distance from the target domain (measured using probabilistic topic modeling of the ideas). Surprisingly, we find that innovators who cite conceptually near sources of inspiration achieve a higher success rate than those who prefer far sources. We discuss implications for research and development of crowd-collaborative innovation platforms.


Introduction
Finding and building on sources of inspiration is part of any creative process [1] and often an important contributor to creative breakthroughs [2]. In crowdcollaborative innovation platforms (e.g., Quirky.com, OpenIDEO.com) where many people contribute ideas and collaborate with each other to solve a variety of creative problems, there is an opportunity to shape these platforms so that previously contributed ideas serve as sources of inspiration for further ideation. To better understand how innovators interact with other ideas, we need more empirical data on what sorts of ideas inspire novel and high-quality concepts. One hypothesis from the creativity literature is that candidate sources with the highest potential for inspiring creative breakthroughs are those that are conceptually far from one's working domain [3], i.e., structurally similar ideas with many surface (or object) dissimilarities (e.g., atom/solar system analogy).
The empirical evidence for this hypothesis is mixed: a number of studies have shown an advantage of far over near sources for creative outcomes [4][5], while others sources for creative outcomes [6], or even an advantage of near over far [7]. These inconsistent results could stem from the tendency to observe only short time-slices of the creative process (e.g., ~30-60 mins), whereas more time and iteration (e.g., over the course of days/weeks) may be necessary to benefit from far sources, due to the cognitive challenges of mapping far sources [4]. Statistical power has also been an issue, with most studies having an N of 12 or less per treatment cell, insufficient to detect even medium to large-sized statistical effects.

Overview
We investigate this interesting but unevenly supported hypothesis, and address prior methodological limitations by studying large numbers of ideas (on the order of thousands) and at a realistic time scale (days/weeks), in the context of OpenIDEO (www.openideo.com), a large-scale Web-based crowdsourced innovation platform that addresses social innovation problems (e.g., managing e-waste, increasing accessibility in elections). Over ~10 weeks, contributors to the platform first post inspirations (e.g., descriptions of solutions to analogous problems, case studies of stakeholders), which help to define the problem space and identify promising solution approaches, and then concepts, i.e., specific solutions to the problem, a subset of which are shortlisted for further development (see Fig. 1). Concepts are typically ~150 words long, providing more detail than one or two words/sentences/sketches, but less detail than a fullfledged design report (see Fig. 4). Contributors are encouraged to build on others' ideas: when posting concepts, contributors are prompted to cite inspirations that serve as sources of inspiration for their idea, which is stored and displayed as metadata for the concept (see Fig. 2).

Sample and Data Preparation
We created a simple web crawler to download concepts and inspirations, which exist as individual webpages. We then created a simple HTML parser to extract the full-text description of each concept/inspiration (for measurement of conceptual distance), and for all concepts, 1) information on which inspirations were cited as sources, and 2) an indicator for whether the concept was shortlisted for development. 707 concepts cited at least one inspiration as a source, with most building on ~10 (median = 10). These 707 concepts (and the 2,826 inspirations that were cited) formed the final sample for analysis

Measuring Conceptual Distance
We used Latent Dirichlet Allocation [8], a form of probabilistic topic modeling, to learn a high-dimensional topic space from the full-text descriptions of the challenge briefs and concepts/inspirations While most concepts/inspirations included images or video, these media complemented/augmented rather than replaced the text description of the concepts/inspirations. To reduce potential noise, stopwords (e.g., "the", "which") were removed from the text. 750 topics were statistically inferred from the entire collection of 6,913 documents. We then computed the cosine similarity between each inspiration and its challenge brief when projected into the topic space, subtracting this score from 1 so that a higher number would indicate greater conceptual distance. This measure correlated well with human judges' similarity ratings (5 judges, intra-class correlation coefficient = .735) for a subset of the data (199 document pairs, r = .485, equal to the highest agreement between the judges). Each concept's "distance" score was the mean distance of its cited inspirations from the challenge brief.

Results
We conducted a logistic regression analysis, using each concept's mean distance of inspirations as the predictor variable, and shortlist status as the binary outcome variable. The overall model was statistically significant, χ 2 (1) = 8.42, p < .01, with adequate fit, Hosmer and Lemeshow χ 2 (8) = 11.32, p = .18 (higher p-value is better fit).
The model estimated that a 1-point increase in a concept's mean distance score predicted a decrease in its probability of being shortlisted, β = -.31, Wald (1) = 8.96, p < .01. Descriptive statistical analysis of the data (see Fig. 5) suggested that this negative effect of increased distance was most prominent in the change from low (mostly near sources) to mid-low (slightly more near than far sources) mean distance. Replications of the analyses with different topic model

Discussion
These surprising results suggest that preferring mostly far sources of inspiration is not most helpful, as claimed in the literature. Rather, citing more near sources than far sources of inspiration seems to be most helpful.
Perhaps this relative mix frees up cognitive resources to develop creative concepts through deeper withincategory exploration [9], and/or iteration and refinement [10]. Our results might also differ from prior work due to the expert panel's emphasis on both quality and novelty of ideas, not novelty per se.
If both novelty and quality are desired, it may be useful to design ways for contributors in crowd-collaborative innovation platforms to interact with prior ideas that are relatively conceptually close to the current problem. Machine learning methods, such as our conceptual distance measure, could be useful foundations for such interfaces. Further research might explore whether these interfaces encourage deeper, more iterative solution exploration, and also potential interactions between distance and novelty/quality tradeoffs.