Analytics from Development to Production

Analytics DeploymentSince the emergence of big data analytics, some experts have propounded on the need to keep data scientists separate from traditional business intelligence (BI) departments, on the grounds that traditional BI approaches and mindsets may compromise data scientists’ creativity and out-of-the-box thinking. I disagree. Rather like the farmer and the cowman (younger readers must check out the soundtrack of “Oklahoma!”), I believe that the BI and analytics teams can be friends. In fact, they must be. The BI farmers and the analytic cowmen can learn much from one another and, indeed, their collaboration is vital to the business in a range of areas, from data governance to production-level analytics.

I’ll focus on the latter topic in this post. The development to production conundrum only arose for BI with the emergence of operational BI. The term business intelligence was first applied by Howard Dresner in the early part of the 1990s to the process of decision making support. Such decision making was typically tactical—problem solving and reporting—in nature. The work was often exploratory and ad hoc at first and seldom demanding urgent (less than a few days) answers. In BI, problem solving and exploration can be seen as development; reporting as production. In most organizations, development and production in a particular area fell to the same person. Other than the obvious problems caused if that person fell under a tram, the development to production process was straightforward.

By the mid- to late ’90s, operational BI was changing this. Decision timeframes shrank to what was called near real-time—less than a few minutes—and the developers were finally separated from production. Frontline business people became the users of systems developed by the BI team, so that team had to figure out how to operationalize the process of getting from exploration and development to production. It took some time. From education to service level agreements (SLAs), and from help desks to maintenance schedules, the BI teams had to relearn and repurpose the dev-to-production process first devised by their colleagues in the operational systems teams. In most cases today, the process is now well-understood and well-managed. BI teams have evolved from pre-agricultural gatherers to farmers in a modern industry.

Enter data scientists in the past ten years, who are retreading the path of development and deployment of analytical applications. Originally, many of these people came from research and exploratory backgrounds. Like their BI colleagues of the ’90s, they were used to developing and running their algorithms themselves. Often, they were given their own standalone environment—Hadoop—where they had free rein, rather like independent cowmen, ranging wide and free, herding data throughout the rolling business plains. And like their BI compatriots, they also came to face the challenge of moving to production, and perhaps even earlier in their evolution. In a data-driven world, most runtime analytics must operate in sub-second timeframes with large, high-speed data sources as part of ongoing operational processes like web commerce purchasing, supply chain optimization, and production monitoring.

Deployment of predictive models to multiple and varied production environments is a key component of many implementations. A recent article by Jonathan Morra of ZEFR (a video content management and solutions company) describes this important aspect, discussing how to implement your ML algorithms in such a way that they can be tested, improved, and updated without causing problems downstream or requiring changes upstream in the data pipeline.” An open source project, Aloha, has been developed to support the creation of generic models, enabling differing semantic implementations in the production environment. In similar fashion, Statistica Enterprise offers deployment to PMML, Java, C#, SQL, and SAS, where data and analysis configurations (and others such as models, rules, etc.) are abstracted as separate objects which are version controlled, approved, audited, etc. through the platform. This approach enables validation for best practice applications in highly regulated industries such as pharma and medical device manufacturing and others where the ethical (provably non-discriminatory) use of data is demanded.

Increasingly, the challenge in analytics is less in model development, but rather more in deployment and maintenance in production. While analytics poses a broader set of production challenges than traditional, operational BI, cowmen data scientists can still benefit immensely for learning the ropes from their farmer friends in BI.