Site icon Snowdrop Solution

Why Knowledge Bases Are The Next Big Thing

Why Knowledge Bases Are The Next Big Thing

Why Knowledge Bases Are The Next Big Thing

There is an intriguing fallacy (what I call the disconnected data fallacy) that seems pervasive in enterprise circles. It goes something like this : Most organizations are filled to bursting with databases, most specifically dedicated to supporting one particular application or another. There’s also data in documents, spreadsheets, and other data stores, and that it’s just a matter of laziness that most of this data is not available to other parts of an organization. Buy the right tool (or hire enough open source developers to build the right tool) and you achieve digital nirvana.

There are many problems with this belief.

Given these issues, you’d think that most companies would recognize that this approach doesn’t work, but despite that, billions (if not trillions) of dollars are spent every year in doing the same damn thing, over and over again.

Enterprise data is not application data. It fulfills a different need, has a much greater requirement for metadata, and should be handled in a different manner.”

So what’s the best solution? Just give up and assume that digital transformations are not possible? Not really. There are ways that you can transform an organization to work around a consolidated enterprise data model, but it requires recognition of several key caveats:

These principles differ fairly dramatically from the requirements that are typically placed upon dedicated databases. Put another way, enterprise data is not application data. It fulfills a different need, has a much greater requirement for metadata, and should be handled in a different manner..

books

WHAT IS A KNOWLEDGE BASE?

A knowledge base can be thought of as a data encyclopedia that’s specific to an organization, subject domain or location. For a retail outlet, a knowledge base might contain the catalog for that outlet, but it might also include the sales staff, known customers, store information, and even marketing campaigns. A sports franchise may have a knowledge base focused on players, teams, coaches, games, seasons and so forth. An art museum would have exhibits, locales, works of art, artists, collectors, etc.

In all of these cases there are underlying categories of things and relationships between these things. Each thing in the knowledge base has a globally unique identifier, an array of attributes and typically external relationships that point from one type of thing to the next. In the baseball example, such a relationship may be between player and a team, though temporal knowledge bases will more likely indicated that there is a contract that binds a particular player to a position with a team for a certain period of time.

Indeed, one of the most powerful aspects of such knowledge bases is that both types of relationships may very well exist in the same database, where a rule can add a property that says a player is on a given team at a specific time if the contract for that player bounds the time in question. It can similarly remove this relationship if this is not true. In a relational database, this can only be accomplished by setting up a specific table with a property that either indicates that player A is on team B or is null; in a semantic knowledge graph, the property is simply not present if the relationship is false.

This seemingly simple characteristic dramatically changes how information can be stored, searched and transformed. A property can have more than one value without requiring the creation of an entirely new table. Properties can be annotated, both to provide more comprehensive definitions and in some cases to perform additional logic if the property contains certain values. Information can also be segregated in different collections (confusingly also called graphs) then merged or deleted once the data has been processed.

Finally, it is possible with specific graph query languages to move across relationships between objects without necessarily knowing what those relationships are, even to the extent of identifying connections between two objects across an indeterminate number of hops. This kind of analysis is very useful for discovering previously unknown connections, and is very difficult to do using relational databases.

Without digging too much farther into the weeds, knowledge graphs in general are more flexible than relational databases, are able to store, manipulate and delete metadata about data far more efficiently, and is able to work with data both in tabular form and as rich “documents”.

A knowledge base is then built on top of knowledge graphs – you can think of it as an application to get at information without needing to know much if anything about the structure of that information.

DYNAMIC INTERFACES WITH KNOWLEDGE GRAPHS

In any organization, there are two competing demands on data systems that can loosely be described as centralization vs. distribution. Centralization involves keeping a tight rein on data structures that are used to describe organizational entities such as products, people, events, locations, organizations and so forth. There is also typically a wealth of information about a resource that doesn’t necessarily fit easily into the numeric view of data that tends to pervade organizations – descriptive content, provenance information, relationships with categories or other resources and similar types of content.

There are two key approaches that can be taken when designing relational data systems. The first approach is to do baseline data modeling, in effect hard coding the relationships that exist directly into the database. This describes the approach taken by about 90% of all developers. It tends to make for relatively fast data systems, but once designed, changing this model becomes much more complicated, especially when a large amount of data has been introduced into the system. This is roughly analogous to creating a hardware chip that encodes business logic. Change is expensive in such systems, and it becomes harder the longer that the data structures remain undocumented (something that occurs most of the time).

Nearly twenty years ago, Drupal came out, and with it a fairly radical idea was introduced. Every document within Drupal could be treated as a distinct node in a graph structure. You could decorate that node by giving it a type, assigning it properties, and creating presentation views, but the fact that the node existed and the node identifier was uniquely specifiable meant that a Drupal designer could turn nodes into anything. One problem with this, however, was that at the end of the day the database underneath it was still a relational database, and the overhead of building classes by indirection typically began eating up a significant percentage of the overall computing cycles.

Graph data stores are built precisely with this scenario in mind. The data model in this case is “soft” – it is effectively constructed when it is queried, and this information can then be sent to the client application to tell the client what to do with the information. What this implies is actually quite powerful. The overwhelming majority of all software applications require an army of programmers to build user interfaces, and should the data model change, this change also necessitates UI changes, significant ones.

In a knowledge base, however, the model describes the interface. When you change the model, the interfaces should automatically change as a consequence. As there may be potentially hundreds or even thousands of classes involved in an enterprise, this becomes a big factor, given the typical “screen” for editing a class may take upwards of one to three thousand dollars to change to create. This is in fact one of the big reasons that software becomes obsolete: changing the model may be fairly trivial, but the knock-on effects of changing the viewers and editors of that model add up quickly and painfully.

This is similarly true of things like pick-lists, drop downs and multi-item selectors. The model (in a knowledge graph) indicates what class a given property is expecting to draw values from for a drop-down. It can also indicate constraints that make cascading possible – selecting the make, model and trim of a car, for instance, where the selection of one limits the values that appear in the next. This process, called faceting, is a pain to write using traditional UI generation tools, but trivial to build with knowledge bases.

Semantic systems can more accurately weight text searches, for instance, identifying that a label, an abstract description and a full description each can be used for searches, but that you are more likely to get relevant searches from labels than the abstraction or even the full content. Since properties values can also be text-search indexed, this means that a semantic knowledge base will retrieve content that is more likely what is being searched for, and will do so in roughly the same amount of time that more traditional search engines would.

Finally, while there are some variations out there with regards to graphs, the graph database industry has mostly standardized on a very well defined stack of technologies. This means that knowledge bases can in general work with multiple triple stores and related stacks with minimal rewriting.

THE BUSINESS BENEFITS OF KNOWLEDGE BASES

Given all that, what benefits do knowledge bases have for businesses? There are several, as it turns out:

Knowledge bases are also ideal tools for integrating with Data Science pipelines: the flexibility of data production and the ability to map between ontologies (data languages) dynamically means that many of the big headaches involved in data analytics – de-duplication, cleansing, validation, dimensionality, ensuring consistent meaning in properties and resources, and so forth, the 90% of work that most data scientists have to do just to get data into a form that’s useful for analysis, can be done automatically.

MarketWatch has estimated that the semantic knowledge base industry will be worth $33 billion by 2023, with year over year growth of 10% through the rest of the decade.

USES FOR KNOWLEDGE BASES

The previous section focused on benefits, but some real world examples can really help to elucidate where such knowledge bases may prove useful. The following are examples from real world applications, though the names of companies and specific details have been obscured.

Despite all this, most companies are just at the very edge of what they could be doing with the technology. One thing that seems to be very compelling with this technology, however, is that even when projects didn’t necessarily meet broad lofty goals, they still provided demonstrable value, something that’s not necessarily been true in other data management sectors.

SUMMARY

Knowledge bases are not panaceas. They are generally good for providing a foundation for managing enterprise level data, because enterprise data has a higher expectation of curation and quality than most application-oriented data projects. They won’t replace other databases in your organization (okay, they might replace quite a few, but not all) but they should end up acting much like the tubas, cellos, tympani and bassoons of an orchestra – they set up the deep knowledge that companies need, then let their developers build off that deep knowledge to reduce the overall complexity of applications.

originally posted on forbes.com by Kurt Cagle

Exit mobile version