Concerning the Design Catalog …

Efficiency in programming languages is achieved by avoiding bad practices like excessive looping, too many I/O or network operations and data movement. Modern processors have moved the bar of unacceptable performance concerning CPU-consuming inefficiencies but of course you can still loop yourself into a hole. Please don’t take that as a challenge.

With languages that have the power to execute very extensive (and therefore expensive) operations to achieve their ends, the same bottlenecks exist. I/O (even with SSD) and network access can still predominate cost and time, and those two resources are related but not the same. Query and other high-level languages like Pig Latin, Hive, SQL, and MongoDB® query language all work well by optimizing (minimizing and cutting the cost of) underlying data and network access required to satisfy the requests they process.

Classic query optimization

Therefore it becomes very important to plan how to do the work. The first order of business is to make sure the request makes sense, that the participating objects exist and are configured to perform what they have been asked to do. To use an example most people are somewhat familiar with, consider the SQL statement:

SELECT order_number, order_origin FROM orders WHERE part_count > 250 AND back_order = 1

the orders table must be checked to see if it exists and the columns order_number, order_origin, part_count and back_order exist.

You may know that relational databases all have system catalogs. These are sets of tables that can be queried like any other. The actual design of those tables varies by database, but most have something like a TABLES table, where every table ever CREATEd has a row or rows. Now, it is cost prohibitive for those engines to perform queries to compile queries, so they use a memory-resident, highly optimized copy of the TABLEs table (and all other catalog tables) to do that work.

Concerning Domino®, the internal knowledge about design elements resides in intricate and related field values on design documents. Since the same problem of runtime access to even validate queries exists, we have to create different, optimized instances of the design data. We call it the “Design Catalog”.

The second order of business in planning a query/request is to find any helpful optimizing strategies to solve the problem at hand. These are combinations of data structures like indexes and fast-path execution means like pre-seeded query terms or classic approaches like nested loop or sort-merge joins.

But something that seems to be simple yet is remarkable complex like how to order the work is the first decision to be made. In general, equality terms are cheaper than range ones. Index-satisfiable terms are cheaper than those requiring direct data access. And for sharded and distributed databases, getting results for single terms on single nodes is the first order of work for map-reduce processing.

In relational engine system catalogs there is virtually always an INDEX table to be consulted for this part of the problem. And to finish the calculations to perform optimization, system catalogs contain COLUMN tables with gathered and sampled  numbers of values (aka cardinality) and other statistical data.

What about Domino’s indexes and DQL optimization?

Across its history, Domino’s indexes have been foundational to its market value. The Notes Indexing Facility (NIF) is a many-splendored thing with its trees of trees and optimized ordinal retrieval capabilities (“get me document 129093 in order by a given key” requires index walking in most engines). Domino’s indexes also house persistently-indexed computed values. Though there may be other engines with something akin to this power there are certainly none more robust. And available today.

So the Design Catalog needed to have quickly-available descriptions of available indexes in a database, meaning that design data needed to be extracted from its normal residence and itself indexed for quick lookup and use in optimization. However, this is complicated business.

For one thing, Domino’s industry-best security model allows for privileges to be applied to design elements. Not all views (or their indexes) are available to all users. For V10.0.0 of Domino we have had to punt on that, and remove all views or folder with readers fields on their design documents from consideration in DQL.

Secondly, since views have implicit document restrictions. So given the Pending view’s selection criteria:

select form = ”order” & order_state = “pending”

any use of those indexes would apply those selection criteria on top of the criteria in the “free form” query term (vs specifying the view to be used like below). So

order_origin = ‘Los Angeles’

using the Pending view would actually mean the following threesome terms:

form = ‘order’ and order_state = ‘pending’ and order_origin = ‘Los Angeles’

and that is not what the user intended. So we need to NOT use views with anything except “Select @all” selection criteria in that general case, and if application developers want to use the Pending view, we opened the syntax

‘Pending’.order_origin = ‘Los Angeles’

which is much more optimal than the fully spelled out threesome since the index persists.

Further considerations

Given the multiply-occurring value data model in Domino, we also restricted free form query terms to only use indexes that exploded those multiply-occurring values into individual index entries. And we had to restrict to using non-Categorized indexes as well.


So in comparison with the relational model above, what of the query-ability of the Design Catalog? Well, we have put the system catalog data into a non-replicated database called And by doing so, we have removed the database context of the design elements and that is a liability. So at this writing I cannot guarantee the forward existence of; it is at this point stopgap. That means any querying of its contents is very risky if attempted. No doubt people will do it anyway and that’s fine.

For now, the Design Catalog gets the job done.  Further instructions on its use will appear of its formal documentation.


DQL roots

A few years ago, I and 3 of my colleagues were drafted for a skunkworks effort, a throwaway project. Prove concept, save relics and go back to your regular job. We were interested in taking a quite functional REST API that was serviced by much more expensive technology and have it instead use native Domino services. We worked for a few months, over the Christmas holiday season, to show a cheaper way to give the API what it needed to function.

Part of that work involved data transformation. JSON is the format of all REST payloads so it was something we needed to supply and consume. Fortunately, for the most part we had some built-in libraries for that problem. But another part was query solving. And pulling together Domino services to satisfy the different query terms, it worked! We delivered a demonstrable, cheap prototype that inspired later work.

I have a long history in query processing going back my 1986 work on the mainframe database, Model 204®, now owned by Rocket Software. Its language, unceremoniously called “User Language” thrives by using 2 kinds of indexes and direct-data access in a way that was at least 40 years a precursor of Lucene and Hadoop sharding and map-reduce engines. Its Boolean processing is both stingy in avoiding I/O via partitioning and optimal in actual low-level operations using the machine-level instruction set to AND, OR and NOT bitmaps.

Image result for and or not

Later, the same technology was ported to the C language and Unix/Windows and I was part of an effort to support the full SQL 1992 language. It ported well, and specialized in the same area – high speed complex Boolean processing.

I also have a long history with SQL. I appreciate its strengths and its standardized publication of the very well defined relational algebra. But, working on Notes/Domino and diving deeply into the unique and valuable properties and capabilities of semi-structured and unstructured data, I have observed that the mapping to the SQL standard has always been a forced one and the success of each attempt varied at best. Enter NoSQL and its pundits. Indeed, enter the internet, where relational data plays a subsidiary role in the extensive unstructured data corpus.

Image result for internet data corpus

Earlier this year (2018) we began working in earnest on providing NoSQL capabilities using Node.js to access Domino. We surveyed the landscape and found it populated by engines that had invested heavily in JSON as their native data format. Now, one of the most beautiful attributes of Domino has always been its malleability to support any number of front (and truth be told, back) ends. Node.js and JSON are no exception, though there is work to do. And they comprise what can only be described as a new standard.

The challenge for us in developing this new front end is to map and make valuable the data, processing and everything else possible in Domino in the new (well, new to Domino) format. Though I pledge to write a LOT about the work in such a way that seeks both input and advertises the incumbent power of the underlying engine, one early deliverable was quickly identified as a query capability.

Domino has had the underlying structures to support a general query facility for a long time. It is NOT a relational engine, which is a very good thing for a NoSQL database. And its deep underpinnings in unstructured, relationally denormalized data are formative in this work.

Now, much of the Domino engine was built in support of the Notes® client and its browser-based ancestors. That is not a liability; there is very rich and useful functionality at our collective disposal. But in Node.js and a query facility, the usage of the indexes and document data has a different footprint. For instance, a call to render 100 index entries at a time while scrolling an inbox or view is a small increment of that needed to find the results for 5000 entries across the same view. And we need to take care not to overwhelm one kind of processing with the other.

But using the indexes of the Notes Indexing Facility (NIF – the part of Domino that comprises views) was an obvious approach in the aforementioned skunkworks and it has born fruit in the current effort. Given the semantics of a database-level query, and the Domino data model, certain restrictions in view and view column design have been needed to have a working engine.

Set-based terms connected with Boolean primitives are the building blocks of any query engine. And in that skunkworks we also identified the Domino IDTable functions as the avenue of choice for Boolean processing. Their speed is tremendous. The one restriction they bring is that NoteIDs are not portable to other replicas, but that affected no early user story or requirement is worth living with for the performance benefit. IDTables are the currency of the query engine and as such, all data manipulation will be done via efficient post processing, at least for now.

We also needed to define the language. Early on we identified the existing engines in the document-based NoSQL world. They were MongoDB®, and CouchDB®, both well established and adopted in the field They each had JSON query interfaces that have users building Boolean trees. So that was the first interface we built, DQL 1.0 if you like. But when we looked at it, and read developer reviews of those interfaces, we concluded it was not way to go. That decision forged DQL in its current, shipping form.  We didn’t focus on the language so much as the engine.  So we called it DGQF (Domino General Query Facility) because Domino is a collection of facilities working together.  But the language acronym, DQL, won the day to the praises of many (If it’s ok, we’ll still call it DQF internally).

There isn’t space here to go into all the variants and power of the language already. The formal documentation is undergoing its final editions and I will provide pointers once it’s available. The approach we took is sound and will yield newfound power in the hands of application developers even into a new generation. We did our best – and will continue in that – to bring existing capabilities into innovative use and expose components such as IDTables that exist in views and folders, into the syntax. We think it hangs together pretty well.

So .. enjoy. And here’s to Domino V11. You ain’t seen nothin’ yet!

The Iris bloodline

He came by helicopter. No one was sure where it landed, but they heard it fly in. He brought several of his direct reports and a company-wide meeting was held to announce something. IBM’s CEO then announced that he had just spent $3.5 billion to buy the 70 (!) people in the room. Well, not them, their company. Along with them came thousands of others working for the parent company Lotus, whose funds had been used to sustain the development of the product those 70 were so proud of.

Lou Gerstner told those present that he was amazed that such a small group of people could have built R3 of Lotus Notes and he pledged that he and all IBM management would stay far away from running their operation but that they would see a huge influx of capital to expand it. This meeting happened at 1 Technology Park Drive in Westford, Massachusetts, home of Iris Associates, a subsidiary of Lotus Development which was now a subsidiary of IBM.

Image result for "iris associates" notes domino logo

Lou kept his pledge and yours truly was hired as part of that expansion, in 1998 to aid in the final phases of R5. After it shipped, and according to some unpublished agreement and schedule, Iris ceased to exist as a company in 2001 and became part of IBM. The investment in Notes/Domino continued for several years, as IBM made their money back several fold.

I don’t want to bite too deeply the hand that fed me, but “fed” is past tense so some of that will come out in this post. I want to make plain what happened, not to practice resentment or articulate any schadenfreude; there is none. But I need to be a bit of an historian so I can really celebrate all that’s happened. And celebrate is the operative word in this blog entry.

We learned why Lou Gerstner was so impressed at the accomplishment of such a small group. In the years to follow the initial purchase of Lotus/Iris, several of the projects that we saw happening around us (and with us), were so large they could not succeed. Agile was adopted to cut the waste.  But there’s nothing new here. Such seminal accounts as the Mythical Man-Month chronicle best what software project life can be like at Big Blue.  The software that survives and even thrives is generally that which is needed to move iron and keep it operational and modern.

We saw the formation of product line city states within brand-based “nations”, first jockeying for market- and mind-share then for survival as cuts ensued. People were doing good work but it wasn’t seeing the light of day.  And, sadly, cutting is arguable one of IBM’s greatest skills.  Layoffs (sorry, “resource actions”) have project names and are carefully planned and executed.

Image result for ibm cuts

I and my surviving colleagues are grateful for being employed these years – I mean that – for permanent employment is promised nobody. And to be completely fair, the way that Notes/Domino hit the market is an unusual phenomenon.  Engineer/market visionaries may boast of their acumen after such good fortune, but the confluence of so many factors involves a degree of luck and timing out of the control of the inventor.  Many are the start-ups with seemingly workable products that for one reason or another fell short of their sales targets.  Not so Notes/Domino.

I need to say strongly that IBM is a great place to work.  In many ways.  There are great people there that I love and with whom I have loved to work.

But of course every developer wants his/her efforts to meet market success.  And the personal fulfillment that comes from that was extremely rare – in all honesty far too rare – for a number of reasons I will discuss some day under different cover.  I do presently hold out hope for Watson and the current efforts into Blockchain, I know some very good people working on those technologies.

So in stark contrast to how things began – and I do not only speak for myself – the environment became a progressively depressing, downward spiral.  Yet, many of the original Iris people nonetheless stayed around, still working on the software they knew and loved. They shifted their work to the cloud offering, SmartCloud Notes®.  Others moved on to other positions in the company.

Suddenly, in September, 2017, a pens-down, work stoppage was declared. There was complete silence by management and those affected counted the possible scenarios that could produce this first-only move of its kind. Most were very bad, but there was one good one – our business was being sold. And that one good scenario was the one that carried the day. HCL Technologies, an Indian high tech services firm, was purchasing several under-valued products from IBM with hopes to shore up the customer base and integrate them into their offerings. The Products and Platforms division (pnp for short – it’s in my e-mail address) was a reasonable rebirth of what had been IBM Software Group (SWG) so many years before under Lou Gerstner and Steve Mills.

The reaction of the engineers varied. Personally I was ecstatic, even giddy. As a friend and former Iris engineer (still at IBM) said “You guys just had your white horse come to your rescue”.

The group of people developing Notes/Domino at HCL consists of MANY of the original Iris engineers, some very talented newbies and a very motivated management team that has helped this whole venture work. And work it does and work it will.

And Lou Gerstner’s comment about the 70 people? They’re baaaack.

The software has grown greatly, which greatly spreads the efforts of those remaining, but the same spirit that started it all is alive and well. This feels like a startup even though its initial aims are to stop customer (oops, sorry Ginnie, “client”) erosion.

Watch this clip from Disney’s Hook to get full effect. It’s like that, complete with Rufio.

Testing .. just do it

True, though paraphrased discussion (circa 2002):

Me: We should have a full test harness and throw low level errors so we can be sure the code responds properly.

Boss’s boss: We don’t have time for that.

Me: Well, what do we have time for, fixing problems under duress?

Boss’s boss (fumes and sputters and decides to lie) Yes, believe it or not, that is cheaper!

Yeah I was being a little bit of a wise guy, but since both our work lives were being half-consumed with critical issues, it was germane and honest.  Those issues needed multiple hops either internally or at customer sites to even gather data then surmise solutions, sometimes trying them in production because we lacked the means of reproducing data and memory states that were the root cause.

Causing defects to occur on your terms should be a no-brainer exercise for development organizations.  But it’s not.  No matter how much is written and preached about test driven development in the Agile framework.  My quip about organizations having the time to fix problems rather than prevent them is not a critical, wise-guy statement.  It describes absolute standard business tactics.

I conducted a meeting with engineers for a period of 5 years.  It was applying a practice called Orthogonal Defect Classification (ODC) to resolutions to past defects and customer problems.  Most engineers found it interesting to go over the work they had done so we could gather data about how to improve.  The process is one of many attempts at formal analysis of system defects.  And as analysis it produced graphs but little change in the way things were done.  The number one sticky fact I took away was that 41% of our bugs were a failure to check the results between system components – often cross-team calls the can change any time and without warning.

So then, part of the problem is that of metrics; gathering them, believing them and acting upon then.  My ODC meetings were only one instance.  I have a friend who led a project team make a body of problematic legacy code more reliable.  It was a funded effort with design and priorities.  His team ended up closing (true story) thousands of defect reports and eliminated issues that were provably existing for 15 years.  When they reported their progress, those numbers were celebrated but phases 2 and 3 of the project were never approved.  That’s because the cost in the field could never be measured.  Since problems were eliminated, that meant counting something that no longer existed.  And since the problems they quashed were a small subset of the whole body of issues encountered, it was an uphill battle to convince management of the value of their work.

I have no other word for another cultural problem – it’s filthy.  There exist caste systems in technical organizations (no, not all technical organizations) where those who test are considered truly less important than those who develop.  Traditionally they may have had lower salaries, had fewer and lower technical skills, etc., so the work they did was likewise considered ancillary to that done by developers.  This is perhaps the aspect of corporate life in technical organizations that is most violently at odds with test driven development and Agile process itself.  Testing is inferior work?  PLEASE!!  May that attitude die a dishonorable death.

Almost all the literature about TDD and Agile are tacitly aimed at new(ish) products and new(ish) teams.  It’s rare to find someone who understands the problem facing legacy software in this approach.  A refreshing exception is Michael Feathers’ Working Effectively with Legacy Code.  He doesn’t work on the code I work on but he gets it regarding the problems of old and new.

Image result for test driven development

There are a number of quips used by opponents to resist applying new and rigorous testing to old code.  I have found I needed counter-quips to combat the dismissive over simplification of the business case.  One quip I’ve heard too much is “Well, you can’t boil the ocean” – which implies that identical rigor needs to be applied to multiple millions of lines of code or else it’s not worth starting.  My counter-quip is “Yeah, but a bay or inlet boils nicely and you know the very bays of which I speak.”

It’s true.  There is no mystery in any system where the problems lie.  And there truly is no mystery about the value of increased testing rigor in those inlets and “bays”.  So all that keeps it from happening is corporate bad habits and errant calculations about business expenses.

And certainly anything I am doing personally will include testing commensurate with the complexity of the technology in anything I’ve developed.  That’s a promise.