“I can use DQL!”

So, DQL.  More about DQL.

I was writing some more automated tests yesterday and part of the automation is to check for identical results from the baseline.  It’s not that the baseline is set in concrete as sometimes defects are found that change it.  But .. it is the baseline.

Of course we test with multiple databases and they each have a flat file full of queries to execute, which I do using multiple threads in Java using the new DominoQuery class.  It works great but the design of the results database where the baseline results reside was getting unwieldy.  In addition to multiple databases I want to support versions, releases, etc.

That kind of complexity has usually caused views and folders to be generated to properly segregate, order and support the various segments of data as they are browsed, updated and reported on in the business flow.  But .. light dawned on marble head (a New England joke, where Marblehead is a coastline community of great beauty, particularly at dawn while one’s cranium is the “marble head” which before this revelation lacked some perception, to be kind) – I said to myself “I can use DQL!” with suitable triumph reminiscent of the days of olde.

So I did

Testno = ?testno and domino_version = ?domversion and

replid = ?replid ‘  and testbed = ?testbed

(well, ok, to paraphrase a query) .. but it worked GREAT!  It took 45 milliseconds doing NSF scanning alone (1 NSF scan for all terms if you please), so it fit the automation time window perfectly.

Oh yeah, those ?values are substitution values that can are set prior to running the query, so the query can remain intact and only the values are changed between executions.

I can’t calculate the savings in view creation, (re)building, etc. but the change in design thought itself is staggering.

Yeah, advertising and bragging a little bit but also just communicating our intent to make people more productive and happier.

At Iris there was a saying “We eat our own dog food” which means we use our own software before anyone else sees it.  “Dog food” offends management and marketing folks, of course but notably I’m not in marketing.  And the developers I know are well aware of why it’s an accurate term to start with.

Just to let everyone know – because it matters – we still eat our own dog food.

So that by the time it goes commercial, it has been forged into a gourmet meal.  Or something like that.

gourmetdogfood

10.0.1 is tasting really good.

You can use DQL!

Advertisements

Bowling together

We celebrated the shipment of Domino V10 at Brunswick Zone Lowell Lanes, about a 15 minute drive from our office in Chelmsford. Remembering the lavish R5 ship party, where everyone and his/her family was flown to the Bahamas for several days, personally, I actually put this on par with or even above that. Not because I’m a good bowler, though I am not bad – okay, I won the first string with 3 consecutive strikes but ended up hurting my knees when I forgot all about the approach (it’s been a while). No, I liked it because we were together.

One of my subjects I read – until it gets too dry – is sociology. I almost minored in anthropology in college and let’s just say I just find people – humankind – fascinating. Some years ago I read Bowling Alone published 2000 and written by Robert Putnam, a statistics-based sociologist who lives only 35 miles away from me. In it, he proves beyond debate the loss in the US of something they call “social capital” and how it’s returning. That loss is devastating to both society and individuals, but its return is equally therapeutic. One fact I always remember vividly is that joining a group – I mean a bridge club or knitting group – statistically has the same positive effect on health as ceasing smoking. The title comes from the fact that more people are bowling than ever but bowling leagues have sharply declined.

Well, Thursday afternoon we were bowling together. And it felt good. As one spends more time with a group of people one of course becomes acquainted with more and more of them. That is perhaps why I found the bowling excursion so enjoyable, for among the large group of R5 Iris folks at Paradise Island, I may have known 10. Oh yeah, my wife sprained her ankle badly too which was a bummer.

Aside for the obvious, victorious, “in your face” achievement of shipping a new, full-point version of Domino in 11 months, which has never been done before (we’re getting better at this), I like to celebrate the “soft” human attributes that shined brightly during this time, the people things that no one takes note of. So here are some of those:

  1. There’s a unity of purpose and vision going on that’s infectious. Though it’s taken some time and it exists in different degrees, the larger team knows what it’s about and what it needs to do. I can’t go into detail about the gargantuan yeoman’s work it took to even get us operational outside IBM; suffice it to say that once people saw that, they realized how real and how alive both this group and this product are. And we have acted in kind.
  2. We are in closer contact with our “family” of business partners and champions than ever before. For, a family it is. The work of Richard Jefts, Barry Rosen and a host of others to engage the human beings that remain faithful to the fruit of our labors has born fruit not only in those relationships but in the focus of the inner team. It’s always been a difficult balancing act to synthesize and vet the conflicting demands of future direction, support issues and business opportunity.  Be assured that among the thousands of people who use our software, sometimes there are people insisting they need “obvious”, polar opposite functionality and pathways forward.  Yet that way forward is clearer than ever.
  3. We’re getting better at this. I won’t play the age card, but people can do the math and guess that demographic, given the length of service of individuals working with this technology. But, testament to its (and their) malleability, the age of the dog has had the opposite effect on the tricks s/he can do. No, we’re not dogs and these aren’t tricks, but the overlooked quality of wisdom (vs. knowledge which at times has been demonstrably distracting) has us renovating and optimizing practices and systems.
  4. “The report of my death was an exaggeration” – said Mark Twain. Domino is not going away and neither is this team. Bowling together, the nay-sayers and negative Nancys literally did not exist. Destructive agendas, even self-destructive ones, yield to creativity and the grit of execution. Nothing dead about it.

I joked to Richard Jefts that we should build a bowling alley in the Chelmsford space for use when our “family” comes to visit. We laughed but he said might just take me up on it. Then you can all bowl together with us.

Building résumés

During a wave of fatigue in the end game of shipping a piece of software, a quality engineer (that is one who writes automated tests) came to me complaining about the lack of movement on the defects she had found.  She had a point; they were weeks old and we were shipping soon.

She said, “I mean, what are we building here?”

I replied, without even thinking: “Résumés.”  She laughed one of those nervous laughs and went back to work.

For several reasons, including some out of our control, the result of that work came to nothing in the field, yet the engineers had new, ephemeral buzz words in their list of experience.  And they hadn’t even used the technology well; they displayed no particular talent at producing anything valuable with it.

It remains a tension in engineering work.  If someone doesn’t keep up with changing technology, one becomes typecast.  Or cast as a dinosaur.  Which is unfair because it’s inaccurate.  I don’t write this because I’m an older person – a mere 10 years in my field can produce the same effect if one doesn’t “keep up” with change.

But I wanted to temper and challenge the common wisdom because I don’t believe it is wisdom but mostly a self-anointing priesthood with very little to show for itself.  So here are some observations about living on, promoting and requiring other to live on the bleeding edge:

  1. Technology requires at least 5 years to prove it’s worth investing in. 80% goes away in that time, and only after 10 years is its value – its staying power in production – proven.  The examples are myriad.
  2. Yet marketing stir is so rabid that “new” is a word inspiring great investment. Investment in nothing, because either the cool was based upon only buzz or the technology is pressed into service for ways it was never designed.
  3. Technical people with buzz take full advantage of non-technical people with money and power.
  4. Knowledge being power, those who claim mastery of new tech are quick to dismiss individuals or groups pressing for responsible oversight in key aspects that make solutions valuable in the field.
  5. And that – value – is too late or too infrequently even considered. Talk of the cost of implementation, ROI, and future-proof-ness threaten the résumé building, so they are often ignored until after launch when truth comes out and people move on to other projects.
  6. In the frenzy of knowledge- and experience-acquisition, accountability is fleeting if it exists at all. Organizations don’t seem to learn or care about the insidious pattern, so it goes on.

That list sounds negative but such is the ugly picture I’ve seen.  And .. I am not saying that the use and integration of new technology inherently fails, of course it does not.  Nor am I saying that failed projects are worthless; there are too many nuggets among lessons learned to even think that.  And relevant, up-to-date experience matters, particularly if one goes job hunting.

But even then, I would sooner reward the ability to provide one’s customers with real business value over proficiency in a tool, language or part of the great “full stack” (which to me is a funny adjectival clause because there are some very important stacks “full stack developers” know nothing of).  When I recruit people I ask questions about design and applicability of this technology or that.

Because … I have found that acquired principles – that is, points of applied wisdom – are more important than acquired knowledge.  Because they port, scale and broadly apply.  They build real value, and résumés take care of themselves.

I don’t say any of this as old person who’s slowing down, only one who’s learned the folly of devotion to only keeping up.

 

 

The case of the fictional monkey experiment

I had heard a story of an experiment performed on a community of monkeys.  Don’t worry, it’s not true.  Well, not true as an experiment – it never happened.  But I still love the story.  It’s a variant of this one.  Of course the whole thing comprises a moralizing sermon of sorts, maybe even a mediocre one in the scheme of things.  But when such describes life it’s useful for me.

In my telling, 20 or so monkeys were put into a large area with a greased pole at its center.   At the top of the greased pole was a cluster of delicious bananas.  But naturally as any simian in the group would attempt to scale the pole, s/he would fall and get nowhere.  The bananas might as well not have existed.

Then 2 things were changed – 5 new monkeys were brought in replacing 5 who removed, and the pole was de-greased.  So a monkey could legitimately climb the pole to get the bananas.

But it was soon found that only the new monkeys would even attempt such a thing.  And when they did, they were pulled down by large number of other monkeys who knew that it was impossible.  So even then, the bananas went uneaten.  And even the new monkeys learned one could not ascend the pole.

It is of interest that it doesn’t matter if stories like this ever happened in a scientific experiment.  Because they happen in corporate life so universally.  And it’s tragic. Wasteful.

Innovation is stifled.  Businesses fail.  The new monkeys quit or assimilate into cynical dysfunction.

Let me not paint too grim a picture though.  Because thankfully and triumphantly poles do get de-greased and get climbed all the time as well.  Or some diligent monkey climbs them grease and all.

However, and probably the main point, the intentionality in climbing that pole is substantial; there are no old-timer monkeys to encourage or give hope.  In fact, they will call the bananas imaginary, mock you for trying and even discredit your achievement as rotten bananas even as you eat them and share them with your friends (well you have certainly heard of sour grapes).

There is no particular effort or project to which I wish to apply this parable except to assign it to all of them.  I have been both an old and new monkey.  And if I’d say “Don’t let this environment happen to you!” it would be pointless because it probably already has somewhere in some way.

But I will tell you every time – Climb the pole!  Those bananas are real and they taste wonderful.  And stop discouraging others; let them get greasy and learn the way they will.

Concerning the Design Catalog …

Efficiency in programming languages is achieved by avoiding bad practices like excessive looping, too many I/O or network operations and data movement. Modern processors have moved the bar of unacceptable performance concerning CPU-consuming inefficiencies but of course you can still loop yourself into a hole. Please don’t take that as a challenge.

With languages that have the power to execute very extensive (and therefore expensive) operations to achieve their ends, the same bottlenecks exist. I/O (even with SSD) and network access can still predominate cost and time, and those two resources are related but not the same. Query and other high-level languages like Pig Latin, Hive, SQL, and MongoDB® query language all work well by optimizing (minimizing and cutting the cost of) underlying data and network access required to satisfy the requests they process.

Classic query optimization

Therefore it becomes very important to plan how to do the work. The first order of business is to make sure the request makes sense, that the participating objects exist and are configured to perform what they have been asked to do. To use an example most people are somewhat familiar with, consider the SQL statement:

SELECT order_number, order_origin FROM orders WHERE part_count > 250 AND back_order = 1

the orders table must be checked to see if it exists and the columns order_number, order_origin, part_count and back_order exist.

You may know that relational databases all have system catalogs. These are sets of tables that can be queried like any other. The actual design of those tables varies by database, but most have something like a TABLES table, where every table ever CREATEd has a row or rows. Now, it is cost prohibitive for those engines to perform queries to compile queries, so they use a memory-resident, highly optimized copy of the TABLEs table (and all other catalog tables) to do that work.

Concerning Domino®, the internal knowledge about design elements resides in intricate and related field values on design documents. Since the same problem of runtime access to even validate queries exists, we have to create different, optimized instances of the design data. We call it the “Design Catalog”.

The second order of business in planning a query/request is to find any helpful optimizing strategies to solve the problem at hand. These are combinations of data structures like indexes and fast-path execution means like pre-seeded query terms or classic approaches like nested loop or sort-merge joins.

But something that seems to be simple yet is remarkable complex like how to order the work is the first decision to be made. In general, equality terms are cheaper than range ones. Index-satisfiable terms are cheaper than those requiring direct data access. And for sharded and distributed databases, getting results for single terms on single nodes is the first order of work for map-reduce processing.

In relational engine system catalogs there is virtually always an INDEX table to be consulted for this part of the problem. And to finish the calculations to perform optimization, system catalogs contain COLUMN tables with gathered and sampled  numbers of values (aka cardinality) and other statistical data.

What about Domino’s indexes and DQL optimization?

Across its history, Domino’s indexes have been foundational to its market value. The Notes Indexing Facility (NIF) is a many-splendored thing with its trees of trees and optimized ordinal retrieval capabilities (“get me document 129093 in order by a given key” requires index walking in most engines). Domino’s indexes also house persistently-indexed computed values. Though there may be other engines with something akin to this power there are certainly none more robust. And available today.

So the Design Catalog needed to have quickly-available descriptions of available indexes in a database, meaning that design data needed to be extracted from its normal residence and itself indexed for quick lookup and use in optimization. However, this is complicated business.

For one thing, Domino’s industry-best security model allows for privileges to be applied to design elements. Not all views (or their indexes) are available to all users. For V10.0.0 of Domino we have had to punt on that, and remove all views or folder with readers fields on their design documents from consideration in DQL.

Secondly, since views have implicit document restrictions. So given the Pending view’s selection criteria:

select form = ”order” & order_state = “pending”

any use of those indexes would apply those selection criteria on top of the criteria in the “free form” query term (vs specifying the view to be used like below). So

order_origin = ‘Los Angeles’

using the Pending view would actually mean the following threesome terms:

form = ‘order’ and order_state = ‘pending’ and order_origin = ‘Los Angeles’

and that is not what the user intended. So we need to NOT use views with anything except “Select @all” selection criteria in that general case, and if application developers want to use the Pending view, we opened the syntax

‘Pending’.order_origin = ‘Los Angeles’

which is much more optimal than the fully spelled out threesome since the index persists.

Further considerations

Given the multiply-occurring value data model in Domino, we also restricted free form query terms to only use indexes that exploded those multiply-occurring values into individual index entries. And we had to restrict to using non-Categorized indexes as well.

Query-ability

So in comparison with the relational model above, what of the query-ability of the Design Catalog? Well, we have put the system catalog data into a non-replicated database called GQFDsgn.cat. And by doing so, we have removed the database context of the design elements and that is a liability. So at this writing I cannot guarantee the forward existence of GQFDsgn.cat; it is at this point stopgap. That means any querying of its contents is very risky if attempted. No doubt people will do it anyway and that’s fine.

For now, the Design Catalog gets the job done.  Further instructions on its use will appear of its formal documentation.

DQL roots

A few years ago, I and 3 of my colleagues were drafted for a skunkworks effort, a throwaway project. Prove concept, save relics and go back to your regular job. We were interested in taking a quite functional REST API that was serviced by much more expensive technology and have it instead use native Domino services. We worked for a few months, over the Christmas holiday season, to show a cheaper way to give the API what it needed to function.

Part of that work involved data transformation. JSON is the format of all REST payloads so it was something we needed to supply and consume. Fortunately, for the most part we had some built-in libraries for that problem. But another part was query solving. And pulling together Domino services to satisfy the different query terms, it worked! We delivered a demonstrable, cheap prototype that inspired later work.

I have a long history in query processing going back my 1986 work on the mainframe database, Model 204®, now owned by Rocket Software. Its language, unceremoniously called “User Language” thrives by using 2 kinds of indexes and direct-data access in a way that was at least 40 years a precursor of Lucene and Hadoop sharding and map-reduce engines. Its Boolean processing is both stingy in avoiding I/O via partitioning and optimal in actual low-level operations using the machine-level instruction set to AND, OR and NOT bitmaps.

Image result for and or not

Later, the same technology was ported to the C language and Unix/Windows and I was part of an effort to support the full SQL 1992 language. It ported well, and specialized in the same area – high speed complex Boolean processing.

I also have a long history with SQL. I appreciate its strengths and its standardized publication of the very well defined relational algebra. But, working on Notes/Domino and diving deeply into the unique and valuable properties and capabilities of semi-structured and unstructured data, I have observed that the mapping to the SQL standard has always been a forced one and the success of each attempt varied at best. Enter NoSQL and its pundits. Indeed, enter the internet, where relational data plays a subsidiary role in the extensive unstructured data corpus.

Image result for internet data corpus

Earlier this year (2018) we began working in earnest on providing NoSQL capabilities using Node.js to access Domino. We surveyed the landscape and found it populated by engines that had invested heavily in JSON as their native data format. Now, one of the most beautiful attributes of Domino has always been its malleability to support any number of front (and truth be told, back) ends. Node.js and JSON are no exception, though there is work to do. And they comprise what can only be described as a new standard.

The challenge for us in developing this new front end is to map and make valuable the data, processing and everything else possible in Domino in the new (well, new to Domino) format. Though I pledge to write a LOT about the work in such a way that seeks both input and advertises the incumbent power of the underlying engine, one early deliverable was quickly identified as a query capability.

Domino has had the underlying structures to support a general query facility for a long time. It is NOT a relational engine, which is a very good thing for a NoSQL database. And its deep underpinnings in unstructured, relationally denormalized data are formative in this work.

Now, much of the Domino engine was built in support of the Notes® client and its browser-based ancestors. That is not a liability; there is very rich and useful functionality at our collective disposal. But in Node.js and a query facility, the usage of the indexes and document data has a different footprint. For instance, a call to render 100 index entries at a time while scrolling an inbox or view is a small increment of that needed to find the results for 5000 entries across the same view. And we need to take care not to overwhelm one kind of processing with the other.

But using the indexes of the Notes Indexing Facility (NIF – the part of Domino that comprises views) was an obvious approach in the aforementioned skunkworks and it has born fruit in the current effort. Given the semantics of a database-level query, and the Domino data model, certain restrictions in view and view column design have been needed to have a working engine.

Set-based terms connected with Boolean primitives are the building blocks of any query engine. And in that skunkworks we also identified the Domino IDTable functions as the avenue of choice for Boolean processing. Their speed is tremendous. The one restriction they bring is that NoteIDs are not portable to other replicas, but that affected no early user story or requirement is worth living with for the performance benefit. IDTables are the currency of the query engine and as such, all data manipulation will be done via efficient post processing, at least for now.

We also needed to define the language. Early on we identified the existing engines in the document-based NoSQL world. They were MongoDB®, and CouchDB®, both well established and adopted in the field They each had JSON query interfaces that have users building Boolean trees. So that was the first interface we built, DQL 1.0 if you like. But when we looked at it, and read developer reviews of those interfaces, we concluded it was not way to go. That decision forged DQL in its current, shipping form.  We didn’t focus on the language so much as the engine.  So we called it DGQF (Domino General Query Facility) because Domino is a collection of facilities working together.  But the language acronym, DQL, won the day to the praises of many (If it’s ok, we’ll still call it DQF internally).

There isn’t space here to go into all the variants and power of the language already. The formal documentation is undergoing its final editions and I will provide pointers once it’s available. The approach we took is sound and will yield newfound power in the hands of application developers even into a new generation. We did our best – and will continue in that – to bring existing capabilities into innovative use and expose components such as IDTables that exist in views and folders, into the syntax. We think it hangs together pretty well.

So .. enjoy. And here’s to Domino V11. You ain’t seen nothin’ yet!

The Iris bloodline

He came by helicopter. No one was sure where it landed, but they heard it fly in. He brought several of his direct reports and a company-wide meeting was held to announce something. IBM’s CEO then announced that he had just spent $3.5 billion to buy the 70 (!) people in the room. Well, not them, their company. Along with them came thousands of others working for the parent company Lotus, whose funds had been used to sustain the development of the product those 70 were so proud of.

Lou Gerstner told those present that he was amazed that such a small group of people could have built R3 of Lotus Notes and he pledged that he and all IBM management would stay far away from running their operation but that they would see a huge influx of capital to expand it. This meeting happened at 1 Technology Park Drive in Westford, Massachusetts, home of Iris Associates, a subsidiary of Lotus Development which was now a subsidiary of IBM.

Image result for "iris associates" notes domino logo

Lou kept his pledge and yours truly was hired as part of that expansion, in 1998 to aid in the final phases of R5. After it shipped, and according to some unpublished agreement and schedule, Iris ceased to exist as a company in 2001 and became part of IBM. The investment in Notes/Domino continued for several years, as IBM made their money back several fold.

I don’t want to bite too deeply the hand that fed me, but “fed” is past tense so some of that will come out in this post. I want to make plain what happened, not to practice resentment or articulate any schadenfreude; there is none. But I need to be a bit of an historian so I can really celebrate all that’s happened. And celebrate is the operative word in this blog entry.

We learned why Lou Gerstner was so impressed at the accomplishment of such a small group. In the years to follow the initial purchase of Lotus/Iris, several of the projects that we saw happening around us (and with us), were so large they could not succeed. Agile was adopted to cut the waste.  But there’s nothing new here. Such seminal accounts as the Mythical Man-Month chronicle best what software project life can be like at Big Blue.  The software that survives and even thrives is generally that which is needed to move iron and keep it operational and modern.

We saw the formation of product line city states within brand-based “nations”, first jockeying for market- and mind-share then for survival as cuts ensued. People were doing good work but it wasn’t seeing the light of day.  And, sadly, cutting is arguable one of IBM’s greatest skills.  Layoffs (sorry, “resource actions”) have project names and are carefully planned and executed.

Image result for ibm cuts

I and my surviving colleagues are grateful for being employed these years – I mean that – for permanent employment is promised nobody. And to be completely fair, the way that Notes/Domino hit the market is an unusual phenomenon.  Engineer/market visionaries may boast of their acumen after such good fortune, but the confluence of so many factors involves a degree of luck and timing out of the control of the inventor.  Many are the start-ups with seemingly workable products that for one reason or another fell short of their sales targets.  Not so Notes/Domino.

I need to say strongly that IBM is a great place to work.  In many ways.  There are great people there that I love and with whom I have loved to work.

But of course every developer wants his/her efforts to meet market success.  And the personal fulfillment that comes from that was extremely rare – in all honesty far too rare – for a number of reasons I will discuss some day under different cover.  I do presently hold out hope for Watson and the current efforts into Blockchain, I know some very good people working on those technologies.

So in stark contrast to how things began – and I do not only speak for myself – the environment became a progressively depressing, downward spiral.  Yet, many of the original Iris people nonetheless stayed around, still working on the software they knew and loved. They shifted their work to the cloud offering, SmartCloud Notes®.  Others moved on to other positions in the company.

Suddenly, in September, 2017, a pens-down, work stoppage was declared. There was complete silence by management and those affected counted the possible scenarios that could produce this first-only move of its kind. Most were very bad, but there was one good one – our business was being sold. And that one good scenario was the one that carried the day. HCL Technologies, an Indian high tech services firm, was purchasing several under-valued products from IBM with hopes to shore up the customer base and integrate them into their offerings. The Products and Platforms division (pnp for short – it’s in my e-mail address) was a reasonable rebirth of what had been IBM Software Group (SWG) so many years before under Lou Gerstner and Steve Mills.

The reaction of the engineers varied. Personally I was ecstatic, even giddy. As a friend and former Iris engineer (still at IBM) said “You guys just had your white horse come to your rescue”.

The group of people developing Notes/Domino at HCL consists of MANY of the original Iris engineers, some very talented newbies and a very motivated management team that has helped this whole venture work. And work it does and work it will.

And Lou Gerstner’s comment about the 70 people? They’re baaaack.

The software has grown greatly, which greatly spreads the efforts of those remaining, but the same spirit that started it all is alive and well. This feels like a startup even though its initial aims are to stop customer (oops, sorry Ginnie, “client”) erosion.

Watch this clip from Disney’s Hook to get full effect. It’s like that, complete with Rufio.

Testing .. just do it

True, though paraphrased discussion (circa 2002):

Me: We should have a full test harness and throw low level errors so we can be sure the code responds properly.

Boss’s boss: We don’t have time for that.

Me: Well, what do we have time for, fixing problems under duress?

Boss’s boss (fumes and sputters and decides to lie) Yes, believe it or not, that is cheaper!

Yeah I was being a little bit of a wise guy, but since both our work lives were being half-consumed with critical issues, it was germane and honest.  Those issues needed multiple hops either internally or at customer sites to even gather data then surmise solutions, sometimes trying them in production because we lacked the means of reproducing data and memory states that were the root cause.

Causing defects to occur on your terms should be a no-brainer exercise for development organizations.  But it’s not.  No matter how much is written and preached about test driven development in the Agile framework.  My quip about organizations having the time to fix problems rather than prevent them is not a critical, wise-guy statement.  It describes absolute standard business tactics.

I conducted a meeting with engineers for a period of 5 years.  It was applying a practice called Orthogonal Defect Classification (ODC) to resolutions to past defects and customer problems.  Most engineers found it interesting to go over the work they had done so we could gather data about how to improve.  The process is one of many attempts at formal analysis of system defects.  And as analysis it produced graphs but little change in the way things were done.  The number one sticky fact I took away was that 41% of our bugs were a failure to check the results between system components – often cross-team calls the can change any time and without warning.

So then, part of the problem is that of metrics; gathering them, believing them and acting upon then.  My ODC meetings were only one instance.  I have a friend who led a project team make a body of problematic legacy code more reliable.  It was a funded effort with design and priorities.  His team ended up closing (true story) thousands of defect reports and eliminated issues that were provably existing for 15 years.  When they reported their progress, those numbers were celebrated but phases 2 and 3 of the project were never approved.  That’s because the cost in the field could never be measured.  Since problems were eliminated, that meant counting something that no longer existed.  And since the problems they quashed were a small subset of the whole body of issues encountered, it was an uphill battle to convince management of the value of their work.

I have no other word for another cultural problem – it’s filthy.  There exist caste systems in technical organizations (no, not all technical organizations) where those who test are considered truly less important than those who develop.  Traditionally they may have had lower salaries, had fewer and lower technical skills, etc., so the work they did was likewise considered ancillary to that done by developers.  This is perhaps the aspect of corporate life in technical organizations that is most violently at odds with test driven development and Agile process itself.  Testing is inferior work?  PLEASE!!  May that attitude die a dishonorable death.

Almost all the literature about TDD and Agile are tacitly aimed at new(ish) products and new(ish) teams.  It’s rare to find someone who understands the problem facing legacy software in this approach.  A refreshing exception is Michael Feathers’ Working Effectively with Legacy Code.  He doesn’t work on the code I work on but he gets it regarding the problems of old and new.

Image result for test driven development

There are a number of quips used by opponents to resist applying new and rigorous testing to old code.  I have found I needed counter-quips to combat the dismissive over simplification of the business case.  One quip I’ve heard too much is “Well, you can’t boil the ocean” – which implies that identical rigor needs to be applied to multiple millions of lines of code or else it’s not worth starting.  My counter-quip is “Yeah, but a bay or inlet boils nicely and you know the very bays of which I speak.”

It’s true.  There is no mystery in any system where the problems lie.  And there truly is no mystery about the value of increased testing rigor in those inlets and “bays”.  So all that keeps it from happening is corporate bad habits and errant calculations about business expenses.

And certainly anything I am doing personally will include testing commensurate with the complexity of the technology in anything I’ve developed.  That’s a promise.

Bridge to hybrid

Bridges. They’re how we travel over expanses of water or deep declines in terrain. Now most bridges support 2-way traffic. And the cost of travel is paid in tolls, and the funds collected generally go to the destination state or principality.

In IT there are data bridges. They are usually constructed by service providers on the destination side of the expanse and as such, are usually interested in things flowing one way, into the promised land from the old, unwanted repository and its archaic processing. So you can guess the relative support given to the old repositories and associated application solutions (it ranges from compromised to abysmal). This of course is the account given by those in sales and technical support. It is rarely the view of the organization who is trying out new technology. So here is the bridge we’re talking about:

Since the advent of the Cloud when data and services have been increasingly offloaded (or is it uploaded) to service providers, a new term has appeared in market-speak: hybrid. It describes the state of an organization having data and services in both their on-premises infrastructure and in the cloud. I don’t have data to back it, but I would posit almost 100% of Cloud-using organizations are hybrid. And so will they remain. It is the savvy Cloud provider that settles for a piece of the processing pie and lets customers make their own cost-based decisions.

So with so many customers remaining in hybrid mode, the Cloud has introduced the need for a new class of data bridge, with different requirements than the vendor-supplied one-way transfer mechanisms of yore. I count this a good thing, because it affords those getting their feet wet to go at their speed and it surrenders to data processing realities – that the Cloud will never house all processing and data. Here are some goodies that could result from the new bridges:

  • An end to target-side bias. Competitive pressure will mount for suppliers to provide first class replication, back-up and application-level data movement services to support the ongoing requirements of consumers.
  • An attack on latency. This goes for both sides of the divide. Data quality degrades with age in most applications and bandwidth can be inadequate to improve the timing of the arrival of time-sensitive updates. Smarter/better compression and more innovative techniques are required.
  • Data model melding and transformation. Strong to weak typing, relational to NOSQL, structured to non-structured, there are mapping and transformation problems to be solved between what has been derogatorily called ‘legacy” stores to newer repositories and back. This process has frequently been relegated to one-of’s that solve the problem of particular disparate data sets. I would like usable (operative word alert) standards or even de facto standards to arise for the most common transformations.
  • New/better trigger and event capabilities. Communicating new data states and transitions and processing phenomena across the divide gives customers choosing hybrid solutions (which means all of them as I have said) a new and valuable advantage.

It might be accurate to classify full-fledged hybrid data bridges as middleware; I don’t really care how they are categorized. I do know that they would go far to the address the ongoing needs of organizations without forcing vendor-driven data or processing movement. Let them decide for themselves!!

But I do think bridges like this would inspire some confidence that providers are listening: