A Programmer's Dream

3 C's of Software Requirements

I have a tendency to want everyone around me to be able to write effective requirements documentation. I don’t necessarily know why I’m like this, but I use those skills constantly in an effort to more or less train people in how they think, write and talk.

Learning Ember in VS Code -- Routing: Optional Dynamic Segments

Or more accurately, making me cry about routing. 

I was working on something that I often use as a test development product when trying new tools and design paradigms.  Basically, it's a story publishing system similar in nature to FictionPress.com or FanFiction.net.  I like this, because it's a fairly complex scenario, but still within the realm of (what I see as) an understandable domain. 

I like this as a problem space because it has complex structures.  At its heart is a STORY, and this story would have attributes (such as title, author, category) and 1 to N CHAPTERS.  While each chapter would have attributes (chapter title, content, etc) would have 0 to N COMMENTS. 

This problem domain gives me nested data elements 3 levels deep, multiple avenues for data filtering and congestion, as well as users at various levels which would impact all of the data elements. 
So, I started working on it in Ember, and was doing fairly well until I hit where I was building the route needed to view a specific story, a chapter at a time.

What I wanted to do here, was to use dynamic segments to control which story and chapter was displayed to the end user. Specifically, the route would be "/r/:story_id/:chapter_id."  Now there are a few things to remember here.  

The first is that while story_id is the actual identifier for the story to be read, chapter_id is not the chapter identifier, rather it would be the ordinal indicator for that chapter as a sequence within the overarching story.  So, a route of /r/123/3 would be the third chapter of story id 123. 

The second is that chapter_id needed to be optional. If I used the route of "/r/123" then I should have automatically been taken to the first chapter of story 123. Likewise any value outside of a number should also take you to the first chapter, while a numeric value larger than the highest ordinal chapter number should take you to the last chapter.   

But most of that is a back end issue. It's the optional dynamic segment of :chapter_id that's we needed to focus on. After all, optional dynamic segments are a concept that just doesn't exist in Ember. 

Worse, is that every example I saw, didn't allow for multiple levels of dynamic segments where the first of them is required and the second of them is dynamic. 

So, I kicked at it, and created routes and configured them in a dozen different ways until I figured out how I could accomplish this. 

What I did was built two routes.   

The first I named "story" and the second I named "story-chapter."   Then, instead of nesting the second option inside the first (which is how most of the examples performed something similar to what I was attempting, I entered two lines into the Router.map function found in router.js. 

This configuration means that I have to routes for the "/r" root route, one which accepts the story_id dynamic segment, and the second which accepts the story_id and chapter_id dynamic segments. 

Then in the story.js file, I used this to navigate from the story route to the story-chapter route. 

Basically, this is the beforeModel function which processes before any data access on the request.  Since we know that without a chapter_id dynamic segment, we're supposed to go to the first chapter, this takes the behavior of forcing you to that place. 

Of course, in story-chapter.js, we can access the various dynamic segments in this fashion: 

Now, I just need to get that commented aspect of things working the way I expect. 

Lambda Expressions in VB.Net, an Overview

I have to say, lambda expressions are awesome.  

To be explicit a lambda expression is an anonymous method that is used to create delegates or expression tree types. Additionally, they can be passed as arguments or returned as the result of a function  Ideally, they should be short, highly explicit actions.  Additionally, the convenience factor of lambda over traditional delegates is that they can be worked with without giving the method an explicit name and with the code immediately inline where it is to be used.

I first stumbled across lambda expressions in relation to LINQ-to-SQL.  I was wanting to perform some logic on my returned dataset, and all the documentation pointed that lambda expressions were the only way to accomplish what I was wanting.  

The bad part is that at the time, while VB.Net supported lambda expressions there were very little documentation in relation to them. To the point that I just failed to grok them, and ultimately, just used a user function in SQL Server do the work I needed, and went on. What was probably worse, is that Microsoft didn't really show examples of VB.Net lambda expressions for the longest time.  Sure there was the explicit page for VB.Net lambda expressions, but all the sample codes, and the details on how to do the more arcane (and complicated) use cases  were still in C#.  Luckily, that's starting to change, but I still felt the need to build a more basic primer on them and their use in VB.Net.

First is the base declaration syntax.  As in standard methods, lambda expressions which return a result would use the FUNCTION declaration while ones which do not would use the SUB declaration.

Now, in C# you would get a lambda expression built like this:
n=> n.IsReleased

In that, n is the input parameter, and the function returns the IsReleased property of the input. The same logic in VB.Net would be written as:
Function (movie) movie.IsReleased

As is typically the case the VB.Net code is more verbose, but that verbosity also provides a level of clarity which the C# expression lacks. But despite the difference in verbosity, the syntax is strikingly similar. In the C# code, you start with the input parameter which goes into the logic statement.  In the VB.Net code, you start with the function keyword, and its parameter goes into the logic statement.  The difference is in the declaration of the actual method.  In the C# you use the lambda operator, whereas in VB.Net you use the FUNCTION keyword. 

Another thing to consider is that lambda expressions are just a shorthand for anonymous methods.  Anonymous methods are just those methods without a name, rather they're assigned to a variable of delegate type.  In VB.Net an anonymous method would be written like:
dim IsMovieReleased = Function (m as MovieObject) as Boolean 
Return m.IsReleased
End Function

Now, when the logic extends to more than one line, then we need to use the Function/End Function syntax. For example: 
Console.Log(String.Format("Checked Released State for {0}", movie.Title))
Return movie.IsReleased
End Function

And since the lambda expression is an anonymous function, it can be assigned to a delegate as well.
Dim isMovReleased As Func(of movie, Boolean) = Function(m) m.IsReleased 
Dim movieObject As New Movie With {.IsReleased = True, .Title="Star Trek"}
Dim isReleased As Boolean = isMovReleased(movieObject)

Microsoft has produced a number of predefined anonymous delegates for our use. The FUNC delegate is for methods which return a value, while the ACTION delegate is for methods which don't. Each of them have a number of different possible parameters for a wide range of use cases.

Now, the ACTION delegate is for assigning anonymous methods which doe not return a value. 
Dim logMovie AS Action(of movie) = Sub(m) Console.Log(m.Title) 
Dim movieObject As New Movie With {.IsReleased = True, .Title="Star Trek"}

In VB.Net the primary difference in implementation for these two options would be the fact that one uses Sub/End Sub while the other uses Function/End Function. 

Now, as I stated earlier, I first stumbled across lambda expressions in relation to LINQ-to-SQL queries.  What's happen is that the WHERE extension method for IEnumerable(of T) accepts the FUNC(of T, Boolean) delegate.  This means that a method can be applied to the LINQ Queries applied against the datasource.  Using the movie concept, and pulling the released movies, you would generate a LINQ-to-SQL statement like this:
Dim movieList as List(of Movies) = dc.Movies. Where(Function(mv) mv.IsReleased).ToList()

The final use of lambda expressions, and one that I don't typically use, is Expression trees.  Expression trees are representations of code in a tree-like data structure where each of the nodes of the tree is a single atomic expression. Ideally an expression tree would be immutable once defined, and modifying any specific node would mean the creation of a new expression tree which the modifications are applied to, while leaving the existing tree(s) available to the system.

Being able to compile and run the code represented by an expression tree means that the system can enable dynamic modification of executable code, LINQ executions against a database, as well as create dynamic queries.  

Now, imagine this lambda expression:
Dim deleg As Func(of Integer, Boolean)
deleg = Function(f) 0 < f >= 15

It's purpose is to determine if the passed in parameter  is between 1 and 15.  But it can be built as an Expression Tree in this manner:
Dim expr as System.Linq.Expressions.Expression(of Func)
expr = Function(f) 0 < f >= 15
Dim deleg As Func(of Integer, Boolean)
deleg = expr.Compile()

Lambda expressions are one of the better adds to VB.Net over the years, and can lead to a lot of making the code clearer and helping tighten the code significantly.  An aspect that makes code easier to maintain as well as more readable, at least one the initial hurdle of understanding the explicit needs for declaration and use are groked.  And frankly, anything that helps bring clarity, conciseness, maintainability and readability to code is a good thing in my book. 

SQL Server Index Management

SQL Server is a complex thing, and when Always On Availability Groups are applied, that complexity increases (IMO) exponentially.   Of particular note is that Indexes are harder to work with. Well, not harder, but there's a greater overhead for their creation, and subsequently their rebuilding.  You should try to reorganize them as often as feasible. 

Now, I have a client, and this client has a large database. I believe it was sitting at just shy of 500GB the other day. That in and of itself is not an issue, the issue is that this database churns.  This client has a couple of hundred branches, each of which has an instance of this database, which is synchronized back up to the home office. Now, I neither designed nor built this application, but I am helping to monitor the health of the SQL Server, but that churn, that constant moving of data from those 100 or so branches that they had actually added to the system up to the Main Office did horrible, horrible things to the Index Files.  

A reorganization that knocks an index down to 21% fragmentation can be back up to 97% in just a few days. 

What was worse was that a weekly index reorg command was consistently failing.  Likewise, moving it to a nightly even, still caused it to fail. Or sometimes it wouldn't fail, it would just run all week until the weekly server reboot would force kill it.   Which is a terrible thing to happen to an index operation. 

So, I went and found Ola Hallengren's scripts (https://ola.hallengren.com/sql-server-index-and-statistics-maintenance.html) and placed them on the server.  And the behavior still occurred.  I cranked it up so that nothing would be done unless the fragmentation was 95% or greater, and it was still failing. 

And as this was going on, the system was becoming more and more unusable.  Queries were ending in minutes not seconds. 

Now, there were very few things I could impact in this situation. I could not change the database structure. I could not change the replication/synchronization methods. I could not change how the applications accessed data.  My hands were well and truly tied.  The only thing I could impact is the index maintenance task itself.  My solution was to add intelligence into the index maintenance.  

First,  I built a query which every couple of hours went out and gathered the physical statistics of the indexes and stored them in a table. Next I made a stored procedure which would add some intelligence into how it was determining which indexes needed to be reorganized. The logic followed these paths: 

  1. Time Limit for Runs. If the procedure has been running for more than 5 hours, end with success  This was done because we decided that fixing five indexes tonight and a different five the next night, is better than never completing. 
  2. Ranking of Indexes.  A big part of the logic is building the list of indexes, and then ranking them.  The system includes all of the indexes
    1. Avg. Fragmentation.  The value had to be greater than 0, and each index has an expected "start point" for consideration as an index to be reorged.  The first aspect of this is to be higher than the expected start off.  Then a higher rank is given to those with a higher difference between the start point and the Average Fragmentation.   For example, if Index 1 has a Start Point of 40 and an Avg Fragmentation of 60 and Index 2 has a Start Point of 80 and an Avg Fragmentation of 85, then Index 1 would have a greater rank since the difference is 20, verses Index 2's difference of 5. 
    2. Priority.  Some indexes are more important than others.  Basically, a table that is used more often in joins and look-ups, we just care more about its indexes than a history table that is only queried every second day. 
    3. Pages.  Items with more than a 1000 pages are given a higher rank, as we're less concerned with fragmentation when an index can be spread over a small number of pages. 
    4. Age.  If an index hasn't been reorganized in 30 days, then we increase its rank exponentially for every day pass 30.
  3. Running.  The system takes the ranked list, and walks it, issuing a reorg command for an explicit single index.

And in general, this worked beautifully.  Over the course of two months, I was able to slowly walk my Starting Point down from 80% to most have a Starting Point of 30%, while a few priority indexes have it set at 15%. 

Then a few days ago, I got a message from the client asking about a specific index that had apparently been escaping notice from the maintenance tasks.  So I started looking, and the first thing I discovered was that the historical logs recorded a 0% as the Average Fragmentation. Now, the database was clearly showing a 97.7% fragmentation, so I knew that there had to be something at issue with my script. 

So, I went and got the Object_ID for the table, and the Index Id for the index, and ran the Physical Statistics function (sys.dm_db_index_physical_stats) for the index in question. 

Which returned me 3 rows. 

I can admit I was not expecting that.  I had not really looked at the documentation for the query, and every index I had checked manually while building the query which performs the checks only returned one row.   Cognitive bias strikes again.  I had seen a behavior in my results sets, and had assumed that the behavior was only possible result set. 

So, off to Microsoft Docs I went (https://docs.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-db-index-physical-stats-transact-sql?view=sql-server-2017).

Anyways, with the three distinct rows, the relevant differential was the alloc_unit_type_desc column.   According to MS Docs, that particular column can have 3 possible values:

The IN_ROW_DATA is the one that all the tables have. And it's the 'important' one for what I was doing.  The other two dealt with large objects, and data from columns that have been pushed off-row (https://blogs.msdn.microsoft.com/sqlserverstorageengine/2006/12/13/more-undocumented-fun-dbcc-ind-dbcc-page-and-off-row-columns/). 

Now, how does this ties into the 0% Average Fragmentation that I was logging? Well its simple. I was  executing the command to get the statistics, and log those values atomically, and explicitly.  In other words, I logged each index in turn. 

Now, when looking at the query results from the Physical Stats function, I was storing the relevant values into a single variable prior to insert those values into my log.  Which means that I was keeping the last row returned.  Which was the LOB_DATA row, and thus had a 0% Average Fragmentation while the first row (IN_ROW_DATA) had the 97.7%.

A quick WHERE clause solved the issue, and my index maintenance once again started humming along nicely.