Splitting Date Ranges
The power and beauty of F# pattern matching are often underestimated. To shine more light on them I decided to share code fragments which exercise pattern matching to deal with date ranges. This will be a simple divide and conquer implementation for a workaround over poorly designed API.
Background
Lately, I've faced following issue at my work:
There was a third-party API over a documents database. This API exposed a standard endpoint to search for documents which met specified criteria. The problem was that at the database level, there was a global max limit of documents that could be returned, and the API didn't offer any way of paging results.
So whenever you'd look for documents, and there were more documents meeting the search criteria than the global limit, you were basically screwed.
To demonstrate the situation, let's consider following F# code:
Note: code presented in this post has no dependencies, so you should be able copy & run in FSI (F# Interactive) 4.1 no problem.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: |
|
This is a fake in-memory model of the documents database. Each document has its id, type, content, and date of creation. In reality, the documents have a lot of other properties. We populate the array with 100K sample documents, each of which is created within an hour timespan.
We can now simulate the behavior of the mentioned search API with following:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: |
|
The relevant part of the snippet is the globalMaxLimit
value (1000), and search
result.
Note that no matter what filters you apply to the search, results get trimmed whenever they exceed the limit.
Let's see what happens if we try to find all DocTypeA
documents:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: |
|
We hit the max limit, and again: there's no way to page the results.
Leaving the API design without comments, let's see how we could workaround such limitation in F#.
Solution
As you may have noticed, each document has its own created
timestamp.
Search filters allow to specify a date range when a document was created.
This means that we could take the original search definition filters and split a date range into two sub-ranges whenever max limit is exceeded. Results for those two sub-ranges could be merged afterwards.
This idea is just another application of a known Divide and conquer algorithm.
Let's start with a type definition for a date range:
1: 2: 3: 4: 5: |
|
The type is pretty self-explanatory. What's also worth mentioning is that for the ranges we'll assume that:
- date after is always exclusive (after but not equal),
- date before is always inclusive (before or equal) - hence the
OrEqual
suffix in case name.
Note: One may wonder why DateTimeOffset is used instead of DateTime. MSDN suggests that in most scenarios you should default to the first one, and that's what I usually tend to do as this structure is more explicit.
Using DateRange
we can declare split
function:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: |
|
The split
function takes a minimum date as first argument.
This helps restricting the date range, and is usually an easy to determine value.
NoLimitSearchError
is a type that declares possible failures:
MinDateAfterNow
- happens whenever the minimum date is in the futureMinTimeSpanExceeded
- happens if by any chance there are more documents than global max limit created within minimum time span (one second in our case). The minium time span says that we cannot split any further a date range.
midBetween
function computes a date between two other dates.
Inside split
function, dateA
and dateB
stand for dates for which we'll compute a date in between.
Nested dates
function takes a mid date and splits given range into two sub-ranges.
Note: We're using Result type for denoting that function may return error. If you want to learn more about the approach, have a look into ROP introduction. I added the definition of this type, so that tooltip engine can infer types correctly (Result type was introduced in F# 4.1)
I'd like to pay your attention for a second to the beauty of pattern matching in above snippet. This was de-facto the reason why I decided to share this blog post in first place.
Note, how thanks to the DateRange
DU we're easily able to specify relevant split
logic.
For me personally, this is a big F# win.
Wrapping it up
Let's see how we can use the split function to define a workaround for the inital issue:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: 47: 48: 49: |
|
The fromFilters
and apply
functions serve as simple helpers to go from ThirdPartyApi.Filters
to DateRange
type and back. Once again, pattern matching proves helpful in both cases.
Most interesting part is the searchNoLimit
function:
- nested
search
function takes aDateRange
and invokes the original api, - if results are within the global limit we're good,
-
otherwise split the
DateRange
and:- invoke
search
recursively in parallel for both sub-ranges,firstRange
andsecondRange
, - append results of sub-ranges into single array.
- invoke
Let's see if we can find now all DocTypeA
documents:
1: 2: 3: 4: 5: 6: 7: 8: 9: |
|
As expected, we get back now 1/3 of all documents from the database.
Be aware
The searchNoLimit
function will invoke recursively until it's able to find all relevant documents.
This means that it might not be always appropriate to invoke - e.g. when your search filters are too liberal, you might attempt to download all documents from database.
Because of that, please be aware that this workaround might not work for all cases. Next step could be to take the function and extend it to actually page the results.
Recap
In this entry we saw how we can workaround a third-part API limitation by writing a divide and conquer algorithm in F#. We also saw how pattern matching help us declare relevant logic in a nice and concise way.
Till next time, Merry Christmas!
| DocTypeA
| DocTypeB
| DocTypeC
Full name: splittingdateranges.ThirdPartyApi.DocType
{DocId: int;
Type: DocType;
Created: DateTimeOffset;
Content: string;}
Full name: splittingdateranges.ThirdPartyApi.Document
val int : value:'T -> int (requires member op_Explicit)
Full name: Microsoft.FSharp.Core.Operators.int
--------------------
type int = int32
Full name: Microsoft.FSharp.Core.int
--------------------
type int<'Measure> = int
Full name: Microsoft.FSharp.Core.int<_>
Document.Type: DocType
--------------------
type Type =
inherit MemberInfo
member Assembly : Assembly
member AssemblyQualifiedName : string
member Attributes : TypeAttributes
member BaseType : Type
member ContainsGenericParameters : bool
member DeclaringMethod : MethodBase
member DeclaringType : Type
member Equals : o:obj -> bool + 1 overload
member FindInterfaces : filter:TypeFilter * filterCriteria:obj -> Type[]
member FindMembers : memberType:MemberTypes * bindingAttr:BindingFlags * filter:MemberFilter * filterCriteria:obj -> MemberInfo[]
...
Full name: System.Type
type DateTimeOffset =
struct
new : dateTime:DateTime -> DateTimeOffset + 5 overloads
member Add : timeSpan:TimeSpan -> DateTimeOffset
member AddDays : days:float -> DateTimeOffset
member AddHours : hours:float -> DateTimeOffset
member AddMilliseconds : milliseconds:float -> DateTimeOffset
member AddMinutes : minutes:float -> DateTimeOffset
member AddMonths : months:int -> DateTimeOffset
member AddSeconds : seconds:float -> DateTimeOffset
member AddTicks : ticks:int64 -> DateTimeOffset
member AddYears : years:int -> DateTimeOffset
...
end
Full name: System.DateTimeOffset
--------------------
DateTimeOffset()
DateTimeOffset(dateTime: DateTime) : unit
DateTimeOffset(ticks: int64, offset: TimeSpan) : unit
DateTimeOffset(dateTime: DateTime, offset: TimeSpan) : unit
DateTimeOffset(year: int, month: int, day: int, hour: int, minute: int, second: int, offset: TimeSpan) : unit
DateTimeOffset(year: int, month: int, day: int, hour: int, minute: int, second: int, millisecond: int, offset: TimeSpan) : unit
DateTimeOffset(year: int, month: int, day: int, hour: int, minute: int, second: int, millisecond: int, calendar: Globalization.Calendar, offset: TimeSpan) : unit
val string : value:'T -> string
Full name: Microsoft.FSharp.Core.Operators.string
--------------------
type string = String
Full name: Microsoft.FSharp.Core.string
Full name: splittingdateranges.ThirdPartyApi.baseDate
type TimeSpan =
struct
new : ticks:int64 -> TimeSpan + 3 overloads
member Add : ts:TimeSpan -> TimeSpan
member CompareTo : value:obj -> int + 1 overload
member Days : int
member Divide : divisor:float -> TimeSpan + 1 overload
member Duration : unit -> TimeSpan
member Equals : value:obj -> bool + 1 overload
member GetHashCode : unit -> int
member Hours : int
member Milliseconds : int
...
end
Full name: System.TimeSpan
--------------------
TimeSpan()
TimeSpan(ticks: int64) : unit
TimeSpan(hours: int, minutes: int, seconds: int) : unit
TimeSpan(days: int, hours: int, minutes: int, seconds: int) : unit
TimeSpan(days: int, hours: int, minutes: int, seconds: int, milliseconds: int) : unit
Full name: splittingdateranges.ThirdPartyApi.documents
member Clone : unit -> obj
member CopyTo : array:Array * index:int -> unit + 1 overload
member GetEnumerator : unit -> IEnumerator
member GetLength : dimension:int -> int
member GetLongLength : dimension:int -> int64
member GetLowerBound : dimension:int -> int
member GetUpperBound : dimension:int -> int
member GetValue : index:int64 -> obj + 7 overloads
member Initialize : unit -> unit
member IsFixedSize : bool
...
Full name: System.Array
Full name: Microsoft.FSharp.Collections.Array.init
Full name: Microsoft.FSharp.Core.Operators.failwith
inherit MemberInfo
member Assembly : Assembly
member AssemblyQualifiedName : string
member Attributes : TypeAttributes
member BaseType : Type
member ContainsGenericParameters : bool
member DeclaringMethod : MethodBase
member DeclaringType : Type
member Equals : o:obj -> bool + 1 overload
member FindInterfaces : filter:TypeFilter * filterCriteria:obj -> Type[]
member FindMembers : memberType:MemberTypes * bindingAttr:BindingFlags * filter:MemberFilter * filterCriteria:obj -> MemberInfo[]
...
Full name: System.Type
val float : value:'T -> float (requires member op_Explicit)
Full name: Microsoft.FSharp.Core.Operators.float
--------------------
type float = Double
Full name: Microsoft.FSharp.Core.float
--------------------
type float<'Measure> = float
Full name: Microsoft.FSharp.Core.float<_>
Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.sprintf
{Type: DocType option;
CreatedBeforeOrEqual: DateTimeOffset option;
CreatedAfter: DateTimeOffset option;}
Full name: splittingdateranges.ThirdPartyApi.Filters
Filters.Type: DocType option
--------------------
type Type =
inherit MemberInfo
member Assembly : Assembly
member AssemblyQualifiedName : string
member Attributes : TypeAttributes
member BaseType : Type
member ContainsGenericParameters : bool
member DeclaringMethod : MethodBase
member DeclaringType : Type
member Equals : o:obj -> bool + 1 overload
member FindInterfaces : filter:TypeFilter * filterCriteria:obj -> Type[]
member FindMembers : memberType:MemberTypes * bindingAttr:BindingFlags * filter:MemberFilter * filterCriteria:obj -> MemberInfo[]
...
Full name: System.Type
Full name: Microsoft.FSharp.Core.option<_>
Full name: splittingdateranges.ThirdPartyApi.globalMaxLimit
Full name: splittingdateranges.ThirdPartyApi.search
Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.async
Full name: Microsoft.FSharp.Collections.Array.filter
Full name: Microsoft.FSharp.Collections.Array.take
Full name: splittingdateranges.onlyDocA
from splittingdateranges
Full name: splittingdateranges.ThirdPartyApi.search
type Async
static member AsBeginEnd : computation:('Arg -> Async<'T>) -> ('Arg * AsyncCallback * obj -> IAsyncResult) * (IAsyncResult -> 'T) * (IAsyncResult -> unit)
static member AwaitEvent : event:IEvent<'Del,'T> * ?cancelAction:(unit -> unit) -> Async<'T> (requires delegate and 'Del :> Delegate)
static member AwaitIAsyncResult : iar:IAsyncResult * ?millisecondsTimeout:int -> Async<bool>
static member AwaitTask : task:Task -> Async<unit>
static member AwaitTask : task:Task<'T> -> Async<'T>
static member AwaitWaitHandle : waitHandle:WaitHandle * ?millisecondsTimeout:int -> Async<bool>
static member CancelDefaultToken : unit -> unit
static member Catch : computation:Async<'T> -> Async<Choice<'T,exn>>
static member Choice : computations:seq<Async<'T option>> -> Async<'T option>
static member FromBeginEnd : beginAction:(AsyncCallback * obj -> IAsyncResult) * endAction:(IAsyncResult -> 'T) * ?cancelAction:(unit -> unit) -> Async<'T>
static member FromBeginEnd : arg:'Arg1 * beginAction:('Arg1 * AsyncCallback * obj -> IAsyncResult) * endAction:(IAsyncResult -> 'T) * ?cancelAction:(unit -> unit) -> Async<'T>
static member FromBeginEnd : arg1:'Arg1 * arg2:'Arg2 * beginAction:('Arg1 * 'Arg2 * AsyncCallback * obj -> IAsyncResult) * endAction:(IAsyncResult -> 'T) * ?cancelAction:(unit -> unit) -> Async<'T>
static member FromBeginEnd : arg1:'Arg1 * arg2:'Arg2 * arg3:'Arg3 * beginAction:('Arg1 * 'Arg2 * 'Arg3 * AsyncCallback * obj -> IAsyncResult) * endAction:(IAsyncResult -> 'T) * ?cancelAction:(unit -> unit) -> Async<'T>
static member FromContinuations : callback:(('T -> unit) * (exn -> unit) * (OperationCanceledException -> unit) -> unit) -> Async<'T>
static member Ignore : computation:Async<'T> -> Async<unit>
static member OnCancel : interruption:(unit -> unit) -> Async<IDisposable>
static member Parallel : computations:seq<Async<'T>> -> Async<'T []>
static member RunSynchronously : computation:Async<'T> * ?timeout:int * ?cancellationToken:CancellationToken -> 'T
static member Sleep : millisecondsDueTime:int -> Async<unit>
static member Start : computation:Async<unit> * ?cancellationToken:CancellationToken -> unit
static member StartAsTask : computation:Async<'T> * ?taskCreationOptions:TaskCreationOptions * ?cancellationToken:CancellationToken -> Task<'T>
static member StartChild : computation:Async<'T> * ?millisecondsTimeout:int -> Async<Async<'T>>
static member StartChildAsTask : computation:Async<'T> * ?taskCreationOptions:TaskCreationOptions -> Async<Task<'T>>
static member StartImmediate : computation:Async<unit> * ?cancellationToken:CancellationToken -> unit
static member StartWithContinuations : computation:Async<'T> * continuation:('T -> unit) * exceptionContinuation:(exn -> unit) * cancellationContinuation:(OperationCanceledException -> unit) * ?cancellationToken:CancellationToken -> unit
static member SwitchToContext : syncContext:SynchronizationContext -> Async<unit>
static member SwitchToNewThread : unit -> Async<unit>
static member SwitchToThreadPool : unit -> Async<unit>
static member TryCancelled : computation:Async<'T> * compensation:(OperationCanceledException -> unit) -> Async<'T>
static member CancellationToken : Async<CancellationToken>
static member DefaultCancellationToken : CancellationToken
Full name: Microsoft.FSharp.Control.Async
--------------------
type Async<'T>
Full name: Microsoft.FSharp.Control.Async<_>
Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.printfn
| Unbounded
| BeforeOrEqual of DateTimeOffset
| After of DateTimeOffset
| Between of DateTimeOffset * DateTimeOffset
Full name: splittingdateranges.DateRange
module Result
from Microsoft.FSharp.Core
--------------------
type Result<'a,'b> =
| Ok of 'a
| Error of 'b
Full name: splittingdateranges.Result<_,_>
| MinDateAfterNow of minDate: DateTimeOffset * now: DateTimeOffset
| MinTimeSpanExceeded of TimeSpan
Full name: splittingdateranges.NoLimitSearchError
Full name: splittingdateranges.minTimeSpan
Full name: splittingdateranges.midBetween
Full name: splittingdateranges.getNowDate
Full name: splittingdateranges.split
Full name: Microsoft.FSharp.Core.Result.map
Full name: splittingdateranges.fromFilters
Full name: splittingdateranges.apply
Full name: splittingdateranges.searchNoLimit
Full name: Microsoft.FSharp.Collections.Array.append
Full name: splittingdateranges.minDate