Find Joined Line-Segments

The following problem appeared recently on comp.lang.functional. It is clearly related to the "cosmic ray trace" problem, discussed here, but a single, comprehensive solution to both problems has thus far eluded me.

In what follows, I describe two K solutions in the order in which I obtained them.

The first solution closely tracks the geometry of the problem, assembling the result iteratively (although without the use of explicit loops or recursion). Several simplifications are described, terminating in the final 9 line solution here.

The second solution is completely non-iterative: points are generated, then assembled in a single pass using pure array operations. That solution is found here. It is likely that further reflection will yield other solutions, even simpler and more direct.

The Problem

Given a 2D array of characters: (stored as a list of list of characters)

sadlkjsadsfljkasf
asrdlfhsad---hhfa
assjh---dash|fsas
saddl|kjsafj|hash
ask---dhhfsk|fhsf
sdafhshfsdkjfhshf

find any joined ASCII line segments: (for the above case)

.................
..........---....
.....---....|....
.....|......|....
...---......|....
.................

and return a list of coordinates making up the joined segments: (assume
unjoined segments are always far enough apart to prevent ambiguity)

==> (((12 . 4) (12 . 1) (10 . 1)) ((3 . 4) (5 . 4) (5 . 2) (7 . 2)))

An Iterative K Solution

The character matrix:

	x:("sadlkjsadsfljkasf"
	   "asrdlfhsad---hhfa"
	   "assjh---dash|fsas"
	   "saddl|kjsafj|hash"
	   "ask---dhhfsk|fhsf"
	   "sdafhshfsdkjfhshf")

A function to find line-segments:

	segments:{:[#b:&(~=)':x=" ",y;z{+(x;y)}'@[-1 2#b;_n;0 -1+];()]}

employing the venerable "not equal prior" idiom. Note the use of amend (@) to decrement the terminal index. The inline lambda {+(x;y)}' prepends the row- or column-index to the result.

To find horizontal segments:

	  :h:,/segment["-"]'[x;!#x]
	((1 10
	  1 12)
	 (2 5
	  2 7)
	 (4 3
	  4 5))

and vertical segments:

	  :v:|:'',/segment["|"]'[+x;!#*x]
	((3 5
	  3 5)
	 (2 12
	  4 12))

We represent a segment as a pair (from-rc;to-rc), where from-rc is the (row;col) where the segment begins, and to-rc is the (row;col) where the segment ends. So, for example, we find a horizontal segment beginning at row 1, column 10, and ending at row 1, column 12.

(h;v), the pair consisting of horizontal line-segments h and vertical line-segments v, is our initial state.

Note that vertical segments are computed by first flipping the input matrix x, and then reversing each segment, giving a uniform representation to points in both horizontal and vertical segments; namely, (row-index;column-index). We will henceforth distinguish the two cases by position within the larger structure: if x is a line-segment, then, x is a horizontal segment if it occurs in h (in (h;v)), and x is a vertical segment if it occurs in v (in (h;v)).

We seek the result ,/(h*;v*), where h* is a list of the maximal length lines terminating at the beginning or ending point of a segment in h, and v* is a list of the maximal length lines terminating athe beginning or ending point of a segment in v. A line is a list of connected points.

Specifically, in the example above, ,/(h*;v*) is:

Our method will be to iterate over the state until it converges, then raze the result:

	,/connect/(h;v)

That is, connect is a function of one argument; in this case, the pair (h;v). It is applied by / repeatedly. Following an application, the result is checked against the argument. If they match, the result is returned. Otherwise, connect is applied to the result.

,/ (raze) takes a list of lists and smashes them together into a list. Since connect returns a pair of lists, this is precisely what we want.

So we need to define connect:

	connect:{at/[x;,/,/:\:/(!#:)'x]_dv\:()}

The expression

	,/,/:\:/(!#:)'x

takes a list of lists x and returns the Cartesian product of indices into x:

	  ,/,/:\:/(!#:)'(h;v)
	(0 0
	 0 1
	 1 0
	 1 1
	 2 0
	 2 1)

connect applies the function at over the initial state x = (h;v) and the set of indices as above. From each part of the result (h*;v*) we delete (_dv) all occurrences of (), the empty list.

The function at updates each part of the state x at indices y 0 and y 1 with the result of the attempt to merge the segments at those locations:

	at:{./[x;k;:;merge . x ./:k:0 1,'y]}

Intuitively, if H is the line (a;..;b) and V is the line (c;..;d), then there are six cases to consider. We can depict the four cases where a merge is possible:

	a--
	  |
	  |
	  d

	--b
	|
	|
	d

	c
	|
	|
	--b

	  c
	  |
	  |
	a--

Consider just the first case. H begins at a and V ends at d. The column index of the last point in H = the column index of the first point in V, and the row index of (every point in) H = -1 + the row index of the first point in V. In this case, we can merge H and V, producing a new vertical line (since the result of the merge terminates in a vertical line segment.) We merge H and V by appending d to H, remembering to remove V from the state.

The function merge must handle two further cases: that where H and V do not meet at all, and that where one (or both) of H and V has already been merged, and is therefore ():

	merge:{
	 :[()_in(x;y)           ;(x;y)
	   (*x)~(*y)-1 0        ;(();(|x),,*|y)
	   (*|x)~(*y)-1 0       ;(();x,,*|y)
	   (*x)~(*|y)+1 0       ;(.[y;(-1+#y;0);1+],,*|x;())
	   (*|x)~(*|y)+1 0      ;(.[y;(-1+#y;0);1+],,*x;())
	                         (x;y)]}

The full solution:

	lines:{
	 h:,/segments["-"]'[x;!#x]
	 v:|:'',/segments["|"]'[+x;!#*x]
	 adjust[x]',/connect/(h;v)}

	adjust:{:[(x .*|y)_in"-|";y;@[y;-1+#y;-1 0+]]}

	segments:{:[#b:&(~=)':x=" ",y;z{+(x;y)}'@[-1 2#b;_n;0 -1+];()]}

	connect:{at/[x;,/,/:\:/(!#:)'x]_dv\:()}

	at:{./[x;k;:;merge . x ./:k:0 1,'y]}

	merge:{
	 :[()_in(x;y)           ;(x;y)
	   (*x)~(*y)-1 0        ;(();(|x),,*|y)
	   (*|x)~(*y)-1 0       ;(();x,,*|y)
	   (*x)~(*|y)+1 0       ;(.[y;(-1+#y;0);1+],,*|x;())
	   (*|x)~(*|y)+1 0      ;(.[y;(-1+#y;0);1+],,*x;())
 	                         (x;y)]}

and the K script.

Simplifications

The ASCII representations of horizontal and vertical line-segments differ: where a horizontal segment meets a vertical segment, the common cell is occupied by a point belonging to the horizontal-segment. For example, in

	---
	  |
	  |
	  ---

the top of the vertical segment is located one row below the right of the top horizontal segment, and the bottom of the vertical segment is located one row above the left of the bottom horizontal segment.

Our first solution faithfully represents this state of affairs, since the segment function produces the actual indices of the first and last occurrences of "-" and "|".

By forcing vertical segments to begin one row above the first occurrence of "|", and end one row below the last occurrence of "|":

	--+
	  |
	  |
          +--

it is possible to simplify the merge function:

	lines:{
	 h:,/segments[("-";0 -1)]'[x;!#x]
	 v:|:'',/segments[("|";-1 0)]'[+x;!#*x]
	 adjust[x]',/connect/(h;v)}

	segments:{:[#b:&(~=)':x[0]=" ",y;z{+(x;y)}'@[-1 2#b;_n;x[1]+];()]}

	merge:{
	 :[()_in(x;y)     ;(x;y)
	   (*x)~*y        ;(();(|x),,*|y)
	   (*|x)~*y       ;(();x,,*|y)
	   (*x)~*|y       ;(y,,*|x;())
	   (*|x)~*|y      ;(y,,*x;())
	                   (x;y)]}

The first revised solution can be found here.

Finally, the case-structure of merge can be eliminated by first performing all matches against begin- and end-points of both lines, and then using that result as an index into the list of the five possible transformations:

	merge:{attach[x;y;match[x;y]]}

	match:{5&(,/~\:/:/{(*x;*|x)}'(x;y))?1}

	attach:{((();(|x),,*|y);(();x,,*|y);(y,,*|x;());(y,,*x;());(x;y))z}

The second revised solution can be found here.

A Non-iterative K Solution

We modify segments, horizontal, and vertical to yield linearized segments.

A segment is a pair ((r;c);(r';c')), where r and c are the row- and column-indices of the starting point of the segment, and r' and c' are the row- and column-indices of the ending point of the segment. The linearization of a coordinate (r;c) in a matrix m consists of the reduction of (r;c) to a single index into the raze of the matrix:

	  m[r;c] ~ (,/m) i
	1

where i is computed:

	i:(^m) _sv (r;c)

Note that (r;c) can be retrieved from i by means of the computation:

	  (r;c) ~ (^m) _vs i
	1

Thus:

	vertical:{points[x]'|:'',/segments[("|";-1 0)]'[+x;!#*x]}
	horizontal:{points[x]',/segments[("-";0 -1)]'[x;!#x]}
	segments:{:[#b:&(~=)':x[0]=" ",y;z{+(x;y)}'@[-1 2#b;_n;x[1]+];()]}
	points:{(^x)_sv+y}

In our example,

	x:("----kjsadsfljkasf"
	   "|srdlfhsad---hhfa"
	   "|ssjh---dash|fsas"
	   "saddl|kjsafj|hash"
	   "ask---dhhfsk|fhsf"
	   "sdafhshfsdkjfhshf")

	  :s:horizontal[x],vertical x
	(0 3
	 27 29
	 39 41
	 71 73
	 0 51
	 39 73
	 29 97)

We will now step through our second solution.

First, flatten the segments into a list of points:

	  :p:,/s
	0 3 27 29 39 41 71 73 0 51 39 73 29 97

Next, generate a vector which groups the points into pairs:

	  :j:&(_.5*#p)#2
	0 0 1 1 2 2 3 3 4 4 5 5 6 6

Next, find pairs of matching points in different segments:

	  :i@:&2=#:'i:=(?p)?/:p
	(0 8
	 3 12
	 4 10
	 7 11)

Next, find the segments in which those points occur, and link the pairs.

	  :k:j i@=h(|/_lin)\:/:h:j i
	(,0 4
	 ,1 6
	 (2 5
	  3 5))

Finally, retrieve the segments, convert the points back into (r;c) pairs, and shorten all terminal vertical segments:

	  :adjust[x]'(+(^x)_vs?,//)'s k
	((0 0
	  0 3
	  2 0)
	 (1 10
	  1 12
	  4 12)
	 (2 5
	  2 7
	  4 5
	  4 3))

The resulting function is:

	lines:{
	 p:,/s:horizontal[x],vertical x         / (s)egments, (p)oints
	 j:&(_.5*#p)#2                          / group by pairs
	 i@:&2=#:'i:=(?p)?/:p                   / where x contacts y
	 k:j i@=h(|/_lin)\:/:h:j i              / where x links to y
	 adjust[x]'(+(^x)_vs?,//)'s k}          / unpack to lines, adjust verticals

Find the script here.