NAME
ftimes-proximo - Locate a group of dig hits within a specified byte range
SYNOPSIS
ftimes-proximo [-l limit] [-r range] {-G group=tag,tag[,tag[,...]][:range]|-g <groups-file}> -f {file|-}
DESCRIPTION
This utility locates a group of dig hits within a specified byte range. To work properly, the input must be sorted by 'hostname' (when present), 'name', and 'offset' in ascending order. Note that this utility does not sort the input -- that step can be done with ftimes-sortini(1). The input format can vary so long as it contains at least the 'name', 'tag', 'offset', and 'string' fields. The two most common formats are:
name|type|tag|offset|string
and
hostname|name|type|tag|offset|string|joiner
The first is produced by ftimes(1) and hipdig(1), and the second is produced by ftimes-dig2dbi(1). Each input record must contain a non-null tag value -- those that don't will be ignored. Generally, each tag should correspond to a unique dig string. However, tag overloading is allowed.
This utility can also take its own output as input, thus providing a way to analyze groups of groups. In that case, the input must contain at least the 'name', 'group', 'footprint', and 'offset' fields.
Output is written to stdout in one of the following formats:
name|group|ordered|proximity|gap|limit|range|window|footprint|offset|offsets|tags
or
hostname|name|group|ordered|proximity|gap|limit|range|window|footprint|offset|offsets|tags
The breakdown of the output format is as follows:
- hostname
-
Hostname of the subject system. This value is transferred directly from the input stream, but only if that field is present.
- name
-
URL-encoded filename. This value is transferred directly from the input stream.
- group
-
Name of the group (as defined on the command line or in a group config file) that was matched.
- ordered
-
Boolean value (y/n) indicating whether the actual tag order matches the order specified in the group definition. If order is important, be sure to specify group definitions using the desired order.
- proximity
-
A value from 0.00 to 1.00 indicating the relative proximity of the dig hits for a given group. This value is computed as follows:
( <limit> - <gap> ) / <limit>
where the gap is the smaller of the specified limit (-l option) or actual gap.
- gap
-
The average gap, in bytes, between adjacent dig hits.
- limit
-
The largest average gap between dig hits for them to be considered close. As the actual gap approaches this number, proximity goes to zero.
- range
-
The number of bytes between the lowest and highest dig offsets for a given match.
- window
-
The number of bytes used to determine whether a given match is in range or not.
-
The number of bytes between the beginning of the first and end of the last dig hits (inclusive).
- offset
-
The offset of the group hit. This corresponds to the lowest offset within a group for a given match (hit).
- offsets
-
Comma delimited list of dig offsets in the order they were found.
- tags
-
Comma delimited list of dig tags in the order they were found.
The trigger event for generating an output record is a group match. Each time a member offset changes for a given group, the entire group is evaluated to see if the resulting set of offsets fall within the specified range. If that condition is met, then an output record is generated.
OPTIONS
- -f {file|-}
-
Specifies the name of the input file. A value of '-' will cause the program to read from stdin.
- -G group=tag,tag[,tag[,...]][:range]
-
Specifies a group definition where
- group
-
The name of the group.
- tag,tag[,tag[,...]]
-
A comma delimited list of two or more unique dig tags.
- range
-
A decimal number or the word 'infinity'. The range is optional in a group definition.
- -g groups-file
-
Specifies the name of a file containing one or more group definitions. The format is the same as that used for the -G option.
- -l limit
-
Specifies the largest average gap between dig hits for them to be considered close. As the gap approaches this number, proximity goes to zero. The default gap is 100 bytes.
- -r range
-
A decimal number or the word 'infinity'. If the latter is specified, then the range window is all bytes in a given file. This is useful if you simply want to determine whether or not all tags occur in a given file. The default range is 100 bytes.
CAVEATS
Group matching only maintains (i.e., remembers) the last offset of each group member. This means that there are cases where a single group could have multiple matches in a specified range, but only one is reported. For example, suppose you have the following group definition:
g_test=a1,b2,c3,d4:100
Now suppose that you have the following dig records:
name|type|tag|offset|string
"file"|normal|a1|10|a1
"file"|normal|b2|20|b2
"file"|normal|c3|30|c3
"file"|normal|a1|40|a1
"file"|normal|d4|50|d4
In this case, one could say that the group matches twice within the specified range of 100 bytes. Once for offsets 10, 20, 30, and 50, and once for offsets 20, 30, 40, and 50. Since this utility only maintains the last offset of each group member, only the second set of offsets is considered a match. This happens because the 'a1' offset is reset from 10 to 40 when the fourth record (not counting the header) is porcessed. Effectively, this means that given two potential matches within a specified range, the match where the offsets are the closest always wins.
AUTHOR
Klayton Monroe
SEE ALSO
ftimes(1), ftimes-dig2ctx(1), ftimes-dig2dbi(1), ftimes-sortini(1), hipdig(1)
HISTORY
This utility was initially written to perform proximity analysis in a case where we needed to identify last names in close proximity to their respective Social Security Numbers (SSN).
This utility first appeared in FTimes 3.9.0.
LICENSE
All documentation and code are distributed under same terms and conditions as FTimes.
|