You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tgrep is a commandline utility meant to search log files per Reddit's specifications [0].
4
+
`tgrep` is a commandline utility meant to search 100 GB log files.
5
5
6
-
Setup
7
-
=====
6
+
### Prerequisites ###
8
7
9
-
Tgrep is written in Clojure and, as such, requires a JVM and the Clojure language [1]to be installed on your computer. What's more, it manages its project dependencies using Leiningen [2], so lein is required to run the utility, as well. All other libraries are included with the executable.
8
+
`tgrep` is written in Clojure and, as such, requires a JVM to be installed on your computer. Leiningen is required to install Clojure and library dependencies.
10
9
11
-
Ruby is used instead of Bash to actually parse arguments in the executable and then run the Clojure code as a script. Any version of Ruby after 1.8.6 should do.
10
+
### Running ###
12
11
13
-
To run the executable, first give it permission to execute via chmod +x bin/tgrep.
12
+
To run:
14
13
15
-
Then run the program (bin/tgrep) according to the Reddit specifications. /log/haproxy.log is the default file, if none other is specified.
14
+
lein search 00:05
15
+
lein search -f logfile 00:05
16
+
lein search -f logfile 00:05-00:10
17
+
lein search -f logfile 00:05:01-00:10
18
+
lein search -f logfile 00:05:01-00:10:01
16
19
17
-
Edge cases
18
-
==========
20
+
### Edge cases ###
19
21
20
22
There were several different potential edge cases and bugs that could've cropped up with this utility. These are mainly the product of:
21
23
@@ -29,18 +31,19 @@ There were several different potential edge cases and bugs that could've cropped
29
31
a 24 hour timepoint. This means that a specified date interval range
30
32
might appear twice in a log file.
31
33
32
-
I've tried to remedy edge cases up front by normalizing all incoming dates
33
-
and immediately turning them into intervals. Any precise values simply turn into one boundary for the time interval we're looking at, and if we're only looking for one timestamp, I simply make an interval out of two identical date values.
34
-
35
-
To combat (3), I simply decided that the first range within the valid bounds of the log file would be the target range.
36
-
37
-
Performance
38
-
===========
39
-
40
-
One note about performance. This code runs as fast as vanilla Java, and it will self-optimize the more you run it. One drawback about using the JVM is that the VM must load before the script can execute. I've talked with jedberg about this problem, and he said that the JVM load time would not be an issue. To give you an indication of the actual run time of my script, I've included an "Elapsed time" tag after the script runs. To mediate the JVM load time, a nailgun Java server (which gives you a persistent JVM) may also be used.
34
+
I've tried to remedy edge cases up front by normalizing all incoming
35
+
dates and immediately turning them into intervals. Any precise values
36
+
simply turn into one boundary for the time interval we're looking at,
37
+
and if we're only looking for one timestamp, I simply make an interval
0 commit comments