Skip to content

Commit

Permalink
released 1.5.4 - speed improvements, fixed slow -z and reading from p…
Browse files Browse the repository at this point in the history
…ipes
  • Loading branch information
Robert-van-Engelen committed Nov 1, 2019
1 parent fb1ef98 commit f4eb5c0
Show file tree
Hide file tree
Showing 7 changed files with 33 additions and 35 deletions.
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ Introduction: why use ugrep?
by filename extension and file signature "magic bytes" or shebangs. For
example, to list all shell scripts in or below the working directory:

ugrep -rltShell ''
ugrep -rl -tShell ''

where `-r` is recursive search, `-l` lists matching files, `-tShell` selects
shell files by file extensions and shebangs, and the empty pattern `''`
Expand Down Expand Up @@ -116,7 +116,7 @@ Introduction: why use ugrep?
to recursively search Python files in the working directory for import
statements:

ugrep -RtPython -f python/imports
ugrep -R -tPython -f python/imports

where `-R` is recursive search while following symlinks, `-tPython` selects
Python files only (i.e. by file name extension `.py` and by Python shebangs),
Expand All @@ -129,7 +129,7 @@ Introduction: why use ugrep?
example to find exact matches of `main` in C/C++ source code while skipping
strings and comments that may have a match with `main` in them:

ugrep -Rotc++ -nw 'main' -f c/zap_strings -f c/zap_comments
ugrep -Ro -tc++ -nw 'main' -f c/zap_strings -f c/zap_comments

where `-R` is recursive search while following symlinks, `-o` for multi-line
matches (since strings and comments may span multiple lines), `-tc++`
Expand Down Expand Up @@ -295,8 +295,8 @@ e.g. `ugrep -on -U 'serialize_\w+Type'` is fast but slower without `-U`.
### Future improvements and TODO

- The line-by-line matching used by options `-z`, `--no-mmap`, and when reading
standard input can be slow, while `-o` (and `-c`, `-l`, `-q`, `-N`) is always
fast. Line-by-line reading should be replaced by block reading.
standard input (e.g. a pipe to ugrep) can be slower, while `-o` (and `-c`,
`-l`, `-q`, `-N`) is always fast. Replace line-by-line with block reading.
- Further optimize searching one word or a few words.
- Improve the speed of matching multiple words, which is currently faster than
GNU grep (ugrep uses Bitap and hashing), but Hyperscan is faster using
Expand Down Expand Up @@ -1988,7 +1988,7 @@ SEE ALSO



ugrep 1.5.3 October 31, 2019 UGREP(1)
ugrep 1.5.4 November 01, 2019 UGREP(1)

<a name="patterns"/>

Expand Down
Binary file modified bin/linux/ugrep
Binary file not shown.
Binary file modified bin/macosx/ugrep
Binary file not shown.
Binary file modified bin/windows/ugrep.exe
Binary file not shown.
6 changes: 3 additions & 3 deletions include/reflex/input.h
Original file line number Diff line number Diff line change
Expand Up @@ -816,7 +816,7 @@ class Input::dos_streambuf : public std::streambuf {
if (n <= 0 || ch1_ == EOF)
return 0;
std::streamsize k = n;
char c;
int c;
while (k > 0 && (c = get()) != EOF)
{
*s++ = c;
Expand Down Expand Up @@ -863,7 +863,7 @@ class Input::dos_streambuf : public std::streambuf {
class BufferedInput : public Input {
public:
/// Buffer size.
static const size_t SIZE = 8192;
static const size_t SIZE = 16384;
/// Buffered stream buffer for reflex::Input, derived from std::streambuf.
class streambuf;
/// Buffered stream buffer for reflex::Input to read DOS files, replaces CRLF by LF, derived from std::streambuf.
Expand Down Expand Up @@ -1066,7 +1066,7 @@ class BufferedInput::dos_streambuf : public std::streambuf {
if (n <= 0 || ch1_ == EOF)
return 0;
std::streamsize k = n;
char c;
int c;
while (k > 0 && (c = get()) != EOF)
{
*s++ = c;
Expand Down
2 changes: 1 addition & 1 deletion man/ugrep.1
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.TH UGREP "1" "October 31, 2019" "ugrep 1.5.3" "User Commands"
.TH UGREP "1" "November 01, 2019" "ugrep 1.5.4" "User Commands"
.SH NAME
\fBugrep\fR -- universal file pattern searcher
.SH SYNOPSIS
Expand Down
48 changes: 23 additions & 25 deletions src/ugrep.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,7 @@ void sigpipe_handle(int) { }
#endif

// ugrep version info
#define UGREP_VERSION "1.5.3"
#define UGREP_VERSION "1.5.4"

// ugrep platform -- see configure.ac
#if !defined(PLATFORM)
Expand Down Expand Up @@ -316,7 +316,6 @@ std::vector<std::string> flag_exclude_override_dir;
void set_color(const char *grep_colors, const char *parameter, char color[COLORLEN]);
void trim(std::string& line);
bool is_output(ino_t inode);
bool is_file(const reflex::Input& input);
size_t strtopos(const char *s, const char *msg);

void format(const char *format, size_t matches);
Expand Down Expand Up @@ -635,7 +634,7 @@ bool MMap::file(reflex::Input& input, const char*& base, size_t& size)
// output buffering and synchronization
struct Output {

static const size_t SIZE = 65536; // size of each buffer in the buffers container
static const size_t SIZE = 16384; // size of each buffer in the buffers container

struct Buffer { char data[SIZE]; }; // a buffer in the buffers container

Expand Down Expand Up @@ -769,9 +768,11 @@ struct Output {
// flush the buffers
void flush()
{
// if multi-threaded and lock is not owned already, then lock on master's mutex
if (lock != NULL && !lock->owns_lock())
lock->lock();

// flush the buffers container to the designated output file, pipe, or stream
for (Buffers::iterator i = buffers.begin(); i != buf; ++i)
fwrite(i->data, 1, SIZE, file);
fwrite(buf->data, 1, cur - buf->data, file);
Expand All @@ -781,15 +782,23 @@ struct Output {
cur = buf->data;
}

// next buffer, allocate one if needed
// next buffer, allocate one if needed (when multi-threaded and lock is owned by another thread)
void next()
{
if (++buf == buffers.end())
grow();
cur = buf->data;
if (lock == NULL || lock->owns_lock() || lock->try_lock())
{
flush();
}
else
{
// allocate a new buffer if no next buffer was allocated before
if (++buf == buffers.end())
grow();
cur = buf->data;
}
}

// allocate a buffer to grow the buffers container
// allocate a new buffer to grow the buffers container
void grow()
{
buf = buffers.emplace(buffers.end());
Expand All @@ -802,11 +811,12 @@ struct Output {
lock = new std::unique_lock<std::mutex>(mutex, std::defer_lock);
}

// flush and release synchronization on the mutex, if one was given with sync()
// flush and release synchronization on the master's mutex, if one was assigned before with sync()
void release()
{
flush();

// if multi-threaded and lock is owned, then release it
if (lock != NULL && lock->owns_lock())
lock->unlock();
}
Expand Down Expand Up @@ -1799,7 +1809,7 @@ struct GrepWorker : public Grep {
master(master),
todo()
{
// all workers synchronize their output with a mutex lock
// all workers synchronize their output on the master's mutex lock
out.sync(master->mutex);

// run worker thread executing jobs assigned in its queue
Expand Down Expand Up @@ -4314,9 +4324,11 @@ void Grep::search(const char *pathname)
{
// read input line-by-line and display lines that match the pattern with context lines

// TODO: replace line-by-line reading with block reading to improve speed

reflex::BufferedInput buffered_input;

if (!is_mmap && is_file(input))
if (!is_mmap)
buffered_input = input;

const char *here = base;
Expand Down Expand Up @@ -4941,20 +4953,6 @@ void set_color(const char *grep_colors, const char *parameter, char color[COLORL
}
}

// return true if input is a regular file that can be mmap'ed
bool is_file(const reflex::Input& input)
{
#ifdef OS_WIN
return false;
#else
if (input.file() == NULL)
return false;
int fd = fileno(input.file());
struct stat buf;
return fstat(fd, &buf) == 0 && S_ISREG(buf.st_mode);
#endif
}

// convert unsigned decimal to positive size_t, produce error when conversion fails or when the value is zero
size_t strtopos(const char *str, const char *msg)
{
Expand Down

1 comment on commit f4eb5c0

@rbnor
Copy link
Contributor

@rbnor rbnor commented on f4eb5c0 Nov 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, well done, looking forward to using this more

Please sign in to comment.