FLOC — Fast Lines Of Code Counter

This software comes with ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law.

A production-quality, cloc-compatible source-code line counter written in Fortran 90/2003 with OpenMP parallelism. Designed to scan repositories with millions of files orders of magnitude faster than the Perl-based cloc tool by eliminating per-line overhead, using single-call buffered I/O, and parallelising across all available CPU cores.

Quick start

# Linux / macOS
make
./floc .

# Scan a specific directory
./floc ~/repos/linux-kernel

# Only Python and Go
./floc --include-lang=Python,Go .

# Skip test directories
./floc --exclude-dir=test,tests,spec .

Sample output (cloc-compatible format):

------------------------------------------------------------------------
Language              files        blank    comment       code
------------------------------------------------------------------------
Python                   33         1226       1026       3017
JavaScript               18          834        512       8204
TypeScript                9          321        198       4110
Go                        6          215         87       2033
------------------------------------------------------------------------
SUM:                     66         2596       1823      17364
------------------------------------------------------------------------

Performance design

Why is FLOC much faster than cloc?

Factor	cloc (Perl)	FLOC (Fortran)
I/O model	`readline()` — one syscall per line	`READ(STREAM)` — one syscall per file
Concurrency	Single-threaded	OpenMP: N files in parallel
Pattern matching	Perl regex engine	Hand-rolled state machine (integer opcodes)
Memory	String objects per line	Pre-allocated byte buffer, reused
Extension lookup	Hash lookup	Sorted array + binary search
Comment detection	Regex	Explicit state transitions

Buffered single-call file read

OPEN(NEWUNIT=lu, FILE=TRIM(path), ACCESS='STREAM', FORM='UNFORMATTED', ...)
INQUIRE(UNIT=lu, SIZE=fsize)
READ(lu) buf(1:nbytes)      ! entire file in ONE READ call

A single read() syscall is typically 50–200× faster than N readline() calls for an N-line file because:

No repeated kernel↔userspace transitions per line
No memory allocation per line
No UTF-8/newline scanning by the runtime

State machine vs regex

The counting hot loop is a SELECT CASE on a small integer (state). Modern CPUs predict the dominant case (usually "normal code, state=0") after the first few hundred iterations. A Perl regex engine has fixed overhead per match regardless of pattern length.

Zero allocation inside the hot loop

count_*_style routines take a CHARACTER(LEN=1), INTENT(IN) :: buf(:) slice. No Fortran string temporaries are created. The line-flag register lf is a 2-bit INTEGER on the stack.

Language detection

Language detection is done by file extension (with special-case whole-file-name recognition for Makefile, Dockerfile, etc.).

Extension table

All ~130 extensions are stored in two parallel arrays (EXT_TABLE, EXT_LANG) that are insertion-sorted alphabetically at startup (once). Every call to get_language_id() performs an O(log N) binary search — typically ~7 comparisons for 130 entries.

FUNCTION binary_search_ext(ext) RESULT(lang_id)
  lo = 1;  hi = NUM_EXT
  DO WHILE (lo <= hi)
    mid = (lo + hi) / 2
    IF (TRIM(EXT_TABLE(mid)) == ext) THEN
      lang_id = EXT_LANG(mid);  RETURN
    ELSE IF (TRIM(EXT_TABLE(mid)) < ext) THEN
      lo = mid + 1
    ELSE
      hi = mid - 1
    END IF
  END DO
END FUNCTION

Special filenames (no extension)

Makefile, GNUmakefile, Dockerfile, Gemfile, Vagrantfile, etc. are matched before the extension search.

Comment & code-line detection

Line classification rules (identical to cloc)

Line content	Classification
Only whitespace	blank
Comment text with no code	comment
Any non-whitespace outside a comment	code (even if it also has a comment)

This means x = 1; // set x counts as 1 code line, not a comment line.

Per-style state machines

Each comment style maps to a dedicated counting routine. The state variable state transitions on individual bytes; lf is a 2-bit "line flags" register:

bit 0 (HAS_CODE)    ← set when any non-ws char is seen outside a comment
bit 1 (HAS_COMMENT) ← set when a comment marker or comment body is seen

On every \n, flush_line(lf, res) classifies the line and resets lf.

CS_C (C, C++, Java, JavaScript, TypeScript, Go, Rust, …)

States:
  0  normal code
  1  line comment   (//)
  2  block comment  (/* … */)
  3  double-quoted string  ("…")
  4  single-quoted string  ('…')
  5  escape in "string"
  6  escape in 'string'

Strings (states 3–6) suppress comment detection — "/*" is not a comment.

CS_PYTHON

States:
  0  normal
  1  # line comment
  2  """ triple-double block
  3  ''' triple-single block
  4  "string"
  5  'string'

Triple-quoted strings used as docstrings are counted as comment lines when they appear on their own lines (cloc behaviour).

CS_HTML

Scans for the literal four-byte prefix  to leave it. All other content is code.

CS_RUBY / CS_PERL

# for inline, =begin / =end for block comments. Both markers must appear at the start of a line (column 0).

CS_LUA

-- for line comments, --[[ / ]] for block comments.

CS_FORTRAN90

! anywhere on a line (after code) marks the rest as a comment. Lines starting with ! are comment lines.

CS_SQL

-- line + /* */ block. No string suppression (SQL strings rarely contain comment-like text).

CS_HASKELL

-- line + {- -} block.

CS_NONE (JSON, Markdown, plain text)

No comment syntax. Every non-blank line is a code line. Blank lines are counted as blank. This matches cloc's behaviour for JSON and Markdown.

Parallelism

FLOC uses OpenMP task parallelism at the file level. The key design choice is using OpenMP REDUCTION instead of CRITICAL sections:

!$OMP PARALLEL DEFAULT(NONE) SHARED(...) &
!$OMP   REDUCTION(+: lang_files, lang_blank, lang_comment, lang_code) &
!$OMP   SCHEDULE(DYNAMIC, 32)
!$OMP DO
DO i = 1, flist%n
  ...
  lang_files(lid) = lang_files(lid) + 1
END DO
!$OMP END DO
!$OMP END PARALLEL

REDUCTION(+: …): OpenMP automatically creates thread-private copies of the accumulation arrays and merges them after the loop. This gives each thread a private write buffer with zero synchronisation overhead.
SCHEDULE(DYNAMIC, 32): Files are handed to threads in chunks of 32. Dynamic scheduling ensures large files (e.g. a 50k-line C file) don't create stragglers that delay the barrier.
No CRITICAL: The counting state machines in floc_counter.f90 are entirely free of shared mutable state.

Thread count

OpenMP defaults to the number of logical CPUs. Override with:

export OMP_NUM_THREADS=8
./floc .

Build instructions

Prerequisites

Platform	Requirement
Linux	`gfortran` ≥ 9, `gcc`, GNU `make`
macOS	`gfortran` (via Homebrew: `brew install gcc`), `brew install libomp`
Windows	MinGW-w64 with gfortran, or WSL2 using the Linux instructions

Linux / macOS

# Clone / download the source
git clone https://github.com/your-org/floc && cd floc

# Build (optimised, OpenMP enabled)
make

# Optional: install to /usr/local/bin
sudo make install

# Or custom prefix
make install PREFIX=~/.local

macOS (Apple Silicon / Intel)

# Install dependencies
brew install gcc libomp

# Force GNU gfortran (not AppleClang)
FC=gfortran-14 CC=gcc-14 make

Windows (MinGW-w64 via MSYS2)

# In an MSYS2 MINGW64 shell:
pacman -S mingw-w64-x86_64-gcc-fortran mingw-w64-x86_64-openmp

cd floc
make

floc.exe --help

Windows (WSL2)

Follow the Linux instructions inside a WSL2 terminal. The resulting binary runs natively in the WSL environment.

Disable OpenMP (single-threaded build)

make OPENMP_FLAGS=""

Verify the build

# Self-test: count FLOC's own source
make test

# Or manually
./floc src/

Usage & options

floc [OPTIONS] <path> [<path> ...]

Option	Description
`--exclude-dir=D[,D2,…]`	Skip directories matching these names (default: `.git`, `node_modules`, `vendor`, `__pycache__`, `dist`, `build`, `target`, …)
`--include-lang=L[,L2,…]`	Count only the listed languages
`--exclude-lang=L[,L2,…]`	Skip the listed languages
`--max-file-size=N`	Skip files larger than N bytes (default: 10 MB)
`--quiet` / `-q`	Suppress progress and timing output
`--by-file`	Print per-file breakdown
`--version` / `-v`	Print version string
`--help` / `-h`	Print help

Examples

# Count entire repo
./floc ~/repos/linux

# Count only backend languages
./floc --include-lang=Python,Go,Rust src/

# Exclude generated and vendor directories
./floc --exclude-dir=generated,vendor,.next .

# Quiet mode for scripting
lines=$(./floc --quiet . | grep SUM | awk '{print $5}')

# Multiple paths
./floc backend/ frontend/ scripts/

Supported languages

40 languages and 130+ extensions:

C, C++, C/C++ Header, Java, JavaScript, TypeScript, Go, Python, Ruby, Perl, Bourne Shell, Bash, HTML, XML, CSS, SCSS, SASS, LESS, JSON, YAML, TOML, Markdown, SQL, Rust, Kotlin, Swift, PHP, C#, Fortran 90, make, Dockerfile, Lua, Scala, R, Haskell, OCaml, Assembly, Terraform, Dart, Elixir, Groovy, Vue

Accuracy notes

FLOC aims for identical output to cloc 2.08 on all well-formed source files. Known intentional differences:

Binary file detection: FLOC skips files with \0 bytes by default (added in count_file via early return on binary markers).
Shebang lines (#!/usr/bin/env python3): Counted as code, same as cloc.
Windows CRLF: The state machine treats \r as whitespace, so Windows line endings are handled transparently.
Files > --max-file-size: Skipped entirely (cloc also has a limit).
Symlinks: Not followed (avoids infinite recursion), matching cloc.

Architecture diagram

floc_main.f90  ─── parse_args()
                ─── collect_files()  ←── floc_cdir.f90  (C interop)
                │                    └── dir_helper.c   (opendir/readdir)
                └── OMP parallel DO
                      └── count_file()  ←── floc_counter.f90
                                        │    count_c_style()
                                        │    count_python_style()
                                        │    count_html_style()
                                        │    … (11 style routines)
                                        └── floc_lang_defs.f90
                                             get_language_id()   (binary search)
                                             LANG_CS[]           (style dispatch)

License

MIT — do whatever you want with this code. Contributions welcome.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
tests		tests
.gitignore		.gitignore
INSTALL.md		INSTALL.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
benchmark.sh		benchmark.sh

Folders and files

Latest commit

History

Repository files navigation

FLOC — Fast Lines Of Code Counter

Table of Contents

Quick start

Performance design

Why is FLOC much faster than cloc?

Buffered single-call file read

State machine vs regex

Zero allocation inside the hot loop

Language detection

Extension table

Special filenames (no extension)

Comment & code-line detection

Line classification rules (identical to cloc)

Per-style state machines

CS_C (C, C++, Java, JavaScript, TypeScript, Go, Rust, …)

CS_PYTHON

CS_HTML

CS_RUBY / CS_PERL

CS_LUA

CS_FORTRAN90

CS_SQL

CS_HASKELL

CS_NONE (JSON, Markdown, plain text)

Parallelism

Thread count

Build instructions

Prerequisites

Linux / macOS

macOS (Apple Silicon / Intel)

Windows (MinGW-w64 via MSYS2)

Windows (WSL2)

Disable OpenMP (single-threaded build)

Verify the build

Usage & options

Examples

Supported languages

Accuracy notes

Architecture diagram

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages