Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions system/doc/efficiency_guide/benchmarking.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,22 +43,22 @@ crypto:strong_rand_bytes(2). 1 2286 Ki 437 ns 29%
```

From the **Time** column we can read out that on average a call to
[`rand:bytes(2)`](`rand:bytes/1`) executes in 128 nano seconds, while
[`rand:bytes(2)`](`rand:bytes/1`) executes in 128 nanoseconds, while
a call to
[`crypto:strong_rand_bytes(2)`](`crypto:strong_rand_bytes/1`) executes
in 437 nano seconds.
in 437 nanoseconds.

From the **QPS** column we can read out how many calls that can be
made in a second. For `rand:bytes(2)`, it is 7,784,000 calls per second.

The **Rel** column shows the relative differences, with `100%` indicating
the fastest code.

When generating two random bytes at the time, `rand:bytes/1` is more
When generating two random bytes at a time, `rand:bytes/1` is more
than three times faster than `crypto:strong_rand_bytes/1`. Assuming
that we really need strong random numbers and we need to get them as
fast as possible, what can we do? One way could be to generate more
than two bytes at the time.
than two bytes at a time.

```text
% erlperf 'rand:bytes(100).' 'crypto:strong_rand_bytes(100).'
Expand All @@ -67,7 +67,7 @@ rand:bytes(100). 1 2124 Ki 470 ns 100%
crypto:strong_rand_bytes(100). 1 1915 Ki 522 ns 90%
```

`rand:bytes/1` is still faster when we generate 100 bytes at the time,
`rand:bytes/1` is still faster when we generate 100 bytes at a time,
but the relative difference is smaller.

```
Expand All @@ -77,7 +77,7 @@ crypto:strong_rand_bytes(1000). 1 1518 Ki 658 ns 100%
rand:bytes(1000). 1 284 Ki 3521 ns 19%
```

When we generate 1000 bytes at the time, `crypto:strong_rand_bytes/1` is
When we generate 1000 bytes at a time, `crypto:strong_rand_bytes/1` is
now the fastest.

## Benchmarking using Erlang/OTP functionality
Expand Down
10 changes: 5 additions & 5 deletions system/doc/efficiency_guide/binaryhandling.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ limitations under the License.

This section gives a few examples on how to handle binaries in an efficient way.
The sections that follow take an in-depth look at how binaries are implemented
and how to best take advantages of the optimizations done by the compiler and
and how to best take advantage of the optimizations done by the compiler and
runtime system.

Binaries can be efficiently _built_ in the following way:
Expand Down Expand Up @@ -118,12 +118,12 @@ Four types of binary objects are available internally:

> #### Change {: .info }
>
> In Erlang/OTP 27, the handling of binaries and bitstrings were
> In Erlang/OTP 27, the handling of binaries and bitstrings was
> rewritten. To fully leverage those changes in the run-time system,
> the compiler needs to be updated, which is planned for a future
> release.
>
> Since, practically speaking, not much have changed from an efficiency
> Since, practically speaking, not much has changed from an efficiency
> and optimization perspective, the following description has not yet
> been updated to describe the implementation in Erlang/OTP 27.

Expand Down Expand Up @@ -196,7 +196,7 @@ This optimization is applied by the runtime system in a way that makes it
effective in most circumstances (for exceptions, see
[Circumstances That Force Copying](binaryhandling.md#forced_copying)). The
optimization in its basic form does not need any help from the compiler.
However, the compiler add hints to the runtime system when it is safe to apply
However, the compiler adds hints to the runtime system when it is safe to apply
the optimization in a more efficient way.

> #### Change {: .info }
Expand Down Expand Up @@ -427,7 +427,7 @@ all_but_zeroes_to_list(<<Byte,T/binary>>, Acc, Remaining) ->

The compiler removes building of sub binaries in the second and third clauses,
and it adds an instruction to the first clause that converts `Buffer` from a
match context to a sub binary (or do nothing if `Buffer` is a binary already).
match context to a sub binary (or does nothing if `Buffer` is already a binary).

But in more complicated code, how can one know whether the optimization is
applied or not?
Expand Down
8 changes: 4 additions & 4 deletions system/doc/efficiency_guide/commoncaveats.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ naive_reverse([]) ->
As the `++` operator copies its left-hand side operand, the growing
result is copied repeatedly, leading to quadratic complexity.

On the other hand, using `++` in loop like this is perfectly fine:
On the other hand, using `++` in a loop like this is perfectly fine:

**OK**

Expand All @@ -64,7 +64,7 @@ naive_but_ok_reverse([], Acc) ->
```

Each list element is copied only once. The growing result `Acc` is the right-hand
side operand, which it is _not_ copied.
side operand, which is _not_ copied.

Experienced Erlang programmers would probably write as follows:

Expand Down Expand Up @@ -167,14 +167,14 @@ the copied term can be many times larger than the original term. For example:
```erlang
init2() ->
SharedSubTerms = lists:foldl(fun(_, A) -> [A|A] end, [0], lists:seq(1, 15)),
#state{data=Shared}.
#state{data=SharedSubTerms}.
```

In the process that calls `init2/0`, the size of the `data` field in the `state`
record will be 32 heap words. When the record is copied to the newly created
process, sharing will be lost and the size of the copied `data` field will be
131070 heap words. More details about
[loss off sharing](eff_guide_processes.md#loss-of-sharing) are found in a later
[loss of sharing](eff_guide_processes.md#loss-of-sharing) are found in a later
section.

To avoid the problem, outside of the fun extract only the fields of the record
Expand Down
4 changes: 2 additions & 2 deletions system/doc/efficiency_guide/drivers.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,8 @@ It is assumed that you have a good understanding of drivers.
The runtime system always takes a lock before running any code in a driver.

By default, that lock is at the driver level, that is, if several ports have
been opened to the same driver, only code for one port at the same time can be
running.
been opened to the same driver, only code for one port can be running
at the same time.

A driver can be configured to have one lock for each port instead.

Expand Down
4 changes: 2 additions & 2 deletions system/doc/efficiency_guide/eff_guide_functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ limitations under the License.

## Pattern Matching

Pattern matching in function head as well as in `case` and `receive` clauses are
Pattern matching in function head as well as in `case` and `receive` clauses is
optimized by the compiler. With a few exceptions, there is nothing to gain by
rearranging clauses.

Expand Down Expand Up @@ -55,7 +55,7 @@ follows:
single instruction that does a binary search; thus, quite efficient even if
there are many values) to select which one of the first three clauses to
execute (if any).
- If none of the first three clauses match, the fourth clause match as a
- If none of the first three clauses match, the fourth clause matches as a
variable always matches.
- If the guard test [`is_integer(Int)`](`is_integer/1`) succeeds, the fourth
clause is executed.
Expand Down
6 changes: 3 additions & 3 deletions system/doc/efficiency_guide/eff_guide_processes.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ The default initial heap size of 233 words is quite conservative to support
Erlang systems with hundreds of thousands or even millions of processes. The
garbage collector grows and shrinks the heap as needed.

In a system that use comparatively few processes, performance _might_ be
In a system that uses comparatively few processes, performance _might_ be
improved by increasing the minimum heap size using either the `+h` option for
[erl](`e:erts:erl_cmd.md`) or on a process-per-process basis using the
`min_heap_size` option for [spawn_opt/4](`erlang:spawn_opt/4`).
Expand Down Expand Up @@ -291,7 +291,7 @@ BEAM code and persistent terms). The amount of virtual address space reserved
for literals can be changed by using the
[`+MIscs option`](`e:erts:erts_alloc.md#MIscs`) when starting the emulator.

Here is an example how the reserved virtual address space for literals can be
Here is an example of how the reserved virtual address space for literals can be
raised to 2 GB (2048 MB):

```text
Expand Down Expand Up @@ -381,7 +381,7 @@ multi-CPU computer by running several Erlang scheduler threads

To gain performance from a multi-core computer, your application _must have more
than one runnable Erlang process_ most of the time. Otherwise, the Erlang
emulator can still only run one Erlang process at the time.
emulator can still only run one Erlang process at a time.

Benchmarks that appear to be concurrent are often sequential. For
example, the [EStone
Expand Down
2 changes: 1 addition & 1 deletion system/doc/efficiency_guide/listhandling.md
Original file line number Diff line number Diff line change
Expand Up @@ -221,7 +221,7 @@ add_42_tail([], Acc) ->
lists:reverse(Acc).
```

In early version of Erlang the tail-recursive function would typically
In early versions of Erlang the tail-recursive function would typically
be more efficient. In modern versions of Erlang, there is usually not
much difference in performance between a body-recursive list function and
tail-recursive function that reverses the list at the end. Therefore,
Expand Down
12 changes: 6 additions & 6 deletions system/doc/efficiency_guide/maps.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ The advantages of records compared to maps are:

- If the name of a record field is misspelled, there will be a compilation
error. If a map key is misspelled, the compiler will give no warning and
program will fail in some way when it is run.
the program will fail in some way when it is run.
- Records will use slightly less memory than maps, and performance is expected
to be _slightly_ better than maps in most circumstances.

Expand All @@ -67,7 +67,7 @@ module.
it.
- Always update the map using the `:=` operator (that is, requiring that an
element with that key already exists). The `:=` operator is slightly more
efficient, and it helps catching mispellings of keys.
efficient, and it helps catch misspellings of keys.
- Whenever possible, match multiple map elements at once.
- Whenever possible, update multiple map elements at once.
- Avoid default values and the `maps:get/3` function. If there are default
Expand Down Expand Up @@ -297,12 +297,12 @@ efficient than using the `=>` operator for a small map.

Here follows some notes about most of the functions in the `maps` module. For
each function, the implementation language (C or Erlang) is stated. The reason
we mention the language is that it gives an hint about how efficient the
we mention the language is that it gives a hint about how efficient the
function is:

- If a function is implemented in C, it is pretty much impossible to implement
the same functionality more efficiently in Erlang.
- However, it might be possible to beat the `maps` modules functions implemented
- However, it might be possible to beat the `maps` module's functions implemented
in Erlang, because they are generally implemented in a way that attempts to
make the performance reasonable for all possible inputs.

Expand Down Expand Up @@ -433,12 +433,12 @@ that will call `maps:update/3` to update only the values that have changed.

`maps:merge/2` is implemented in C. For [small maps](maps.md#terminology), the
key tuple may be shared with any of the argument maps if that argument map
contains all the keys. Literal key tuples are prefered if possible.
contains all the keys. Literal key tuples are preferred if possible.

> #### Change {: .info }
>
> The sharing of key tuples by `maps:merge/2` was introduced in OTP 26.0. Older
> versions always contructed a new key tuple on the callers heap.
> versions always constructed a new key tuple on the caller's heap.

### maps:merge_with/3

Expand Down
18 changes: 9 additions & 9 deletions system/doc/efficiency_guide/profiling.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,18 +27,18 @@ Even experienced software developers often guess wrong about where the
performance bottlenecks are in their programs. Therefore, profile your program
to see where the performance bottlenecks are and concentrate on optimizing them.

Erlang/OTP contains several tools to help finding bottlenecks:
Erlang/OTP contains several tools to help find bottlenecks:

- `m:tprof` is a tracing profiler that can measure call count, call time, or
heap allocations per function call.
- `m:fprof` provides the most detailed information about where the program time
is spent, but it significantly slows down the program it profiles.
- `m:dbg` is the generic erlang tracing frontend. By using the `timestamp` or
- `m:dbg` is the generic Erlang tracing frontend. By using the `timestamp` or
`cpu_timestamp` options it can be used to time how long function calls in a
live system take.
- `m:lcnt` is used to find contention points in the Erlang Run-Time System's
internal locking mechanisms. It is useful when looking for bottlenecks in
interaction between process, port, ETS tables, and other entities that can be
interaction between processes, ports, ETS tables, and other entities that can be
run in parallel.

The tools are further described in [Tools](profiling.md#profiling_tools).
Expand Down Expand Up @@ -85,7 +85,7 @@ detailed breakdown of where memory is used.
Processes, ports, and ETS tables can then be inspected using their respective
information functions, that is,
[`process_info/2`](`m:erlang#process_info_memory`),
[`erlang:port_info/2 `](`m:erlang#port_info_memory`), and `ets:info/1`.
[`erlang:port_info/2`](`m:erlang#port_info_memory`), and `ets:info/1`.

Sometimes the system can enter a state where the reported memory from
`erlang:memory(total)` is very different from the memory reported by
Expand Down Expand Up @@ -117,7 +117,7 @@ with more or less overhead.
variety of information about the running system.
- `m:etop` is a command line tool that can connect to remote nodes and display
information similar to what the UNIX tool top shows.
- `m:msacc` allows the user to get a view of what the Erlang Run-Time system is
- `m:msacc` allows the user to get a view of what the Erlang Run-Time System is
spending its time doing. Has a very low overhead, which makes it useful to run
in heavily loaded systems to get some idea of where to start doing more
granular profiling.
Expand Down Expand Up @@ -191,19 +191,19 @@ _Table: Tool Summary_

`dbg` is a generic Erlang trace tool. By using the `timestamp` or
`cpu_timestamp` options it can be used as a precision instrument to profile how
long time a function call takes for a specific process. This can be very useful
long a function call takes for a specific process. This can be very useful
when trying to understand where time is spent in a heavily loaded system as it
is possible to limit the scope of what is profiled to be very small. For more
information, see the `m:dbg` manual page in Runtime Tools.

### lcnt

`lcnt` is used to profile interactions in between entities that run in parallel.
For example if you have a process that all other processes in the system needs
`lcnt` is used to profile interactions between entities that run in parallel.
For example if you have a process that all other processes in the system need
to interact with (maybe it has some global configuration), then `lcnt` can be
used to figure out if the interaction with that process is a problem.

In the Erlang Run-time System entities are only run in parallel when there are
In the Erlang Run-Time System entities are only run in parallel when there are
multiple schedulers. Therefore `lcnt` will show more contention points (and thus
be more useful) on systems using many schedulers on many cores.

Expand Down
12 changes: 6 additions & 6 deletions system/doc/efficiency_guide/system_limits.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ see the [`+P`](`e:erts:erl_cmd.md#max_processes`) command-line flag
in the [`erl(1)`](`e:erts:erl_cmd.md`) manual page in ERTS.

- [](){: #unique_pids } **Unique Local Process Identifiers on a
Runtime System Instance ** - On a 64 bit system at most `2⁶⁰ - 1`
Runtime System Instance** - On a 64 bit system at most `2⁶⁰ - 1`
unique process identifiers can be created, and on a 32 bit system at most `2²⁸ - 1`.

- **Known nodes** - A remote node Y must be known to node X if there exists
Expand Down Expand Up @@ -61,7 +61,7 @@ In the 64-bit run-time system, the maximum size is 2,305,843,009,213,693,951 byt
If the limit is exceeded, bit syntax construction fails with a `system_limit`
exception, while any attempt to match a binary that is too large
fails. From Erlang/OTP 27, all other operations that create binaries (such as
[`list_to_binary/1`](`list_to_binary/1`)) also enforces the same limit.
[`list_to_binary/1`](`list_to_binary/1`)) also enforce the same limit.

- **Total amount of data allocated by an Erlang node** - The Erlang runtime system
can use the complete 32-bit (or 64-bit) address space, but the operating system
Expand Down Expand Up @@ -91,10 +91,10 @@ variable.

- [](){: #unique_references } **Unique References on a Runtime System Instance** -
Each scheduler thread has its own set of references, and all other threads have
a shared set of references. Each set of references consist of `2⁶⁴ - 1`unique
a shared set of references. Each set of references consists of `2⁶⁴ - 1` unique
references. That is, the total amount of unique references that can be produced
on a runtime system instance is `(NumSchedulers + 1) × (2⁶⁴ - 1)`. If a scheduler
thread create a new reference each nano second, references will at earliest be
thread creates a new reference each nanosecond, references will at earliest be
reused after more than 584 years. That is, for the foreseeable future they are
sufficiently unique.

Expand All @@ -109,11 +109,11 @@ sufficiently unique.
the total amount of unique integers without the `monotonic`
modifier is `(NumSchedulers + 1) × (2⁶⁴ - 1)`.

If a unique integer is created each nano second, unique integers will be
If a unique integer is created each nanosecond, unique integers will be
reused at earliest after more than 584 years. That is, for the foreseeable future
they are sufficiently unique.

- ** Timer resolution ** - On most systems, millisecond resolution. For more
- **Timer resolution** - On most systems, millisecond resolution. For more
information, see the [*Timers*](`e:erts:time_correction.md#timers`) section of
the [*Time and Time Correction in Erlang*](`e:erts:time_correction.md`) ERTS
User's guide.
Loading
Loading