diff --git a/system/doc/efficiency_guide/benchmarking.md b/system/doc/efficiency_guide/benchmarking.md index 38d214dafc06..bafe7bdf226b 100644 --- a/system/doc/efficiency_guide/benchmarking.md +++ b/system/doc/efficiency_guide/benchmarking.md @@ -43,10 +43,10 @@ crypto:strong_rand_bytes(2). 1 2286 Ki 437 ns 29% ``` From the **Time** column we can read out that on average a call to -[`rand:bytes(2)`](`rand:bytes/1`) executes in 128 nano seconds, while +[`rand:bytes(2)`](`rand:bytes/1`) executes in 128 nanoseconds, while a call to [`crypto:strong_rand_bytes(2)`](`crypto:strong_rand_bytes/1`) executes -in 437 nano seconds. +in 437 nanoseconds. From the **QPS** column we can read out how many calls that can be made in a second. For `rand:bytes(2)`, it is 7,784,000 calls per second. @@ -54,11 +54,11 @@ made in a second. For `rand:bytes(2)`, it is 7,784,000 calls per second. The **Rel** column shows the relative differences, with `100%` indicating the fastest code. -When generating two random bytes at the time, `rand:bytes/1` is more +When generating two random bytes at a time, `rand:bytes/1` is more than three times faster than `crypto:strong_rand_bytes/1`. Assuming that we really need strong random numbers and we need to get them as fast as possible, what can we do? One way could be to generate more -than two bytes at the time. +than two bytes at a time. ```text % erlperf 'rand:bytes(100).' 'crypto:strong_rand_bytes(100).' @@ -67,7 +67,7 @@ rand:bytes(100). 1 2124 Ki 470 ns 100% crypto:strong_rand_bytes(100). 1 1915 Ki 522 ns 90% ``` -`rand:bytes/1` is still faster when we generate 100 bytes at the time, +`rand:bytes/1` is still faster when we generate 100 bytes at a time, but the relative difference is smaller. ``` @@ -77,7 +77,7 @@ crypto:strong_rand_bytes(1000). 1 1518 Ki 658 ns 100% rand:bytes(1000). 1 284 Ki 3521 ns 19% ``` -When we generate 1000 bytes at the time, `crypto:strong_rand_bytes/1` is +When we generate 1000 bytes at a time, `crypto:strong_rand_bytes/1` is now the fastest. ## Benchmarking using Erlang/OTP functionality diff --git a/system/doc/efficiency_guide/binaryhandling.md b/system/doc/efficiency_guide/binaryhandling.md index 4ff7afb3c278..8626252717f0 100644 --- a/system/doc/efficiency_guide/binaryhandling.md +++ b/system/doc/efficiency_guide/binaryhandling.md @@ -23,7 +23,7 @@ limitations under the License. This section gives a few examples on how to handle binaries in an efficient way. The sections that follow take an in-depth look at how binaries are implemented -and how to best take advantages of the optimizations done by the compiler and +and how to best take advantage of the optimizations done by the compiler and runtime system. Binaries can be efficiently _built_ in the following way: @@ -118,12 +118,12 @@ Four types of binary objects are available internally: > #### Change {: .info } > -> In Erlang/OTP 27, the handling of binaries and bitstrings were +> In Erlang/OTP 27, the handling of binaries and bitstrings was > rewritten. To fully leverage those changes in the run-time system, > the compiler needs to be updated, which is planned for a future > release. > -> Since, practically speaking, not much have changed from an efficiency +> Since, practically speaking, not much has changed from an efficiency > and optimization perspective, the following description has not yet > been updated to describe the implementation in Erlang/OTP 27. @@ -196,7 +196,7 @@ This optimization is applied by the runtime system in a way that makes it effective in most circumstances (for exceptions, see [Circumstances That Force Copying](binaryhandling.md#forced_copying)). The optimization in its basic form does not need any help from the compiler. -However, the compiler add hints to the runtime system when it is safe to apply +However, the compiler adds hints to the runtime system when it is safe to apply the optimization in a more efficient way. > #### Change {: .info } @@ -427,7 +427,7 @@ all_but_zeroes_to_list(<>, Acc, Remaining) -> The compiler removes building of sub binaries in the second and third clauses, and it adds an instruction to the first clause that converts `Buffer` from a -match context to a sub binary (or do nothing if `Buffer` is a binary already). +match context to a sub binary (or does nothing if `Buffer` is already a binary). But in more complicated code, how can one know whether the optimization is applied or not? diff --git a/system/doc/efficiency_guide/commoncaveats.md b/system/doc/efficiency_guide/commoncaveats.md index 2aa4c3ec79f5..84ed913b7f42 100644 --- a/system/doc/efficiency_guide/commoncaveats.md +++ b/system/doc/efficiency_guide/commoncaveats.md @@ -49,7 +49,7 @@ naive_reverse([]) -> As the `++` operator copies its left-hand side operand, the growing result is copied repeatedly, leading to quadratic complexity. -On the other hand, using `++` in loop like this is perfectly fine: +On the other hand, using `++` in a loop like this is perfectly fine: **OK** @@ -64,7 +64,7 @@ naive_but_ok_reverse([], Acc) -> ``` Each list element is copied only once. The growing result `Acc` is the right-hand -side operand, which it is _not_ copied. +side operand, which is _not_ copied. Experienced Erlang programmers would probably write as follows: @@ -167,14 +167,14 @@ the copied term can be many times larger than the original term. For example: ```erlang init2() -> SharedSubTerms = lists:foldl(fun(_, A) -> [A|A] end, [0], lists:seq(1, 15)), - #state{data=Shared}. + #state{data=SharedSubTerms}. ``` In the process that calls `init2/0`, the size of the `data` field in the `state` record will be 32 heap words. When the record is copied to the newly created process, sharing will be lost and the size of the copied `data` field will be 131070 heap words. More details about -[loss off sharing](eff_guide_processes.md#loss-of-sharing) are found in a later +[loss of sharing](eff_guide_processes.md#loss-of-sharing) are found in a later section. To avoid the problem, outside of the fun extract only the fields of the record diff --git a/system/doc/efficiency_guide/drivers.md b/system/doc/efficiency_guide/drivers.md index d5bbaad1424e..fe647bd3c1eb 100644 --- a/system/doc/efficiency_guide/drivers.md +++ b/system/doc/efficiency_guide/drivers.md @@ -30,8 +30,8 @@ It is assumed that you have a good understanding of drivers. The runtime system always takes a lock before running any code in a driver. By default, that lock is at the driver level, that is, if several ports have -been opened to the same driver, only code for one port at the same time can be -running. +been opened to the same driver, only code for one port can be running +at the same time. A driver can be configured to have one lock for each port instead. diff --git a/system/doc/efficiency_guide/eff_guide_functions.md b/system/doc/efficiency_guide/eff_guide_functions.md index a90b27b0e432..dc6aa857bdf9 100644 --- a/system/doc/efficiency_guide/eff_guide_functions.md +++ b/system/doc/efficiency_guide/eff_guide_functions.md @@ -23,7 +23,7 @@ limitations under the License. ## Pattern Matching -Pattern matching in function head as well as in `case` and `receive` clauses are +Pattern matching in function head as well as in `case` and `receive` clauses is optimized by the compiler. With a few exceptions, there is nothing to gain by rearranging clauses. @@ -55,7 +55,7 @@ follows: single instruction that does a binary search; thus, quite efficient even if there are many values) to select which one of the first three clauses to execute (if any). -- If none of the first three clauses match, the fourth clause match as a +- If none of the first three clauses match, the fourth clause matches as a variable always matches. - If the guard test [`is_integer(Int)`](`is_integer/1`) succeeds, the fourth clause is executed. diff --git a/system/doc/efficiency_guide/eff_guide_processes.md b/system/doc/efficiency_guide/eff_guide_processes.md index 050785a3a76f..cd59cb103c71 100644 --- a/system/doc/efficiency_guide/eff_guide_processes.md +++ b/system/doc/efficiency_guide/eff_guide_processes.md @@ -88,7 +88,7 @@ The default initial heap size of 233 words is quite conservative to support Erlang systems with hundreds of thousands or even millions of processes. The garbage collector grows and shrinks the heap as needed. -In a system that use comparatively few processes, performance _might_ be +In a system that uses comparatively few processes, performance _might_ be improved by increasing the minimum heap size using either the `+h` option for [erl](`e:erts:erl_cmd.md`) or on a process-per-process basis using the `min_heap_size` option for [spawn_opt/4](`erlang:spawn_opt/4`). @@ -291,7 +291,7 @@ BEAM code and persistent terms). The amount of virtual address space reserved for literals can be changed by using the [`+MIscs option`](`e:erts:erts_alloc.md#MIscs`) when starting the emulator. -Here is an example how the reserved virtual address space for literals can be +Here is an example of how the reserved virtual address space for literals can be raised to 2 GB (2048 MB): ```text @@ -381,7 +381,7 @@ multi-CPU computer by running several Erlang scheduler threads To gain performance from a multi-core computer, your application _must have more than one runnable Erlang process_ most of the time. Otherwise, the Erlang -emulator can still only run one Erlang process at the time. +emulator can still only run one Erlang process at a time. Benchmarks that appear to be concurrent are often sequential. For example, the [EStone diff --git a/system/doc/efficiency_guide/listhandling.md b/system/doc/efficiency_guide/listhandling.md index a66022b36107..d61edb732492 100644 --- a/system/doc/efficiency_guide/listhandling.md +++ b/system/doc/efficiency_guide/listhandling.md @@ -221,7 +221,7 @@ add_42_tail([], Acc) -> lists:reverse(Acc). ``` -In early version of Erlang the tail-recursive function would typically +In early versions of Erlang the tail-recursive function would typically be more efficient. In modern versions of Erlang, there is usually not much difference in performance between a body-recursive list function and tail-recursive function that reverses the list at the end. Therefore, diff --git a/system/doc/efficiency_guide/maps.md b/system/doc/efficiency_guide/maps.md index 333c78fc6ab6..f02fbdf5f7e2 100644 --- a/system/doc/efficiency_guide/maps.md +++ b/system/doc/efficiency_guide/maps.md @@ -45,7 +45,7 @@ The advantages of records compared to maps are: - If the name of a record field is misspelled, there will be a compilation error. If a map key is misspelled, the compiler will give no warning and - program will fail in some way when it is run. + the program will fail in some way when it is run. - Records will use slightly less memory than maps, and performance is expected to be _slightly_ better than maps in most circumstances. @@ -67,7 +67,7 @@ module. it. - Always update the map using the `:=` operator (that is, requiring that an element with that key already exists). The `:=` operator is slightly more - efficient, and it helps catching mispellings of keys. + efficient, and it helps catch misspellings of keys. - Whenever possible, match multiple map elements at once. - Whenever possible, update multiple map elements at once. - Avoid default values and the `maps:get/3` function. If there are default @@ -297,12 +297,12 @@ efficient than using the `=>` operator for a small map. Here follows some notes about most of the functions in the `maps` module. For each function, the implementation language (C or Erlang) is stated. The reason -we mention the language is that it gives an hint about how efficient the +we mention the language is that it gives a hint about how efficient the function is: - If a function is implemented in C, it is pretty much impossible to implement the same functionality more efficiently in Erlang. -- However, it might be possible to beat the `maps` modules functions implemented +- However, it might be possible to beat the `maps` module's functions implemented in Erlang, because they are generally implemented in a way that attempts to make the performance reasonable for all possible inputs. @@ -433,12 +433,12 @@ that will call `maps:update/3` to update only the values that have changed. `maps:merge/2` is implemented in C. For [small maps](maps.md#terminology), the key tuple may be shared with any of the argument maps if that argument map -contains all the keys. Literal key tuples are prefered if possible. +contains all the keys. Literal key tuples are preferred if possible. > #### Change {: .info } > > The sharing of key tuples by `maps:merge/2` was introduced in OTP 26.0. Older -> versions always contructed a new key tuple on the callers heap. +> versions always constructed a new key tuple on the caller's heap. ### maps:merge_with/3 diff --git a/system/doc/efficiency_guide/profiling.md b/system/doc/efficiency_guide/profiling.md index ccc8479699cd..e86e6a63bb69 100644 --- a/system/doc/efficiency_guide/profiling.md +++ b/system/doc/efficiency_guide/profiling.md @@ -27,18 +27,18 @@ Even experienced software developers often guess wrong about where the performance bottlenecks are in their programs. Therefore, profile your program to see where the performance bottlenecks are and concentrate on optimizing them. -Erlang/OTP contains several tools to help finding bottlenecks: +Erlang/OTP contains several tools to help find bottlenecks: - `m:tprof` is a tracing profiler that can measure call count, call time, or heap allocations per function call. - `m:fprof` provides the most detailed information about where the program time is spent, but it significantly slows down the program it profiles. -- `m:dbg` is the generic erlang tracing frontend. By using the `timestamp` or +- `m:dbg` is the generic Erlang tracing frontend. By using the `timestamp` or `cpu_timestamp` options it can be used to time how long function calls in a live system take. - `m:lcnt` is used to find contention points in the Erlang Run-Time System's internal locking mechanisms. It is useful when looking for bottlenecks in - interaction between process, port, ETS tables, and other entities that can be + interaction between processes, ports, ETS tables, and other entities that can be run in parallel. The tools are further described in [Tools](profiling.md#profiling_tools). @@ -85,7 +85,7 @@ detailed breakdown of where memory is used. Processes, ports, and ETS tables can then be inspected using their respective information functions, that is, [`process_info/2`](`m:erlang#process_info_memory`), -[`erlang:port_info/2 `](`m:erlang#port_info_memory`), and `ets:info/1`. +[`erlang:port_info/2`](`m:erlang#port_info_memory`), and `ets:info/1`. Sometimes the system can enter a state where the reported memory from `erlang:memory(total)` is very different from the memory reported by @@ -117,7 +117,7 @@ with more or less overhead. variety of information about the running system. - `m:etop` is a command line tool that can connect to remote nodes and display information similar to what the UNIX tool top shows. -- `m:msacc` allows the user to get a view of what the Erlang Run-Time system is +- `m:msacc` allows the user to get a view of what the Erlang Run-Time System is spending its time doing. Has a very low overhead, which makes it useful to run in heavily loaded systems to get some idea of where to start doing more granular profiling. @@ -191,19 +191,19 @@ _Table: Tool Summary_ `dbg` is a generic Erlang trace tool. By using the `timestamp` or `cpu_timestamp` options it can be used as a precision instrument to profile how -long time a function call takes for a specific process. This can be very useful +long a function call takes for a specific process. This can be very useful when trying to understand where time is spent in a heavily loaded system as it is possible to limit the scope of what is profiled to be very small. For more information, see the `m:dbg` manual page in Runtime Tools. ### lcnt -`lcnt` is used to profile interactions in between entities that run in parallel. -For example if you have a process that all other processes in the system needs +`lcnt` is used to profile interactions between entities that run in parallel. +For example if you have a process that all other processes in the system need to interact with (maybe it has some global configuration), then `lcnt` can be used to figure out if the interaction with that process is a problem. -In the Erlang Run-time System entities are only run in parallel when there are +In the Erlang Run-Time System entities are only run in parallel when there are multiple schedulers. Therefore `lcnt` will show more contention points (and thus be more useful) on systems using many schedulers on many cores. diff --git a/system/doc/efficiency_guide/system_limits.md b/system/doc/efficiency_guide/system_limits.md index 99d1e7dce74d..04301728cbfb 100644 --- a/system/doc/efficiency_guide/system_limits.md +++ b/system/doc/efficiency_guide/system_limits.md @@ -32,7 +32,7 @@ see the [`+P`](`e:erts:erl_cmd.md#max_processes`) command-line flag in the [`erl(1)`](`e:erts:erl_cmd.md`) manual page in ERTS. - [](){: #unique_pids } **Unique Local Process Identifiers on a -Runtime System Instance ** - On a 64 bit system at most `2⁶⁰ - 1` +Runtime System Instance** - On a 64 bit system at most `2⁶⁰ - 1` unique process identifiers can be created, and on a 32 bit system at most `2²⁸ - 1`. - **Known nodes** - A remote node Y must be known to node X if there exists @@ -61,7 +61,7 @@ In the 64-bit run-time system, the maximum size is 2,305,843,009,213,693,951 byt If the limit is exceeded, bit syntax construction fails with a `system_limit` exception, while any attempt to match a binary that is too large fails. From Erlang/OTP 27, all other operations that create binaries (such as -[`list_to_binary/1`](`list_to_binary/1`)) also enforces the same limit. +[`list_to_binary/1`](`list_to_binary/1`)) also enforce the same limit. - **Total amount of data allocated by an Erlang node** - The Erlang runtime system can use the complete 32-bit (or 64-bit) address space, but the operating system @@ -91,10 +91,10 @@ variable. - [](){: #unique_references } **Unique References on a Runtime System Instance** - Each scheduler thread has its own set of references, and all other threads have -a shared set of references. Each set of references consist of `2⁶⁴ - 1`unique +a shared set of references. Each set of references consists of `2⁶⁴ - 1` unique references. That is, the total amount of unique references that can be produced on a runtime system instance is `(NumSchedulers + 1) × (2⁶⁴ - 1)`. If a scheduler -thread create a new reference each nano second, references will at earliest be +thread creates a new reference each nanosecond, references will at earliest be reused after more than 584 years. That is, for the foreseeable future they are sufficiently unique. @@ -109,11 +109,11 @@ sufficiently unique. the total amount of unique integers without the `monotonic` modifier is `(NumSchedulers + 1) × (2⁶⁴ - 1)`. - If a unique integer is created each nano second, unique integers will be + If a unique integer is created each nanosecond, unique integers will be reused at earliest after more than 584 years. That is, for the foreseeable future they are sufficiently unique. -- ** Timer resolution ** - On most systems, millisecond resolution. For more +- **Timer resolution** - On most systems, millisecond resolution. For more information, see the [*Timers*](`e:erts:time_correction.md#timers`) section of the [*Time and Time Correction in Erlang*](`e:erts:time_correction.md`) ERTS User's guide. diff --git a/system/doc/efficiency_guide/tablesdatabases.md b/system/doc/efficiency_guide/tablesdatabases.md index cc8a509e7ae4..fd78037dd8ab 100644 --- a/system/doc/efficiency_guide/tablesdatabases.md +++ b/system/doc/efficiency_guide/tablesdatabases.md @@ -101,7 +101,7 @@ print_person(PersonId) -> print_age(Person), print_occupation(Person); [] -> - io:format("No person with ID = ~p~n", [PersonID]) + io:format("No person with ID = ~p~n", [PersonId]) end. %%% Internal functions @@ -123,23 +123,23 @@ print_person(PersonId) -> %% Look up the person in the named table person, case ets:lookup(person, PersonId) of [Person] -> - print_name(PersonID), - print_age(PersonID), - print_occupation(PersonID); + print_name(PersonId), + print_age(PersonId), + print_occupation(PersonId); [] -> - io:format("No person with ID = ~p~n", [PersonID]) + io:format("No person with ID = ~p~n", [PersonId]) end. %%% Internal functions -print_name(PersonID) -> +print_name(PersonId) -> [Person] = ets:lookup(person, PersonId), io:format("No person ~p~n", [Person#person.name]). -print_age(PersonID) -> +print_age(PersonId) -> [Person] = ets:lookup(person, PersonId), io:format("No person ~p~n", [Person#person.age]). -print_occupation(PersonID) -> +print_occupation(PersonId) -> [Person] = ets:lookup(person, PersonId), io:format("No person ~p~n", [Person#person.occupation]). ``` @@ -150,7 +150,7 @@ For non-persistent database storage, prefer Ets tables over Mnesia `local_content` tables. Even the Mnesia `dirty_write` operations carry a fixed overhead compared to Ets writes. Mnesia must check if the table is replicated or has indices, this involves at least one Ets lookup for each `dirty_write`. Thus, -Ets writes is always faster than Mnesia writes. +Ets writes are always faster than Mnesia writes. ### tab2list @@ -306,7 +306,7 @@ that the gain is significant when the key can be used to lookup elements. If you frequently do lookups on a field that is not the key of the table, you lose performance using [mnesia:select()](`mnesia:select/3`) or -[`mnesia:match_object()`](`mnesia:match_object/1`) as these function traverse +[`mnesia:match_object()`](`mnesia:match_object/1`) as these functions traverse the whole table. Instead, you can create a secondary index and use `mnesia:index_read/3` to get faster access at the expense of using more memory. @@ -331,7 +331,7 @@ PersonsAge42 = Using transactions is a way to guarantee that the distributed Mnesia database remains consistent, even when many different processes update it in parallel. -However, if you have real-time requirements it is recommended to use dirtry +However, if you have real-time requirements it is recommended to use dirty operations instead of transactions. When using dirty operations, you lose the consistency guarantee; this is usually solved by only letting one process update the table. Other processes must send update requests to that process.