feat: NumPy 2.x parity — byte export, file I/O, dtype resolution & np.rint (+ all-15-dtype fuzz) by Nucs · Pull Request #616 · SciSharp/NumSharp

Nucs · 2026-07-02T08:47:52Z

Overview

Splits the non-drawing NumPy 2.x parity work off the nditer line onto a clean branch based on master. 24 commits covering byte export / casting, file I/O, dtype resolution, the new np.rint ufunc, printing fixes, and a large differential-fuzz expansion. The 5 imaging commits (NumSharp.Drawing*) are intentionally excluded and remain on nditer.

What's included

Byte export & casting

tobytes(order) — full ndarray.tobytes C/F/A/K parity
(breaking) dissolve legacy NDArray.ToByteArray → tobytes is the sole NumPy byte export
ndarray.astype(casting=) gate — closes the last type-conversion parity gap
ToByteArray() returns logical C-order bytes; GetData<T>/ToArray<T> dtype-mismatch semantics pinned

File I/O

np.fromfile — full parity: count / offset / sep (text) / default-dtype / Stream
np.tofile(sep, format) — full ndarray.tofile parity + C-order fix

Dtype resolution

Char promotes/computes as the uint16 masquerade (6 Char bugs + collected)
int8/float16 avoidance across the type-resolution surface
np.min_scalar_type — stop avoiding int8/float16
normalize non-0/1 bool bytes everywhere bool is used numerically (reductions/casts)

Math

np.rint — round-half-to-even ufunc, reusing UnaryOp.Round (float-tier promotion, out=/where=)

Printing

tofile / scalar-str parity: #-flag decimal point + float32/16 scientific threshold

Testing — differential fuzz (NumPy oracle)

grids widened toward all 15 dtypes; Char woven in via the uint16 proxy
independent C# Decimal oracle: unary/binary/reduce/scan/power/var/std/matmul/astype/stat/where/sort/manip
Group A: ~30 array→array np.* ops wired into the oracle grids (5 bugs found, pinned under [OpenBugs])
NEP50 out-of-range python-scalar gap reproduced

Docs / benchmark

website 15-dtype table, namespace overwrites, TOC links, benchmark coach page
2026-06-29 benchmark history snapshot (recovers bitwise after an intermittent AccessViolation)
UnmanagedMemoryBlock(T*, long) ownership remark fix

Breaking changes

Change	Impact	Migration
`NDArray.ToByteArray` dissolved	callers of `ToByteArray()`	use `tobytes()` (optionally `tobytes(order)`)

Verification

Branch = master + 24 cherry-picked commits. git diff <merge-base>..master -- src/ is empty, so the picks sit on exactly their authored src/ baseline (no divergence risk).
Solution builds clean: 0 errors.
One binary conflict during cherry-pick (docs/website-src/images/benchmark-dashboard.png, add/add) resolved in favor of master's curated "scoreboard" image; the textual benchmarks-dashboard.md rework was applied.

@deepakkumar1984

Removed "Deepak Kumar Battini" from the <Authors> metadata in src/NumSharp.Core/NumSharp.Core.csproj, which feeds the NuGet package author attribution shown on nuget.org and in the generated .nupkg. The Authors field now reads: "Eli Belash, Haiping Chen, Meinrad Recheis". Scope notes for future reference: - The @deepakkumar1984 GitHub-handle references in docs/issues/ (issue #362 archive + categories.md) were intentionally left as-is: they are factual attributions of who filed/commented on that real GitHub issue, not a self-authored project credit. - The "Deepak Kumar Gouda" entry in src/numpy/doc/changelog/ is a different person inside the third-party NumPy reference clone and is out of scope.

…tes parity) ToByteArray() copied Storage.InternalArray.BytesLength bytes from Storage.Address — the RAW underlying buffer — ignoring strides, offset and broadcasting. A non-contiguous view therefore leaked the wrong data and the wrong length: e.g. arange(10)[::2] (logical [0,2,4,6,8], 20 bytes) returned the full 40-byte parent buffer [0..9]; a transpose returned F-order bytes; a broadcast view returned only the unstretched source. ToByteArray() now mirrors numpy.ndarray.tobytes(): it returns the LOGICAL array in C (row-major) order, always exactly size*dtypesize bytes. A pristine C-contiguous offset-0 buffer (Shape.IsSliced == false) keeps the fast single-memcpy path; any sliced/strided/transposed/broadcast/offset view is first materialized via copy('C'). No internal callers depended on the old raw-buffer behavior; the legacy pristine-contiguous contract (and its existing test) is unchanged. Tests (31 new; full CI-style suite 11197 passed / 0 failed / 11 skipped): - Casting/NDArray.ToByteArray.Test.cs: every view type (contiguous, prefix, middle-slice, strided, reversed, transpose, column, broadcast, 3D, 0-d, empty) across all 15 dtypes; exact NumPy-verified bytes; frombuffer round-trips; detached-copy semantics. - Interop/NumpyByteContractTests.cs: NumSharp<->NumPy 2.4.2 byte contract — endianness (big-endian string dtype byteswaps; NPTypeCode path is little-endian), NaN-payload / signaling-NaN / subnormal / -0 bit-exact preservation (Half/Single/Double/Complex), uint64 above int64.max, complex part layout, Char<->uint16 code units incl. surrogate pairs, Decimal having no numpy dtype, and bool non-0/1-byte storage/truthiness/reduction semantics.

@10m

…twise after intermittent AccessViolation Full official NumSharp-vs-NumPy run on branch nditer @ 2d16f477 (i9-13900K, .NET 10.0.101 / Release, NumPy 2.4.2). Runs the op/dtype/N matrix (14 suites × 15 dtypes × 1K/100K/10M = 1851 cells) plus the five appended subsystems (NDIter, Layout, Operand, Cast, Fusion). Repoints benchmark/history/latest -> 2026-06-29_2d16f477. Headline (NPY/NS, >1 = NumSharp faster): 1,000 1.13x (116 / 58 / 35 / 11) 100,000 0.98x (295 / 135 / 124 / 35) 10,000,000 1.39x (427 / 136 / 24 / 4) Overall: 1851 ops | ✅ 838 | 🟡 329 | 🟠 183 | 🔴 50 | ▫ 388 | ⚪ 63 NDIter 1.20x geomean · Cast 1450 win / 118 lag. Bitwise recovery ---------------- The original full run (~10782s) hit the known intermittent AccessViolation mid-run, inside the in-process Bitwise suite (BitwiseBenchmarks.LeftShift -> SimdScalarShiftDispatch<int> -> IL_ShiftLeft_Scalar_Int32). Because the official config runs InProcessEmit (one process per suite), that single fault took the whole suite down and dropped all 81 bitwise op×dtype×N cells (⚪ jumped 69→225). The bitwise suite was re-run from the same HEAD (clean, no source change), it completed without fault (the bug is rare), and its BenchmarkDotNet JSON was merged back into the op matrix. The NDIter+Layout+Operand+Cast+Fusion sections were spliced back verbatim. Bitwise cells now carry data (⚪ 225→63), e.g. np.left_shift int32 @10m = 1.67x. Investigation (filed #615) ------------------------------------------- The shift/bitwise kernels are the VICTIM, not the cause. ~13M faithful ops did not reproduce the fault, and the built-in guard-page detector (NUMSHARP_GUARD_PAGES=1) found ZERO out-of-bounds across the entire bitwise surface (setup casts + all six ops × all integer dtypes × all three sizes). AccessViolation (unmapped memory) + no OOB ⇒ an unmanaged-storage LIFETIME race (use-after-free under GC pressure), not an overrun — most likely a raw .Address pointer handed to an emitted kernel without a GC.KeepAlive, or a pool/finalizer free-list race. Full diagnosis, repro harness, and next steps in issue #615. Provenance: MANIFEST records clean HEAD (benchmarked code unchanged). The bitwise timings were measured in a separate process from the same commit on the same quiet machine — identical methodology to the orchestrator's per-suite isolation.

…s used numerically A boolean's numeric value is exactly 0 or 1 (NumPy: every nonzero counts as 1), but a bool buffer can legally hold non-0/1 bytes: np.frombuffer returns a zero-copy VIEW (like NumPy), and framework interop (Numpy.NET, P/Invoke, mmap, network) wraps foreign buffers. NumSharp aliased bool to uint8 storage in several SIMD/scalar fast paths and accumulated/reinterpreted the RAW byte (e.g. byte 255 contributed 255), diverging from NumPy across sum-family reductions and narrow casts. Four independent paths read the raw byte instead of normalizing nonzero->1: 1. Flat sum/prod/mean - DirectILKernelGenerator.cs::EmitConvertTo widened the raw byte for a Boolean source. Now normalizes (ldc.i4.0; cgt.un) before widening, mirroring the existing to==Boolean '!= 0'. Covers flat reductions and scans. 2. Axis sum/prod/mean - Reduction.Axis.Widening.cs aliased (Boolean,*) to the byte-widening SIMD kernel. Removed the 3 bool entries so bool falls through to the scalar reducer (CombineScalarsPromoted -> ConvertToInt64Bits/ConvertToDouble), which already normalizes via '!= 0'. 3. Flat var/std - Default.Reduction.Var.cs::VarMomentsRealDispatch read bool as byte. Added VarMomentsBool (reads byte != 0 ? 1.0 : 0.0). 4. Narrow casts astype(bool->i8/u8/i16/u16/f16) - the SIMD subword/widen/xToHalf cast kernels reinterpreted the raw byte (bool->wide already used the scalar normalizing path). TryGetCastKernel/TryGetStridedCastKernel now return null for a Boolean source, routing every bool cast to NDIterCasting.ConvertValue (normalizes via ReadAsInt64 'bool ? 1 : 0'). Also makes the dispatch match its own documented contract. Before/after (bytes [0,1,2,3,255,0,127,128] -> logical [F,T,T,T,T,F,T,T]): sum 516->6, mean 64.5->0.75, var 8033.75->0.1875, std 89.63->0.4330127 2D sum total 10->4, axis0 [4,6,0]->[2,2,0], axis1 [3,7]->[2,2] astype(u8) [0,1,2,3,255,..]->[0,1,1,1,1,0,1,1] Min/max/any/all/argmin/argmax over bool were already correct (result casts back to bool, so nonzero->True is preserved); proper 0/1 bool buffers are unaffected (normalization is idempotent). Verification (NumPy 2.4.2 as oracle): - New regression suite test/NumSharp.UnitTest/BoolNonBinaryReductionTests.cs: 19 tests (flat+axis sum/prod/mean/var/std, 5 narrow + 2 wide casts, 0/1 guard), green on net8.0 and net10.0. - Differential probes: 38/38 and 30/30 ops bit-equal to NumPy. - No regression: full suite 11166 pass / 0 fail; FuzzMatrix 69/69 corpora; 1290 reduction tests. Trade-off: bool->{i8,u8,i16,u16,f16} casts lose their SIMD subword fast path (now scalar). A perf-preserving alternative is a one-pass SIMD clamp min(byte,1) before the existing kernel; deferred.

…r bugs + collected) Char is NumSharp's representation of a 16-bit unsigned integer (System.Char / UTF-16 code unit) with no direct NumPy dtype — its only sound analogue is uint16, so Char MUST promote and compute bit-identically to uint16 everywhere. A differential-fuzz Char gate (generate every op as uint16 in NumPy 2.4.2, relabel uint16->char, assert bit-identical) surfaced six places where it did not. All expected values below were probed against NumPy 2.4.2 as uint16. Bug 1 — promote(Char, *) ranked Char in group 0 / priority 0 (with uint8/Byte), and the static arr_arr / arr_scalar promotion tables carried wrong Char rows (e.g. (char,uint8)->uint8). char[321] + uint8[65] returned dtype Byte / value 130 (386 truncated to a byte). Fixed every Char entry to mirror uint16's rank: arr_arr: (char,uint8)->char, (char,int8/int16)->int32, (char,float16)->float32 and the symmetric (uint8/int8/int16/float16, char) entries. arr_scalar: char ARRAY wins over every integer/bool scalar (-> char); (char,float16) scalar -> float32. char + uint8 now yields a 2-byte-unsigned result (value 386), not a truncated byte. Bug 1b — the same mis-rank computed comparisons at the truncated width: greater(char 321, uint8 65) returned False and equal returned True (both operands collapsed to 65). Fixed by the same table change (comparisons resolve their compare-width through result_type) -> greater True, equal False. Bug 2 — reciprocal(char) returned Double because IsInteger() deliberately excludes Char, so it fell to the float loop. NumPy reciprocal(uint16) takes the integer loop (1//x, 0 for |x|>=2). Admit Char to the integer branch and added a Char case to ReciprocalInteger (preserves Char, narrow-type 1/0 -> 0 sentinel). Bug 3 — power(char, float32) returned Double. Root cause was a power-promotion override `lhsGroup<=2 && rhsGroup==3 -> float64` that fired for EVERY int-base ** float-exp. That is a NumPy-1.x rule NEP50 removed: result_type already computes power promotion (uint16**f32->f32, int32**f32->f64, weak-int**f_arr->f_arr). Removed the override. Also fixes the broader pre-existing bug where {bool,int8,int16,uint8,uint16} ** float32 wrongly upcast to float64. Bug 4 — power crashed on a char scalar exponent: the scalar-exponent fast path read it via Convert.ToDouble(char), whose IConvertible.ToDouble throws InvalidCastException. Read the char code point directly (mirrors the existing Half special-case). power(uint16, 2) now returns uint16 [9,16]. Bug 5 — (Boolean, Char) was missing from the arr_scalar table, so a bool array op a char SCALAR threw KeyNotFoundException '(Boolean, Char)'. Not bitwise-specific — any binary op hit it (add too). Added (bool,char)->char. Bug 6 — invert(char) with N >= SIMD width threw NotSupportedException: the BitwiseNot SIMD path emitted Vector<char>, which the BCL does not have (CanUseSimd(Char) is already false for the same reason). Excluded Char from the BitwiseNot SIMD eligibility gate; the scalar path computes ~x bit-exactly. Collected along the way: - AsNumpyDtypeName(Char) reported "uint8" (Char is 2 bytes) -> now "uint16"; graduated the [OpenBugs] T1_33 audit test that documented it. - The power-override removal also corrects the non-Char int**float32 promotions. - bool-array + char-scalar failed for ALL binary ops, not just bitwise. Verification: - Char≡uint16 differential in NumSharp (uint16 is NumPy-validated): 496 binary/unary op × dtype × order combos, 11 memory layouts (strided/reversed/transposed/broadcast/2D/axis), 36 astype combos — 0 diverged. - New CharUInt16MasqueradeTests.cs (26 tests) pins all six bugs with NumPy-probed values plus arr_arr/arr_scalar table differentials and memory-layout coverage. - Updated find_common_type Case23 (char+int16: Int16 -> Int32, matching uint16). - Full suite green: 11224 passed / 0 failed (net10.0, CI filter); FuzzMatrix NumPy-oracle gate green; net8.0 + net10.0 compile clean.

…cted during Char work) Collected while making Char promote as the uint16 masquerade — the Char≡uint16 differential vs NumPy 2.4.2 surfaced it, but it is NOT Char-specific: it affects every integer dtype. NumPy 2.x (NEP50) folds a Python int operand into the array's dtype but FIRST range-checks it, raising OverflowError before any element-wise work when the value is not representable: np.array([1,2],uint16) + 70000 -> OverflowError (70000 > 65535) np.array([1,2],uint16) * -1 -> OverflowError (-1 < 0, unsigned) np.array([1],int8) + 200 -> OverflowError (200 > 127) np.power(np.array([2,3],uint16),-1) -> OverflowError (-1 out of uint16 range) In-range scalars whose RESULT overflows wrap fine (uint16[1,2]-5 -> [65532,65533]). NumSharp promotes a C# scalar purely TYPE-based (the arr_scalar table), so it silently coerces/wraps the value instead of range-checking it: uint16[1,2] + 70000 -> [4465,4466] uint16[1,2] * -1 -> [65535,65534] int8[1] + 200 -> wraps power(uint16[2,3],-1) -> [0,43691] The power case is doubly wrong: no throw AND a nonsense modular inverse (43691 == 3^-1 mod 2^16). Inconsistency proving this is a gap, not a design choice: NumSharp's OWN fused path already enforces it — np.evaluate throws the exact OverflowException (NDExpr.Typing.cs "Python integer {value} out of bounds for {dtype}"). Only the operator / ufunc path skips the check. Adds 5 [OpenBugs] reproductions (asserting the correct OverflowError; failing today, excluded from CI via TestCategory!=OpenBugs). Remove [OpenBugs] when the operator/ ufunc weak-scalar path range-checks the value like np.evaluate already does.

… (Char woven, Decimal oracle) Extend the NumPy differential-fuzz corpora to cover the full NumSharp dtype matrix across every op, instead of the 13 NumPy-representable dtypes on a curated subset of ops. The two NumPy-orphan dtypes (Char, Decimal) are now first-class grid members, and the per-mode dtype axes are widened toward ALL_DTYPES. Net corpus growth: the op corpus went 35.5K -> 53.3K cases (+16.3K, ~46%) plus 234 new Decimal cases; committed corpus ~68K. The full FuzzMatrix gate is green (72/72), CI-style. WHAT CHANGED 1. Char WOVEN into every tier (was excluded — no NumPy char dtype). - gen_oracle.char_tier(mode): generates each Char op through the uint16 NumPy proxy and relabels uint16->char (bytes intact), appending the cases into the SAME tier file as their native kin (binary_arith/unary/bitwise/comparison/reduce/scan/stat/ manip/sort/tail/astype_full). The existing per-tier [FuzzMatrix] tests now assert NumSharp's Char === uint16. 3,726 Char cases woven, all green. - Replaces the earlier bolt-on `char` mode + char.jsonl + FuzzCorpusTests.Char. 2. Native dtype subsets widened to ALL_DTYPES where the op is type-general: SORT/STAT/SCAN/MANIP/TAIL/PARAM/CNZ/NAN_REDUCE/LOGIC/ALIAS/COPYTO (CLIP keeps every ordered dtype; TAIL/ALIAS exclude bool — gen subtracts, NumPy bans bool `-`). 3. Decimal differential coverage via an INDEPENDENT C# oracle (no NumPy analog): - test/oracle/gen_decimal_oracle.cs computes every expected value with naive scalar System.Decimal arithmetic (NOT NumSharp kernels) and emits the harness JSONL schema -> decimal_{unary,binary,reduce,scan}.jsonl (234 cases: 4 unary, 8 binary arith + 6 comparison, 5 reductions, 2 scans, over 13 single + 7 pair layouts). - BitDiff tokenizes Decimal by canonical VALUE (scale-insensitive: 1.0m === 1.00m), since there is no NumPy decimal scale to match. ALL GREEN — no decimal kernel bug. HARNESS - FuzzCorpus.DtypeToTC: map "char" and "decimal". - OpRegistry.ApplyArgsort: wire Boolean/Decimal/Complex (were harness gaps, now green). - FuzzCorpusTests: 4 Decimal* tiers; Char-woven note. BUGS FOUND (carved out of the green corpus, reproduced under [OpenBugs], CI-excluded; remove the carve + test when fixed): OpenBugs.Char.cs (OpenBugsCharTests): - promote(Char,Byte) -> Byte: Char ranks below uint8 in the promotion table, so Char x uint8 truncates the Char's high byte. Corrupts arithmetic (char[321]+uint8[65] -> Byte 130, not uint16 386) AND comparisons (greater(321,65) -> False, equal -> True). - reciprocal(char) -> Double (should be uint16/char integer reciprocal). - power(char, float32) -> Double (should be float32; Char mis-ranked above float32). - power(char, ...) scalar path -> InvalidCastException (kernel calls Convert.To*(char)). - bitwise_*(bool, char) -> KeyNotFoundException '(Boolean, Char)' (unregistered kernel). - invert(char) N>=16 -> NotSupportedException (Vector256<ushort> path omits Char; N<=15 scalar path works). OpenBugs.DtypeCoverage.cs (OpenBugsDtypeCoverageTests): - clip(bool) on non-contiguous (strided/transposed/F-contig) -> NotSupportedException (contiguous path handles bool; the general clip kernel omits it). NON-BUGS (classified, documented carves — NOT OpenBugs): - complex self-multiply ULP: NumSharp matches NumPy's SCALAR z*z exactly; NumPy's own array ufunc disagrees by ~1 ULP on a catastrophic-cancellation _cbase value. Carved complex from ALIAS (ill-conditioned input, not a bug). - argsort<bool>/<Complex> "not supported" was an OpRegistry wiring gap, now fixed -> green. Docs: Fuzz/README.md ledger + .claude/CLAUDE.md updated (corpus counts, char_tier, gen_decimal_oracle.cs, value-aware decimal compare, [OpenBugs] dispositions).

… green) Round out the independent C# Decimal oracle (gen_decimal_oracle.cs) to the decimal-specific ops where bugs are most likely to hide. 68 new cases, all green — NumSharp's Decimal kernels are bit-correct (value-wise) for every covered op. - power(decimal, int-exponent): exact repeated multiply / reciprocal oracle. 20 cases. - var (axis=None, ddof=0): mean((x-mean)^2), exact decimal arithmetic. 11 cases. - std (axis=None, ddof=0): sqrt(var) oracled by an INDEPENDENT Newton-Raphson decimal sqrt (NOT NumSharp's DecimalMath.Sqrt) — both converge to the same value to full decimal precision, validating DecimalMath.Sqrt. 11 cases. - matmul 2D@2D (incl. one F-contiguous B): naive triple-loop sum-of-products oracle (decimal + is exact, so accumulation order is irrelevant). 4 cases. - astype decimal->{int32,int64,float32,float64} (truncation toward zero) and {int32,int64,float64}->decimal: the cast kernel. 22 cases. Decimal now spans 8 tiers / 302 cases (unary/binary/reduce/scan/power/varstd/matmul/ astype) over 13 single + 7 pair layouts. Full FuzzMatrix gate green (76/76).

…rity Extend ToByteArray with an order parameter and add the NumPy-named tobytes() alias, mirroring numpy.ndarray.tobytes(order='C'). Previously ToByteArray() only emitted logical C-order bytes with no order control. Order semantics (all probed against NumPy 2.4.2, src/numpy methods.c + convert.c PyArray_ToString): 'C'/'c' -> row-major 'F'/'f' -> column-major 'A'/'a' -> 'F' iff F-contiguous AND not C-contiguous, else 'C' (exactly NumPy's PyArray_ISFORTRAN test; reused via OrderResolver) 'K'/'k' -> ALWAYS 'C' for the numeric dtypes NumSharp supports. This is a real NumPy quirk: tobytes routes non-reference dtypes through CopyInto into a C-contiguous destination (the F flag is only set for order=='F'), so tobytes('K') never preserves an F-contiguous source. Deliberately special-cased instead of OrderResolver's 'K' (which keeps an F source as 'F' for copy/flatten). invalid -> ArgumentException via OrderResolver (NumSharp's house mapping of NumPy's ValueError). Implementation (no per-dtype switch; NDIter-driven): - Fast path: direct Buffer.MemoryCopy from Storage.Address when the view is already contiguous in the requested physical order AND offset==0. The offset==0 guard is required — simple contiguous slices are re-based (offset folded into Address) but strided/negative-stride/F-sliced views keep their start in Shape.offset, so Address would point at the wrong first element (verified: reversed [::-1] off=4, F-sliced T[...,1:] off=2). - Otherwise materialize via copy(physical) (NDIter copy primitive; absorbs scalar/(1,)/strided/broadcast/transposed uniformly across all 15 dtypes), then memcpy. copy('F') leaves a buffer whose linear bytes are the column-major readout. - GC.AllocateUninitializedArray: the buffer is fully overwritten by the copy, so the CLR zero-fill is pure waste — matches NumPy's uninitialized PyBytes_FromStringAndSize(NULL,n). Validation: - 936/936 cases bit-exact vs NumPy oracle (13 comparable dtypes x 18 layouts x 4 orders: contig/F-contig/asfortran/strided/reversed/transpose/column/ row/submat/broadcast/3D/scalar/empty/prefix/offset/1-elt/negcol/3D-transpose). - Char + Decimal (no NumPy equivalent) validated oracle-free via the metamorphic identity tobytes('F') == transpose(reversed-axes).tobytes('C'). - 14 new MSTest cases (exact NumPy bytes + all-15-dtype rules + lowercase + invalid-order + empty/scalar/roundtrip/detached-copy); 17 existing ToByteArray tests still green on net8.0 + net10.0. Performance (Release, warm best-of-21 vs NumPy 2.4.2, NPY/NS): contiguous fast path 2.9x-6.4x faster (10M-50M, i4/f8) — single memcpy + uninitialized alloc + .NET LOH page reuse (CPython returns large buffers to the OS and re-faults each call). materialize <=10M 1.6x-2.1x faster (strided/transpose). materialize 50M+ ~0.5x: the unmanaged->managed 2nd pass costs an extra out-of-cache DRAM sweep vs NumPy's single CopyInto-into-bytes; rare for tobytes (usually contiguous). Single-pass would need a per-dtype instantiation (against the no-dtype-switch rule) or byte-reinterpret plumbing; deferred.

…oracle grids (5 bugs found) Close the highest-value gaps from the COVERAGE_GAPS.md audit: array→array ops that fit the differential-oracle design but were never wired. 18 ops added across Batches 1-3; the op corpus grows by ~1.7K cases; full FuzzMatrix gate green (77/77). Five real bugs caught, carved from the green corpus and reproduced under [OpenBugs]. ADDED (green): Batch 1 (logic mode): logical_and/or/xor (binary→bool), logical_not (unary→bool), arctan2 (binary trig→float). Batch 2: sort (value sort, axis -1/0/1 → sort.jsonl), diagonal (matmul tier), ediff1d (scan tier), nanpercentile/nanquantile (nanreduce tier, finite+NaN data), round_/around (new rounding.jsonl, decimals 0/1/2). Batch 3: flatnonzero, argwhere (sort tier), allclose, array_equal (logic tier, 0-D bool), unique (sort tier, contiguous+finite). BUGS FOUND → [OpenBugs] (OpenBugsDtypeCoverageTests): - trace(unsigned) → Int64 instead of uint64 (accumulator upcasts to the signed default; cf. sum(uint8)→uint64 which is correct). Trace_Unsigned_WrongResultDtype. - round_/around with NEGATIVE decimals: int loop THROWS ArgumentOutOfRangeException (System.Math.Round rejects digits<0), float mis-rounds. Round_NegativeDecimals_Broken. - round_ on float16 with decimals>=1 diverges from NumPy banker's rounding. Round_Float16_Fractional_Diverges. - iscomplex / isreal IGNORE the imaginary part (complex → all-real) and emit garbage on strided real input. IsComplex_IgnoresImaginaryPart / IsReal_IgnoresImaginaryPart. CLASSIFIED NON-BUGS (carved, documented — not OpenBugs): - nanpercentile/nanquantile across inf: percentile INTERPOLATION over inf is ill-defined (inf-inf=NaN) — gave them finite+NaN data so they test the actual nan-skipping. - unique on raw-offset corpus views = the documented '#11 unreachable-via-API' representation gap (public-API unique is correct, verified); inf/NaN complex ordering is implementation-defined — gave unique contiguous+finite data instead. Tracking doc: test/NumSharp.UnitTest/Fuzz/COVERAGE_GAPS.md (full np.* audit + Group A work list; Batches 1-3 done, remaining: take/put/compress/extract, ravel_multi_index/unravel_index/indices, convolve, flatten/rollaxis/append/insert, the split family, the decimal-specific ops).

…plit/index), all green Complete the np.* half of Group A. New `groupa` tier (103 cases) exercises the shape, selection, convolution, multi-output split, and index-transform ops that fit the oracle but were unwired. All GREEN — no bugs. Full FuzzMatrix gate 78/78. ADDED (green, groupa tier): shape: flatten (C-order copy, incl. non-contiguous source), rollaxis, append, insert selection: take (int64 indices, axis), compress (bool cond), extract (bool mask), put (mutate) math: convolve (full/same/valid, 1-D) multi-out: split / hsplit / vsplit / dsplit (one case per output piece) index: ravel_multi_index (coords->flat), unravel_index (flat->coords, per-dim piece) np.* Group A is now complete: 33 ops wired across Batches 1-6 (7 -> [OpenBugs], the rest green). Only `indices` (creation-shaped, no operand -> Group C) and the decimal-specific ops (extend gen_decimal_oracle.cs) remain. Tracking: Fuzz/COVERAGE_GAPS.md.

…rity) Investigating the audit item "np.array(byte[]) -> int8": that path is already correct (C# byte -> uint8/Byte, sbyte -> int8/SByte across every np.array overload, and AsNumpyDtypeName maps them right). The real, closely-related defect is in np.min_scalar_type, which falsely assumed "NumSharp doesn't have int8/sbyte" and never returned float16 (Half) — both types that DO exist among the 15 dtypes. Bugs fixed (verified against NumPy 2.4.2 convert_datatype.c:min_scalar_type_num): 1. Negative int8-range scalars widened to Int16 instead of SByte (int8). min_scalar_type(-1..-128) returned Int16; NumPy returns int8. Affected every signed input (sbyte/short/int/long) since they route through MinTypeForSignedInt. The sbyte arm (sb>=0 ? Byte : Int16) had the same widening. 2. Float/double values NEVER returned Half (float16). The old code used exact-float32-representability, which both (a) missed float16 entirely and (b) diverged from NumPy, e.g. min_scalar_type(0.1) returned Double. NumPy demotes floats by RANGE (magnitude), allowing precision loss: |v| < 65000 (or non-finite) -> float16 ; |v| < 3.4e38 -> float32 ; else float64. Bounds are exclusive and copied verbatim from NumPy (65000, not 65504). 3. Added explicit Half input -> Half (NumPy NPY_HALF: float16 stays float16). promote_types/result_type were checked and are NOT affected (they promote existing types, not value-based smallest-type inference) — 11/11 narrow-type pairs match NumPy. Validation: - New NumPy-oracle differential over 122 cases (all 15-dtype boundaries incl. int8 negatives, float16 range, exclusive cutoffs) — 0 divergences. - Corrected two tests that pinned the old buggy behavior (min_scalar_type(-1)->Int16, 1.0f->Single, NaN/Inf->Single, -10->Int16) to NumPy truth, per project DOD. - Added np.array C#-type -> NumPy-dtype-name regression (byte[] -> uint8 never int8, sbyte[] -> int8, full 12-type matrix) to lock the originally-reported item. - Full suite green: 11259 passed / 0 failed (net10.0); affected classes green on net8.0.

… C-order fix Rewrite NDArray.tofile from a binary-only, single-string-overload stub into the full NumPy ndarray.tofile(fid, /, sep='', format='%s'). Adds a Stream overload, text mode (sep + Python %-format), and — critically — fixes a data-corruption bug. BUG FIX (binary mode, non-contiguous arrays): The old tofile blindly wrote this.Array.Address for this.Array.BytesLength — the RAW underlying buffer. A sliced/strided/transposed/broadcast view therefore leaked the wrong bytes and wrong length (e.g. arange(10)[::2] wrote all 10 elements instead of the 5 logical ones). NumPy guarantees "Data is always written in C order, independent of the order of a". Binary mode now writes the logical C-order bytes: contiguous offset-0 views stream straight from Storage.Address (no managed copy); every other layout is materialized C-contiguous via copy('C') first — mirroring NumPy's PyArray_ToFile (whole-buffer fwrite when ISCONTIGUOUS, else a C-order walk). API parity (probed against NumPy 2.4.2 methods.c + convert.c): - tofile(string fid, sep='', format='%s') — creates/truncates the file ("wb"). - tofile(Stream, sep='', format='%s') — writes at the current position, leaves the stream open (the file-object form). - sep=="" (default) => binary; sep!="" => text, C-order iterate, "format % item" joined by sep with NO trailing separator. - format default "%s" (== "") => the NumPy scalar string via ArrayFormatter.ScalarStr (proven == str(np.scalar): floats keep ".0", 1e+20, complex "(1+2j)"/"1j", etc.). New PrintfFormatter (Backends/Printing): a focused port of CPython str.__mod__ as used by tofile — conversions d i u f F e E g G s x X o c %, flags - + space 0 #, width and .precision. Python semantics reproduced (probed): float truncation under %d, 2-digit-min lowercase/uppercase exponent for %e/%E, C-style %g with trailing-zero stripping, signed %x/%X/%o with "#" prefixing even zero ("%#x" % 0 == "0x0"), Python's zero-padding of inf/nan under the 0 flag ("%08.2f" % inf == "00000inf"), complex->real for real convs. Drive-by fix (ArrayFormatter.PythonComplexRepr): str() of a pure-imaginary complex with a +0.0 real part force-prepended a "+" — str(0j) gave "+0j" (and 1j would give "+1j"). Now matches CPython: pure-imaginary form carries the imaginary's own sign ("0j","1j", "-1j","-0j"); a -0.0 real still takes the parenthesized form ("(-0+0j)"). This is a pre-existing 0-d complex scalar-str bug that also surfaced through tofile text mode. Validation: - 558/558 tofile cases bit/byte-exact vs NumPy (13 dtypes x 6 layouts x binary + text %s/%d/%.3f/%e/%g/%08.2f/%+.1f). - PrintfFormatter: 446/446 format-x-value cases match Python (float64 + int64 across the conversion/flag/width/precision matrix, incl. inf/nan and round-half-to-even). - Char + Decimal (no NumPy analog) self-consistent (binary == ToByteArray('C'); text = per-element ScalarStr). - 16 new MSTest cases; full CI suite green (11275 passed) incl. 204 printing/ToString parity tests (the complex fix is consistent with the ~18K-case printing port). Performance (Release, warm best-of-N vs NumPy 2.4.2, NPY/NS): binary contiguous 10M 2.48x binary strided 10M 10.84x (NumPy per-element fwrite) text %.4f 200K 1.92x int text ~5x float %s text 1.37x Float-%s text is the one sub-1.5x case: both libraries are bound by the same shortest-float (Dragon4) algorithm; batched writer flushing keeps the rest of the per-element overhead off the hot path.

@name

…oat32/16 sci threshold Two NumPy-parity bugs surfaced by differential fuzzing np.tofile against NumPy 2.4.2 (binary + text, ~2500 format-x-value-x-dtype cases). Both live in the shared printing layer, so they also affected 0-d scalar str, not just tofile. 1) PrintfFormatter '#' (alternate) flag dropped the decimal point on f/e/g. Python keeps a decimal point under '#' even when no fractional digits are emitted: "%#.0f" % 3 -> "3." (was "3") "%#.0e" % 3 -> "3.e+00" (was "3e+00") "%#g" % 100000 -> "100000." (was "100000") Fix: thread `alt` into FormatFixed / FormatScientific and insert the point when the mantissa/number has none; FormatGeneral already suppressed trailing-zero stripping under '#' but must likewise force the point. 84/738 stress cases were wrong; now 738/738 match Python (incl. %g power-of-10 carry and %.Nf round-half-to-even, already correct). 2) ArrayFormatter.PythonFloatRepr used float64's positional/scientific window for ALL float dtypes. NumPy's scalar repr (scalartypes.c.src @name@type_@kind@_either) is positional iff 1e-4 <= |value| < max_positional, where max_positional is PER DTYPE (float16 -> 1e3, float32 -> 1e6, float64 -> 1e16) and the cutoff is tested on the VALUE: str(np.float32(1e15)) -> "1e+15" (was "1000000000000000.0") str(np.float32(1e-4)) -> "1e-04" (was "0.0001"; float32(1e-4) rounds to 9.999e-5 < 1e-4) str(np.float16(65500)) -> "6.55e+04" (was "65500.0") For float64 the new value-based test is exactly equivalent to the old decExp in [-4,16) check, so double output (and the ~18K-case array-print fuzz) is unchanged; only the float32/float16 0-d scalar / tofile-%s path is tightened to its true window. This was a latent printing-port bug: the array-printing path uses FloatingFormat (its own exp logic), so the scalar-only PythonFloatRepr escaped the earlier fuzz. 865/865 f16/f32/f64 scalar-str fuzz cases now match NumPy. Also documents PrintfFormatter's deliberate leniency where CPython/NumPy RAISE (%x/%o on bool/float, %c on float, %d on inf/nan, conversion-count != 1): a per-element file writer returns a best-effort rendering instead of aborting mid-stream. These remain the only differential-fuzz divergences and are all cases where `format % item` throws in Python (and NumPy's own tofile even segfaults on %d-with-inf). Tests: 2 new regression cases (# flag, float32/16 sci threshold at both the tofile and 0-d ToString surfaces). Full CI suite green (11277 passed) incl. 204 printing/ToString parity tests.

… (NumPy parity) Follow-up to the np.min_scalar_type fix: swept the whole dtype-resolution surface with full differential matrices against NumPy 2.4.2 and fixed every sibling of the "narrow type (int8/float16) gets avoided/widened" bug. Probes run: promote_types 13x13 (169), can_cast 13x13x5 casting modes (845), issubdtype concrete x abstract (117), reductions+unary per dtype (300), binary ufuncs 12x12x7 (1008). promote_types and the reduction/unary-math tiers were already correct; the bugs below were not. 1) NPTypeHierarchy._concreteParent was MISSING BOTH narrow types. SByte (int8) and Half (float16) had no entry, so GetImmediateKind() returned Generic for them — breaking every consumer: - issubdtype(int8, signedinteger/integer/number) -> False (NumPy: True) - issubdtype(float16, floating/inexact/number) -> False (NumPy: True) - can_cast same_kind involving int8/float16 (see #2) - maximum_sctype(int8)/(float16) -> themselves (now int64/float64) Added [SByte]=SignedInteger and [Half]=Floating (+ GetMaximumType arms). issubdtype differential: 117/117 after (int8/float16 rows now pass). 2) can_cast(..., "same_kind") used a SYMMETRIC "same category" model, diverging from NumPy's DIRECTIONAL kind ordering (dtype_kind_to_ordering in convert_datatype.c): bool(0) < unsigned(1) < signed(2) < float(4) < complex(5), allowed iff a safe cast OR KindOrder(from) <= KindOrder(to). 32/845 cases were wrong, e.g.: - int16 -> int8 was False (NumPy True: signed -> signed) - int16 -> uint8 was True (NumPy False: signed -> unsigned is DOWN a kind) - int32 -> float was False (NumPy True: int -> float) - float -> float16 direction handled; complex -> float rejected. Added NPTypeHierarchy.KindOrder + CanCastSameKindOrder; can_cast now ORs safe with the ordering. can_cast differential: 845/845 after. (Kept the symmetric IsSameKind for its own callers/tests; it is no longer used by can_cast.) 3) Bool-input ufuncs with NO bool loop returned bool instead of int8. NumPy's power/floor_divide/remainder/square loops start at int8, so bool operands promote (probed 2.4.2, and already the documented rule for np.evaluate — the DIRECT ops just never applied it): - power(bool,bool) -> int8 [True**True=1, False**False=1] - floor_divide(bool,bool) -> int8 [1//1=1, 0//0=0] - mod(bool,bool) -> int8 [1%1=0] - square(bool) -> int8 [True->1, False->0] Extended the existing shift-op bool->int8 remap in ExecuteBinaryOp to cover FloorDivide/Mod/Power, fixed ResolvePowerResultType (power's scalar-exponent fast paths), and Default.Square. Scoped strictly to resultType==Boolean (i.e. both operands bool) so add/multiply/etc. are untouched. binary differential: 1008/1008 after; values verified against NumPy. Also corrected two tests that pinned the OLD buggy same_kind behavior (can_cast(int32, float32, "same_kind") asserted False; NumPy says True) — they contradicted other tests in the tree that already expected the correct direction. Validation: - Differentials vs NumPy 2.4.2 all green after fixes: promote 169, can_cast 845, issubdtype 117, binary 1008, reductions+unary (only np.rint diverges — a MISSING ufunc in NumSharp, not a dtype bug; noted, out of scope). - New Casting/NarrowDtypeParityTests.cs (9) pins issubdtype/maximum_sctype/ can_cast-directional + the four bool->int8 op cases (dtype AND values). - Full suite 11286 passed / 0 failed (net10.0), FuzzMatrix gate 79/79 bit-exact, affected classes green on net8.0.

…rt/manip (Group A complete) Closes the final Group A item: extend the INDEPENDENT C# Decimal oracle (gen_decimal_oracle.cs) to the remaining decimal-supported ops. Decimal is the one NumSharp numeric dtype with no NumPy analog, so the generator itself is the oracle — every expected value is computed with naive scalar System.Decimal math (no NumSharp kernels), then the harness replays the operand through NumSharp's decimal KERNELS and value-compares (BitDiff tokenizes decimal by canonical value, so 1.0m ≡ 1.00m). New / extended tiers (12 total, 579 cases, all green): - decimal_unary (+floor/ceil/trunc via decimal.Floor/Ceiling/Truncate — exact base-10) - decimal_scan (+diff n=1,2 along the last axis; DiffAxis oracle) - decimal_stat (NEW, 170): clip = Max(lo,Min(hi,x)); order stats median/ptp/ percentile/quantile (axis=None -> scalar). Oracle = naive sort + NumPy 'linear' interpolation in EXACT decimal (Quantile/Median). - decimal_where (NEW, 4): where(cond,a,b) 16-byte conditional-copy over contig+strided - decimal_sort (NEW, 7): sort along an axis (1-D/2-D, contig+strided; SortAxis oracle) - decimal_manip (NEW, 36): ravel/transpose/reshape — value-preserving reindex forcing the strided decimal materialize/copy path (compared C-contiguous) Every oracle formula was validated bit-identical against NumSharp's decimal path BEFORE generating the corpus (median even/odd n, ptp, quantile/percentile q in {0,.25,.5,.75,1}, clip, sort, diff, floor/ceil/trunc, where, reshape/ravel/transpose) — de-risked, no false divergences. Zero harness changes: all these ops are already dtype-generic in OpRegistry, so the decimal cases flow through the existing dispatch. nan* reductions intentionally skipped: System.Decimal cannot represent NaN, so nansum/nanmax/... are byte-identical to plain sum/max/... (already covered by decimal_reduce; verified np.nansum(decimal) == np.sum(decimal)). Note: the shared n++ case-ID counter shifts IDs in the pre-existing decimal tiers (binary/reduce/power/varstd/matmul/astype); those diffs are ID-relabel ONLY — operand/expected buffers are unchanged (verified by id-stripped diff). Gate: FuzzMatrix 65/65 green on net10.0 (4 new decimal test methods: DecimalStat/DecimalWhere/DecimalSort/DecimalManip). COVERAGE_GAPS.md: Group A closed. (cherry picked from commit 48ebfa4fcc2ce57bcedcdfc14ba75108bef845c8)

…nversion parity gap Audited the ENTIRE type-conversion surface against NumPy 2.4.2 and found the value side fully on-parity; the one gap was a missing API parameter, closed here. Audit (all differential vs NumPy 2.4.2, bit-exact): - astype 13 NumPy dtypes, 13x13, FRESH independent oracle (aggressive edges distinct from the committed corpus: 0.4999/0.5/0.5001 boundaries, subnormals, 1e300, all int overflow points, NaN/±inf, complex) — 169/169. (Plus the committed astype_full corpus 5070 + FuzzMatrix, green.) - Char (uint16 masquerade): char<->X byte-identical to uint16<->X — 23/23. - Decimal->X (all 13 numeric dst incl. modular int overflow + float16 Inf): matches the NumPy-verified double path — 442/442. - X->Decimal: int exact + round-trip; float exact in-range. - np.array(dtype=) / copyto vs astype — 12/12. - can_cast (np.can_cast AND the separate NDIterCasting.CanCast used by copyto/ufuncs): each 845/845 vs NumPy across 13x13x5 casting modes. The gap: ndarray.astype had no `casting=` parameter — it always cast unsafely. NumPy's astype takes casting='unsafe' (default) and raises TypeError when a stricter rule ('no'/'equiv'/'safe'/'same_kind') forbids the cast. Fix: added `string casting = "unsafe"` to both full astype overloads (NPTypeCode and Type). Default 'unsafe' is a NO-OP short-circuit — 100% backward compatible, every existing caller is unchanged. A stricter rule validates through the hardened np.can_cast (bit-exact vs NumPy across all 15 dtypes) and raises InvalidCastException — same exception type and message shape as np.copyto: "Cannot cast array data from dtype('int32') to dtype('int16') according to the rule 'safe'" Verified vs NumPy: int32->int64 safe OK; int32->int16 safe raises, same_kind OK; int32->float32 safe raises, same_kind OK; float64->int32 safe/same_kind raise, unsafe OK. Documented finding (NOT changed — no NumPy analog): float->Decimal for NaN/±Inf/overflow (|v| >= ~7.9e28) silently yields 0. System.Decimal has no NaN/Inf and a smaller range, so there is no NumPy behavior to match; left as-is pending a decision (0 / throw / saturate). Tests: Casting/AstypeCastingParamTests.cs (8) — default-unsafe, safe/same_kind/unsafe outcomes, message shape, Type overload, same-dtype no-op, and value-invariance for allowed casts. Full suite 11314 passed / 0 failed (net10.0); FuzzMatrix astype corpus green; net8.0 green.

@1m

…g UnaryOp.Round Implements np.rint, the previously-missing float-tier rounding ufunc surfaced during the int8/float16 dtype sweep. NumPy 2.4.2 is the oracle (probed + generate_umath.py: rint has only e/f/d/g/F/D/G loops — no integer/bool loops). rint vs np.round/around: SAME value kernel (round-half-to-even == Math.Round default == the existing UnaryOp.Round, already internally named "rint" in UfuncName), but a DIFFERENT dtype rule. around preserves integer dtype (int->int identity); rint is float-tier like sqrt/sin: bool/int8/uint8 -> float16, int16/uint16/char -> float32, int32/uint32/int64/uint64 -> float64, floats/complex/decimal preserved. So it maps to ResolveUnaryFloatReturnType, not the dtype-preserving path. Implementation (reuses existing infra, no new kernel/UnaryOp): - np.rint.cs — NumPy-shaped overload rint(x, out=, where=, dtype=) + positional-dtype convenience forms (modeled on np.trunc.cs). - Default.Rint.cs — one-liner: ExecuteUnaryOp(nd, UnaryOp.Round, ResolveUnaryFloatReturnType(nd, typeCode, "rint"), out, where). Mirrors Default.Sin. - TensorEngine.cs — two abstract Rint overloads next to Truncate. - Fixed a REAL latent gap: EmitUnaryComplexOperation threw "not supported for Complex" for UnaryOp.Round, so np.around(complex) was ALSO broken. Added the complex Round case (rounds real & imag separately, half-to-even) — np.rint(complex) and np.around(complex) now both work. Dtype/value/layout/param parity (all verified vs NumPy 2.4.2): - 13-dtype tier check; half-to-even values (0.5->0, 2.5->2, -2.5->-2, 2.6->3); nan/inf preserved; complex real+imag; decimal; int-input identity-as-float. - Layouts: strided, negative-stride, broadcast, transpose, empty, 0-d scalar. - out= (same instance + fill), where= (masked slots keep prior out), dtype= (loop runs at that dtype; dtype=<int> raises the no-loop error). Performance (Release, best-of, reusing the SIMD UnaryOp.Round vector path): at scale NumSharp is competitive-to-faster — @1m float32/float64 ~5-6x NumPy, @10m float32 3.7x, float64 1.25x, complex 1.77x; float16 ~parity (scalar path, no Vector<Half> in .NET) and int paths are widening-cast bound. Same profile as np.around (shared kernel). NumPy's single-instruction _mm256_round wins only on tiny L2-resident arrays (NumSharp's shared unary-dispatch overhead), which amortizes by ~1M elements. Tests: - Math/np.rint.Test.cs (16) — tiers, half-to-even, nan/inf, complex, decimal, layouts, out/where/dtype, plus Around_Complex_NowSupported (the bonus fix). - Fuzz: rint added to gen_oracle UNARY_EXTRA_OPS + OpRegistry; regenerated unary_extra corpus (+364 rint cases). FuzzMatrix gate replays them bit-exact. Docs: CLAUDE.md Math-Arithmetic list + ufunc out=/where= list updated. Full suite 11314 passed / 0 failed (net10.0), FuzzMatrix green, net8.0 rint green.

…= casting, metamorphic Extends Math/np.rint.Test.cs from 16 to 31 tests, all pinned to NumPy 2.4.2 output, covering the subtle parity points beyond the fuzz corpus: - Signed zero preserved (f64+f32): rint(-0.4)/rint(-0.5)/rint(-0.0) -> -0.0 (signbit True), rint(0.4) -> +0.0; subnormal underflow rint(-1e-300) -> -0.0. Verified NumSharp Math.Round matches NumPy signbit exactly. - Half (float16) half-to-even values; Char (uint16 proxy) -> float32. - All 15 supported dtypes -> NumPy-parity result dtype (adds Char + Decimal to the tier check). - Higher-rank 2D values + F-contiguous input. - out= same_kind narrowing cast (f64 loop -> f32 out) returns the f32 out; out= float->int raises (not same_kind); in-place out=a aliasing. - where= broadcasts (2,) mask across a (2,2) output; masked-off slots keep prior contents. - Metamorphic (oracle-free): rint idempotent, and odd (rint(-x) == -rint(x)). - Large integral magnitudes (2^53, +/-1e16) unchanged. Full suite 11329 passed / 0 failed (net10.0); rint class green on net8.0.

…ype/Stream Answers "do we support tofile/fromfile to full extent?": tofile was complete, but fromfile was binary-only with a required dtype — the pair was asymmetric (you could tofile a text or non-contiguous file but not read it back). This brings fromfile up to NumPy's fromfile(file, dtype=float, count=-1, sep='', offset=0). Was: fromfile(string file, NPTypeCode|Type dtype) — reads the WHOLE file as binary only. Now (backward-compatible; the old 2-arg calls still resolve): - dtype defaults to float64 (NumPy's default) and may be omitted. - count: read N items; -1 (default) reads the rest. Past EOF reads what's present. - offset: skip N bytes from the current position (binary only; text+offset raises, like NumPy). - sep: text mode. Splits NumPy-style — a whitespace-only separator splits on any whitespace run; otherwise split on the separator's non-whitespace core with the surrounding whitespace treated as a wildcard (matches swab_separator), trailing separator ignored. Integer tokens WRAP into the dtype (int8 "300"->44, uint8 "-1" ->255) like NumPy's scanf loop; floats accept sci / nan / inf; bool parses as int (nonzero => True); malformed data raises ValueError. - Stream overload (the file-object form): reads from the current position, left open. NaN parity: NumPy/C text "nan" is the POSITIVE quiet NaN (0x7FF8…); .NET's double.NaN is the NEGATIVE one (0xFFF8…). fromfile emits the positive pattern so bytes are identical to NumPy and a narrowing cast lands on float 0x7FC00000 / half 0x7E00. Complex: reads the bare "a+bj" form NumPy accepts AND, as a superset, the parenthesized "(1+2j)" form its own tofile writes — so complex text ROUND-TRIPS in NumSharp (NumPy's text reader errors on the parenthesized form, so it does not round-trip there). Binary fast path: a seekable source (every filename) reads straight into one exact-sized buffer that the array then views (pinned) — a single disk->buffer copy, versus the MemoryStream growth + ToArray double copy of the streaming fallback (non-seekable streams). Validation (differential vs NumPy 2.4.2): 35 oracle cases bit-exact (binary count/offset/ EOF-clamp/default-dtype/reinterpret + text sep/whitespace/trailing/count/sci/nan/inf/ int-wrap/bool/complex/empty/malformed-error/offset-error) + Stream + default-dtype + 15 binary round-trips (all dtypes, incl. sliced views) + 6 text round-trips. 15 new MSTest cases; full CI suite green (11344 passed). Performance (Release, warm, NPY/NS): binary read 10M int32 = 1.18x (single-copy/disk bound, both do one copy); text parse 200K float64 = 4.79x.

…the sole NumPy byte export Per "Breaking Changes OK to match NumPy": remove the pre-parity backward-compat shim so the NumPy-named tobytes is the one and only byte-export API. Source - NdArray.ToByteArray.cs -> NdArray.tobytes.cs: move the implementation into tobytes(char order = 'C') and DELETE the public ToByteArray(char) method (tobytes was previously just a thin alias delegating to it). Body is byte-for-byte identical, so behavior is unchanged; this is a pure rename/removal. - np.tofile.cs: doc-comment reference ToByteArray('C') -> tobytes('C'). Old-implementation test removed (user-directed) - Delete test/NumSharp.UnitTest/APIs/np.tofromfile.Test.cs — the 2019-era uint8/uint16 tofile<->fromfile round-trip. Its coverage is fully subsumed (all 15 dtypes + views) by NDArray.tofile.Test.cs (RoundTrip_*_AllDtypes) and NDArray.fromfile.Test.cs (RoundTrip_Binary_AllDtypes_IncludingViews). Tests migrated ToByteArray -> tobytes (coverage preserved, not reduced) - NDArray.ToByteArray.Test.cs -> NDArray.tobytes.Contract.Test.cs (class ToByteArrayTests -> TobytesContractTests): the C6 non-contiguous-view regression suite, unchanged in substance. - NDArray.tobytes.Order.Test.cs: DefaultOrder_IsC_AndAliasMatches -> DefaultOrder_IsC (the alias-equality assertions are moot once the alias is gone; the default==C check stays). - NDArray.ToArray.Test.cs, NDArray.tofile.Test.cs, NDArray.fromfile.Test.cs, NumpyByteContractTests.cs: call sites updated to tobytes. Note: fromfile's Type + NPTypeCode overload pair is KEPT — it is the established NumSharp idiom (frombuffer carries the identical pair) and NumPy's fromfile(file, dtype, ...) signature is a superset of the old 2-arg call, so it cannot be "de-compatibilized" without breaking parity. Validation: net10.0 and net8.0 both Passed 11342 / Failed 0 / Skipped 11 (CI filter TestCategory!=OpenBugs and !=HighMemory and !=LargeMemoryTest).

The XML <remarks> on the `UnmanagedMemoryBlock(T* ptr, long count)` constructor claimed "Does claim ownership." — the exact opposite of what the constructor does, and a direct contradiction of its own <summary> ("Construct as a wrapper around pointer ... without claiming ownership"). The body sets `_disposer = Disposer.Null` (AllocationType.Wrap), so disposing the block never frees `ptr`; the caller owns the memory's lifetime. A reader trusting the old remark could reason their way into a double-free (expecting the block to free it) or a use-after-free (expecting the block NOT to, when it never would). Rewrote the remark to state the non-owning contract explicitly. Comment-only change; no behavioral impact.

Regression guards for typed data extraction on a dtype MISMATCH, pinning NumSharp's contract against the Numpy.NET interop finding "C3": there, `NDarray.GetData<int>()` on an int64 array silently REINTERPRETS the raw buffer as int32 and truncates to the element count ([1, 5e9, -1] -> [1, 0, 705032704], i.e. numpy's `a.view(np.int32)[:3]` byte garbage) — data corruption. NumSharp must never behave that way. The tests lock in: * NDArray.GetData<T>() performs a VALUE CAST equal to numpy's `a.astype(...)` — int64->int32 truncates by value ([1, 705032704, -1], NOT the reinterpret [1, 0, 705032704]); float64->int32 truncates toward zero with overflow wrap ([1, -2, -2147483648]); int32->int64 widens; matching dtype is a passthrough. * NDArray.ToArray<T>() is STRICT and throws ArrayTypeMismatchException on mismatch (no silent conversion). Oracle values produced with NumPy 2.4.2. Category: normal (green) — these document the intended, correct behavior, not an OpenBug.

…ark coach DocFX site updates: * api/index.md — "Supported Data Types" was stale at 12 numeric types. Corrected to the 15 public array dtypes: added the three that were missing (SByte -> np.int8, Half -> np.float16, Complex -> np.complex128) and a note that NPTypeCode.Empty/String and the Float alias are enum/compat values (Float resolves to Single), not additional array dtypes. * api/overwrites/NumSharp.md — new DocFX overwrite supplying a summary/remarks block for the `NumSharp` namespace landing page (NDArray, the np facade, Shape/Slice/NPTypeCode, NumPyRandom) plus a short broadcasting/slicing example. Wired in via docfx.json: registered under `overwrite`, and excluded from the `content` glob so it is applied as metadata rather than rendered as a standalone page. * toc.yml — added top-level "Benchmarks" (the dashboard page) and "Source Code" (GitHub) entries. * docs/benchmarks-dashboard.md — added the "Click Here to see breakdown" onboarding coach: a one-time, localStorage-remembered bubble that points first-time viewers at the interactive breakdown targets (2x-5x status band, Reduction suite row, dtype-heatmap tail cell). It is IntersectionObserver-driven, keyboard-dismissable, degrades gracefully when storage is disabled, and self-dismisses after ~5.6s. Also removed the now-defunct `.guide-formula` box and folded its content into the "Reading Ratios" prose (ratio = NumPy / NumSharp, higher is better), matching the repo-wide NPY/NS convention. * images/benchmark-dashboard.png — dashboard screenshot asset (also used by the README).

Nucs added 24 commits July 1, 2026 19:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: NumPy 2.x parity — byte export, file I/O, dtype resolution & np.rint (+ all-15-dtype fuzz)#616

feat: NumPy 2.x parity — byte export, file I/O, dtype resolution & np.rint (+ all-15-dtype fuzz)#616
Nucs wants to merge 24 commits into
masterfrom
development-journey1

Nucs commented Jul 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Nucs commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

What's included

Breaking changes

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Nucs commented Jul 2, 2026 •

edited

Loading