01:21 | fredy330 | left the channel | |
01:55 | fredy330 | joined the channel | |
01:55 | danieel | vup: the vectorized code is better than a hand crafted sse/avx inline?
| |
01:57 | vup | danieel: well I did not try hand crafted sse / avx, but if you look at the generated code: https://rust.godbolt.org/z/aYKnv8Y5e
| |
01:57 | vup | its actually super clean
| |
01:57 | vup | and probably very similar to what I would come up with
| |
01:59 | danieel | thats nice... can we try if it can do Neon? (like on arm?)
| |
02:05 | vup | danieel: yeah it looks like the neon code is super clean aswell: https://rust.godbolt.org/z/64q96aenP
| |
02:06 | vup | (even nicer than the avx2 code, but that is expected, as sse/avx does not have a lot of instructions for 8 bit integers, for example shifts are missing, so you first need to convert the 8 bit integers to 16 bit integers)
| |
02:07 | danieel | is that just what rust gives? or llvm/gcc can do as well?
| |
02:07 | danieel | if this can do both.. then no need to use inline asm :)
| |
02:09 | vup | I mean that is what rust gives, but as is uses llvm as backend I would expect similar results for c
| |
02:09 | vup | and indeed: https://godbolt.org/z/z8zGsndz3
| |
02:10 | vup | the naive c translation produces pretty much the exact same code
| |
02:10 | vup | (works for sse/avx aswell)
| |
02:11 | vup | gcc produces a more complicated, but still vectorized and atleast for sse/avx similarly performing version
| |
02:11 | vup | (did not benchmark the neon version myself yet)
| |
02:12 | danieel | well, i am amazed
| |
02:12 | vup | In general, I would expect the vectorization for c/c++ to be a lot better even than the one for rust, because rust for example has bounds checking for array accesses, which it first has to figure out how to optimize away
| |
02:14 | vup | (otoh rust disallows aliasing of mutable references, so thats one point where its simpler for vectorization, but in c you can of course always use `restrict` to help the compiler)
| |
03:03 | Bertl_oO | would be nice to see if 'readable' C code can outperorm the neon remapper ;)
| |
03:03 | Bertl_oO | *outperform
| |
06:14 | Bertl_oO | off to bed now ... have a good one everyone!
| |
06:14 | Bertl_oO | changed nick to: Bertl_zZ
| |
07:51 | se6astian | good day
| |
07:52 | se6astian | vup: bandwidth.bmp uploaded to https://cloud.apertus.org/index.php/s/NsbF5kD4Yt4SpZg
| |
08:39 | se6astian | have you seen https://libre-soc.org ?
| |
11:02 | se6astian | https://wiki.apertus.org/index.php/Raw12_viewer documentation created and https://wiki.apertus.org/index.php/RAW12 updated
| |
12:51 | Bertl_zZ | changed nick to: Bertl
| |
12:51 | Bertl | morning folks!
| |
15:10 | Bertl | off for now ... bbl
| |
15:11 | Bertl | changed nick to: Bertl_oO
| |
15:50 | se6astian | vup/anuejn/Bertl_oO: Do you know why black coloumns feature seems to be commented out in current snap code: https://github.com/apertus-open-source-cinema/axiom-firmware/blob/main/software/sensor_tools/snap/snap.c#L620 ?
| |
15:59 | se6astian | seems like it was that way from the beginning? a mistake we never noticed?
| |
16:09 | se6astian | pushed fixed
| |
16:09 | se6astian | https://github.com/apertus-open-source-cinema/axiom-firmware/commit/9eed0c6943af4b007a207f6aefc94e35af57eef1
| |
16:09 | se6astian | tested on beta
| |
16:09 | se6astian | seems to be working fine
| |
17:06 | vup | yeah libre-soc has been around for some time
| |
17:08 | vup | not sure about the black column feature
| |
17:25 | vup | Bertl_oO: so this seems to get auto-vectorized: https://paste.niemo.de/raw/ixomuboteq
| |
17:25 | vup | but one would have to benchmark it against your handrolled version
| |
17:34 | vup | also did you every try just mmaping the file instead of converting line by line and then copying that to the file?
| |
18:02 | Bertl_oO | the main idea for the conversion is to use it as a 'filter' in a pipeline, so mmaping is not suitable for this purpose
| |
18:03 | Bertl_oO | regardings benchmarking: yes, that would be nice to see, se6astian ^^
| |
18:09 | vup | Bertl_oO: hmm what more filters do you envision in the pipeline for snap?
| |
18:28 | illwieckz | left the channel | |
19:09 | Bertl_oO | not necessarily for snap, the idea here was to use memtool to read/write data from/to memory and to convert it on the fly
| |
19:21 | vup | right
| |
20:35 | balrog | left the channel | |
20:39 | balrog | joined the channel | |
21:12 | illwieckz | joined the channel | |
23:34 | fredy330 | left the channel |