01:09 | anuejn | good question
| |
01:27 | Dest123 | left the channel | |
02:19 | tpw_rules | some of the training logic runs in the csr_domain because there's no reason for it not to
| |
02:20 | tpw_rules | oh i was thinking about BitwiseAccessibleInteger, is there really a need for that or bitarray?
| |
02:20 | tpw_rules | i just treat them as regular integers
| |
02:22 | tpw_rules | vup: also i added the assert that the bus word width is exactly 32 because if it's greater then the split_stages calculation has to change, right?
| |
02:22 | tpw_rules | and the write port granularity and stufff
| |
02:27 | tpw_rules | also not sure i 100% agree with wholesale marking the csr_domain as false path. maybe only for StatusSignals
| |
02:42 | vup | tpw_rules: BitwiseAccessibleInteger is also used elsewhere in the codebase, so its just reused here
| |
02:42 | tpw_rules | i mean BitwiseAccessibleInteger as what the control and status signals are mapped to in pydrive. i think it would be fine if it were a regular integer
| |
02:42 | tpw_rules | pydriver*
| |
02:43 | vup | tpw_rules: I think the gateware is still correct if the bus word width is bigger than 32bits if you use the __{get,set}item__ helpers, as those mask the values accordingly
| |
02:43 | vup | tpw_rules: they are just regular integers
| |
02:43 | vup | BitwiseAccessibleInteger is just use for some inbetween conversions
| |
02:44 | tpw_rules | oh, then what is the change anuejn is proposing?
| |
02:44 | vup | we were thinking about using bitarray, but we now disregarded this idea
| |
02:45 | vup | now the PR is a bugfix for the problem that causes the buffer[n]_base values to not be read correctly
| |
02:45 | vup | and also some performance optimizations
| |
02:45 | tpw_rules | btw, i don't think the buffer[n]_level values were being read correctly either, does this fix them too?
| |
02:45 | vup | a simple read for a toplevel variable (design.something) is now down to 33us from 410us for me
| |
02:45 | tpw_rules | ah that's great
| |
02:45 | vup | tpw_rules: yes it fixes the level values aswell
| |
02:47 | tpw_rules | i still don't believe you on the bus_word_width... because bus_word_width is the width of the data attribute in handle_read and handle_write, correct? so the 32 * i in there would have to change
| |
02:47 | vup | ohh yeah
| |
02:48 | vup | that is true
| |
02:48 | vup | sorry somehow I missed that
| |
02:55 | tpw_rules | i wonder if it would be faster to change DevMemAccessor to use memory mapping and ctypes
| |
02:55 | vup | it uses memory mapping
| |
02:56 | tpw_rules | i mean instead of seek and read
| |
02:56 | vup | yeah
| |
02:56 | vup | thats already upcoming
| |
02:56 | tpw_rules | you should be able to cast it to an array of 32 bit integers iirc
| |
02:59 | tpw_rules | but ok, i will see what it looks like when you guys are done
| |
02:59 | tpw_rules | if i can figure out the git, i will rebase my PR on the main branch as it stands and delete the DevMemAccessor hack commit
| |
02:59 | tpw_rules | (tomorrow)
| |
03:00 | vup | ð
| |
03:49 | vup | tpw_rules: so with this it meets timing for me https://paste.niemo.de/togujapafa.patch
| |
03:50 | vup | (this is with vivado 2019.2, I would hope a newer version does even better...)
| |
03:51 | vup | also this https://github.com/apertus-open-source-cinema/naps/pull/23 contains the fix for the pydriver and additional performance optimizations
| |
04:02 | fredy | joined the channel | |
05:34 | tpw_rules | vup: did you have to add the FIFO on line 28 before the gearbox?
| |
05:34 | tpw_rules | i did not
| |
05:34 | tpw_rules | well, if i wanted it to pass at 125MHz
| |
05:34 | tpw_rules | at 250 it does not pass, but i'm pretty sure in your version the gearbox runs at 125MHz
| |
05:35 | tpw_rules | but i will take that diff into consideration, thank you for the research
| |
05:39 | tpw_rules | i might replace the gearbox anyway, i think it is inefficent. vivado mentions a very wide shift register, i think much wider than necessary.
| |
05:40 | tpw_rules | but whatever i will research more tomorrow
| |
06:09 | tpw_rules | also re PR #23, don't merge it yet. i think it can be made even faster. i will fiddle tomorrow too
| |
06:09 | tpw_rules | how exactly did you test the 33us figure?
| |
07:10 | fredy | left the channel | |
07:31 | fredy | joined the channel | |
08:26 | se6astian | good day
| |
08:26 | se6astian | wiki spam continues with accounts already created before
| |
08:27 | se6astian | not that many though, cleaned 2 pages
| |
11:59 | Bertl_oO | morning folks!
| |
12:27 | anuejn | tpw_rules: your problem should be fixed now: https://github.com/apertus-open-source-cinema/naps/pull/23
| |
12:28 | anuejn | also you should get much better performance from pydriver
| |
14:10 | vup | tpw_rules: I tested in with the camera.py design on the micro, just using
| |
14:10 | vup | timeit.timeit(lamba: design.sensor_reset_n, number=...)
| |
14:11 | vup | Would be curious to see if you can get it significantly faster
| |
14:12 | vup | To me anything faster than ~15us seems very hard to achieve
| |
14:19 | vup | (For the general purpose case)
| |
14:20 | vup | SocMemory could probably be sped up a lot more
| |
17:13 | tpw_rules | vup: https://paste.niemo.de/osucewulej.lua this is what i meant with ctypes. this drops from 33us to 24us
| |
17:13 | tpw_rules | at the cost of read and write being unable to accept aligned offsets
| |
17:15 | tpw_rules | non-aligned offsets*
| |
17:29 | tpw_rules | it also makes maybe a 15% improvement to download time for the ILA in the pattern test
| |
17:33 | vup | tpw_rules: interesting, would not have thought that this helps that much
| |
17:33 | vup | want to open a PR with it?
| |
17:35 | tpw_rules | yeah, are you okay with the compromise about non-aligned offsets? i think i can also add some improvements to SocMemory and the ILA to the PR
| |
17:36 | vup | what I am currently thinking is having a slow and a fast version of the DevMemAccessor and insert the slow version if the memory map contains any unaligend addresses and the fast version if it does not
| |
17:39 | tpw_rules | my question is why would it have unaligned addresses
| |
17:40 | vup | well I think you currently can just reques a unaligned address, when creating a {Status,Control}Signal and it will get passed through that way
| |
17:44 | tpw_rules | i guess i still see the whole pydriver thing as mostly a debugging aid. so it makes sense to me to make illegal special cases like unaligned addresses and multiple fields per address (i.e. Value.bit_start != 0)
| |
17:47 | fredy | left the channel | |
17:47 | anuejn | you too feel fancy having a jitsi session?
| |
17:48 | anuejn | s/too/two
| |
17:48 | vup | sounds good to me
| |
17:50 | tpw_rules | sure
| |
17:55 | anuejn | oh sorry bad missplaning from my side I dont really have time :(
| |
17:55 | vup | lol
| |
17:55 | anuejn | maybe later this eveneng but I cant say currently
| |
17:55 | anuejn | planing is not my strength
| |
17:56 | tpw_rules | later would be better for me too
| |
17:56 | vup | sure, should work for me aswell
| |
18:22 | Spirit532 | left the channel | |
18:22 | Spirit532 | joined the channel | |
19:29 | tpw_rules | hm, trying to write to a StatusSignal with pydriver causes a bus error
| |
19:30 | tpw_rules | which isn't entirely unexpected, but it's a bit of a sharp edge
| |
19:34 | vup | do you think the axi infra should silently ignore writes to StatusSignals, or just pydriver should catch them?
| |
19:57 | tpw_rules | i think pydriver should probably catch them and throw an AttributeError. does it actually know? i don't think it's super worth changing if it involves changing more than a couple lines
| |
19:58 | tpw_rules | silently ignoring would be a terrible idea
| |
20:03 | vup | pydriver doesn't know right now, but it would not be too hard to teach it
| |
20:08 | tpw_rules | vup: what exactly did you mean by "The SocMemory changes should also be reflected in the test_smoke test of soc_memory_test.py."? those tests pass
| |
20:08 | tpw_rules | i'm not really sure what additional test to add, the smoke test proves that reading and writing works
| |
20:10 | vup | test_smoke passes, but it should probably read from aligned addresses over the whole address space, right now it reads from (with the change) unsensical addresses and only tests the first fourth of the address space
| |
20:11 | tpw_rules | ohh yes, so just multiply by 4
| |
20:11 | vup | yeah
| |
20:12 | tpw_rules | i was also thinking of separating it into a write pass and a read pass because that would have caught the unaligned bug i fixed
| |
20:15 | vup | sure makes sense
| |
20:15 | tpw_rules | alright, i've done everything i want to for https://github.com/apertus-open-source-cinema/naps/pull/21 . it's ready to merge once you review it
| |
20:15 | tpw_rules | (and i address your comments)
| |
21:43 | anuejn | okay i am ready and chilling in the jitsi
| |
21:52 | tpw_rules | anuejn: do you see me?
| |
21:52 | anuejn | nope
| |
21:52 | tpw_rules | blah. it keeps alternating between saying connection good and connection lost
| |
21:52 | anuejn | can you hear me?
| |
21:52 | tpw_rules | no
| |
21:52 | tpw_rules | but i see that you rae there
| |
21:52 | anuejn | bad luck
| |
21:53 | tpw_rules | you are*
| |
21:53 | anuejn | yeah me too
| |
22:43 | vup | Bertl_oO: se6astian: is the current beta gateware doing 60fps at the full sensor resolution?
| |
23:29 | Bertl_oO | bope
| |
23:29 | Bertl_oO | *nope
| |
23:29 | Bertl_oO | hdmi does 60FPS, sensor max 40FPS
| |
23:30 | vup | I see
| |
00:18 | tpw_rules | left the channel | |
00:19 | tpw_rules | joined the channel |