apertus° IRC logs

2018/05/07

Timezone: UTC

01:00

rton

left the channel

02:39

sups

joined the channel

02:46

sups

left the channel

07:49

se6astian|away

changed nick to: se6astian

08:01

rton

joined the channel

08:37

sebix

joined the channel

08:37

sebix

left the channel

08:37

sebix

joined the channel

09:07

se6astian

https://www.apertus.org/axiom-beta-roadmap updated

09:18

Bertl_zZ

changed nick to: Bertl

09:18

Bertl

morning folks!

10:15

seaman

joined the channel

12:01

TofuLynx

left the channel

12:55

Bertl

off for now ... bbl

12:55

Bertl

changed nick to: Bertl_oO

13:06

Kjetil_

joined the channel

13:06

alexML_

joined the channel

13:11

anuejn2

left the channel

13:11

Kjetil

left the channel

13:11

alexML

left the channel

13:11

anuejn2

joined the channel

13:45

RexOrCine|away

changed nick to: RexOrCine

14:00

comradekingu

left the channel

15:26

se6astian

changed nick to: se6astian|away

15:28

TofuLynx

joined the channel

15:30

TofuLynx

left the channel

15:43

nmdis1999

joined the channel

16:09

sebix

left the channel

16:48

nmdis1999

Good evening everyone! :D

16:48

BAndiT1983

hi nmdis1999

16:48

Bertl_oO

changed nick to: Bertl

16:48

Bertl

evening nmdis1999!

16:48

nmdis1999

Hello BAndiT1983!

16:49

nmdis1999

How was the fare Bertl?

16:49

Bertl

great! thanks for asking!

16:50

nmdis1999

BTW I did some research on how to benefit from cache utilization as you asked :)

16:50

se6astian|away

changed nick to: se6astian

16:50

nmdis1999

Here are the point : 1. Using small data types

16:51

nmdis1999

2. Organizing data to avoid alignment holes 3. Problems caused by standard dynamic memory allocator (we already discussed as I believe)

16:52

se6astian

hi nmdis1999

16:52

alexML_

hello

16:52

nmdis1999

Also, I found a great article on how to optimize data cache access by using transformation

16:52

nmdis1999

Hi alexM_ and sebastian :D

16:57

Bertl

excellent

16:58

nmdis1999

One more point, which I am not sure about though : instead of reading one item per cache line in the inner loop (in case of matrix for example) we can use all of the items

16:58

nmdis1999

for example : itemsPerCacheLine = CacheLineSize/sizeof(elementType)

16:58

nmdis1999

is it a good idea?

16:59

Bertl

there is no problem with cache access, as long as it doesn't trigger writeback or unwanted fetches

16:59

BAndiT1983

what is it about currently?

17:00

Bertl

but this is something you have to test on the actual hardware (of course it helps to be aware of this)

17:01

nmdis1999

Yeah, I had a slight guess about it.

17:01

Kjetil_

I guess this is supposed to run on a PC. So if you want high speed gain from cache optimizations multiple cores should, if possible, work on the same L3/L2 cache lines

17:02

nmdis1999

Kjetil_ : but isn't optimization for cache kind of different L2/L3 cache?

17:03

nmdis1999

Bertl : you asked me to find out which data is used for which visualization, isn't the data same (pixel data?) I have some doubt in it

17:04

Bertl

the source for all will always be the sensel data from the sensor

17:05

Bertl

but the access pattern will be different I guess

17:05

Kjetil_

nmdis1999: hm?

17:05

nmdis1999

Okay :)

17:06

Kjetil_

changed nick to: Kjetil

17:06

Bertl

Kjetil: it is supposed to run on the Axiom Beta

17:06

nmdis1999

I was wondering if the optimization for cache works differently for L2/L3

17:06

Kjetil

Bertl: ah ok

17:06

alexML_

regarding cache, here's a quick exercise anyone can try

17:07

alexML_

remember the qualification task, with lodepng?

17:07

nmdis1999

Yup

17:07

alexML_

take a 4K image (4096x3072) and do some processing on it using two for loops

17:08

Bertl

note that the Cortex A9 in the Zynq doesn't have L2/L3 cache it has an I-cache and a D-cache

17:09

alexML_

let's say for (int y = 0; y < 3072; y++) { for (int x = 0; x < 4096; x++) { do something with im[x + y*pitch] } }

17:09

nmdis1999

Bertl : I didn't know that , thanks!

17:09

Bertl

https://developer.arm.com/products/processors/cortex-a/cortex-a9

17:09

alexML_

write down the execution time, then reverse the order of the two for's

17:09

alexML_

first do the "for x" in the outer loop, then the "for y" in the inner loop

17:09

Kjetil

nmdis1999: I guess is is less relevant as this should run on the Cortex. But, L2/L3 is usually shared between multiple cores. So if multiple cores require the same piece of data, the first core to request it will suffer cache miss penalty. But if the other cores shortly after need the same data they will then hit in the cache.

17:10

nmdis1999

The execution time will be larger when we'll reverse the loops ?

17:11

Bertl

https://mgc-images.imgix.net/esl/Zynq7000_credited-9D62B659.png?q=80&w=1600&fit=max

17:11

Kjetil

(One example is multithreaded matrix multiplication were all cores multiply separate rows with the same column)

17:13

alexML_

yeah; didn't try this on ARM, but on a low-end x86-64 you can easily get a 20-30x slowdown after swapping the loops

17:13

nmdis1999

6.2.1 : https://lwn.net/Articles/255364/

17:14

nmdis1999

It did explained how and why the execution time will be greater, also how we can use itemsPerCacheLine = CacheLineSize/sizeof(elementType) in our benefit :)

17:17

Kjetil

Remember that you should also consider alignment to cache lines when using itemsPerCacheLine trickery

17:19

alexML_

I'd say we shouldn't worry too much about cache at first; these optimizations are best done after getting a working proof of concept, IMO

17:21

Kjetil

If you make sure that the processing parts also can be used on a x86 you can use tools like cachegrind to view cacheperformance. (The caches won't be the same, different sizes, different types, hw prefechers will be different) But it might give you are clue where to start optimising

17:34

nmdis1999

Thanks, Kjetil that helps :D

17:44

nmdis1999

off for now, good night :D

17:45

nmdis1999

left the channel

18:50

BAndiT1983

changed nick to: BAndiT1983|away

19:09

XDjackieXD

left the channel

19:12

XDjackieXD

joined the channel

19:58

BAndiT1983|away

changed nick to: BAndiT1983

20:29

TofuLynx

joined the channel

20:29

TofuLynx

Good Evening!

20:30

TofuLynx

Hello BAndiT1983!

20:30

TofuLynx

Are you on?

20:41

RexOrCine

changed nick to: RexOrCine|away

20:53

parasew[m]

left the channel

20:55

parasew[m]

joined the channel

21:00

Bertl

off for now ... bbl

21:00

Bertl

changed nick to: Bertl_oO

21:10

BAndiT1983

hey TofuLynx, just for a bit

100

21:10

BAndiT1983

what's the latest state?

101

21:21

TofuLynx

Hello!

102

21:21

TofuLynx

I fixed that accident with the class name, I renamed it back to BilinearDebayer.

103

21:22

TofuLynx

Also, I'm implementing a pattern system, however, I have some questions regarding this

104

21:23

TofuLynx

From what I saw, it seems that the bilinear debayer functions do also depend on the image pattern, so it isnt enough to have pattern offsets

105

21:23

TofuLynx

is this true?

106

21:24

BAndiT1983

offsets are always equal, so it should be enough in fact

107

21:24

TofuLynx

but, for example , imagine this situation:

108

21:25

se6astian

changed nick to: se6astian|away

109

21:25

BAndiT1983

for example RGGB, you wil have R at offset 0, G1 at offset 1, G2 at offset width + 1 and B at width + 2

110

21:25

TofuLynx

you need to generate a red pixel value for a green0 pixel offset in a RGGB pattern

111

21:25

TofuLynx

G2 at offset width + 0 and B at width + 1 :)

112

21:25

BAndiT1983

of course, but you know, that your start point is at index 1, bet ween 2 reds, for example

113

21:26

TofuLynx

can you explain it better?

114

21:26

TofuLynx

wait

115

21:26

BAndiT1983

just like in the linear interpolation, you would start at index 1 of the data, then you wil get red at 0 and at 2

116

21:26

TofuLynx

I will paste it on lab chat

117

21:26

BAndiT1983

this is also what my implementation does

118

21:27

TofuLynx

pastebin*

119

21:27

BAndiT1983

what about trello, supragy is using it, also nmdis1999

120

21:27

TofuLynx

https://pastebin.com/Nfvtp8FG

121

21:27

TofuLynx

check it

122

21:27

TofuLynx

well I suggested you and g3ggo to use trello, but I think g3ggo couldnt use it or something like that

123

21:28

BAndiT1983

why do you do the shift as index?

124

21:28

BAndiT1983

g3gg0 is commenting sometimes on supragyas board, so no problem there

125

21:28

TofuLynx

shift as index?

126

21:28

BAndiT1983

by the way, he is absent today, but you were also off for 2 days, hope he will be online tomorrow

127

21:29

TofuLynx

okk!

128

21:29

BAndiT1983

width << 1, this looks a bit awkward

129

21:29

TofuLynx

well, how do you suggest to change to the next row?

130

21:30

BAndiT1983

borders should be done in linear way, as you just have 2 values there

131

21:30

BAndiT1983

i would use index = 1 as starting point for red, or whatever color is first in the sensor

132

21:31

TofuLynx

yep, I am not doing bilinear on borders, except on the down border, for now. I have to avoid it in the four borders

133

21:31

BAndiT1983

maybe 2 steps, first most data, starting at x 1 and y 1, so you can get it diagonally

134

21:31

BAndiT1983

afterwards the borders

135

21:31

TofuLynx

yeah

136

21:31

TofuLynx

why do you suggest index = 1 as starting point?

137

21:32

BAndiT1983

it was for general linear interpolation, so you are between 2 reds

138

21:32

TofuLynx

hmm

139

21:32

BAndiT1983

but for bilinear you should start at 1,1 or so, so you are in the middle of diagonal cross of pixels, can't remember the patterns curretnly

140

21:32

BAndiT1983

*currently

141

21:33

TofuLynx

ah yeah it does start at 1,1 :)

142

21:33

BAndiT1983

nice examples there -> https://www.semanticscholar.org/paper/Low-cost-Bayer-to-RGB-bilinear-interpolation-with-PÃ©rez-Espeso/9baca257d1f737fc50ba250e0c7fdcf3d7e81f2c

143

21:33

TofuLynx

ok, andrej, I have a query

144

21:33

TofuLynx

basically

145

21:33

TofuLynx

imagine a RGGB pattern

146

21:33

BAndiT1983

give the loop constant values, without shifting and such, also no calculations, otherwise the loop will execute it at every iteration

147

21:34

TofuLynx

and you need the red value for the Green0 offset

148

21:34

TofuLynx

you basically grab the red value at the left and the red value at the right, correct?

149

21:34

BAndiT1983

yep

150

21:34

TofuLynx

151

21:34

TofuLynx

now imagine a GBGR pattern

152

21:35

TofuLynx

we need a red value for the first green offset

153

21:35

TofuLynx

you basically grab the red value at down and red value at up, correct?

154

21:35

BAndiT1983

there is not gbgr, gbrg would it be

155

21:35

TofuLynx

oops, my bad

156

21:35

BAndiT1983

otherwise you would have columsn of green

157

21:36

BAndiT1983

https://github.com/codeplaysoftware/visioncpp/wiki/Example:-Bayer-Filter-Demosaic

158

21:36

TofuLynx

imagine

159

21:36

TofuLynx

GRBG

160

21:36

TofuLynx

we need the red value at the left and the red value at the right, for the first green offset, correct?

161

21:37

BAndiT1983

right one will get interpolated value

162

21:37

BAndiT1983

but left one will get half of red on index 1

163

21:37

TofuLynx

hmm?

164

21:37

TofuLynx

can you explain it?

165

21:38

BAndiT1983

my suggestion is to have some sort of balancing of values for now, not an expert, to be honest

166

21:38

BAndiT1983

you have a value for red on index 1 and required red values for index 0 and 2

167

21:38

TofuLynx

correct

168

21:38

BAndiT1983

index 2 is simple, just interpolate between reds on index 1 and 3

169

21:39

TofuLynx

wait

170

21:39

BAndiT1983

but as we have a border case here, so index 0 would get half of red on index 1

171

21:39

TofuLynx

where is the red value at index 3?

172

21:39

BAndiT1983

ah you have red in the next line, then width + index ;)

173

21:39

TofuLynx

hmm can you repeat it? xD

174

21:40

BAndiT1983

it's image d in my link

175

21:40

BAndiT1983

there are also the cases and according interpolation patterns

176

21:41

TofuLynx

Hmm I see

177

21:41

TofuLynx

bookmarked!

178

21:43

TofuLynx

also

179

21:43

TofuLynx

how do I replace the (_width << 1) ?

180

21:45

BAndiT1983

with red offset + 1

181

21:45

BAndiT1983

also width << 1 shouldn't be used in the loop, but calculated beforehand, it accelerates the processing

182

21:46

BAndiT1983

https://link.springer.com/content/pdf/10.1186%2Fs13640-017-0196-z.pdf

183

21:48

TofuLynx

Ok! :)

184

21:49

BAndiT1983

interpolation in outer areas, like in bggr pattern, requires some sort of weighting, to get the border right

185

21:49

BAndiT1983

don't know if it visible if you would replicate red value to the missing pixels on the left and top

186

21:49

danieel

joined the channel

187

21:51

TofuLynx

Is it noticeable the change from bilinear to linear in this case?

188

21:52

BAndiT1983

don't think so, but you have to process top and bottom horizontally, left and right vertically

189

21:52

TofuLynx

yes?

190

21:53

BAndiT1983

do you want to know it for border areas or for main image?

191

21:54

TofuLynx

I am trying to understand if is it neccessary some aditional processing for the linear interpolation at the borders

192

21:54

TofuLynx

also, what do you think about the SetPatternOffsets function? >> https://pastebin.com/v57QrEE6

193

21:54

BAndiT1983

no, just known pixels / 2 between them

194

21:55

BAndiT1983

this method should be ok, at first glance

195

21:55

TofuLynx

ah ok, but you are talking about weighting, what do you mean?

196

21:55

BAndiT1983

i've had a task for you this days

197

21:56

BAndiT1983

bggr, if we are looking at the first red pixel, which has 3 unknown pixels on top, top-left and left

198

21:56

BAndiT1983

then red pixle would have weight of 1, but others would get something like 0.5 or 0.75 of the value

199

21:57

BAndiT1983

my task would be, as excercise for you, to implement a unit test for downscaler, an example for bayerpreprocessor is already in the repo and also executing succesfully now

200

21:57

TofuLynx

It's a weighted variation of bilinear interpolation?

201

21:58

TofuLynx

Hmmm, and do you think we need to test the downscaler for my project?

202

21:59

BAndiT1983

yes, on one side as excercise, on the other to ensure it's still working if we would do adjustments