Jamboree algorithm
Posted: Fri Aug 28, 2015 6:04 pm
Hi,
I am currently rewriting my engine to use bitboards and to attempt to use parallel search. As mentioned in a previous post, I was trying to improve the speeds of my move generator a while ago. I made some changes and I'm happy with my Perft speeds now. I also decided to switch to magic bitboards.
I have never written multi-threaded code before (I'm using C++), so it's a bit of a steep learning curve. So my question may sound a little confused, or perhaps not even make sense. I am presently trying to teach my self about thread pooling and stuff, but in the meantime I can't quite get my head around how the algorithm works.
My understanding is that it is better to only create as many threads as there are cores, as creating threads creates a lot of additional overhead, so one should implement a thread pool/scheduler type arrangement to create x number of worker threads, and stack jobs onto a queue for the worker threads to get when they become idle. Is that correct?
Then, for arguments sake say you have two worker threads, and at a certain ply you put all the siblings of the first move onto the queue, and the worker threads set going on the first two siblings. Then at ply+1, the first moves of those two nodes are searched; say they don't produce a cut-off, so the siblings are all put onto the queue. I can't see how they can complete, because the worker threads are waiting for the results of ply+1, but there are no other worker threads free to calculate the siblings nodes at ply+1 that it is waiting for. Hope that made sense.
Also, in anticipation that I wouldn't be able to pass my game-state class around by reference anymore since different threads will be changing it simultaneously, I have made sure that I can copy it. However, that also seems like an awful lot of overhead -- is simply copying the game-state for each new parallel search the correct approach?
I'd like to just get something that works first before really trying to optimize it and such.
Thanks in advance for any advice.
I am currently rewriting my engine to use bitboards and to attempt to use parallel search. As mentioned in a previous post, I was trying to improve the speeds of my move generator a while ago. I made some changes and I'm happy with my Perft speeds now. I also decided to switch to magic bitboards.
I have never written multi-threaded code before (I'm using C++), so it's a bit of a steep learning curve. So my question may sound a little confused, or perhaps not even make sense. I am presently trying to teach my self about thread pooling and stuff, but in the meantime I can't quite get my head around how the algorithm works.
My understanding is that it is better to only create as many threads as there are cores, as creating threads creates a lot of additional overhead, so one should implement a thread pool/scheduler type arrangement to create x number of worker threads, and stack jobs onto a queue for the worker threads to get when they become idle. Is that correct?
Then, for arguments sake say you have two worker threads, and at a certain ply you put all the siblings of the first move onto the queue, and the worker threads set going on the first two siblings. Then at ply+1, the first moves of those two nodes are searched; say they don't produce a cut-off, so the siblings are all put onto the queue. I can't see how they can complete, because the worker threads are waiting for the results of ply+1, but there are no other worker threads free to calculate the siblings nodes at ply+1 that it is waiting for. Hope that made sense.
Also, in anticipation that I wouldn't be able to pass my game-state class around by reference anymore since different threads will be changing it simultaneously, I have made sure that I can copy it. However, that also seems like an awful lot of overhead -- is simply copying the game-state for each new parallel search the correct approach?
I'd like to just get something that works first before really trying to optimize it and such.
Thanks in advance for any advice.