# Topic: Unique Shuffle, Chaos-n-Chains (Page 1 of 1)

warjournal

From:
Insane since: Aug 2000

posted 12-10-2018 16:02

When I started looking into latin squares and sudoku, I came across an interesting
problem. How do you shuffle a sequence so that each one is unique by position?

code:
```0 1 2 3
1 2 3 0```

If you were to use a regular random/shuffle function, the problem manifests
like so:

code:
```0 1 2 3
0 2 3 1```

That's not good. Not good at all. The first position is the same and is completely
invalid for my purposes. The whole point of shuffling is get get a certain kind
of unique output.

Let's say I have 8 digits and I want to shuffle the first 5. If one of those 5
comes back the same, then I only shuffled 4. If two come back the same, then I only
shuffled 3. When I want something shuffled, I want it shuffled, damn it. For my
purposes, I need that unique by position output. And I really need to avoid the
possibility of everything coming back exactly the same (very bad).

My first solution was to run a permutation, keep the valid ones in a list, and then
pick one at random. It works, but seriously bogs down when trying to shuffle a
sequence that is 16 long. That's a base perm of 16! and I don't want to do that.
I don't even want to do 10!. And that's just once - not including the
fact that I have to do it more than a few times. Imagine trying to do 10!*5.
No.

There has got to be a better way.

After thinking about it for awhile, I decided on a daisy chain.

code:
```0 1 2 3
1 2 3 0

01 12 23 30

len(chain)=[4]```

Using a daisy chain as such, I could pass a bunch of indices, shuffle them,
and then put them back together (re-index):

code:
```a b c d e f g h
0 1   3
3 0   1
d a c b e f g h```

And I was happy with that for a few years. But I was doing some distribution tests a
few weeks ago and some anomolies popped up. Nothing horrible, but enough to make me
want to fix it. (I was aware of this years ago, but let it go at the time.)

The problem is that there are far more valid solutions that should be included.
Check this out:

code:
```0 1 2 3
1 0 3 2

len(chain)=[2,2]```

With a full chain, a lot of solutions are excluded - and I mean a lot.

Okay, time to start thinking about the possibility of smaller chains within the
big chain. Mini-chains or sub-chains.

My first stop was thinking about the basic possibilities.
Something like this:

code:
```len(master_chain)=6
[2,2,2]
[2,4]
[6]```

I noticed, and it is easy to surmise, that the length of the sub-chains add up to
the length of the master chain. I was sure that I wasn't the first one to think
of this kind of problem. So I hit Google. Turns out that this problem is called
coin change.

Damn near all hits were recursive and greedy. They did me no good. And then I found
coinchanging.py by Kris Wright.

code:
```n=6
coins=[2,3,4,6]

[2, 2, 2]
[2, 4]
[6]```

Simple, and bloody perfect.

Notice that I did not include 1 or 5 in coins[]. This is because you can't have a chain
with length 1. It's just not a chain with length=1.

So, which one to pick? I mean, the probability of each chain can't be the same, right?
Or can it? Time for some brute force to get some more data.

I wrote it up to perm, take the valid solutions, and count the chains.
I got this:

code:
```n=6

([2, 2, 2], 15)
([2, 4], 30)
([3, 3], 40)
([4, 2], 60)
([6], 120)

g00dcount=265
720```

Okay... wtf? I know that those numbers are correct because I did a proper brute force. However,
notice that my brute force shows [2,4] and [4,2] as seperate solutions. Even though both of those
chains are identical, they are different. I'll see about explaining this later.

What do those numbers mean? It means that a there is a 15:265 chance that the chain [2,2,2]
will be picked. It means that there is a 40:265 chance that [3,3] will be picked.

It means that there is a 30:265 chance that [2,4] will be picked. Or 60:265 that [4,2] will be
picked. But, doing it coinchanger.py way since both chains are identical, 90:265 that [2,4] will
be picked.

Note: Going with full daisy chain (Phil) will yield only 120:265=45%. Like
I mentioned earlier, a lot of valid solutions are being left out.

Note: "g00dcount" was legit typo that I haven't bothered fixing.

Note: (Phil, 120) is actually (n-1)!.

So now I have to figure out the probabilities that my brute force spits out. More staring at
numbers, playing with formulas, and hardcore logical thinking. I live for this shit.

Two sleepless nights later, I came up with this little ditty:

code:
```#!/usr/bin/env python
from math import factorial as fact

def nchoser_min1(n,r):
# permutation without repetition
# with a reduction in possibilities
n=n-1
r=r-1
return fact(n)/fact(n-r)```

Using that, I can now calculate the probability of each chain. Let's take it for a spin with
some hard coding.

Note: I know that I'm not actually using formal N Chose R, AKA Combinations without
Repetition. I am actually using Permutations without Repetition.

code:
```#!/usr/bin/env python
from math import factorial as fact

def nchoser_min1(n,r):
# permutation without repetition
# with reduction in possibilities
n=n-1
r=r-1
return fact(n)/fact(n-r)

print('n=6')
print('')

#[2, 2, 2] 15
a=nchoser_min1(6,2)
b=nchoser_min1(4,2)
c=nchoser_min1(2,2)
print('[2, 2, 2] {}*{}*{}={}'.format(a,b,c,a*b*c))
gc0=a*b*c

#[2, 4], 30
a=nchoser_min1(6,2)
b=nchoser_min1(4,4)
print('[2,4] {}*{}={}'.format(a,b,a*b))
gc1=a*b

#[3, 3] 40
a=nchoser_min1(6,3)
b=nchoser_min1(3,3)
print('[3,3] {}*{}={}'.format(a,b,a*b))
gc2=a*b

#[4, 2], 60
a=nchoser_min1(6,4)
b=nchoser_min1(2,2)
print('[4,2] {}*{}={}'.format(a,b,a*b))
gc3=a*b

#[6], 120
a=nchoser_min1(6,6)
print('[6] {}'.format(a))
gc4=a

print('')
print('g00dcount={}'.format(gc0+gc1+gc2+gc3+gc4))```

And now side-by-side:

code:
```brute force           n chose r
n=6                   n=6

([2, 2, 2], 15)       [2, 2, 2] 5*3*1=15
([2, 4], 30)          [2,4] 5*6=30
([3, 3], 40)          [3,3] 20*2=40
([4, 2], 60)          [4,2] 60*1=60
([6], 120)            [6] 120

g00dcount=265         g00dcount=265```

So, yeah. The numbers add up every single time. I even played with some other
hard code examples and those worked as well.

Now I have the last piece and a plan-of-attack that can get me what I want.

- Start with a chain length and pass it on to coinchanging.py.
- Take each solution, perm when needed, and calculate the probability.
- Pick a chain based on those probabilities.
- Expand the chosen chain into groups.
- Shuffle the whole thing all will-nilly.
- Take each group in turn and daisy chain
- Put it all back together.

code:
```n=6
0 1 2 3 4 5

based on probability,
the chain picked is: [2,4]

expand into mini-chains:
0 0 1 1 1 1

shuffle all willy-nilly:
1 0 1 1 1 0

take each mini-chain in turn
and daisy chain shuffle
put it all back together

. 0 . . . 0   1 . 1 1 1 .   1 0 1 1 1 0
. 1 . . . 5   0 . 2 3 4 .   0 1 2 3 4 5
. 5 . . . 1   2 . 4 0 3 .   2 5 4 0 3 1

[2]=15 51
[4]=02 24 43 30```

Ugh. I hope that makes sense.

I have all of the pieces and I can start putting it all together.

A little bit of a side-note. My use of the word "chain" is extremely lazy.
Once I suss a few things out, I'll wax my nomenclature a bit better.

In my next post, I'm going to clarify a few things.
Stay tuned.

edit: lol linkwords, not sure i can fix...
edit: too funny, linkwords turning ( 'bracket' 6 'bracket' ) into (Phil)
Medic!
(Edited by warjournal on 12-10-2018 16:09)

(Edited by warjournal on 12-10-2018 16:12)

(Edited by warjournal on 12-10-2018 16:24)

warjournal

From:
Insane since: Aug 2000

posted 12-10-2018 22:21

Let's take a closer look at this and why I am using -1 for both variables:

code:
```def nchoser_min1(n,r):
n=n-1
r=r-1
return fact(n)/fact(n-r)```

When doing the factorial thing with pools of possibilities, you have
something like this:

code:
```n=6

0=[0, 1, 2, 3, 4, 5]
1=[0, 1, 2, 3, 4, 5]
2=[0, 1, 2, 3, 4, 5]
3=[0, 1, 2, 3, 4, 5]
4=[0, 1, 2, 3, 4, 5]
5=[0, 1, 2, 3, 4, 5]```

Then, when you pick one, you prune, and move on to the next one.
- the first has 6 options
-- pick and prune
- the second has 5 options because of pruning
-- pick and prune

And so on down to the last pool, which will have only one option.

Factorial.

So why am I using -1? Because I have the added rule that a position
can't contain itself in it's pool.

code:
```n=6

0=[1, 2, 3, 4, 5]
1=[0, 2, 3, 4, 5]
2=[0, 1, 3, 4, 5]
3=[0, 1, 2, 4, 5]
4=[0, 1, 2, 3, 5]
5=[0, 1, 2, 3, 4]```

And then the same kind of thing, but remember that it can't be itself
(so to speak).
- the first pick has only 5 options because it can't be itself
-- pick and prune
- the next can't be iself or previous, so only 4 options
-- pick and prune
- the next can't be itself or previous 2 picks, so only 3 options
-- pick and prune

That's why I said that n=6 is actually (6-1)!=120 for a full chain.

Let's look at one with sub-chains.

code:
```n=6
c=[3,3]

0=[1, 2, 3, 4, 5]
1=[0, 2, 3, 4, 5]
2=[0, 1, 3, 4, 5]
3=[0, 1, 2, 4, 5]
4=[0, 1, 2, 3, 5]
5=[0, 1, 2, 3, 4]

pick 02
prune

0=2
1=[3, 4, 5]
2=[1, 3, 4, 5]
3=[1, 4, 5]
4=[1, 3, 5]
5=[1, 3, 4]

mini-chain=[02]```

What just happened? Why all the 0's taken out? Because it is the start
of the mini-chain and has to be set aside to link-back to when you get
to the end of the mini-chain. Until then, 0 can't be chosen.

code:
```pick 21
prune

0=2
1=[3, 4, 5]
2=1
3=[4, 5]
4=[3, 5]
5=[3, 4]

mini-chain=[02, 21]```

And now you do the link-back thing because you are at the end of the
mini-chain (size 3).

code:
```end of chain
has to be 10

0=2
1=0
2=1
3=[4, 5]
4=[3, 5]
5=[3, 4]

mini-chain=[02, 21, 10]```

Pick a starting point, set it aside, chain away, and then link-back
when you get to the end.

code:
```0 1 2 3 4 5
2 0 1 4 5 3```

that there are two mini-chains in there instead of a full-on daisy chain of
size 6.

The possibilities go something like this:

code:
```n=6
c=[3,3]

0: starting point, 5 possibilities, can't be self, prune and set aside
2: 3 possibilities, can't be self, previous, or starting point, pick-n-prune
1: only one possibility, link-back to start```

And that is why using -1 for n and r is good for this. That is why it works
to get the same numbers as my brute force.

Instead of (6 chose 3), actually doing (5 chose 2).
And so on.

But there is something in my brute force that really bothers me. These two
little lines:

code:
```([2, 4], 30)
([4, 2], 60)```

They are different. Since they produce different probabilities depending on order
they are obviously different.

Not to me. To me, [2,4] and [4,2] are the same.
Remember when I said that I was going to expand the chains into groups and shuffle?
That is why I see them as the same. Check it out:

code:
```[2, 4] = 0 0 1 1 1 1
[4, 2] = 0 0 0 0 1 1

:shuffle:

[2, 4] = 1 1 0 0 1 1
[4, 2] = 0 0 1 1 0 0```

They have over-lap and can produce the exact same groupings. And yet one has a higher
probability of happening over the other. Can this be refined even more to take that
into account without actually... doing it?

Should I keep what I already have working? Or should I go down yet another rabbit
hole?

But...
Yesterday I realized that I had already solved this problem over two years ago.
I just didn't realize it right away because it was for a different thing that I
was playing with. I already ran a few small samples and the numbers are close enough.
I'm going to open the flood gates of possibilities even wider.

Chaos is a strange mistress that I adore to no end, and imma make 'er mah bitch.

No, I'm not going to abandon this. There is power in manipulating chains. I've
already done it several times to great effect. Just not quite in this way. This
particular approach will go on the backburner for a bit.

Yes, I will show ya'll how I solved this problem over two years ago. One of the
goofiest little baubles that I have ever coded. And the basic idea behind it
also has a lot of power.

Stay tuned.

(Edited by warjournal on 12-10-2018 22:58)

warjournal

From:
Insane since: Aug 2000

posted 12-11-2018 01:50

This all started because I wanted to do a unique shuffle by
position. It can't be that difficult, right?

My first solution was doing a perm, keeping the ones that I
like, and picking one at random. Fine for small numbers, but
I can get crazy sometimes. This route quickly became far to
unweildly and infeasible.

My next solution was to do a daisy chain. This little ditty:

code:
```0 1 2 3 4
1 2 3 4 0

01 12 23 34 40```

And I was happy for quite some time. Even though I was aware
of a problem, I was happy.

Now I want to fix that problem. That problem being this little
guy:

code:
```0 1 2 3 4
1 0 3 4 2```

I did some brute force to get some data to work from. I took
that data, thought it up, and nailed it. I even learned about the
coin change problem along the way. A bit long-winded, but I
was about to be happy again. Fast enough with larger sequences
and easy enough to code. Happy day!

As I was going about my crazy macak routine, I noticed something
in my notes. It tickled my brain. "Hey, that looks familiar.
I've seen that before. OMG I bet it would work!" And then my
crazy got even macakier.

Rather than dig it out of my archives, I re-wrote it in under
an hour. When you see how simple the code is, you'll see why
it didn't take long at all to code.

Well before I was daisy chaining, I was playing a game of
"Can Be, Can't Be, Has to Be". That right there was the start
of my sudoku journey. I wasn't thinking in terms of unique shuffle
by position; rather, I was taking more of a quantum approach.

The problem looks like this:

code:
```0 1 2 3 4 5
1 2 3 4 0 .```

What do you put in that last position? You can't put anything in
there. It Can't Be any number.

But if you take one step back and re-arrange it like so:

code:
```0 1
1 2
2 3
3 4
4 [5, 0]
5 [0]```

Now, 4 Can Be 5 or 0.
However, 5 Has to Be 0.

I've seen this problem more than a few times with people trying
to make sudoku grids or latin squares from scratch.

Me? I solved it with one simple rule that I call "Minimum in Pool".
Since 5 has the least amount of possibilities, pick from it's pool.
And 4 will fall into place.

[code]

code:
```0 1        0 1     0 1
1 2        1 2     1 2
2 3        2 3     2 3
3 4        3 4     3 4
4 [5, 0]   4 [5]   4 5
5 [0]      5 0     5 0

0 1 2 3 4 5
1 2 3 4 5 0```

Fire up your pools, remove self from pools, find all pools with minimum
number of possibilities, pick one of those pools, pick from that pool,
prune, and go again,

No big deal.

What got me thinking that this will work is the fact that you are picking
from pools at random. There is no need to pick a particular chain sequence
because they will happen incidently. The chains are a product of
probability - not the other way around.

Big rush to get the code running just so I could see the numbers. I was getting
really sloppy towards the end.

Here is a sample data set from my brute force:

code:
```n=7

[2, 2, 3], 48  3%
[2, 3, 2], 72  4%
[3, 2, 2], 90  5%
[2, 5], 144    8%
[5, 2], 360   16%
[3, ```