Closed Thread Icon

Topic awaiting preservation: Unique Shuffle, Chaos-n-Chains Pages that link to <a href="https://ozoneasylum.com/backlink?for=33322" title="Pages that link to Topic awaiting preservation: Unique Shuffle, Chaos-n-Chains" rel="nofollow" >Topic awaiting preservation: Unique Shuffle, Chaos-n-Chains\

 
Author Thread
warjournal
Maniac (V) Mad Scientist

From:
Insane since: Aug 2000

posted posted 12-10-2018 16:02

When I started looking into latin squares and sudoku, I came across an interesting
problem. How do you shuffle a sequence so that each one is unique by position?

code:
0 1 2 3
1 2 3 0



If you were to use a regular random/shuffle function, the problem manifests
like so:

code:
0 1 2 3
0 2 3 1



That's not good. Not good at all. The first position is the same and is completely
invalid for my purposes. The whole point of shuffling is get get a certain kind
of unique output.

Let's say I have 8 digits and I want to shuffle the first 5. If one of those 5
comes back the same, then I only shuffled 4. If two come back the same, then I only
shuffled 3. When I want something shuffled, I want it shuffled, damn it. For my
purposes, I need that unique by position output. And I really need to avoid the
possibility of everything coming back exactly the same (very bad).

My first solution was to run a permutation, keep the valid ones in a list, and then
pick one at random. It works, but seriously bogs down when trying to shuffle a
sequence that is 16 long. That's a base perm of 16! and I don't want to do that.
I don't even want to do 10!. And that's just once - not including the
fact that I have to do it more than a few times. Imagine trying to do 10!*5.
No.

There has got to be a better way.

After thinking about it for awhile, I decided on a daisy chain.

code:
0 1 2 3
1 2 3 0

01 12 23 30

len(chain)=[4]



Using a daisy chain as such, I could pass a bunch of indices, shuffle them,
and then put them back together (re-index):

code:
a b c d e f g h
0 1   3
3 0   1
d a c b e f g h



And I was happy with that for a few years. But I was doing some distribution tests a
few weeks ago and some anomolies popped up. Nothing horrible, but enough to make me
want to fix it. (I was aware of this years ago, but let it go at the time.)

The problem is that there are far more valid solutions that should be included.
Check this out:

code:
0 1 2 3
1 0 3 2

len(chain)=[2,2]



With a full chain, a lot of solutions are excluded - and I mean a lot.

Okay, time to start thinking about the possibility of smaller chains within the
big chain. Mini-chains or sub-chains.

My first stop was thinking about the basic possibilities.
Something like this:

code:
len(master_chain)=6
[2,2,2]
[2,4]
[6]



I noticed, and it is easy to surmise, that the length of the sub-chains add up to
the length of the master chain. I was sure that I wasn't the first one to think
of this kind of problem. So I hit Google. Turns out that this problem is called
coin change.

Damn near all hits were recursive and greedy. They did me no good. And then I found
coinchanging.py by Kris Wright.

code:
n=6
coins=[2,3,4,6]

[2, 2, 2]
[2, 4]
[6]



Simple, and bloody perfect.

Notice that I did not include 1 or 5 in coins[]. This is because you can't have a chain
with length 1. It's just not a chain with length=1.

So, which one to pick? I mean, the probability of each chain can't be the same, right?
Or can it? Time for some brute force to get some more data.

I wrote it up to perm, take the valid solutions, and count the chains.
I got this:

code:
n=6

([2, 2, 2], 15)
([2, 4], 30)
([3, 3], 40)
([4, 2], 60)
([6], 120)

g00dcount=265
badcount=455
720



Okay... wtf? I know that those numbers are correct because I did a proper brute force. However,
notice that my brute force shows [2,4] and [4,2] as seperate solutions. Even though both of those
chains are identical, they are different. I'll see about explaining this later.

What do those numbers mean? It means that a there is a 15:265 chance that the chain [2,2,2]
will be picked. It means that there is a 40:265 chance that [3,3] will be picked.

It means that there is a 30:265 chance that [2,4] will be picked. Or 60:265 that [4,2] will be
picked. But, doing it coinchanger.py way since both chains are identical, 90:265 that [2,4] will
be picked.

Note: Going with full daisy chain (Phil) will yield only 120:265=45%. Like
I mentioned earlier, a lot of valid solutions are being left out.

Note: "g00dcount" was legit typo that I haven't bothered fixing.

Note: (Phil, 120) is actually (n-1)!.

So now I have to figure out the probabilities that my brute force spits out. More staring at
numbers, playing with formulas, and hardcore logical thinking. I live for this shit.

Two sleepless nights later, I came up with this little ditty:

code:
#!/usr/bin/env python
from math import factorial as fact

def nchoser_min1(n,r):
   # permutation without repetition
   # with a reduction in possibilities
    n=n-1
    r=r-1
    return fact(n)/fact(n-r)



Using that, I can now calculate the probability of each chain. Let's take it for a spin with
some hard coding.

Note: I know that I'm not actually using formal N Chose R, AKA Combinations without
Repetition. I am actually using Permutations without Repetition.

code:
#!/usr/bin/env python
from math import factorial as fact

def nchoser_min1(n,r):
   # permutation without repetition
   # with reduction in possibilities
    n=n-1
    r=r-1
    return fact(n)/fact(n-r)

print('n=6')
print('')

#[2, 2, 2] 15
a=nchoser_min1(6,2)
b=nchoser_min1(4,2)
c=nchoser_min1(2,2)
print('[2, 2, 2] {}*{}*{}={}'.format(a,b,c,a*b*c))
gc0=a*b*c

#[2, 4], 30
a=nchoser_min1(6,2)
b=nchoser_min1(4,4)
print('[2,4] {}*{}={}'.format(a,b,a*b))
gc1=a*b

#[3, 3] 40
a=nchoser_min1(6,3)
b=nchoser_min1(3,3)
print('[3,3] {}*{}={}'.format(a,b,a*b))
gc2=a*b

#[4, 2], 60
a=nchoser_min1(6,4)
b=nchoser_min1(2,2)
print('[4,2] {}*{}={}'.format(a,b,a*b))
gc3=a*b

#[6], 120
a=nchoser_min1(6,6)
print('[6] {}'.format(a))
gc4=a

print('')
print('g00dcount={}'.format(gc0+gc1+gc2+gc3+gc4))



And now side-by-side:

code:
brute force           n chose r
n=6                   n=6

([2, 2, 2], 15)       [2, 2, 2] 5*3*1=15
([2, 4], 30)          [2,4] 5*6=30
([3, 3], 40)          [3,3] 20*2=40
([4, 2], 60)          [4,2] 60*1=60
([6], 120)            [6] 120

g00dcount=265         g00dcount=265



So, yeah. The numbers add up every single time. I even played with some other
hard code examples and those worked as well.


Now I have the last piece and a plan-of-attack that can get me what I want.

- Start with a chain length and pass it on to coinchanging.py.
- Take each solution, perm when needed, and calculate the probability.
- Pick a chain based on those probabilities.
- Expand the chosen chain into groups.
- Shuffle the whole thing all will-nilly.
- Take each group in turn and daisy chain
- Put it all back together.

code:
n=6
0 1 2 3 4 5

based on probability,
the chain picked is: [2,4]

expand into mini-chains:
0 0 1 1 1 1

shuffle all willy-nilly:
1 0 1 1 1 0

take each mini-chain in turn
and daisy chain shuffle
put it all back together

. 0 . . . 0   1 . 1 1 1 .   1 0 1 1 1 0
. 1 . . . 5   0 . 2 3 4 .   0 1 2 3 4 5
. 5 . . . 1   2 . 4 0 3 .   2 5 4 0 3 1

[2]=15 51
[4]=02 24 43 30



Ugh. I hope that makes sense.

I have all of the pieces and I can start putting it all together.

A little bit of a side-note. My use of the word "chain" is extremely lazy.
Once I suss a few things out, I'll wax my nomenclature a bit better.

In my next post, I'm going to clarify a few things.
Stay tuned.

edit: lol linkwords, not sure i can fix...
edit: too funny, linkwords turning ( 'bracket' 6 'bracket' ) into (Phil)
Medic!
(Edited by warjournal on 12-10-2018 16:09)

(Edited by warjournal on 12-10-2018 16:12)

(Edited by warjournal on 12-10-2018 16:24)

warjournal
Maniac (V) Mad Scientist

From:
Insane since: Aug 2000

posted posted 12-10-2018 22:21

Let's take a closer look at this and why I am using -1 for both variables:

code:
def nchoser_min1(n,r):
    n=n-1
    r=r-1
    return fact(n)/fact(n-r)



When doing the factorial thing with pools of possibilities, you have
something like this:

code:
n=6

0=[0, 1, 2, 3, 4, 5]
1=[0, 1, 2, 3, 4, 5]
2=[0, 1, 2, 3, 4, 5]
3=[0, 1, 2, 3, 4, 5]
4=[0, 1, 2, 3, 4, 5]
5=[0, 1, 2, 3, 4, 5]



Then, when you pick one, you prune, and move on to the next one.
- the first has 6 options
-- pick and prune
- the second has 5 options because of pruning
-- pick and prune

And so on down to the last pool, which will have only one option.

Factorial.

So why am I using -1? Because I have the added rule that a position
can't contain itself in it's pool.

code:
n=6

0=[1, 2, 3, 4, 5]
1=[0, 2, 3, 4, 5]
2=[0, 1, 3, 4, 5]
3=[0, 1, 2, 4, 5]
4=[0, 1, 2, 3, 5]
5=[0, 1, 2, 3, 4]



And then the same kind of thing, but remember that it can't be itself
(so to speak).
- the first pick has only 5 options because it can't be itself
-- pick and prune
- the next can't be iself or previous, so only 4 options
-- pick and prune
- the next can't be itself or previous 2 picks, so only 3 options
-- pick and prune

That's why I said that n=6 is actually (6-1)!=120 for a full chain.

Let's look at one with sub-chains.

code:
n=6
c=[3,3]

0=[1, 2, 3, 4, 5]
1=[0, 2, 3, 4, 5]
2=[0, 1, 3, 4, 5]
3=[0, 1, 2, 4, 5]
4=[0, 1, 2, 3, 5]
5=[0, 1, 2, 3, 4]

pick 02
prune

0=2
1=[3, 4, 5]
2=[1, 3, 4, 5]
3=[1, 4, 5]
4=[1, 3, 5]
5=[1, 3, 4]

mini-chain=[02]



What just happened? Why all the 0's taken out? Because it is the start
of the mini-chain and has to be set aside to link-back to when you get
to the end of the mini-chain. Until then, 0 can't be chosen.

code:
pick 21
prune

0=2
1=[3, 4, 5]
2=1
3=[4, 5]
4=[3, 5]
5=[3, 4]

mini-chain=[02, 21]



And now you do the link-back thing because you are at the end of the
mini-chain (size 3).

code:
end of chain
has to be 10

0=2
1=0
2=1
3=[4, 5]
4=[3, 5]
5=[3, 4]

mini-chain=[02, 21, 10]



Pick a starting point, set it aside, chain away, and then link-back
when you get to the end.

code:
0 1 2 3 4 5
2 0 1 4 5 3



Tada. Perfectly valid solution with all position being different. Added bonus
that there are two mini-chains in there instead of a full-on daisy chain of
size 6.

The possibilities go something like this:

code:
n=6
c=[3,3]

0: starting point, 5 possibilities, can't be self, prune and set aside
2: 3 possibilities, can't be self, previous, or starting point, pick-n-prune
1: only one possibility, link-back to start



And that is why using -1 for n and r is good for this. That is why it works
to get the same numbers as my brute force.

Instead of (6 chose 3), actually doing (5 chose 2).
And so on.

But there is something in my brute force that really bothers me. These two
little lines:

code:
([2, 4], 30)
([4, 2], 60)



They are different. Since they produce different probabilities depending on order
they are obviously different.

Not to me. To me, [2,4] and [4,2] are the same.
Remember when I said that I was going to expand the chains into groups and shuffle?
That is why I see them as the same. Check it out:

code:
[2, 4] = 0 0 1 1 1 1
[4, 2] = 0 0 0 0 1 1

:shuffle:

[2, 4] = 1 1 0 0 1 1
[4, 2] = 0 0 1 1 0 0



They have over-lap and can produce the exact same groupings. And yet one has a higher
probability of happening over the other. Can this be refined even more to take that
into account without actually... doing it?

Should I keep what I already have working? Or should I go down yet another rabbit
hole?

But...
Yesterday I realized that I had already solved this problem over two years ago.
I just didn't realize it right away because it was for a different thing that I
was playing with. I already ran a few small samples and the numbers are close enough.
I'm going to open the flood gates of possibilities even wider.

Chaos is a strange mistress that I adore to no end, and imma make 'er mah bitch.

No, I'm not going to abandon this. There is power in manipulating chains. I've
already done it several times to great effect. Just not quite in this way. This
particular approach will go on the backburner for a bit.

Yes, I will show ya'll how I solved this problem over two years ago. One of the
goofiest little baubles that I have ever coded. And the basic idea behind it
also has a lot of power.

Stay tuned.

(Edited by warjournal on 12-10-2018 22:58)

warjournal
Maniac (V) Mad Scientist

From:
Insane since: Aug 2000

posted posted 12-11-2018 01:50

This all started because I wanted to do a unique shuffle by
position. It can't be that difficult, right?

My first solution was doing a perm, keeping the ones that I
like, and picking one at random. Fine for small numbers, but
I can get crazy sometimes. This route quickly became far to
unweildly and infeasible.

My next solution was to do a daisy chain. This little ditty:

code:
0 1 2 3 4
1 2 3 4 0

01 12 23 34 40



And I was happy for quite some time. Even though I was aware
of a problem, I was happy.

Now I want to fix that problem. That problem being this little
guy:

code:
0 1 2 3 4
1 0 3 4 2



I did some brute force to get some data to work from. I took
that data, thought it up, and nailed it. I even learned about the
coin change problem along the way. A bit long-winded, but I
was about to be happy again. Fast enough with larger sequences
and easy enough to code. Happy day!

As I was going about my crazy macak routine, I noticed something
in my notes. It tickled my brain. "Hey, that looks familiar.
I've seen that before. OMG I bet it would work!" And then my
crazy got even macakier.

Rather than dig it out of my archives, I re-wrote it in under
an hour. When you see how simple the code is, you'll see why
it didn't take long at all to code.

Well before I was daisy chaining, I was playing a game of
"Can Be, Can't Be, Has to Be". That right there was the start
of my sudoku journey. I wasn't thinking in terms of unique shuffle
by position; rather, I was taking more of a quantum approach.

The problem looks like this:

code:
0 1 2 3 4 5
1 2 3 4 0 .



What do you put in that last position? You can't put anything in
there. It Can't Be any number.

But if you take one step back and re-arrange it like so:

code:
0 1
1 2
2 3
3 4
4 [5, 0]
5 [0]



Now, 4 Can Be 5 or 0.
However, 5 Has to Be 0.

I've seen this problem more than a few times with people trying
to make sudoku grids or latin squares from scratch.

Me? I solved it with one simple rule that I call "Minimum in Pool".
Since 5 has the least amount of possibilities, pick from it's pool.
And 4 will fall into place.

[code]

code:
0 1        0 1     0 1
1 2        1 2     1 2
2 3        2 3     2 3
3 4        3 4     3 4
4 [5, 0]   4 [5]   4 5
5 [0]      5 0     5 0

0 1 2 3 4 5
1 2 3 4 5 0



Fire up your pools, remove self from pools, find all pools with minimum
number of possibilities, pick one of those pools, pick from that pool,
prune, and go again,

No big deal.

What got me thinking that this will work is the fact that you are picking
from pools at random. There is no need to pick a particular chain sequence
because they will happen incidently. The chains are a product of
probability - not the other way around.

Big rush to get the code running just so I could see the numbers. I was getting
really sloppy towards the end.

Here is a sample data set from my brute force:

code:
n=7

[2, 2, 3], 48  3%
[2, 3, 2], 72  4%
[3, 2, 2], 90  5%
[2, 5], 144    8%
[5, 2], 360   16%
[3, 4], 180   10%
[4, 3], 240   13%
[7], 720      38%

g00dcount=1854



My newly re-coded Min Pool shuffler currently has very poor formatting,
so I am providing a sample with manual formatting:

code:
n=7
i=1000

[2, 2, 3], 29, 3%
[2, 3, 2], 31, 3%
[3, 2, 2], 43, 4%
[2, 5], 65,    7%
[5, 2], 191,  19%
[3, 4], 97,   10%
[4, 3], 151,  15%
[7], 393,     39%

len=8



I rounded the percentages (by hand), but damn close enough for me.

Even with a mostly random algorithm, I am getting the kind of
variation that I want. Yes! Happy dance for real this time!

The chains are incidental. They just happen, and they happen
close enough to the truth of brute force.

Get it?

Here is the code that does that, but the output is extremely
sloppy:

code:
#!/usr/bin/env python
import random
import copy

def find_chains(aseq,bseq):

    keepme=0
    findme=[]
    rlist=[]

    while aseq:

        if findme==[]:
            pcount=1
            keepme=aseq.pop(0)
            findme=bseq.pop(0)
        else:
            t=aseq.index(findme)
            aseq.pop(t)
            findme=bseq.pop(t)
            pcount+=1

        if keepme==findme:
            findme=[]
            rlist.append(pcount)

    return rlist


def unique_shuffle(mpool):
    fpool=[]

    # build the pools
    for i in range(len(mpool)):
        t=copy.deepcopy(mpool)
        t.remove(mpool[i])
        fpool.append(t)

    for i in range(len(mpool)):

        # find minimum length
        minl=len(mpool)+1
        for p in fpool:
            if isinstance(p,list):
                t=len(p)
                if t<minl: minl=t

        # build list of minimums
        ml=[]
        for p in range(len(fpool)):
            if isinstance(fpool[p],list):
                if len(fpool[p])==minl:
                    ml.append(p)

        # pick a pool
        a=random.choice(ml)
        # pick random from pool
        b=random.choice(fpool[a])
        fpool[a]=b

        # prune
        for i in range(len(fpool)):
            if isinstance(fpool[i],list):
                if b in fpool[i]:
                    fpool[i].remove(b)

    return fpool

tl=list(range(7))
mlist=[]
olist=[]
for i in range(1000):
    x=unique_shuffle(tl)
    y=find_chains(copy.deepcopy(tl),copy.deepcopy(x))
    if y not in mlist:
        mlist.append(y)
        olist.append(1)
    else: # it is already in mlist[]
        j=mlist.index(y)
        k=olist[j]
        olist[j]=k+1

ts=sum(olist)
for i in range(len(mlist)):
    print(mlist[i],olist[i],float(olist[i])/float(ts) )

print('\nlen={}'.format(len(olist)))



Look at how ridiculously simple that whole thing is. No crazy
equations or convoluted acrobatics. Simple, yet powerful. And
it is fast enough to where I can toss some monster sequences
into it without grinding my teeth.

There are a few things that I want to note about the above
implementation, but for another day.

Mistress Chaos.
Manipulating patterns.
Too much fun.

Any questions?
Pending any, I'm off to play.
Woo!

warjournal
Maniac (V) Mad Scientist

From:
Insane since: Aug 2000

posted posted 12-26-2018 23:00

Been meaning to get back to this, but I keep getting side-tracked. Been hanging with
a guitar player and he's got me playing ukulele again. And we both discovered Mississippi
John Hurt. Big side-track. He's been playing guitar for 40 years and he took a big
step when he put down the pick. Lately he's been manipulating those strings like I
manipulate data. Good stuff.

Anyways...

The first thing about the above code is changing the name of the def. Naming defs in
vague kind of way bites my booty on occassion.

The other thing about the above code is that it is still on the brutish side of things.
That is, you don't have to worry about min in pool until you get towards the end. I'll
be tossing a lot of large sequences into it and it is wasting a great deal of processing
power. Old school thinking, but still applicable on the scale that I will be using it
for.

Back when I originally wrote it, I did extend it into 2d for latin square generation.
And it worked like a charm. Everytime. I did eventually extend it into more dimensions,
but things started getting far more complicated. I had to start adding more rules.
This is, in part, how I managed to take a known orthogonal and x2 it in 2 dims.

Now we go back a little bit.

My first method of creating sudoku grids involved orthogonal latin squares. It worked,
but ortho pattern all over the place. Not good.

Then I came up with a mostly different way of creating sudoku grids. The starting pattern
is way different, but some of the ideas are the same. This method involves using two
very particular chains. And it works... for the most part. For small sudoku, it is
amazing. But larger ones, it starts to fall apart. For example, sudoku 256x256 is absolutely
atrocious.

That is why I started playing with chains some more. While the above code does improve
sudoku 256x256, the improvement is negligable.

I went back, used orthogonals, and one particular chain. While digging through my
archives, I found some code that I had set-up several years ago for this. It's almost
like I knew that I would eventually be able to create orthogonals big enough to do
what I wanted. Copy, Paste, Save, and go. Done deal. And I'm hapy. (I did have to play
with Gimp a little bit just to make sure, though.)

Orthogonals and chains? The orthogonals that I used are XOR. Why? Because XOR has
the kind of chains that I want all over the place. Absolutely replete. I like to
think of XOR as a master pattern to rule them all. It really is amazing on several
levels.

It took 3 years going down several rabbit holes. And a few detours here-n-there. But
I can now create sudoku 256x256 that make me happy.

But, the question that was asked, what to do with it?
I honestly don't know. I have my uses, but for the public at large? I don't know.
But I do know that I would love to walk into the comp-sci department at a local uni
and see what they think. It's funny because I quit uni a little early because they
went far too slow for me. I work so much better on my own without Pascal being
shoved down my throat. Fuck that. I've got work to do.

I love you guys.
Hands down love you guys.
Thanks for listening.

Cheers.

Tao
Maniac (V) Inmate

From: The Pool Of Life
Insane since: Nov 2003

posted posted 01-04-2019 03:04

I'm listening... Don't always understand... but that's the way of things eh?

« BackwardsOnwards »

Show Forum Drop Down Menu