Discussion:
Portable Addr for arm64
(too old to reply)
Aram Hăvărneanu
2015-02-24 21:08:23 UTC
Permalink
I'm wondering what's the best way to extend the new assembler and
the portable Prog/Addr for arm64.

Arm64 has many more addressing modes:

shifted register Rn<<s
extending register Rm.SXTB
extending register Rm.UXTB<<s
indirect (immediate 0) (Rn)
immediate offset o(Rn)
pre-indexed o(Rn)!
post-indexed (Rn)o!
register offset (unscaled) (Rn)(Rm)
register offset (scaled) (Rn)[Rm]
extended register offset (unscaled, 64-bit index) (Rn)(Rm)
extended register offset (unscaled converted index) (Rn)(Rm.UXTB)
extended register offset (scaled 64-bit index) (Rn)[Rm]
extended register offset (scaled converted index) (Rn)[Rm.SXTW]
special-purpose register SPR(n)
vector register Vn
vector register V(n)
vector register set {Vn, ... Vm, Va-Vb}
vector lane Vn[m]
vector lane (of set) {Va-Vb}[m]

We currently make use of pre/post-indexed modes with writeback.
Some of them already fit in the new scheme, for example the scaled
variants fit in the new scheme. Others like extending register can
be made to fit by using Addr->name. Some don't fit, however. The
most problematic are the pre/post-indexed modes, vector register
set, and vector lane of set.

There are a few options that I can think of:

1) We can't use Addr->name to indicate pre/post-indexed modes because
we (also) want to set name to NAME_AUTO, for example, while using
a pre/post-indexed mode. However we can encode this new information
in the higher bits, but that seems ugly.

2) We can add more fields to Addr (I think one is enough), but it
seems wasteful to add more fields for such a specific purpose.

3) We can use arch-specific types, like D_XPRE, D_XPOST or D_EXTREG,
perhaps renamed to use TYPE prefixes, to match the new scheme, but
that seems contrary to the point of portable Addr in the first
place.
--
Aram Hăvărneanu
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Aram Hăvărneanu
2015-02-24 22:02:57 UTC
Permalink
Of course, since we have TYPE_SHIFT, which is specific to arm,
precedent exists to add TYPE_XPRE and TYPE_XPOST which are specific
to arm64. The vector lane stuff can wait since we don't use it.

Also note that (Rn)[Rm] and (Rn)(Rm*8) are actually different, the
later is context-free, but the former is context-sensitive, the
scale is implied by the instruction variant. But perhaps we can
just use NAME_SCALED with TYPE_REG or something.
--
Aram Hăvărneanu
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Rob Pike
2015-02-24 22:11:23 UTC
Permalink
My plan for the next few days is to clean up and regularize the remaining
oddball encodings for Addrs and to unify their printing. This will clean up
a lot of TODOs in the new assembler.

Once that's done, we will have a regular base on which to build the arm64
support.

Here is your list and my interpretation and/or questions:

shifted register Rn<<s
already handled
extending register Rm.SXTB
easy to add. need a representation (i.e. Addr.Field) to store SXTB
extending register Rm.UXTB<<s
should fall out without more work if the above two are done
indirect (immediate 0) (Rn)
already handled
immediate offset o(Rn)
already handled
pre-indexed o(Rn)!
can't we stick with the .P notation arm already uses?
post-indexed (Rn)o!
ditto
register offset (unscaled) (Rn)(Rm)
should fall out without effort
register offset (scaled) (Rn)[Rm]
can we use the * notation like on 386? generally i'm trying to get the
operand to encode directly into an Addr without context
extended register offset (unscaled, 64-bit index) (Rn)(Rm)
looks the same as "register offset (unscaled)". what is this?
extended register offset (unscaled converted index) (Rn)(Rm.UXTB)
should fall out if above is done (whatever it is)
extended register offset (scaled 64-bit index) (Rn)[Rm]
again, prefer to use an explicit scale
extended register offset (scaled converted index) (Rn)[Rm.SXTW]
ditto
special-purpose register SPR(n)
already handled
vector register Vn
will work if the registers are defined in obj/arm64
vector register V(n)
ditto
vector register set {Vn, ... Vm, Va-Vb}
does it need to be {} rather than []? easy regardless but it needs a field
in Addr
vector lane Vn[m]
what is this?
vector lane (of set) {Va-Vb}[m]
what is this?

-rob
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Aram Hăvărneanu
2015-02-24 23:06:51 UTC
Permalink
Thanks for the quick response.
Post by Rob Pike
can't we stick with the .P notation arm already uses?
Then we would be different from both Plan 9 and the arm64 manual (and
binutils), which use the ! variant. Perhaps that doesn't matter.
Liblink would need significant changes though. As a side note, I think
the ! notation is really nice.
Post by Rob Pike
can we use the * notation like on 386?
Yes, but then we have to decide what does the following mean:

MOVH $42, (R1)(R2*4)

Should it work? If it does work by synthesising two instructions, then

MOVH $42, (R1)(R2*2)

will not be able to use the more efficient 1-instruction encoding
because aclass can't return a different value as it doesn't have
access to the instruction. So the choice is between more expressive
syntax, implemented with primitive instructions or more restrictive
syntax (only certain scales will work) but with more efficient
encoding.

Note that 32-bit arm has (R1)[R2] too, although I don't know if it's
currently implemented in our tools. What if we just say that on arm
and arm64, scale=1 always means (R1)[R2]; we print it with square
brackets, and when we parse it we always expect spare brackets, in
other words what if (R1)(R2*2) doesn't exist on arm and arm64 at all?

In that case parsing and printing is context-free. The meaning of an
Addr is not context-free, of course, but only liblink cares about the
meaning, and in liblink the Addr will always be inside a Prog where it
matters (in asmout mainly).

In any case (Rn)[Rm] is not very important as we don't use it at all.
Post by Rob Pike
extended register offset (unscaled, 64-bit index) (Rn)(Rm)
looks the same as "register offset (unscaled)". what is this?
Rm is a 32-bit register, but I think it's missing the conversion
operator? Have to ask Charles since he wrote asm.ms. Of course asm.ms
is wrong since it just looks like the other one.
Post by Rob Pike
vector register set {Vn, ... Vm, Va-Vb}
does it need to be {} rather than []? easy regardless but it needs a field in Addr
It's {} just for consistency with the manual and binutils.
Post by Rob Pike
vector lane Vn[m]
what is this?
vector lane (of set) {Va-Vb}[m]
what is this?
You can divide the 128-bit SIMD register into lanes, which describe
the shape of the vector you operate on. For example you can have 32bit
x 2 lanes or 8bit x 8 lanes or any other intermediary scenario.

The second form is the same thing except the instruction operate on
more SIMD registers (this is usually the case).
--
Aram Hăvărneanu
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Russ Cox
2015-02-25 15:19:40 UTC
Permalink
Consistency with the manuals is explicitly not a goal and never has been,
and at this point I am not terribly concerned with consistency with Plan 9
either. Go has already diverged some, and more is planned. The goal is to
make similar concepts look similar across systems, so that if you see the
same concept on a different system you will recognize it as the same
concept.

Given that, I think using .W and .P suffixes for the increments (as on arm)
makes sense.
I also think that (Rn)(Rm) should be (Rn)(Rm*1) (as on x86). If 1 is the
only possible multiplier, so be it. The x86 rejects AX*3 too.

In what forms can the .SXTB and .UXTB appear? If they can appear in any
form then maybe they should be separate Reg values. If only in a few forms,
then maybe a separate bit makes sense. It's not clear.

What is the difference between (Rn)(Rm) and (Rn)[Rm]?

What are the possible lane specs for [m]?

Thanks.
Russ
Post by Aram Hăvărneanu
Thanks for the quick response.
Post by Rob Pike
can't we stick with the .P notation arm already uses?
Then we would be different from both Plan 9 and the arm64 manual (and
binutils), which use the ! variant. Perhaps that doesn't matter.
Liblink would need significant changes though. As a side note, I think
the ! notation is really nice.
Post by Rob Pike
can we use the * notation like on 386?
MOVH $42, (R1)(R2*4)
Should it work? If it does work by synthesising two instructions, then
MOVH $42, (R1)(R2*2)
will not be able to use the more efficient 1-instruction encoding
because aclass can't return a different value as it doesn't have
access to the instruction. So the choice is between more expressive
syntax, implemented with primitive instructions or more restrictive
syntax (only certain scales will work) but with more efficient
encoding.
Note that 32-bit arm has (R1)[R2] too, although I don't know if it's
currently implemented in our tools. What if we just say that on arm
and arm64, scale=1 always means (R1)[R2]; we print it with square
brackets, and when we parse it we always expect spare brackets, in
other words what if (R1)(R2*2) doesn't exist on arm and arm64 at all?
In that case parsing and printing is context-free. The meaning of an
Addr is not context-free, of course, but only liblink cares about the
meaning, and in liblink the Addr will always be inside a Prog where it
matters (in asmout mainly).
In any case (Rn)[Rm] is not very important as we don't use it at all.
Post by Rob Pike
extended register offset (unscaled, 64-bit index) (Rn)(Rm)
looks the same as "register offset (unscaled)". what is this?
Rm is a 32-bit register, but I think it's missing the conversion
operator? Have to ask Charles since he wrote asm.ms. Of course asm.ms
is wrong since it just looks like the other one.
Post by Rob Pike
vector register set {Vn, ... Vm, Va-Vb}
does it need to be {} rather than []? easy regardless but it needs a
field in Addr
It's {} just for consistency with the manual and binutils.
Post by Rob Pike
vector lane Vn[m]
what is this?
vector lane (of set) {Va-Vb}[m]
what is this?
You can divide the 128-bit SIMD register into lanes, which describe
the shape of the vector you operate on. For example you can have 32bit
x 2 lanes or 8bit x 8 lanes or any other intermediary scenario.
The second form is the same thing except the instruction operate on
more SIMD registers (this is usually the case).
--
Aram Hăvărneanu
--
You received this message because you are subscribed to the Google Groups
"golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Aram Hăvărneanu
2015-02-25 16:57:18 UTC
Permalink
Thanks for your input.
Post by Russ Cox
Given that, I think using .W and .P suffixes for the increments (as on arm)
makes sense.
I strongly agree that consistency is paramount, so we will use .W
and .P (I actually have to read on this one, I only know of .IA and
.IB from Plan 9, .W and .P look very similar to that but I have to
learn the exact subtleties).

I don't like it at all, I think it's a property of the address, not
of the prog, but since arm paved the way here I will just change
arm64 to be the same.
Post by Russ Cox
I also think that (Rn)(Rm) should be (Rn)(Rm*1) (as on x86). If 1 is the
only possible multiplier, so be it. The x86 rejects AX*3 too.
[snip]
What is the difference between (Rn)(Rm) and (Rn)[Rm]?
MOV (Rn)(Rm), Rt

Moves dword from Rn+Rm to Rt.

MOVH (Rn)(Rm), Rt

Moves half-word from Rn+Rm to Rt.

MOV (Rn)[Rm], Rt

Moves dword from Rn+Rm*8 to Rt.

MOVH (Rn)[Rm], Rt

Moves half-word from Rn+Rm*2 to Rt.

That being said, considering your note about x86 rejecting AX*3, I
believe we can retire (Rn)[Rm] and always write explicit scale.
Liblink can reject the invalid ones.
Post by Russ Cox
In what forms can the .SXTB and .UXTB appear? If they can appear in any form
then maybe they should be separate Reg values. If only in a few forms, then
maybe a separate bit makes sense. It's not clear.
They appear in most forms. Not quite all, but most have an "extending
register" variant. There are 8 possible values: UXTB, UXTH, LSL|UXTW,
UXTX, SXTB, SXTH, SXTW, SXTX. They usually appear with a shift, so
like R2.UXTB<<2.
Post by Russ Cox
What are the possible lane specs for [m]?
1, 2, 4, 8, 16

An 128-bit SIMD register can be treated as either a 128bit, or as
a 64bit box, and this box can be subdivided in the appropiate number
of 8,16,32,64-bit cells. [m] just selects the lane though, to specify
the shape of the regiser you have to do Vn.mS, where m is the number
of lanes (2⁰-2⁴) and S is lane size in standard letter (B, H, S,
D), so like V1.2D, V3.16B, etc.

But the current assembler does not implements lanes, it's only
mentioned in asm.ms optimistically. In fact, the only two vector
instruction supported by liblink are AES and SHA1. Perhaps we don't
need to wory about this at all at the moment.

In reality the only thing that's not in the current cmd/asm and
that we actually use is pre/post-increment modes. (Liblink makes
heavy use of the extended register stuff, but we don't use it in
assembly written by humans and it's not explicitely generated by
the compilers either. Rather liblink will generate load/stores using
extended register form for generic moves if it thinks it's a good
idea).

OK, so pre/post-increment is settled, scaled-index is settled, all
that remains to be decided is the Rn.UXTB stuff.
--
Aram Hăvărneanu
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Russ Cox
2015-02-25 17:45:07 UTC
Permalink
Post by Aram Hăvărneanu
Thanks for your input.
Post by Russ Cox
Given that, I think using .W and .P suffixes for the increments (as on
arm)
Post by Russ Cox
makes sense.
I strongly agree that consistency is paramount, so we will use .W
and .P (I actually have to read on this one, I only know of .IA and
.IB from Plan 9, .W and .P look very similar to that but I have to
learn the exact subtleties).
I think they are aliases for each other. We should probably support both
like on arm.
Post by Aram Hăvărneanu
I don't like it at all, I think it's a property of the address, not
of the prog, but since arm paved the way here I will just change
arm64 to be the same.
Post by Russ Cox
I also think that (Rn)(Rm) should be (Rn)(Rm*1) (as on x86). If 1 is the
only possible multiplier, so be it. The x86 rejects AX*3 too.
[snip]
What is the difference between (Rn)(Rm) and (Rn)[Rm]?
MOV (Rn)(Rm), Rt
Moves dword from Rn+Rm to Rt.
MOVH (Rn)(Rm), Rt
Moves half-word from Rn+Rm to Rt.
MOV (Rn)[Rm], Rt
Moves dword from Rn+Rm*8 to Rt.
MOVH (Rn)[Rm], Rt
Moves half-word from Rn+Rm*2 to Rt.
Okay, then the scale factor should *definitely* be part of the syntax.

Also, please change MOV to MOVD or MOVQ (like on ppc64 or x86). I'd really
rather not have any unsuffixed MOV instruction.
(I also intend to add MOV1, MOV2, MOV4, MOV8, MOVP (pointer), and MOVR
(register) at some point, on all systems.)
Post by Aram Hăvărneanu
That being said, considering your note about x86 rejecting AX*3, I
believe we can retire (Rn)[Rm] and always write explicit scale.
Liblink can reject the invalid ones.
Post by Russ Cox
In what forms can the .SXTB and .UXTB appear? If they can appear in any
form
Post by Russ Cox
then maybe they should be separate Reg values. If only in a few forms,
then
Post by Russ Cox
maybe a separate bit makes sense. It's not clear.
They appear in most forms. Not quite all, but most have an "extending
register" variant. There are 8 possible values: UXTB, UXTH, LSL|UXTW,
UXTX, SXTB, SXTH, SXTW, SXTX. They usually appear with a shift, so
like R2.UXTB<<2.
Post by Russ Cox
What are the possible lane specs for [m]?
1, 2, 4, 8, 16
An 128-bit SIMD register can be treated as either a 128bit, or as
a 64bit box, and this box can be subdivided in the appropiate number
of 8,16,32,64-bit cells. [m] just selects the lane though, to specify
the shape of the regiser you have to do Vn.mS, where m is the number
of lanes (2⁰-2⁎) and S is lane size in standard letter (B, H, S,
D), so like V1.2D, V3.16B, etc.
But the current assembler does not implements lanes, it's only
mentioned in asm.ms optimistically. In fact, the only two vector
instruction supported by liblink are AES and SHA1. Perhaps we don't
need to wory about this at all at the moment.
Okay, great, let's drop lanes, and let's use [ ] for register sets, like on
arm (not { }). It might be possible to make the lane an instruction suffix,
if it comes to that. (That's kind of like what amd64 did.)


In reality the only thing that's not in the current cmd/asm and
Post by Aram Hăvărneanu
that we actually use is pre/post-increment modes. (Liblink makes
heavy use of the extended register stuff, but we don't use it in
assembly written by humans and it's not explicitely generated by
the compilers either. Rather liblink will generate load/stores using
extended register form for generic moves if it thinks it's a good
idea).
OK, so pre/post-increment is settled, scaled-index is settled, all
that remains to be decided is the Rn.UXTB stuff.
The choice is between different reg values and an extra bit. I don't see a
perfect solution here. Both are bad in different ways and good in different
ways. Let's start by tentatively saying that they'll be different Reg
values, probably laid out something like

const (
REG_R0 = 32 + iota
REG_R1
...
REG_R31
)
const (
REG_UXTB = iota*(REG_R31+1)
REG_UXTH
REG_LSLUXTW
REG_UXTX
REG_SXTB
REG_SXTH
REG_SXTW
REG_SXTX
REG_SUFFIX = 15*(REG_R31+1)
)

If that turns out to be awful we can reconsider.

Thanks.
Russ
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Aram Hăvărneanu
2015-02-25 18:11:34 UTC
Permalink
Post by Russ Cox
I think they are aliases for each other. We should probably support
both like on arm.
Code looks like they should be aliases, but something about them
is set differently:

{".W", LS, arm.C_WBIT},
{".P", LS, arm.C_PBIT},

{".IB", LS, arm.C_PBIT | arm.C_UBIT},
{".IA", LS, arm.C_UBIT},
Post by Russ Cox
please change MOV to MOVD or MOVQ
Ok. Q means 128-bit register on arm64, so I think D is better.
Post by Russ Cox
let's use [ ] for register sets, like on arm (not { })
Ok.
Post by Russ Cox
const (
REG_R0 = 32 + iota
REG_R1
...
REG_R31
)
const (
REG_UXTB = iota*(REG_R31+1)
REG_UXTH
REG_LSLUXTW
REG_UXTX
REG_SXTB
REG_SXTH
REG_SXTW
REG_SXTX
REG_SUFFIX = 15*(REG_R31+1)
)
What is the point of REG_SUFFIX?

And what about the shift? (32-bit arm also has the shift problem).
--
Aram Hăvărneanu
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Russ Cox
2015-02-27 02:52:24 UTC
Permalink
Post by Aram Hăvărneanu
Post by Russ Cox
I think they are aliases for each other. We should probably support
both like on arm.
Code looks like they should be aliases, but something about them
{".W", LS, arm.C_WBIT},
{".P", LS, arm.C_PBIT},
{".IB", LS, arm.C_PBIT | arm.C_UBIT},
{".IA", LS, arm.C_UBIT},
Post by Russ Cox
please change MOV to MOVD or MOVQ
Ok. Q means 128-bit register on arm64, so I think D is better.
Okay but please use something else for 128-bit. Don't overload Q. Maybe it
should just be MOV16.
Post by Aram Hăvărneanu
Post by Russ Cox
let's use [ ] for register sets, like on arm (not { })
Ok.
Post by Russ Cox
const (
REG_R0 = 32 + iota
REG_R1
...
REG_R31
)
const (
REG_UXTB = iota*(REG_R31+1)
REG_UXTH
REG_LSLUXTW
REG_UXTX
REG_SXTB
REG_SXTH
REG_SXTW
REG_SXTX
REG_SUFFIX = 15*(REG_R31+1)
)
What is the point of REG_SUFFIX?
So that you can extract the suffix info with r&REG_SUFFIX.

And what about the shift? (32-bit arm also has the shift problem).
What shift?

Russ
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Aram Hăvărneanu
2015-02-27 09:40:13 UTC
Permalink
Post by Russ Cox
What shift?
MOVD R0->3, R1
MOVD R0<<2, R1

I was concerned we can't encode it in TYPE_SHIFT, because of 32 vs. 16
registers, but looking more closely at the encoding, it looks like we
have 5 bits for the register in offset, so it would fit arm64 too.
--
Aram Hăvărneanu
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...