Closed Bug 837957 Opened 11 years ago Closed 11 years ago

Implement ICU dependent functions of Intl.Collator, Intl.NumberFormat, Intl.DateTimeFormat

Categories

(Core :: JavaScript: Internationalization API, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla22

People

(Reporter: mozillabugs, Assigned: mozillabugs)

References

(Blocks 1 open bug)

Details

Attachments

(12 files, 18 obsolete files)

9.92 KB, patch
mozillabugs
: review+
Details | Diff | Splinter Review
7.43 KB, patch
mozillabugs
: review+
Details | Diff | Splinter Review
9.14 KB, patch
mozillabugs
: review+
Details | Diff | Splinter Review
13.94 KB, patch
mozillabugs
: review+
Details | Diff | Splinter Review
7.64 KB, patch
mozillabugs
: review+
Details | Diff | Splinter Review
16.91 KB, patch
mozillabugs
: review+
Details | Diff | Splinter Review
11.60 KB, patch
mozillabugs
: review+
Details | Diff | Splinter Review
9.13 KB, patch
mozillabugs
: review+
Details | Diff | Splinter Review
12.43 KB, patch
mozillabugs
: review+
Details | Diff | Splinter Review
9.69 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
28.58 KB, patch
mozillabugs
: review+
Details | Diff | Splinter Review
16.73 KB, patch
mozillabugs
: review+
Details | Diff | Splinter Review
Implement based on ICU the parts of the ECMAScript Internationalization API that depend directly on an underlying internationalization library, in particular:
- determination of available locales and features,
- Intl.Collator.prototype.compare,
- Intl.NumberFormat.prototype.format,
- Intl.DateTimeFormat.prototype.format.
Blocks: 837961
Blocks: es-intl
Depends on: 811911
Depends on: 724533
Whiteboard: [leave open]
Comment on attachment 718718 [details] [diff] [review]
Implement ICU dependent functions of Intl.Collator, Intl.NumberFormat, Intl.DateTimeFormat (ICU stubs)

Review of attachment 718718 [details] [diff] [review]:
-----------------------------------------------------------------

Bunches of style nits, but otherwise fine enough.

::: js/src/builtin/Intl.cpp
@@ +20,5 @@
>  #include "vm/Stack.h"
>  
>  #include "jsobjinlines.h"
>  
> +#include "unicode/utypes.h"

Assuming this is a reference to intl/icu/source/common/unicode/utypes.h, is this patch predicated on starting to use ICU?  Right now it looks like this patch wouldn't build.

@@ +22,5 @@
>  #include "jsobjinlines.h"
>  
> +#include "unicode/utypes.h"
> +
> +using namespace icu;

Hmm.  We *are* the implementers of namespace js, so it's okay to open that.  Here...this looks dodgy.  I think probably we want to have the icu:: prefix in all the type names for this.  Either that, or we should have |using icu::Foo;| for the particular symbols we want.  We've run into issues opening up whole non-SpiderMonkey namespaces in the past, so once bitten, twice shy.

@@ +30,5 @@
> +/******************** ICU stubs ********************/
> +
> +#if !ENABLE_INTL_API
> +
> +/* When the Internationalization API isn't enabled, we also shouldn't link

/*
 * When the...

is SpiderMonkey comment style for multiline comments.

@@ +36,5 @@
> + * bit rot. The following stub implementations for ICU functions make this
> + * possible. The functions using them should never be called, so they assert
> + * and return error codes. Signatures adapted from ICU header files locid.h,
> + * numsys.h, ucal.h, ucol.h, udat.h, udatpg.h, uenum.h, unum.h; see the ICU
> + * directory for license.

Hmm.  Historically we haven't done it this way, and predictably bitrot has ensued.  In the long run I expect we won't want to allow disabling ICU, and this will die.  In the short run, eh.  We can do it this way if it (probably) helps you out.  :-)

@@ +46,5 @@
> +    JS_ASSERT(false);
> +    return 0;
> +}
> +
> +typedef struct UEnumeration UEnumeration;

struct UEnumeration;

@@ +49,5 @@
> +
> +typedef struct UEnumeration UEnumeration;
> +
> +static int32_t
> +uenum_count(UEnumeration* en, UErrorCode* status)

SpiderMonkey style has * next to the variable name, sadly.  (Applies everywhere else here, too.)

@@ +51,5 @@
> +
> +static int32_t
> +uenum_count(UEnumeration* en, UErrorCode* status)
> +{
> +    JS_ASSERT(false);

Let's make these all MOZ_NOT_REACHED("<function name>: Intl API disabled");

@@ +59,5 @@
> +
> +static const char*
> +uenum_next(UEnumeration* en,
> +           int32_t* resultLength,
> +           UErrorCode* status)

This should all be on one line if it fits in 99ch.  Same throughout.

@@ +93,5 @@
> +  UCOL_ON = 17,
> +  UCOL_SHIFTED = 20,
> +  UCOL_LOWER_FIRST = 24,
> +  UCOL_UPPER_FIRST = 25,
> +} UColAttributeValue;

enum UColAttributeValue {
  ...
};

and the same for all the enums here.

@@ +135,5 @@
> +ucol_strcoll(    const    UCollator    *coll,
> +        const    UChar        *source,
> +        int32_t            sourceLength,
> +        const    UChar        *target,
> +        int32_t            targetLength)

Fix the bizarre whitespacing, please.  There are a few others, too, below.

@@ +159,5 @@
> +    return NULL;
> +}
> +
> +typedef struct UParseError UParseError;
> +typedef struct UFieldPosition UFieldPosition;

Just forward-declare these structs too.

@@ +251,5 @@
> +}
> +
> +class Locale {
> +public:
> +    Locale( const   char * language,

public: should be indented two spaces.  All the parameters should be |const char *foo| or |const char *foo = 0| as well.  Looks like a search for "* " in these changes would point out a bunch of places.

@@ +267,5 @@
> +}
> +
> +class NumberingSystem {
> +public:
> +    static NumberingSystem* createInstance(const Locale & inLocale, UErrorCode& status);

& by parameter name.
Attachment #718718 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 718720 [details] [diff] [review]
Implement ICU dependent functions of Intl.Collator, Intl.NumberFormat, Intl.DateTimeFormat (part 1)

Review of attachment 718720 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/Intl.cpp
@@ +15,5 @@
>  #include "jsatom.h"
>  #include "jscntxt.h"
>  #include "jsinterp.h"
>  #include "jsobj.h"
> +#include "jstypes.h"

What types are you accessing through this #include?  I'm surprised you wouldn't already have them here.

@@ +29,5 @@
> +#include "unicode/numsys.h"
> +#include "unicode/ucal.h"
> +#include "unicode/ucol.h"
> +#include "unicode/udat.h"
> +#include "unicode/udatpg.h"

I assume all these horrible header names date back to 8.3 days?  :-)

@@ +480,5 @@
> +(* GetAvailable)(int32_t localeIndex);
> +
> +static bool
> +intl_availableLocales(JSContext *cx, CountAvailable countAvailable,
> +    GetAvailable getAvailable, MutableHandleValue result)

Needs whitespace formatting fixing.

I find myself wondering if this wouldn't be much better as a HashSet<const char *>, or something.  I guess let's run with this for now, and we can figure out something better later if it comes to it.

...or is it this way because it feeds directly into self-hosted code?  Then I guess my assumption that [[availableLocales]] wasn't actually a thing in implementations was wrong!  I'd gotten the impression it was sort of a latent property of the system that arbitrates internationalization queries.

@@ +483,5 @@
> +intl_availableLocales(JSContext *cx, CountAvailable countAvailable,
> +    GetAvailable getAvailable, MutableHandleValue result)
> +{
> +#if ENABLE_INTL_API
> +    int32_t count = countAvailable();

This always returns a value >= 0, right?  Please use uint32_t, then, and for the other available-locales-indexes.  If ICU mandates int32_t (...not int, or some other integer type, seeing as <stdint.h> didn't exist until 1999, and wasn't in MSVC til 2010?), use uint32_t for value types when you can, and leave int32_t to signatures that have to match up with ICU typedefs or method signatures or whatever.

@@ +484,5 @@
> +    GetAvailable getAvailable, MutableHandleValue result)
> +{
> +#if ENABLE_INTL_API
> +    int32_t count = countAvailable();
> +#endif

Not that it really matters, but why not put this below and have one #if ENABLE_INTL_API for the entire function?

@@ +491,5 @@
> +        return false;
> +
> +#if ENABLE_INTL_API
> +    RootedValue t(cx, BooleanValue(true));
> +    for (int32_t i = 0; i < count; i++) {

uint32_t as well.

@@ +492,5 @@
> +
> +#if ENABLE_INTL_API
> +    RootedValue t(cx, BooleanValue(true));
> +    for (int32_t i = 0; i < count; i++) {
> +        const char* locale = getAvailable(i);

const char *locale

@@ +493,5 @@
> +#if ENABLE_INTL_API
> +    RootedValue t(cx, BooleanValue(true));
> +    for (int32_t i = 0; i < count; i++) {
> +        const char* locale = getAvailable(i);
> +        char *lang = JS_strdup(cx, locale);

Use ScopedJSFreePtr<char> here -- lets you remove/not worry about the js_free(lang).

@@ +499,5 @@
> +            return false;
> +        char *p;
> +        while ((p = strchr(lang, '_')))
> +            *p = '-';
> +        RootedAtom a(cx, Atomize(cx, lang, strlen(lang), InternAtom));

Interning's not such a great concept, the way it's currently formulated (it's not amenable to compacting GC).  Could you skip interning, and if we find perf bottlenecks from this later we can add it then?

@@ +504,5 @@
> +        js_free(lang);
> +        if (!a)
> +            return false;
> +        if (!JSObject::defineProperty(cx, locales, a->asPropertyName(), t,
> +                                      JS_PropertyStub, JS_StrictPropertyStub, JSPROP_ENUMERATE)) {

{ aligns with } if the condition of an if spans multiple lines.

@@ +517,5 @@
> +/**
> + * Returns the object holding the internal properties for obj.
> + */
> +static bool
> +GetInternals(JSContext *cx, HandleObject obj, MutableHandleObject internals)

Hoo boy, this is some fun times.  We really need to clean up the self-hosting/native code boundary and transitions and how you do them at some point.
Attachment #718720 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 718721 [details] [diff] [review]
Implement ICU dependent functions of Intl.Collator, Intl.NumberFormat, Intl.DateTimeFormat (part 2)

Review of attachment 718721 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/Intl.cpp
@@ +562,5 @@
>  
>  
>  /******************** Collator ********************/
>  
> +static void collator_finalize(FreeOp* fop, RawObject obj);

FreeOp *fop

I see a few other cases of * misaligned, probably worth going over "* " hits in this patch.

@@ +568,3 @@
>  static Class CollatorClass = {
>      js_Object_str,
> +    JSCLASS_HAS_PRIVATE,

Privates are a moderately deprecated JSAPI feature.  Use JSCLASS_HAS_RESERVED_SLOTS(1) instead, and use getReservedSlot/setReservedSlot with PrivateValue(void*) and toPrivate() to access it.

@@ +659,5 @@
> +collator_finalize(FreeOp* fop, RawObject obj)
> +{
> +    UCollator *coll = (UCollator *) obj->getPrivate();
> +    if (coll)
> +        ucol_close(coll);

...presumably ucol_close also frees UCollator?

@@ +748,5 @@
> +    JSAutoByteString locale(cx, args[0].toString());
> +    if (!locale)
> +        return false;
> +    UErrorCode status = U_ZERO_ERROR;
> +    UEnumeration *values = ucol_getKeywordValuesForLocale("co", locale.ptr(), false, &status);

I could be reading headers wrong, but shouldn't that first argument be "collation"?

@@ +750,5 @@
> +        return false;
> +    UErrorCode status = U_ZERO_ERROR;
> +    UEnumeration *values = ucol_getKeywordValuesForLocale("co", locale.ptr(), false, &status);
> +    if (U_FAILURE(status))
> +        return false;

Returning false in the JSAPI is equivalent to throwing an uncatchable exception and halting JS execution in its tracks.  If you're calling a method in SpiderMonkey, usually it'll throw an exception for you if it fails.  If you're calling something not in SpiderMonkey, like here, no exception's thrown.  If you don't throw, the browser continues on, but whatever script was going just...no longer runs.  So we need to be throwing a JS exception of some sort here.  Something like

MSG_DEF(JSMSG_INTERNAL_INTL_ERROR,         ###, 0, JSEXN_ERR, "internal error while computing Intl data")

and the appropriate JS_ReportErrorNumber call should probably do the trick for throwing in most cases.

@@ +755,5 @@
> +
> +    int32_t count = uenum_count(values, &status);
> +    if (U_FAILURE(status)) {
> +        uenum_close(values);
> +        return false;

We have mozilla::Scoped for use in handling these sorts of RAII situations.  Add #include "mozilla/Scoped.h" at the start of the #includes (in its own #include section), add this somewhere appropriate at top level:

MOZ_TYPE_SPECIFIC_SCOPED_POINTER_TEMPLATE(ScopedCloseValues, UEnumeration*, uenum_close)

and use ScopedCloseValues values = ucol_getKeyword... above.  And then get rid of all the uenum_closes everywhere.

@@ +767,5 @@
> +
> +    int32_t index = 0;
> +    RootedId indexId(cx);
> +    for (int32_t i = 0; i < count; i++) {
> +        const char* collation = uenum_next(values, NULL, &status);

Don't pass NULL here.  Use an |int32_t length| local (hmm, I guess ICU must assume these types somehow -- hope there's a way to tell it to use mozilla/StandardInteger.h for them!), and pass along that length to equality methods, please, which should use strncmp.

@@ +770,5 @@
> +    for (int32_t i = 0; i < count; i++) {
> +        const char* collation = uenum_next(values, NULL, &status);
> +        if (U_FAILURE(status)) {
> +            uenum_close(values);
> +            return false;

Needs to throw an error.

@@ +773,5 @@
> +            uenum_close(values);
> +            return false;
> +        }
> +
> +        // per ECMA-402, 10.2.3, we don't include standard and search

Please quote the language used in the spec, and capitalize/add a period as appropriate for a complete sentence.  I take it that's this verbiage?

"The values "standard" and "search" must not be used as elements in any [[sortLocaleData]][locale].co
and [[searchLocaleData]][locale].co array."

@@ +778,5 @@
> +        if (equal(collation, "standard") || equal(collation, "search"))
> +            continue;
> +
> +        // ICU returns old-style keyword values; map them to BCP 47 equivalents
> +        // (see http://bugs.icu-project.org/trac/ticket/9620)

Needs a period at the end.

@@ +797,5 @@
> +        if (!IndexToId(cx, index++, &indexId))
> +            return false;
> +        RootedValue element(cx, StringValue(jscollation));
> +        if (!DefineNativeProperty(cx, collations, indexId, element, JS_PropertyStub,
> +                                  JS_StrictPropertyStub, JSPROP_ENUMERATE, 0, 0)) {

Use JSObject::defineElement here have |i| be uint32_t.  That also saves an IndexToId call we want to avoid; we're slowly working at removing jsid entirely as a concept (replacing it with a property key concept, from ES6).
Attachment #718721 - Flags: review?(jwalden+bmo) → review-
Updated per comment 6. Carrying r+jwalden.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #6)

Boy, I didn't expect that even stubs that should disappear in a few weeks have to be reformatted for SpiderMonkey... I agree though that some of the ICU formatting is a bit weird.

> > +#include "unicode/utypes.h"
> 
> Assuming this is a reference to intl/icu/source/common/unicode/utypes.h, is
> this patch predicated on starting to use ICU?  Right now it looks like this
> patch wouldn't build.

Correct - we need a review of
https://bug724533.bugzilla.mozilla.org/attachment.cgi?id=722065

> > +uenum_count(UEnumeration* en, UErrorCode* status)
> 
> SpiderMonkey style has * next to the variable name, sadly.  (Applies
> everywhere else here, too.)

Even sadder that this correctly reflects how the language interprets the *.
Attachment #718718 - Attachment is obsolete: true
Attachment #722071 - Flags: review+
Update per comment 7. Also includes a new class ScopedICUObject in partial response to comment 8; therefore not carrying r+jwalden.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #7)

> @@ +29,5 @@
> > +#include "unicode/numsys.h"
> > +#include "unicode/ucal.h"
> > +#include "unicode/ucol.h"
> > +#include "unicode/udat.h"
> > +#include "unicode/udatpg.h"
> 
> I assume all these horrible header names date back to 8.3 days?  :-)

Or possibly OS/400.
http://site.icu-project.org/download/50#TOC-ICU4C-Supported-Platforms

> > +intl_availableLocales(JSContext *cx, CountAvailable countAvailable,
> > +    GetAvailable getAvailable, MutableHandleValue result)
> 
> Needs whitespace formatting fixing.
> 
> I find myself wondering if this wouldn't be much better as a HashSet<const
> char *>, or something.  I guess let's run with this for now, and we can
> figure out something better later if it comes to it.
> 
> ...or is it this way because it feeds directly into self-hosted code?  Then
> I guess my assumption that [[availableLocales]] wasn't actually a thing in
> implementations was wrong!  I'd gotten the impression it was sort of a
> latent property of the system that arbitrates internationalization queries.

Both are possible. The Chrome implementation lets ICU handle as much as possible of the locale negotiation, and has some conformance bugs because of that. I handle most of it in JavaScript (with actual [[availableLocales]] and other internal properties), with better conformance, but possibly some duplication of effort.

> > +/**
> > + * Returns the object holding the internal properties for obj.
> > + */
> > +static bool
> > +GetInternals(JSContext *cx, HandleObject obj, MutableHandleObject internals)
> 
> Hoo boy, this is some fun times.  We really need to clean up the
> self-hosting/native code boundary and transitions and how you do them at
> some point.

This one can disappear when SpiderMonkey gets private names.
Attachment #718720 - Attachment is obsolete: true
Attachment #722073 - Flags: review?(jwalden+bmo)
Updated per comment 8.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #8)

> > +    JSCLASS_HAS_PRIVATE,
> 
> Privates are a moderately deprecated JSAPI feature.

Is there any documentation on what's deprecated and what's in fashion? It seems I had to discover a number of things over the last six months where current usage in SpiderMonkey is different from what the documentation would indicate - Value vs jsval, Rooted* vs reusing args, reserved slots vs private....

And what stopped me originally from using reserved slots was "reserved slots are visible to the garbage collector" with no mention of non-GC things:
https://developer.mozilla.org/en-US/docs/SpiderMonkey/JSAPI_Reference/JS_GetReservedSlot

> > +    if (coll)
> > +        ucol_close(coll);
> 
> ...presumably ucol_close also frees UCollator?

Eventually. The spec is fuzzy enough that the implementation might also keep the UCollator around and return it if you call ucol_open soon afterwards with matching parameters. But the spec says that not calling ucol_close will result in memory leaks, and there is no separate ucol_free or such.

> > +    UEnumeration *values = ucol_getKeywordValuesForLocale("co", locale.ptr(), false, &status);
> 
> I could be reading headers wrong, but shouldn't that first argument be
> "collation"?

You read correctly, but the documentation is incomplete: The BCP 47 equivalent for "collation", "co", is also supported. I filed
http://bugs.icu-project.org/trac/ticket/10021

> > +    int32_t count = uenum_count(values, &status);
> > +    if (U_FAILURE(status)) {
> > +        uenum_close(values);
> > +        return false;
> 
> We have mozilla::Scoped for use in handling these sorts of RAII situations. 
> Add #include "mozilla/Scoped.h" at the start of the #includes (in its own
> #include section), add this somewhere appropriate at top level:
> 
> MOZ_TYPE_SPECIFIC_SCOPED_POINTER_TEMPLATE(ScopedCloseValues, UEnumeration*,
> uenum_close)
> 
> and use ScopedCloseValues values = ucol_getKeyword... above.  And then get
> rid of all the uenum_closes everywhere.

This looks cool, but unfortunately I couldn't get it to work. Issues:

- The template requires TypeSpecificDelete to be in the same namespace as the Deleter functions (icu in my case), but then release() requires it to be in its namespace (mozilla).

- ICU declares several of its types as void* rather than using a struct, so the templates for UNumberFormat and UDateTimePatternGenerator collide with each other.

- There are cases where an object should be deleted on abnormal exits, but returned to the caller if everything goes well.

I implemented a simple ScopedICUObject class that handles this. A minor detail is that ICU doesn't document function return values if the status indicates an error, so I'm assigning the value to a Scoped thingy only after checking the status.

> > +        const char* collation = uenum_next(values, NULL, &status);
> 
> Don't pass NULL here.  Use an |int32_t length| local (hmm, I guess ICU must
> assume these types somehow -- hope there's a way to tell it to use
> mozilla/StandardInteger.h for them!), and pass along that length to equality
> methods, please, which should use strncmp.

I don't think strncmp is any better than strcmp for null-terminated strings.
Attachment #718721 - Attachment is obsolete: true
Attachment #722074 - Flags: review?(jwalden+bmo)
Updated per comment 8.
Attachment #718723 - Attachment is obsolete: true
Attachment #718723 - Flags: review?(jwalden+bmo)
Attachment #722075 - Flags: review?(jwalden+bmo)
Attachment #718724 - Attachment is obsolete: true
Attachment #722077 - Flags: review?(jwalden+bmo)
Attachment #722071 - Flags: checkin?(jwalden+bmo)
Comment on attachment 722071 [details] [diff] [review]
Implement ICU dependent functions of Intl.Collator, Intl.NumberFormat, Intl.DateTimeFormat (ICU stubs)

https://hg.mozilla.org/integration/mozilla-inbound/rev/14f64332f3ea
Attachment #722071 - Flags: checkin?(jwalden+bmo)
Comment on attachment 722073 [details] [diff] [review]
Implement ICU dependent functions of Intl.Collator, Intl.NumberFormat, Intl.DateTimeFormat (part 1)

Review of attachment 722073 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/Intl.cpp
@@ +37,1 @@
>  #include "unicode/utypes.h"

Hmm.  I think we might want to move these above the jsobjinlines.h #include, so that we have all *.h headers, then all {*-inl,*inlines}.h headers.  Sorry for all this code motion.  :-(

@@ +38,5 @@
>  
> +#if ENABLE_INTL_API
> +using icu::Locale;
> +using icu::NumberingSystem;
> +#endif

I think we've usually put using-declarations beneath using-namespaces.

@@ +524,5 @@
> +}
> +
> +// Simple RAII for ICU objects. MOZ_TYPE_SPECIFIC_SCOPED_POINTER_TEMPLATE
> +// unfortunately doesn't work because of namespace incompatibilities
> +// (TypeSpecificDelete cannot be in icu and mozilla at the same time)

Hmm?  I'm not sure I quite see how both-in-icu and mozilla is happening here.  But the void* issue is obvious enough regardless.  (If only C/C++ had generative typedefs...)

@@ +532,5 @@
> +class ScopedICUObject
> +{
> +    typedef T* type;
> +    type ptr;
> +    void (* deleter)(type);

|type| being purely for convenience in the scoped stuff, just use T* directly everywhere in here, and get rid of type.

@@ +537,5 @@
> +
> +  public:
> +    ScopedICUObject(type ptr, void (* deleter)(type)) :
> +        ptr(ptr),
> +        deleter(deleter)

SpiderMonkey style is trending toward trailing _ on field names.  Also some whitespace alignment nits:

ScopedICUObject(T *ptr, void (*deleter)(T*))
  : ptr_(ptr),
    deleter_(deleter)
{}

@@ +550,5 @@
> +    // but returned to the caller if everything goes well, call keep()
> +    // just before returning.
> +    void keep() {
> +        ptr = NULL;
> +    }

Canonically this would be:

T * forget() {
    T *tmp = ptr;
    ptr = NULL;
    return tmp;
}

then making the return-the-value idiom read |return scope.forget();| or so.

@@ +553,5 @@
> +        ptr = NULL;
> +    }
> +};
> +
> +static const int STACK_STRING_SIZE = 50;

Are subsequent uses of this going to require int?  size_t would be better.

@@ +555,5 @@
> +};
> +
> +static const int STACK_STRING_SIZE = 50;
> +
> +static const int ICU_OBJECT_SLOT = 0;

JS slot numbers are uint32_t.
Attachment #722073 - Flags: review?(jwalden+bmo) → review+
Updated per comment 17. Carrying r+jwalden.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #17)

> > +// (TypeSpecificDelete cannot be in icu and mozilla at the same time)
> 
> Hmm?  I'm not sure I quite see how both-in-icu and mozilla is happening
> here.

See comment 11.
Attachment #722073 - Attachment is obsolete: true
Attachment #724287 - Flags: review+
Attachment #724287 - Flags: checkin?(jwalden+bmo)
https://hg.mozilla.org/integration/mozilla-inbound/rev/acad61792b53 to kill off all the unused-function warnings introduced by that single push.  :-(  Should have noticed those when reviewing, and/or I should have test-built it before pushing.
Comment on attachment 722074 [details] [diff] [review]
Implement ICU dependent functions of Intl.Collator, Intl.NumberFormat, Intl.DateTimeFormat (part 2)

Review of attachment 722074 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/Intl.cpp
@@ +560,5 @@
>  
>  
>  /******************** Collator ********************/
>  
> +static void collator_finalize(FreeOp *fop, RawObject obj);

Fun fun fun, this should be JSObject *obj now.  Apparently the Raw typedefs are going away and shouldn't be used.  This particular case, the change is happening because use of the raw pointers is in flux, with ongoing GC work.  You're close enough to bleeding edge with that, that there's no real way you could have learned that, other than from a review, I think.

In general, however, we don't have good documentation about what's new and well-supported and what's deprecated.  :-(  I don't have any good answers here, except to say that APIs in js/public are more likely to be clean and non-deprecated than APIs in js/src.  But that space is still being built out now, so it's pretty rare that the thing you want will be in there yet, unfortunately.

I updated https://developer.mozilla.org/en-US/docs/SpiderMonkey/JSAPI_Reference/JS_GetReservedSlot to hopefully somewhat clarify the slots and private values situation, at least as far as the slots half goes.  I'll try to update the privates situation in a moment.  (Private values should really be documented on the JS::Value or jsval page, but I tried to get them to move jsval to JS::Value and apparently hit an MDN bug.  That page won't be updated to mention private value details until one page, and its history, have been restored.)

@@ +566,3 @@
>  static Class CollatorClass = {
>      js_Object_str,
> +    JSCLASS_HAS_RESERVED_SLOTS(1),

static const uint32_t COLLATOR_SLOTS_COUNT = 1;

Put that just above this, and move the ICU_OBJECT_SLOT const just above it, so all the slots, and the total count, are in one place.  Also, please rename ICU_OBJECT_SLOT to UCOLLATOR_SLOT.  I assume from the name that you're using ICU_OBJECT_SLOT for all Intl objects.  But really slot numbers are intrinisically tied to single Classes, so there should be one const per class, not all using the same const.  A single const just makes for more trouble when any of the individual classes using it decides it wants more reserved slots.

@@ +657,5 @@
>  
> +static void
> +collator_finalize(FreeOp *fop, RawObject obj)
> +{
> +    UCollator *coll = (UCollator *) obj->getReservedSlot(ICU_OBJECT_SLOT).toPrivate();

Use a C++ static_cast<> here.

@@ +797,5 @@
> +
> +        RootedString jscollation(cx, JS_NewStringCopyZ(cx, collation));
> +        if (!jscollation) {
> +            return false;
> +        }

No {} here.
Attachment #722074 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 722075 [details] [diff] [review]
Implement ICU dependent functions of Intl.Collator, Intl.NumberFormat, Intl.DateTimeFormat (part 3)

Review of attachment 722075 [details] [diff] [review]:
-----------------------------------------------------------------

General note, but could you add SUPPRESS_UNUSED_WARNING to every function that's not yet used, please?  It'll save you the hassle of dealing with my manual additions of that, in your own patch queue.

::: js/src/builtin/Intl.cpp
@@ +818,5 @@
> +    RootedValue value(cx);
> +
> +    RootedObject internals(cx);
> +    if (!GetInternals(cx, collator, &internals))
> +        return NULL;

This is the object that you originally had __locale and all those internal properties underscore-prefixed, until we realized it was never exposed and so not necessary, right?  Just want to be sure I understand exactly why the properties/value of this object are constrained, since we're blithely treating those values as having specific types.

@@ +822,5 @@
> +        return NULL;
> +
> +    if (!JS_GetProperty(cx, internals, "locale", value.address())) {
> +        return NULL;
> +    }

No {}.  This also should use JSObject::getProperty, and you should add a CommonPropertyName for locale if there isn't one, and use JSObject::getProperty with cx->names().locale.  Same for all the other JS_GetProperty calls in this patch, and any others, that compare to constant strings.

@@ +827,5 @@
> +    JSAutoByteString locale(cx, value.toString());
> +    if (!locale)
> +        return NULL;
> +
> +    // UCollator options with default values.

I find myself wondering if these would be better placed next to the code that modifies them.  It would be clearer what value's used at each point, then.  (I'd also not be scrolling back and forth between default and modification, when reviewing this.  :-) )  Would make knowing you'd set all values more difficult, tho.  Dunno.  Anyway, not asking for a change, just raising the point in case you have particular feelings here.

@@ +834,5 @@
> +    UColAttributeValue uAlternate = UCOL_DEFAULT;
> +    UColAttributeValue uNumeric = UCOL_OFF;
> +    // Normalization is always on to meet the canonical equivalence requirement.
> +    UColAttributeValue uNormalization = UCOL_ON;
> +    UColAttributeValue uCaseFirst = UCOL_DEFAULT;

The header I'm reading says acceptable values for UCOL_CASE_FIRST are upper-first, lower-first, or off.  Should this be UCOL_OFF?

@@ +848,5 @@
> +        const char *oldLocale = locale.ptr();
> +        const char *p;
> +        const char *insert;
> +        uint32_t localeLen, insertLen, index;
> +        localeLen = strlen(oldLocale);

This is guaranteed to be a structurally-valid locale tag, right?  Looking at spec algorithms I think it is, just would like a logical double-check (assertion would be ideal, but I guess we self-hosted all the validity code :-\ ) before we start doing strlen and implicitly assuming no null bytes.

Declare/define localeLen all in one step, please.

@@ +850,5 @@
> +        const char *insert;
> +        uint32_t localeLen, insertLen, index;
> +        localeLen = strlen(oldLocale);
> +        if ((p = strstr(oldLocale, "-x-")))
> +            index = p - oldLocale;

Make index and localeLen size_t, please, so all operations have well-defined unsigned-value semantics.

@@ +853,5 @@
> +        if ((p = strstr(oldLocale, "-x-")))
> +            index = p - oldLocale;
> +        else
> +            index = localeLen;
> +        if ((p = strstr(oldLocale, "-u-")) && (p - oldLocale < index)) {

Move the const char *insert down to here, so it's clearer it's not used above.  And add a blank line before this, too.  And don't parenthesize the second half of this, please.

@@ +859,5 @@
> +            insert = "-co-search";
> +        } else {
> +            insert = "-u-co-search";
> +        }
> +        insertLen = strlen(insert);

insertLen should be declared here.

@@ +860,5 @@
> +        } else {
> +            insert = "-u-co-search";
> +        }
> +        insertLen = strlen(insert);
> +        char *newLocale = (char*) JS_malloc(cx, localeLen + insertLen + 1);

cx->pod_malloc<char>(localeLen + insertLen + 1)

@@ +864,5 @@
> +        char *newLocale = (char*) JS_malloc(cx, localeLen + insertLen + 1);
> +        if (!newLocale)
> +            return NULL;
> +        uint32_t i;
> +        for (i = localeLen; i >= index; i--)

It'd be nice to see this go from start of string to end, without so much manual looping.  How about this?

memcpy(newLocale, oldLocale, index);
memcpy(newLocale + index, insert, insertLen);
memcpy(newLocale + index + insertLen, oldLocale + index, localeLen - index + 1); // '\0'

@@ +891,5 @@
> +    } else if (equal(sensitivity, "case")) {
> +        uStrength = UCOL_PRIMARY;
> +        uCaseLevel = UCOL_ON;
> +    } else {
> +        uStrength = UCOL_TERTIARY;

You can assert the string is "variant" here, right?

@@ +897,5 @@
> +
> +    if (!JS_GetProperty(cx, internals, "ignorePunctuation", value.address()))
> +        return NULL;
> +    if (value.toBoolean())
> +        uAlternate = UCOL_SHIFTED;

I will candidly admit to not understanding at all what's going on here.  Could you explain how this is supposed to effect [[ignorePunctuation]]'s behavior?

@@ +913,5 @@
> +            return NULL;
> +        if (equal(caseFirst, "upper"))
> +            uCaseFirst = UCOL_UPPER_FIRST;
> +        else if (equal(caseFirst, "lower"))
> +            uCaseFirst = UCOL_LOWER_FIRST;

Else assert equal(caseFirst, "false"), please.

@@ +928,5 @@
> +    ucol_setAttribute(coll, UCOL_CASE_LEVEL, uCaseLevel, &status);
> +    ucol_setAttribute(coll, UCOL_ALTERNATE_HANDLING, uAlternate, &status);
> +    ucol_setAttribute(coll, UCOL_NUMERIC_COLLATION, uNumeric, &status);
> +    ucol_setAttribute(coll, UCOL_NORMALIZATION_MODE, uNormalization, &status);
> +    ucol_setAttribute(coll, UCOL_CASE_FIRST, uCaseFirst, &status);

I don't see in docs where it's valid to keep setting attributes after a failure.  If I'm not mistaken about this, could you introduce a Scoped object here to handle closing/reporting, and then have |if (U_FAILURE(status)) / return NULL;| after each of these?  (And then return coll.forget() at the end.)

@@ +950,5 @@
> +        return true;
> +    }
> +
> +    size_t length1 = str1->length();
> +    jschar const *chars1 = str1->getChars(cx);

We'd have the const qualifier first, so const jschar *.

@@ +958,5 @@
> +    jschar const *chars2 = str2->getChars(cx);
> +    if (!chars2)
> +        return false;
> +
> +    UCollationResult uresult = ucol_strcoll(coll, chars1, length1, chars2, length2);

Hmm.  I guess we're assuming jschar == UChar, above and beyond the two types being compatible.  Hopefully that won't break anywhere, but I have my doubts.  :-\  Do you have try server access to test this, or should I do so before pushing?

@@ +982,5 @@
> +
> +    RootedObject collator(cx, &args[0].toObject());
> +
> +    // Obtain a UCollator object, cached if possible.
> +    bool isCollatorInstance = collator->getClass() == &CollatorClass;

Hmm.  I find myself wondering how this works with Collator instances from other globals.  Given we're using per-global weakmaps of internal properties, right now I think it probably doesn't matter.

Could you add a comment like "// XXX Does this handle Collator instances from other globals correctly?" just so the potential issue is noted?
Attachment #722075 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 722077 [details] [diff] [review]
Implement ICU dependent functions of Intl.Collator, Intl.NumberFormat, Intl.DateTimeFormat (part 4)

Review of attachment 722077 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/Intl.cpp
@@ +1019,5 @@
>  
>  
>  /******************** NumberFormat ********************/
>  
> +static void numberFormat_finalize(FreeOp *fop, RawObject obj);

More JSObject * as before.

@@ +1116,5 @@
>  
> +static void
> +numberFormat_finalize(FreeOp *fop, RawObject obj)
> +{
> +    UNumberFormat *nf = (UNumberFormat *) obj->getReservedSlot(ICU_OBJECT_SLOT).toPrivate();

static_cast<> as before

@@ +1193,5 @@
> +    CallArgs args = CallArgsFromVp(argc, vp);
> +    JS_ASSERT(args.length() == 0);
> +
> +    RootedValue result(cx);
> +    if (!intl_availableLocales(cx, unum_countAvailable, unum_getAvailable, &result))

I think this lets you take a SUPPRESS_BLAH off that method, probably.  I'll try to notice these as they happen, but no guarantees.  Too bad there's no attribute to say never-used, rather than can-be-never-used.

@@ +1221,5 @@
> +        return false;
> +    }
> +    const char *name = numbers->getName();
> +    JSString *jsname = JS_NewStringCopyZ(cx, name);
> +    delete numbers;

Hoo boy, this is some implementation dumpster-diving.

::: js/src/builtin/Intl.h
@@ +87,5 @@
> + * Returns the numbering system type identifier per Unicode
> + * Technical Standard 35, Unicode Locale Data Markup Language, for the
> + * default numbering system for the given locale.
> + *
> + * Usage: defaultNumberingSystem = intl_numberingSystem(locale)

The docs I saw suggested |locale| could be something like "en_US", for the NumberingSystem stuff you're passing along.  Are language tags also accepted?
Attachment #722077 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 722078 [details] [diff] [review]
Implement ICU dependent functions of Intl.Collator, Intl.NumberFormat, Intl.DateTimeFormat (part 5)

Review of attachment 722078 [details] [diff] [review]:
-----------------------------------------------------------------

Given the fragility of memory management, especially at API mismatch points like this, I think I want to see a new version for the StringBuffer use in it.

::: js/src/builtin/Intl.cpp
@@ +1241,5 @@
> +    RootedObject internals(cx);
> +    if (!GetInternals(cx, numberFormat, &internals))
> +       return NULL;
> +
> +    if (!JS_GetProperty(cx, internals, "locale", value.address())) {

Same cx->names().locale mumble mumble again.

@@ +1270,5 @@
> +    if (equal(style, "currency")) {
> +        if (!JS_GetProperty(cx, internals, "currency", value.address()))
> +            return NULL;
> +        // uCurrency remains owned by value.toString()
> +        uCurrency = JS_GetStringCharsZ(cx, value.toString());

This comment is sort of right.  The chars are owned by the string, but the string's not rooted locally after the next getProperty.  It's rooted through the rooting of the internals object, which would root the string, but this rooting chain gets a bit strained.  Could you save the string into a root that lives for the whole method, instead?

Also I think you need a SkipRoot skip(cx, &uCurrency); so that the in-progress rooting analysis builds don't get confused and try to poison this pointer under you.  This would need to stay valid until uCurrency is used, so probably it goes next to the declaration of that.  No, we don't particularly have documentation of this.  :-\  I only knew just enough to know to ask someone if that was necessary, and really no more than that -- we have just about nothing for this now, except some tinderbox builds that will probably turn orange without it.

Later on you implicitly assume this string's length is 3.  That follows from IsWellFormedCurrencyCode, right?  Add a MOZ_ASSERT(value.toString()->length() == 3, "IsWellFormedCurrencyCode permits only length-3 strings"); so the reason for this is clear.

@@ +1281,5 @@
> +            uStyle = UNUM_CURRENCY_ISO;
> +        else if (equal(currencyDisplay, "symbol"))
> +            uStyle = UNUM_CURRENCY;
> +        else
> +            uStyle = UNUM_CURRENCY_PLURAL;

Assert this is "name"?

@@ +1285,5 @@
> +            uStyle = UNUM_CURRENCY_PLURAL;
> +    } else if (equal(style, "percent")) {
> +        uStyle = UNUM_PERCENT;
> +    } else {
> +        uStyle = UNUM_DECIMAL;

Assert this is "decimal"?

@@ +1294,5 @@
> +        return NULL;
> +    if (hasP) {
> +        if (!JS_GetProperty(cx, internals, "minimumSignificantDigits", value.address()))
> +            return NULL;
> +        uMinimumSignificantDigits = value.toInt32();

Hmm.  If this is actually always guaranteed to be an int32_t Value -- that is, isInt32(), versus isDouble() -- it's pretty obscure.  (I suspect this value can flow from script in various ways, which can produce isDouble() versions of the integer values.  An attacker would have to know that certain methods produce isDouble() values, and combine them just right to get an integer-valued double, but it *is* a hazard.)  Could you use int32_t(value.toNumber()) to be safe?  Same for all these minimum/maximum values.

@@ +1315,5 @@
> +        return NULL;
> +    uUseGrouping = value.toBoolean();
> +
> +    UErrorCode status = U_ZERO_ERROR;
> +    UNumberFormat *nf = unum_open(uStyle, NULL, 0, icuLocale(locale.ptr()), NULL, &status);

It's again permissible to pass in language tags here, and not _-separated stuff?

The UParseError* here is only for parse errors in the pattern we're not providing, correct?

@@ +1345,5 @@
> +        return NULL;
> +    }
> +
> +    toClose.keep();
> +    return nf;

return toClose.forget();

@@ +1353,5 @@
> +intl_FormatNumber(JSContext *cx, UNumberFormat *nf, double x, MutableHandleValue result)
> +{
> +    if (x == 0.0)
> +        // could be -0.0, which we don't want to see in output
> +        x = 0.0;

Style nit -- don't want a multiline unbraced if -- and a substantive bit about using something clearer about what's being done:

// FormatNumber doesn't consider -0.0 to be negative.
if (MOZ_DOUBLE_IS_NEGATIVE_ZERO(x))
    x = 0.0;

@@ +1356,5 @@
> +        // could be -0.0, which we don't want to see in output
> +        x = 0.0;
> +
> +    jschar stackChars[STACK_STRING_SIZE + 1];
> +    jschar *chars = stackChars;

Rather than manual memory management of maybe-stacky memory, you should use StringBuffer from vm/StringBuffer.h.  It's a little awkward because you'll have to resize(32) (that's the current internal buffer size) and then use begin() to access the raw internal pointer.  (You'll have to change STACK_STRING_SIZE to 32 as well.)  But it seems better reusing that allocation logic than open-coding it.

Note that because you don't actually care about null-termination, there's no reason not to use the full 32 for the input length here.  unum_formatDouble's documentation is actually surprisingly vague about whether the output is null-terminated.  If I read the tea leaves of the implementation just right (it flows into UnicodeString::extract(UChar* dest, int32_t destCapacity, UErrorCode&) which is clear about this), I *think* if the output is length-32 without null termination, and the input buffer is length 32, status will be U_STRING_NOT_TERMINATED_WARNING and not a U_FAILURE.  (size is the length of the formatted string not including '\0'.)  (This actually seems like a documentation bug to me, that unum_formatDouble doesn't clearly say what null-termination behavior is.)  We don't care about null-termination, because we're using a CopyN function below currently.  So we shouldn't add in this +1 everywhere.

@@ +1372,5 @@
> +            js_free(chars);
> +        JS_ReportErrorNumber(cx, js_GetErrorMessage, NULL, JSMSG_INTERNAL_INTL_ERROR);
> +        return false;
> +    }
> +

Apropos of previous patches, I just stumbled across

 * ICU functions that take a reference (C++) or a pointer (C) to a UErrorCode
 * first test if(U_FAILURE(errorCode)) { return immediately; }
 * so that in a chain of such functions the first one that sets an error code
 * causes the following ones to not perform any operations.

So ignore my comment about scoped and checking for failure every time in one of the previous patches.  \o/

@@ +1373,5 @@
> +        JS_ReportErrorNumber(cx, js_GetErrorMessage, NULL, JSMSG_INTERNAL_INTL_ERROR);
> +        return false;
> +    }
> +
> +    RootedString str(cx, JS_NewUCStringCopyN(cx, chars, size));

Once StringBuffer is used, this can be bs.finishString() returned into a JSString *.  You'll have to null-check the result, as string creation can fail if the stackful buffer was used, and allocating for the chars or whatever failed.

@@ +1397,5 @@
> +    // Obtain a UNumberFormat object, cached if possible.
> +    bool isNumberFormatInstance = numberFormat->getClass() == &NumberFormatClass;
> +    UNumberFormat *nf;
> +    if (isNumberFormatInstance) {
> +        nf = (UNumberFormat *) numberFormat->getReservedSlot(ICU_OBJECT_SLOT).toPrivate();

I think I forgot to mention this reviewing the last patch or two, but add new constants for the slot and slot count for these classes.

::: js/src/builtin/Intl.h
@@ +93,5 @@
>  extern JSBool
>  intl_numberingSystem(JSContext *cx, unsigned argc, Value *vp);
>  
> +/**
> + * Returns a String value representing x (which must be a Number value)

"must be a number" is slightly clearer, and eliminates |new Number(5)|, if the reader started to wonder.
Attachment #722078 - Flags: review?(jwalden+bmo) → review-
Updated per comment 22. Carrying r+jwalden.
Attachment #722074 - Attachment is obsolete: true
Attachment #725280 - Flags: review+
Attachment #725280 - Flags: checkin?(jwalden+bmo)
Updated per comment 23. Carrying r+jwalden.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #23)

> General note, but could you add SUPPRESS_UNUSED_WARNING to every function
> that's not yet used, please?  It'll save you the hassle of dealing with my
> manual additions of that, in your own patch queue.

So we don't have to add and remove these macros like crazy, can you just check in parts 1-3 together? At that point, there shouldn't be any unused non-stub functions left.

> > +    RootedObject internals(cx);
> > +    if (!GetInternals(cx, collator, &internals))
> > +        return NULL;
> 
> This is the object that you originally had __locale and all those internal
> properties underscore-prefixed, until we realized it was never exposed and
> so not necessary, right?

Correct. For Collator instances, this object has exactly the internal properties specified for these objects. For NumberFormat and DateTimeFormat, there are some variations.

> > +    // UCollator options with default values.
> 
> I find myself wondering if these would be better placed next to the code
> that modifies them.  It would be clearer what value's used at each point,
> then.  (I'd also not be scrolling back and forth between default and
> modification, when reviewing this.  :-) )  Would make knowing you'd set all
> values more difficult, tho.  Dunno.  Anyway, not asking for a change, just
> raising the point in case you have particular feelings here.

No particular feelings. At some point I had shortcuts for argument-less localeCompare and toLocaleString, and in that shortcut the default values were filled in without even creating a Collator, NumberFormat, or DateTimeFormat. Now that doesn't matter any more.

> > +    UColAttributeValue uCaseFirst = UCOL_DEFAULT;
> 
> The header I'm reading says acceptable values for UCOL_CASE_FIRST are
> upper-first, lower-first, or off.  Should this be UCOL_OFF?

A bit further up in ucol.h there's the hint "All the attributes can take UCOL_DEFAULT value", and since I want the locale default here, using UCOL_DEFAULT seems appropriate.

> > +        const char *oldLocale = locale.ptr();

> > +        localeLen = strlen(oldLocale);
> 
> This is guaranteed to be a structurally-valid locale tag, right?

Yes.

> > +    if (!JS_GetProperty(cx, internals, "ignorePunctuation", value.address()))
> > +        return NULL;
> > +    if (value.toBoolean())
> > +        uAlternate = UCOL_SHIFTED;
> 
> I will candidly admit to not understanding at all what's going on here. 
> Could you explain how this is supposed to effect [[ignorePunctuation]]'s
> behavior?

Added comment. The ICU documentation is no help here.

> > +    ucol_setAttribute(coll, UCOL_CASE_LEVEL, uCaseLevel, &status);
> > +    ucol_setAttribute(coll, UCOL_ALTERNATE_HANDLING, uAlternate, &status);
> > +    ucol_setAttribute(coll, UCOL_NUMERIC_COLLATION, uNumeric, &status);
> > +    ucol_setAttribute(coll, UCOL_NORMALIZATION_MODE, uNormalization, &status);
> > +    ucol_setAttribute(coll, UCOL_CASE_FIRST, uCaseFirst, &status);
> 
> I don't see in docs where it's valid to keep setting attributes after a
> failure.

Added comment.

> > +    jschar const *chars2 = str2->getChars(cx);
> > +    if (!chars2)
> > +        return false;
> > +
> > +    UCollationResult uresult = ucol_strcoll(coll, chars1, length1, chars2, length2);
> 
> Hmm.  I guess we're assuming jschar == UChar, above and beyond the two types
> being compatible.

Both are specified to be UTF-16 code units:
http://userguide.icu-project.org/unicode#TOC-Programming-using-UTFs
http://ecma-international.org/ecma-262/5.1/#sec-8.4

> Hopefully that won't break anywhere, but I have my
> doubts.  :-\  Do you have try server access to test this, or should I do so
> before pushing?

Most recent try server build with Intl enabled:
https://tbpl.mozilla.org/?tree=Try&rev=4b5645a415c7

Collation demo:
http://lindenbergsoftware.com/demo/Collation.html
v1-v4 swap in different source code variants, which you can also edit. Editable input is in the lower left text box. The Sort button runs the source and places the output into the lower right text box.

Conformance tests:
http://test262.ecmascript.org/testcases_intl402.html
Relevant here are the 10.3.2_CS_* tests.
Attachment #722075 - Attachment is obsolete: true
Attachment #725282 - Flags: review+
Attachment #725282 - Flags: checkin?(jwalden+bmo)
(In reply to Norbert Lindenberg from comment #27)
> So we don't have to add and remove these macros like crazy, can you just
> check in parts 1-3 together? At that point, there shouldn't be any unused
> non-stub functions left.

Sounds good, doing that now.

> > > +    UCollationResult uresult = ucol_strcoll(coll, chars1, length1, chars2, length2);
> > 
> > Hmm.  I guess we're assuming jschar == UChar, above and beyond the two types
> > being compatible.
> 
> Both are specified to be UTF-16 code units:
> http://userguide.icu-project.org/unicode#TOC-Programming-using-UTFs
> http://ecma-international.org/ecma-262/5.1/#sec-8.4

The issue is that if T and U are different types -- even if they're compatible -- you can't pass T* where U* was expected:

  wchar_t* wp = 0;
  uint16_t* up = 0;
  void foo(wchar_t* p)
  {
    foo(up); // ERROR: if wchar_t != uint16_t, can't pass uint16_t* when wchar_t* expected
  }

Looking at headers, we have:

  #if defined(UCHAR_TYPE)
      typedef UCHAR_TYPE UChar;
  /* Not #elif U_HAVE_CHAR16_T -- because that is type-incompatible with pre-C++11 callers
      typedef char16_t UChar;  */
  #elif U_SIZEOF_WCHAR_T==2
      typedef wchar_t UChar;
  #elif defined(__CHAR16_TYPE__)
      typedef __CHAR16_TYPE__ UChar;
  #else
      typedef uint16_t UChar;
  #endif

and:

  #ifdef WIN32
  typedef wchar_t   jschar;
  #else
  typedef uint16_t  jschar;
  #endif

We probably don't want to define UCHAR_TYPE because it'll make using system ICU hard, and I don't want to make that impossible in case we ever want to use it ourselves (say on b2g).  Thus Windows is fine as wchar_t is two bytes there.  So it's a question of whether __CHAR16_TYPE__ is the same type as uint16_t or not.  http://gcc.gnu.org/ml/gcc-patches/2008-11/msg00258.html suggests they may not be.  But if this worked on try, that's enough for now.  If anywhere else encounters bustage, we'll have to reinterpret_cast<> or |typedef UChar jschar;|.  I guess we cross that bridge when we get to it.
Comment on attachment 724287 [details] [diff] [review]
Implement ICU dependent functions of Intl.Collator, Intl.NumberFormat, Intl.DateTimeFormat (part 1)

https://hg.mozilla.org/integration/mozilla-inbound/rev/e1cc50bfee41
Attachment #724287 - Flags: checkin?(jwalden+bmo)
Comment on attachment 725280 [details] [diff] [review]
Implement ICU dependent functions of Intl.Collator, Intl.NumberFormat, Intl.DateTimeFormat (part 2)

https://hg.mozilla.org/integration/mozilla-inbound/rev/f723856dac07
Attachment #725280 - Flags: checkin?(jwalden+bmo)
Comment on attachment 725282 [details] [diff] [review]
Implement ICU dependent functions of Intl.Collator, Intl.NumberFormat, Intl.DateTimeFormat (part 3)

https://hg.mozilla.org/integration/mozilla-inbound/rev/c3d72dcbbe94
Attachment #725282 - Flags: checkin?(jwalden+bmo)
...and https://hg.mozilla.org/integration/mozilla-inbound/rev/0ed3d38e0e4d to remove suppression macros from the !ENABLE_INTL_API functions where it can be removed now.
Updated per comment 24. Carrying r+jwalden.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #24)

> > +    const char *name = numbers->getName();
> > +    JSString *jsname = JS_NewStringCopyZ(cx, name);
> > +    delete numbers;
> 
> Hoo boy, this is some implementation dumpster-diving.

Meaning...? Suggestions?

> > + * Usage: defaultNumberingSystem = intl_numberingSystem(locale)
> 
> The docs I saw suggested |locale| could be something like "en_US", for the
> NumberingSystem stuff you're passing along.  Are language tags also accepted?

Hmm. Yes in the sense that I know the ICU team has done a lot of work to enable BCP 47 language tags in their API, and that I do get properly localized behavior when I pass in language tags. No in the sense that there's no documentation I could use to prove to you that this is actually supposed to work. I filed yet another ICU ticket:
http://bugs.icu-project.org/trac/ticket/10040
Attachment #722077 - Attachment is obsolete: true
Attachment #725687 - Flags: review+
Attachment #725687 - Flags: checkin?(jwalden+bmo)
Updated per comment 25.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #25)

> > +    UNumberFormat *nf = unum_open(uStyle, NULL, 0, icuLocale(locale.ptr()), NULL, &status);
> 
> It's again permissible to pass in language tags here, and not _-separated
> stuff?

Same story as in comment 33.

> The UParseError* here is only for parse errors in the pattern we're not
> providing, correct?

Correct.

> > +    if (x == 0.0)
> > +        // could be -0.0, which we don't want to see in output
> > +        x = 0.0;
> 
> Style nit -- don't want a multiline unbraced if --

It seems you guys spend way too much brain power thinking about braces. In Java-land, there's a simple rule: use braces for any compound statement; opening brace at the end of the line, closing at the beginning. Most JavaScript style guides have adopted the same rule. My right pinky knows the rule; from 1997 until I started working on SpiderMonkey my brain never had to worry about it. But in SpiderMonkey there's a page of rules to consider, and undocumented add-ons to the rule still appear, and adding an assertion or comment in one line can trigger multiple brace changes across dozens of lines. That's a lot of effort just to save a bit of space here and there...

> (This actually seems like a documentation bug to me, that unum_formatDouble
> doesn't clearly say what null-termination behavior is.)

I filed
http://bugs.icu-project.org/trac/ticket/10042

> > + * Returns a String value representing x (which must be a Number value)
> 
> "must be a number" is slightly clearer, and eliminates |new Number(5)|, if
> the reader started to wonder.

Number objects are not Number values - see ES5 4.3.19 and 4.3.21.
Attachment #722078 - Attachment is obsolete: true
Attachment #725688 - Flags: review?(jwalden+bmo)
Attachment #722079 - Attachment is obsolete: true
Attachment #725706 - Flags: review?(jwalden+bmo)
Comment on attachment 725687 [details] [diff] [review]
Implement ICU dependent functions of Intl.Collator, Intl.NumberFormat, Intl.DateTimeFormat (part 4)

https://hg.mozilla.org/integration/mozilla-inbound/rev/605348ff1ee6

(In reply to Norbert Lindenberg from comment #33)
> > > +    const char *name = numbers->getName();
> > > +    JSString *jsname = JS_NewStringCopyZ(cx, name);
> > > +    delete numbers;
> > 
> > Hoo boy, this is some implementation dumpster-diving.
> 
> Meaning...? Suggestions?

I was referring to the ugliness of knowing that it's proper to dispose of the NumberSystem object by deleting it.  I had to dig into the initial creation method quite a ways to see the |new| this pairs up with.  But hey, it works...at least up until we start building against system ICU, at which point I suspect this delete will be a mismatch with the corresponding new in ICU.  (Mozilla -- not SpiderMonkey -- defines its own malloc implementation.)  Bug 724531 comment 29 mentions this; hopefully whoever adds --with-system-icu can make things kosher here somehow.

I don't have any suggestions for anything better here.  If I'd had them I'd have said it more clearly.  :-)
Attachment #725687 - Flags: checkin?(jwalden+bmo)
(In reply to Norbert Lindenberg from comment #38)
> Created attachment 725706 [details] [diff] [review]
> Implement ICU dependent functions of Intl.Collator, Intl.NumberFormat,
> Intl.DateTimeFormat (part 9)

I should highlight that this change renders BasicFormatMatcher and BestFitFormatMatcher unused, and relies on ICU's DateTimePatternGenerator to construct the pattern for the provided options. As it turns out, DateTimePatternGenerator doesn't conform to the spec for BasicFormatMatcher, and I have an item on my to-do list to find a solution for this issue.
Depends on: 851951
(In reply to Norbert Lindenberg from comment #34)
> It seems you guys spend way too much brain power thinking about braces. In
> Java-land, there's a simple rule: use braces for any compound statement;
> opening brace at the end of the line, closing at the beginning. Most
> JavaScript style guides have adopted the same rule. My right pinky knows the
> rule; from 1997 until I started working on SpiderMonkey my brain never had
> to worry about it. But in SpiderMonkey there's a page of rules to consider,
> and undocumented add-ons to the rule still appear, and adding an assertion
> or comment in one line can trigger multiple brace changes across dozens of
> lines. That's a lot of effort just to save a bit of space here and there...

I think the problem you have is you aren't, and haven't, edited existing code much.  None of us around here spend much time thinking about any of this.  We picked it up in our first several patches, then it's second nature.

How about this: seeing as you're mostly not inserting small bits in existing code, I'll just fix up style issues post-review, pre-checkin.  You might get occasional conflicts as a result, but if you're adding large hunks at a time, there shouldn't be much trouble dealing with them, even purely with .rej files.  That seem reasonable to you?
Comment on attachment 725688 [details] [diff] [review]
Implement ICU dependent functions of Intl.Collator, Intl.NumberFormat, Intl.DateTimeFormat (part 5)

Review of attachment 725688 [details] [diff] [review]:
-----------------------------------------------------------------

I'm assuming I'll touch up style nits on a ready-for-pushing version of this, now.

::: js/src/builtin/Intl.cpp
@@ +561,5 @@
>          return tmp;
>      }
>  };
>  
> +static const size_t INITIAL_STRING_BUFFER_SIZE = 32;

Probably worth explaining where this number came from, since it *did* come from somewhere:

// As a small optimization (not important for correctness), this is the inline capacity of a StringBuffer.

@@ +1372,5 @@
> +        unum_setTextAttribute(nf, UNUM_CURRENCY_CODE, uCurrency, 3, &status);
> +        if (U_FAILURE(status)) {
> +            JS_ReportErrorNumber(cx, js_GetErrorMessage, NULL, JSMSG_INTERNAL_INTL_ERROR);
> +            return NULL;
> +        }

Now that I know about the UErrorCode outparam bit being documented behavior, it seems to me we should use it whenever it makes sense, to avoid JS error reporting noise.  Given that, we should remove this test and let the final one in the method do the work.  (The one just after unum_open being exceptional because it has to happen before the scoped-object bit happens.)  We should add a comment at the top of the file mentioning that ICU methods have this behavior, maybe like so:

/*
 * Pervasive note: ICU functions taking a UErrorCode in/out param always test
 * that param before doing anything, and will return immediately in such cases
 * without doing anything.
 */

::: js/src/builtin/Intl.h
@@ +93,5 @@
>  extern JSBool
>  intl_numberingSystem(JSContext *cx, unsigned argc, Value *vp);
>  
> +/**
> + * Returns a String value representing x (which must be a Number value)

Sure, Number value is never a Number object.  The spec language surrounding types is awkward through its insistence on using capital-typed names for the types, when those names refer to the constructors in ordinary parlance.  So I usually try to avoid it when possible.  How about this -- shorter, simpler, unambiguous when you consider that String object and Number object would be spelled out if that were meant:

Returns a string representing the number x, according to the effective locale and the formatting options of the given NumberFormat.
Attachment #725688 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 725703 [details] [diff] [review]
Implement ICU dependent functions of Intl.Collator, Intl.NumberFormat, Intl.DateTimeFormat (part 6)

Review of attachment 725703 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/Intl.cpp
@@ +1655,5 @@
> +
> +// ICU returns old-style keyword values; map them to BCP 47 equivalents
> +// (see http://bugs.icu-project.org/trac/ticket/9620).
> +const char *
> +bcp47CalendarName(const char *icuName)

This should be static.

@@ +1685,5 @@
> +    RootedId indexId(cx);
> +
> +    // We need the default calendar for the locale as the first result.
> +    UErrorCode status = U_ZERO_ERROR;
> +    UCalendar *cal = ucal_open(NULL, -1, locale.ptr(), UCAL_DEFAULT, &status);

From ICU docs:

 * @param zoneID The desired TimeZone ID.  If 0, use the default time zone.
 * @param len The length of zoneID, or -1 if null-terminated.

It reads a little bit weird to say -1 here, because zoneID *isn't* null-terminated.  0 would be a little less confusing.  There might be a docs nit hiding behind this, possibly.  Normally I expect APIs like this to take ptr/length and treat the length as the canonical unit of determination.  Having the pointer be primary when null, and secondary when not, is a little strange to me.

Stumbling through ICU headers a bit, I happened to discover that ICU has C++ smart pointer classes for dealing with things like disposing of uenums and the like.  For example:

U_NAMESPACE_BEGIN

/**
 * \class LocalUEnumerationPointer
 * "Smart pointer" class, closes a UEnumeration via uenum_close().
 * For most methods see the LocalPointerBase base class.
 *
 * @see LocalPointerBase
 * @see LocalPointer
 * @stable ICU 4.4
 */
U_DEFINE_LOCAL_OPEN_POINTER(LocalUEnumerationPointer, UEnumeration, uenum_close);

U_NAMESPACE_END

Any reason we're not using these rather than hand-rolling our own, probably-inferior (I'd expect their smart pointers wouldn't have the must-use-smartpointer-after-errorchecking-the-allocating-call thing we have) versions?

@@ +1699,5 @@
> +    if (!IndexToId(cx, index++, &indexId))
> +        return false;
> +    RootedValue element(cx, StringValue(jscalendar));
> +    if (!DefineNativeProperty(cx, calendars, indexId, element, JS_PropertyStub,
> +                              JS_StrictPropertyStub, JSPROP_ENUMERATE, 0, 0))

You should use JSObject::defineElement to avoid jsid creation/initialization here.  (We're working at making known-uint32_t index accesses use dedicated, faster APIs.  Using the API specific to that helps with the transition.)  This also makes your life easier, as you don't have to deal with jsid madness.

@@ +1732,5 @@
> +        if (!IndexToId(cx, index++, &indexId))
> +            return false;
> +        element = StringValue(jscalendar);
> +        if (!DefineNativeProperty(cx, calendars, indexId, element, JS_PropertyStub,
> +                                  JS_StrictPropertyStub, JSPROP_ENUMERATE, 0, 0))

More opportunity to use JSObject::defineElement.

@@ +1754,5 @@
> +    JSAutoByteString locale(cx, args[0].toString());
> +    if (!locale)
> +        return false;
> +    RootedString jsskeleton(cx, args[1].toString());
> +    const jschar *skeleton = JS_GetStringCharsZ(cx, jsskeleton);

This needs a null-check, and a SkipRoot skip(cx, &skeleton), I think.

@@ +1765,5 @@
> +        return false;
> +    }
> +    ScopedICUObject<UDateTimePatternGenerator> toClose(gen, udatpg_close);
> +
> +    int size = udatpg_getBestPattern(gen, skeleton, skeletonLen, NULL, 0, &status);

Looks like size should be int32_t.

@@ +1770,5 @@
> +    if (U_FAILURE(status) && status != U_BUFFER_OVERFLOW_ERROR) {
> +        JS_ReportErrorNumber(cx, js_GetErrorMessage, NULL, JSMSG_INTERNAL_INTL_ERROR);
> +        return false;
> +    }
> +    ScopedJSFreePtr<UChar> pattern((UChar *) JS_malloc(cx, sizeof(UChar) * (size + 1)));

cx->pod_malloc<UChar>(size + 1)
Attachment #725703 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 725704 [details] [diff] [review]
Implement ICU dependent functions of Intl.Collator, Intl.NumberFormat, Intl.DateTimeFormat (part 7)

Review of attachment 725704 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/Intl.cpp
@@ +1826,5 @@
> +    if (hasP) {
> +        if (!JSObject::getProperty(cx, internals, internals, cx->names().timeZone, &value))
> +            return NULL;
> +        if (!value.isUndefined()) {
> +            uTimeZone = JS_GetStringCharsZ(cx, value.toString());

Hmm.  It's slightly unfortunate our string APIs deal with size_t here.  We could use JS_GetStringCharsAndLength if it weren't for that.  (All JS strings are limited to uint32_t length in SpiderMonkey, actually slightly lower.)

@@ +1827,5 @@
> +        if (!JSObject::getProperty(cx, internals, internals, cx->names().timeZone, &value))
> +            return NULL;
> +        if (!value.isUndefined()) {
> +            uTimeZone = JS_GetStringCharsZ(cx, value.toString());
> +            uTimeZoneLength = u_strlen(uTimeZone);

We're going to need a SkipRoot skip(cx, &uTimeZone) above here as well, I think.

@@ +1833,5 @@
> +    }
> +    if (!JSObject::getProperty(cx, internals, internals, cx->names().pattern, &value))
> +        return NULL;
> +    uPattern = JS_GetStringCharsZ(cx, value.toString());
> +    uPatternLength = u_strlen(uPattern);

More SkipRoot needed.

@@ +1849,5 @@
> +    }
> +
> +    // ECMAScript requires the Gregorian calendar to be used from the beginning
> +    // of ECMAScript time.
> +    UCalendar *cal = (UCalendar *) udat_getCalendar(df);

I don't see a need for this cast, looking at the current headers.

@@ +1850,5 @@
> +
> +    // ECMAScript requires the Gregorian calendar to be used from the beginning
> +    // of ECMAScript time.
> +    UCalendar *cal = (UCalendar *) udat_getCalendar(df);
> +    ucal_setGregorianChange(cal, -8640000000000000, &status);

Unadorned numbers being ints in C++, this is either not going to work right (truncation of the constant) or trigger warnings with some compilers.  I'm not sure which.  Tack on a .0 or a d so neither case happens.

@@ +1853,5 @@
> +    UCalendar *cal = (UCalendar *) udat_getCalendar(df);
> +    ucal_setGregorianChange(cal, -8640000000000000, &status);
> +
> +    // An error here means the calendar is not Gregorian, so we don't care.
> +    status = U_ZERO_ERROR;

Comment's a good idea.  The assignment, I would remove -- it seems likely to trigger compiler warnings, as it can't affect subsequent execution.
Attachment #725704 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 725926 [details] [diff] [review]
Implement ICU dependent functions of Intl.Collator, Intl.NumberFormat, Intl.DateTimeFormat (tests)

Review of attachment 725926 [details] [diff] [review]:
-----------------------------------------------------------------

This looks fine except for supportedLocalesOf.

::: js/src/tests/Intl/Collator/supportedLocalesOf.js
@@ +349,5 @@
> +
> +
> +var count = Intl.Collator.supportedLocalesOf(locales).length;
> +
> +reportCompare(locales.length, count, "Number of supported locales in Intl.Collator");

This is really gross. I'd prefer to assert this property for some reasonably stable subset so that the test doesn't constantly break and so that it will run on platforms where people have built their own firefox or ICU.

::: js/src/tests/Intl/DateTimeFormat/format.js
@@ +35,5 @@
> +// Locale ar-MA; long format, Islamic civilian calendar.
> +format = new Intl.DateTimeFormat("ar-ma-u-ca-islamicc",
> +                                 {timeZone: "UTC",
> +                                  year: "numeric", month: "long", day: "numeric",
> +                                  hour: "numeric", minute: "numeric", second: "numeric"});

Please factor out these long, identical formatting options.

::: js/src/tests/Intl/DateTimeFormat/supportedLocalesOf.js
@@ +365,5 @@
> +];
> +
> +var count = Intl.DateTimeFormat.supportedLocalesOf(locales).length;
> +
> +reportCompare(locales.length, count, "Number of supported locales in Intl.DateTimeFormat");

The same comment as on the Collator version applies here.

::: js/src/tests/Intl/NumberFormat/supportedLocalesOf.js
@@ +365,5 @@
> +];
> +
> +var count = Intl.NumberFormat.supportedLocalesOf(locales).length;
> +
> +reportCompare(locales.length, count, "Number of supported locales in Intl.NumberFormat");

Ditto.
Attachment #725926 - Flags: review?(terrence) → review+
Comment on attachment 725705 [details] [diff] [review]
Implement ICU dependent functions of Intl.Collator, Intl.NumberFormat, Intl.DateTimeFormat (part 8)

Review of attachment 725705 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/Intl.js
@@ +407,5 @@
>   *
>   * Spec: ECMAScript Internationalization API Specification, 6.2.4.
>   */
>  function DefaultLocale() {
> +    var localeOfLastResort = "en-GB";

I assume this change is because of bug 851763?  Add a comment like // XXX TEMPORARY HACK FOR BUG 851763 if so.

@@ +1116,5 @@
>  
>  
>  function collatorSortLocaleData(locale) {
> +    var collations = intl_availableCollations(locale);
> +    callFunction(std_Array_unshift, collations, null);

For a start this is okay, I guess.

But unshift is inherently a linear-time operation, in the absence of a pile of hacks, and particular requirements that only *sometimes* make it fast.  (I am assuming intl_availableCollations(locale) is large; I don't remember for sure that it is.)  Can we follow up and make intl_availableCollations reserve space for this |null| if we're going to add it like this later?  Perhaps as an optional mode or something, as I assume most users don't need any extra space at the start.

@@ +1134,1 @@
>          sensitivity: "variant"

This is implementation-defined per "The default search sensitivity per locale (10.2.3)" as mentioned informatively in Appendix A, correct?

@@ +1416,5 @@
> +        "arab", "arabext", "bali", "beng", "deva",
> +        "fullwide", "gujr", "guru", "hanidec", "khmr",
> +        "knda", "laoo", "latn", "limb", "mlym",
> +        "mong", "mymr", "orya", "tamldec", "telu",
> +        "thai", "tibt"

Could you add a comment explaining how we derived this list?  I have no idea when I'd want to add or remove entries from this list in the future.

@@ +1418,5 @@
> +        "knda", "laoo", "latn", "limb", "mlym",
> +        "mong", "mymr", "orya", "tamldec", "telu",
> +        "thai", "tibt"
> +    ];
> +    callFunction(std_Array_unshift, numberingSystems, defaultNumberingSystem);

Hmm, another linear-time perf thing to fix at some point.

Well, actually in this case we can also look at it as adding all of numberingSystems to an array containing only defaultNumberingSystem.  And that, you can do this way instead to eliminate the issue:

// We support a subset of the decimal numbering systems supported in ICU.
var systems = [intl_numberingSystem(locale)]; // start with the locale's system
var otherSystems = [ ... ];
callFunction(std_Function_apply, std_Array_push, systems, otherSystems);
return systems;
Attachment #725705 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 725706 [details] [diff] [review]
Implement ICU dependent functions of Intl.Collator, Intl.NumberFormat, Intl.DateTimeFormat (part 9)

Review of attachment 725706 [details] [diff] [review]:
-----------------------------------------------------------------

It is entirely possible I misunderstood a whole bunch of this patch.  In any case, it definitely needs enough changes for another review of it.

::: js/src/builtin/Intl.js
@@ +1598,5 @@
>          var value = GetOption(options, prop, "string", dateTimeComponentValues[prop], undefined);
>          opt[prop] = value;
>      }
>  
> +    // Steps 20-21 provided by ICU.

Hmm.  I guess all these step numbers became unsynced at some point between the i18n spec we were looking at when the patch adding this landed, and the 2013-02-28 draft with errata.  I guess once everything's stabilized I should go back and adjust numbering appropriately.

@@ +1625,5 @@
>      // Step 31.
>      internals.initializedDateTimeFormat = true;
>  }
>  
>  

Understanding nothing of patterns, skeletons, or date/time formatting using ICU before reading this patch, I hope it will come as no surprise that this was all very, very confusing.  :-)  A brief overview comment discussing why things are as they are here, how ICU works to implement this, and how SpiderMonkey uses ICU internally here would be incredibly helpful to future readers.

In pursuit of understanding everything, I wrote up such a comment.  Please critique it, tell me where I totally don't know what I'm talking about, and include an appropriately-fixed version in this patch -- here or thereabouts seems a reasonable place.

"""
Different locales have different optimal ways to display dates using the same basic components.  For example, en-US might use "Sept. 24, 2012" while fr-FR might use "24 Sept. 2012".  The intent of Intl.DateTimeFormat is to permit production of the "optimal" format for the locale, without the user having to pick it (or even know what it is).

ICU implements this behavior with a two-level pattern system.  The skeleton level consumes the options -- options.weekday, options.era, options.day, options.minute, options.hour, options.hour12, etc. (see Table 3 - Components of date and time formats) -- provided when an Intl.DateTimeFormat is initialized.  These inputs are used to form a skeleton.  A skeleton is a string like "yyyyMMDD", specifying the components of the date/time to be included in the ultimate formatted string.  A skeleton is consumed by a UDateTimePatternGenerator to produce a pattern.  The various skeleton patterns are defined in unicode/udat.h

The pattern level takes a pattern string like "yyyy.MM.dd" and produces an exact corresponding string like "2012.09.14".  This level does only exact pattern substitutions (and unescaping of any escaped portions of the pattern).  The overall format of the corresponding string is identical to that of the pattern string.  The components of the string are substituted with locale-sensitive values -- "December" for en-US or "Dezember" for de-DE, say -- but the overall format is determined entirely by the input pattern, locale-insensitively.

Inside this Intl.DateTimeFormat implementation, skeletons exist only temporarily within |toBestICUPattern|.  The created skeleton is immediately passed to js::intl_patternForSkeleton, a C++ method which gets the best pattern for the specified locale for those components.

Patterns initially exist as the return value of |toBestICUPattern|.  This value then propagates to the |internals.pattern| property for the specific DateTimeFormat-initialized object.  From here it's primarily consumed when producing a formatted string.  But there's one additional place that consumes the pattern: Intl.DateTimeFormat.prototype.resolvedOptions(), to expose the values of the Table 3 properties.  In the spec all the properties listed in Table 3 exist as separate internal properties on |internals|.  But in our implementation, these properties are latent in the value of |internals.pattern| -- they're not saved or tracked.  So to reconstitute them when computing resolved options, we have to parse |internals.pattern| to extract the components and set the appropriate properties on the returned object.  This task is performed by |resolveICUPattern|.
"""

@@ +1626,5 @@
>      internals.initializedDateTimeFormat = true;
>  }
>  
>  
> +function toBestICUPattern(locale, options) {

An overview comment for this method would have been helpful in helping me understand it.  Something like this, maybe:

"""
Computes an ICU pattern for the given locale, usable in producing formatted dates/times for the locale.  (Internally this computes an ICU skeleton containing the requested components in |options|, then it gets an ICU pattern corresponding to that skeleton in the given locale.)
"""

@@ +1627,5 @@
>  }
>  
>  
> +function toBestICUPattern(locale, options) {
> +    var skeleton = "";

Is there specific documentation for the symbols in a skeleton anywhere?  Somehow I found <http://userguide.icu-project.org/formatparse/datetime>, but that's not a complete guide to all the skeleton symbols used here.  udat.h seems to include other symbols, some of which might be applicable at the skeleton level -- UDAT_HOUR as "j" is obviously used below here, for one.  Having one document to point to would be ideal here, but if it has to be two, somehow, that's better than not having it spelled out at all.

@@ +1636,5 @@
> +    case "short":
> +        skeleton += "E";
> +        break;
> +    case "long":
> +        skeleton += "EEEE";

FormatDateTime 7.a.ix says the interpretations of narrow/short/long for all of these are implementation-defined, correct?  Not disagreeing with these choices here, or saying they're not sensible, just making sure I understand the extent they're required by the spec.

Given that for numeric fields, field width corresponds to output width, I'm a little sad that non-numeric fields don't at least roughly try to correspond to length ordering -- "E", "EEEE", "EEEEE" from shortest to longest, I mean.  :-(  This confused me for a bit, until I noticed the rough correspondence had been specified for *numeric* fields only.

Is it at all possible we might want "e" instead, here?  I have no idea whether Intl.DateTimeFormat output should use local day-of-week or not.  If we want "e" here, I assume we'd also want to use "e" in |icuPatternCharToComponent|, probably.

@@ +1638,5 @@
> +        break;
> +    case "long":
> +        skeleton += "EEEE";
> +    }
> +    switch (options.era) {

When I look at http://userguide.icu-project.org/formatparse/datetime I only see the single code "G" for era designator.  Surely I'm missing something here, right?  Because otherwise it looks like these would produce "ADADADADAD" for "narrow" and "ADADADAD" for "long", which is obviously not acceptable.

Hmm.  Further down that doc it says that "yyyyy.MMMM.dd GGG hh:mm aaa" would correspond to "01996.July.10 AD 12:08 PM".  Maybe there's some sort of API rule that unrecognized-length sequences correspond to one of the recognized-length versions?  (And we're just using the different lengths to hopefully trigger better-targeted era behavior should it ever be implemented?)  Let me know where this is listed/documented, please!

@@ +1654,5 @@
> +    case "2-digit":
> +        skeleton += "yy";
> +        break;
> +    case "numeric":
> +        skeleton += "y";

Nitpick, but "yyyy" more clearly to me suggests what this actually expands to.

...or, after more reading/grokking, I guess this is what UDAT_YEAR is.  Right?  So it should be one "y", because "yyyy" would constrain exactly to four-digit years, but we want whatever the locale-preferred format is.  Is that right?

@@ +1683,5 @@
> +        skeleton += "d";
> +        break;
> +    }
> +    var hourSkeletonChar = "j";
> +    if (options.hour12 !== undefined) {

Where's "j" documented?  It's not listed at the link mentioned previously, although both "h"/"H" are.  (Maybe you meant this to be "k"?  Or perhaps I'm totally wrong, and my head's just spinning right now from so much pattern-comparison.  :-) )

...oh, later after finding the bits in udat.h I see this is UDAT_HOUR, with "the locale's preferred hour format (12 or 24)".  That makes sense, then.

@@ +1691,5 @@
> +            hourSkeletonChar = "H";
> +    }
> +    switch (options.hour) {
> +    case "2-digit":
> +        skeleton += hourSkeletonChar + hourSkeletonChar;

It *is* documented somewhere (where exactly?) that "jj" produces a two-digit 24-hour number, right?  In combining docs from two different places, I don't see anything that tells me this work.  (I guess this is the hour-equivalent to the era/"G" issue mentioned before.)

@@ +1715,5 @@
> +        break;
> +    }
> +    switch (options.timeZoneName) {
> +    case "short":
> +        skeleton += "z";

Since this corresponds to PDT and such, I'd prefer "zzz" as well here.  Or does "zzz" not correspond to PDT?  http://userguide.icu-project.org/formatparse/datetime seems to say "z"/"zz"/"zzz" all correspond to the same thing, but I'm not entirely sure that's the case, or if only "z" does and the others correspond to slightly longer things.

...or no, reading further along, maybe this is so |resolveICUPattern| can reconstitute this value when required by resolvedOptions()?  (And maybe this was the "why" for "y" versus "yyyy", and "jj" versus "j".)  Given this re-parsing complexity, I seriously wonder whether we wouldn't be better just saving these options on the internals data, so we don't have to go to such work to recompute them later...

@@ +1918,5 @@
>  
>  function dateTimeFormatLocaleData(locale) {
> +    return {
> +        ca: intl_availableCalendars(locale),
> +        nu: getNumberingSystems(locale)

I'm confused.  Doesn't this value correspond to "The value of the [[localeData]] internal property" in 12.2.3?  And doesn't that have a number of requirements that aren't followed here, and that InitializeDateTimeFormat depends on?  And that were basically followed in the previous shim?  I'm very confused about this!

Or is this going to be filled in in subsequent patches?

...or no, after further review I see everything else here is latent in the pattern value computed by |js::intl_patternForSkeleton|.  Right?  There should be a comment here saying that the missing values are latent in the pattern computed by |toBestICUPattern|, or that this implementation doesn't need to track them, or something.

@@ +1979,5 @@
>          calendar: internals.calendar,
>          numberingSystem: internals.numberingSystem,
>          timeZone: internals.timeZone
>      };
> +    resolveICUPattern(internals.pattern, result);

This method implements DateTimeFormat.prototype.resolvedOptions(), defined in 12.3.3.  Unless I'm seriously confused,

@@ +1999,5 @@
> +    m: "minute",
> +    s: "second",
> +    z: "timeZoneName",
> +    v: "timeZoneName",
> +    V: "timeZoneName"

Is there specific documentation for the symbols that make up a pattern anywhere?  As mentioned before I found <http://userguide.icu-project.org/formatparse/datetime> and bits at the top of udat.h, but nothing comprehensive and specifically saying it was a reference to all pattern symbols.  Well, the link may be a claim to complete documentation, if I read it closely.  Is it?

Whatever the proper reference is, this needs a comment by it that points to it, either as a link or as a header path, or as something that'll let the reader see where all these characters come from.

@@ +2003,5 @@
> +    V: "timeZoneName"
> +};
> +
> +
> +function resolveICUPattern(pattern, result) {

"""
Recomputes the internal properties (12.4) of a DateTimeFormat-initialized object that are latently represented in the ICU pattern string.  Each property/value pair represented in the pattern ("hour12" is only present if "hour" is) is then defined upon the |result| object.
"""

You should also |assert(IsObject(result), "resolveICUPattern")|, come to think of it.

@@ +2008,5 @@
> +    var i = 0;
> +    while (i < pattern.length) {
> +        var c = pattern[i++];
> +        if (c === "'") {
> +            while (i < pattern.length && pattern[i] !== c)

Since we're skipping over a quoted (escaped) subcomponent til the closing single-quote, I'd rather see a comparison to "'" than to |c| again.

@@ +2022,5 @@
> +            switch (c) {
> +            // "text" cases
> +            case "G":
> +            case "E":
> +            case "a":

What's "a" doing here?  There's no "a" in |icuPatternCharToComponent|, so you're not going to do anything except uselessly set the value here, I think.

@@ +2037,5 @@
> +            // "number" cases
> +            case "y":
> +            case "d":
> +            case "h":
> +            case "H":

Does "j" need to be handled here?
Attachment #725706 - Flags: review?(jwalden+bmo) → review-
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #44)

> I think the problem you have is you aren't, and haven't, edited existing
> code much.

SpiderMonkey code, you mean.

> How about this: seeing as you're mostly not inserting small bits in existing
> code, I'll just fix up style issues post-review, pre-checkin.

That sounds good.
Updated per comment 45. Carrying r+jwalden.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #45)

> > +        unum_setTextAttribute(nf, UNUM_CURRENCY_CODE, uCurrency, 3, &status);
> > +        if (U_FAILURE(status)) {
> > +            JS_ReportErrorNumber(cx, js_GetErrorMessage, NULL, JSMSG_INTERNAL_INTL_ERROR);
> > +            return NULL;
> > +        }
> 
> Now that I know about the UErrorCode outparam bit being documented behavior,
> it seems to me we should use it whenever it makes sense, to avoid JS error
> reporting noise.  Given that, we should remove this test and let the final
> one in the method do the work.

Actually, looking at this again I realized that the final one isn't necessary at all because the intervening function calls don't take the status parameter. I removed that one instead.
Attachment #725688 - Attachment is obsolete: true
Attachment #726518 - Flags: review+
Attachment #726518 - Flags: checkin?(jwalden+bmo)
Updated per comment 46. Carrying r+jwalden.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #46)

> Stumbling through ICU headers a bit, I happened to discover that ICU has C++
> smart pointer classes for dealing with things like disposing of uenums and
> the like.  For example:
> 
> U_NAMESPACE_BEGIN
> 
> /**
>  * \class LocalUEnumerationPointer
>  * "Smart pointer" class, closes a UEnumeration via uenum_close().
>  * For most methods see the LocalPointerBase base class.
>  *
>  * @see LocalPointerBase
>  * @see LocalPointer
>  * @stable ICU 4.4
>  */
> U_DEFINE_LOCAL_OPEN_POINTER(LocalUEnumerationPointer, UEnumeration,
> uenum_close);
> 
> U_NAMESPACE_END
> 
> Any reason we're not using these rather than hand-rolling our own,
> probably-inferior (I'd expect their smart pointers wouldn't have the
> must-use-smartpointer-after-errorchecking-the-allocating-call thing we have)
> versions?

I avoid the ICU C++ API as much as possible because it can't be used when using a system ICU:
http://userguide.icu-project.org/design#TOC-ICU-Binary-Compatibility:-Using-ICU-as-an-Operating-System-Level-Library
Attachment #725703 - Attachment is obsolete: true
Attachment #726519 - Flags: review+
Attachment #726519 - Flags: checkin?(jwalden+bmo)
Updated per comment 47. Carrying r+jwalden.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #47)

> > +    UCalendar *cal = (UCalendar *) udat_getCalendar(df);
> 
> I don't see a need for this cast, looking at the current headers.

udat_getCalendar returns a const UCalendar *; ucal_setGregorianChange wants a UCalendar *. If I'm in trouble, so is Apple:
http://www.opensource.apple.com/source/CF/CF-744/CFDateFormatter.c
Attachment #725704 - Attachment is obsolete: true
Attachment #726520 - Flags: review+
Attachment #726520 - Flags: checkin?(jwalden+bmo)
(In reply to Norbert Lindenberg from comment #51)
> > I think the problem you have is you aren't, and haven't, edited existing
> > code much.
> 
> SpiderMonkey code, you mean.

Yes, sorry for not being quite precise about that.  :-)
(In reply to Terrence Cole [:terrence] from comment #48)

> > +var count = Intl.Collator.supportedLocalesOf(locales).length;
> > +
> > +reportCompare(locales.length, count, "Number of supported locales in Intl.Collator");
> 
> This is really gross. I'd prefer to assert this property for some reasonably
> stable subset so that the test doesn't constantly break and so that it will
> run on platforms where people have built their own firefox or ICU.

My thinking was that Mozilla would want to notice if a new version of ICU drops support for a language that Mozilla cares about, so I listed the locales for those languages. I didn't list ICU languages that Mozilla currently doesn't localize into [1]. You're right that people who use a slimmed version of ICU would see the test fail and would have to change it, but I can't predict which locales they'd keep or drop. The lists of English/French/Spanish locales are probably longer than needed because I don't have information on which language/country combinations Mozilla cares about.

If you still think the list is too long, how would you construct a shorter list?

[1] http://www.mozilla.org/en-US/firefox/all/
(In reply to Norbert Lindenberg from comment #56)
> (In reply to Terrence Cole [:terrence] from comment #48)
> > 
> > This is really gross. I'd prefer to assert this property for some reasonably
> > stable subset so that the test doesn't constantly break and so that it will
> > run on platforms where people have built their own firefox or ICU.
> 
> My thinking was that Mozilla would want to notice if a new version of ICU
> drops support for a language that Mozilla cares about, so I listed the
> locales for those languages. I didn't list ICU languages that Mozilla
> currently doesn't localize into [1]. You're right that people who use a
> slimmed version of ICU would see the test fail and would have to change it,
> but I can't predict which locales they'd keep or drop. The lists of
> English/French/Spanish locales are probably longer than needed because I
> don't have information on which language/country combinations Mozilla cares
> about.

Good point! If the only goal of this test is to serve as such a canary, ideally we'd want this test to only run on TBPL and not locally. I think we can actually have a reasonable solution here: move these tests to the jsreftest harness (somewhere under js/src/tests/) and use the harness condition code in the header to ensure they only get run in the browser and not in the shell. A header like this should work:

// |reftest| skip-if(xulRuntime.shell) -- testing ICU integration with the browser

> If you still think the list is too long, how would you construct a shorter
> list?
> 
> [1] http://www.mozilla.org/en-US/firefox/all/

I was thinking of the 'C' locale since gettext (I believe) requires it to be present. I guess the same is not true of ICU. I guess it's a moot point though if we move these to reftests.
Comment on attachment 726518 [details] [diff] [review]
Implement ICU dependent functions of Intl.Collator, Intl.NumberFormat, Intl.DateTimeFormat (part 5)

https://hg.mozilla.org/integration/mozilla-inbound/rev/8a1221cad0c0
Attachment #726518 - Flags: checkin?(jwalden+bmo)
Comment on attachment 726519 [details] [diff] [review]
Implement ICU dependent functions of Intl.Collator, Intl.NumberFormat, Intl.DateTimeFormat (part 6)

https://hg.mozilla.org/integration/mozilla-inbound/rev/19d4df9405dd
Attachment #726519 - Flags: checkin?(jwalden+bmo)
Comment on attachment 726520 [details] [diff] [review]
Implement ICU dependent functions of Intl.Collator, Intl.NumberFormat, Intl.DateTimeFormat (part 7)

https://hg.mozilla.org/integration/mozilla-inbound/rev/41789248e1e6
Attachment #726520 - Flags: checkin?(jwalden+bmo)
(In reply to Norbert Lindenberg from comment #53)
> I avoid the ICU C++ API as much as possible because it can't be used when
> using a system ICU:
> http://userguide.icu-project.org/design#TOC-ICU-Binary-Compatibility:-Using-
> ICU-as-an-Operating-System-Level-Library

Hmm.  I can understand it as a general rule.  It's a bit unfortunate when it applies even to fully-header-defined C++ classes like this (inline methods, too, if there are any).  Oh well.

(In reply to Norbert Lindenberg from comment #54)
> > > +    UCalendar *cal = (UCalendar *) udat_getCalendar(df);
> > 
> > I don't see a need for this cast, looking at the current headers.
> 
> udat_getCalendar returns a const UCalendar *; ucal_setGregorianChange wants
> a UCalendar *.

Hmm, my eyes glossed over the const.  Sorry about that!

One minor substantive change I made to the patches, beyond style: I moved the second StringBuffer::resize() calls into the buffer-overflowed ifs.  It makes more sense that way, and on the off chance of a bogus error happening, it's probably not valid to treat the returned size as being well-defined, anyway.
Blocks: 852837
Updated per comment 49. Carrying r+jwalden.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #49)

> > +    var localeOfLastResort = "en-GB";
> 
> I assume this change is because of bug 851763?  Add a comment like // XXX
> TEMPORARY HACK FOR BUG 851763 if so.

No. I added a comment explaining en-GB. Previously, I used "und" because the implementation didn't have any real locale data.

> @@ +1116,5 @@
> >  
> >  
> >  function collatorSortLocaleData(locale) {
> > +    var collations = intl_availableCollations(locale);
> > +    callFunction(std_Array_unshift, collations, null);
> 
> For a start this is okay, I guess.
> 
> But unshift is inherently a linear-time operation, in the absence of a pile
> of hacks, and particular requirements that only *sometimes* make it fast. 
> (I am assuming intl_availableCollations(locale) is large; I don't remember
> for sure that it is.)  Can we follow up and make intl_availableCollations
> reserve space for this |null| if we're going to add it like this later? 
> Perhaps as an optional mode or something, as I assume most users don't need
> any extra space at the start.

intl_availableCollations(locale) is small; 1 element for most locales; 6 for Chinese. While rigging up the function to provide that number I found that it's called twice per Collator instance; localeData() is called in both ResolveLocale and InitializeCollator. I changed ResolveLocale to return the locale data to InitializeCollator, but it made no measurable difference, so I reverted the change.

> @@ +1134,1 @@
> >          sensitivity: "variant"
> 
> This is implementation-defined per "The default search sensitivity per
> locale (10.2.3)" as mentioned informatively in Appendix A, correct?

Correct.
Attachment #725705 - Attachment is obsolete: true
Attachment #727084 - Flags: review+
Attachment #727084 - Flags: checkin?(jwalden+bmo)
Update per comment 50.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #50)

I apologize for not providing adequate documentation with this patch. The interaction with ICU is impossible to understand without, and the related ICU documentation is incomplete - I filed a bug about that a long time ago. You guessed a lot of the background correctly, but were still missing critical bits.

I hope the comment after InitializeDateTimeFormat provides the necessary information.

> > +    // Steps 20-21 provided by ICU.
> 
> Hmm.  I guess all these step numbers became unsynced at some point between
> the i18n spec we were looking at when the patch adding this landed, and the
> 2013-02-28 draft with errata.  I guess once everything's stabilized I should
> go back and adjust numbering appropriately.

I can see why you'd want to use the February spec draft, since it addresses some spec problems that you found in the code reviews and adds a feature that we still want to add to this initial implementation (time zones). But there are still some open issues that I'll have to address in a later draft. Let's stick with the 1.0 spec as a reference for the time being.

> > +    case "short":
> > +        skeleton += "E";
> > +        break;
> > +    case "long":
> > +        skeleton += "EEEE";
> 
> FormatDateTime 7.a.ix says the interpretations of narrow/short/long for all
> of these are implementation-defined, correct?  Not disagreeing with these
> choices here, or saying they're not sensible, just making sure I understand
> the extent they're required by the spec.

Correct. Think of it as 9 knobs connected to the engine via rubber bands. In fact, the rubber bands for the era and timeZoneName knobs aren't required to exist at all...

> Given that for numeric fields, field width corresponds to output width, I'm
> a little sad that non-numeric fields don't at least roughly try to
> correspond to length ordering -- "E", "EEEE", "EEEEE" from shortest to
> longest, I mean.  :-(  This confused me for a bit, until I noticed the rough
> correspondence had been specified for *numeric* fields only.

The set of pattern symbols has grown organically since about 1996. If someone designed it from scratch today, it would surely look different.

> Is it at all possible we might want "e" instead, here?  I have no idea
> whether Intl.DateTimeFormat output should use local day-of-week or not.  If
> we want "e" here, I assume we'd also want to use "e" in
> |icuPatternCharToComponent|, probably.

I have no idea what "local day of week" really means, and this pattern character isn't used at all in CLDR in any of its hundreds of locales. "E" is used everywhere.

> > +    case "2-digit":
> > +        skeleton += "yy";
> > +        break;
> > +    case "numeric":
> > +        skeleton += "y";
> 
> Nitpick, but "yyyy" more clearly to me suggests what this actually expands
> to.
> 
> ...or, after more reading/grokking, I guess this is what UDAT_YEAR is. 
> Right?  So it should be one "y", because "yyyy" would constrain exactly to
> four-digit years, but we want whatever the locale-preferred format is.  Is
> that right?

"y" uses as many digits as necessary. In the Japanese imperial calendar, we're only in the year 25 of the current era, and that shouldn't turn into 0025.

> ...or no, reading further along, maybe this is so |resolveICUPattern| can
> reconstitute this value when required by resolvedOptions()?  (And maybe this
> was the "why" for "y" versus "yyyy", and "jj" versus "j".)  Given this
> re-parsing complexity, I seriously wonder whether we wouldn't be better just
> saving these options on the internals data, so we don't have to go to such
> work to recompute them later...

resolveICUPattern gets the pattern, not the skeleton, and the pattern may be somewhat different from what the skeleton requests (it all gets filtered through what UDateTimePatternGenerator can find for the locale).

> >  function dateTimeFormatLocaleData(locale) {
> > +    return {
> > +        ca: intl_availableCalendars(locale),
> > +        nu: getNumberingSystems(locale)
> 
> I'm confused.  Doesn't this value correspond to "The value of the
> [[localeData]] internal property" in 12.2.3?  And doesn't that have a number
> of requirements that aren't followed here, and that InitializeDateTimeFormat
> depends on?  And that were basically followed in the previous shim?  I'm
> very confused about this!
> 
> Or is this going to be filled in in subsequent patches?

Correct, there is an issue that UDateTimePatternGenerator with the underlying locale data is an adequate implementation of BestFitFormatMatcher, but doesn't conform to the BasicFormatMatcher spec. I've filed
https://bugzilla.mozilla.org/show_bug.cgi?id=852837

> @@ +2037,5 @@
> > +            // "number" cases
> > +            case "y":
> > +            case "d":
> > +            case "h":
> > +            case "H":
> 
> Does "j" need to be handled here?

No, "j" is only allowed in skeleton strings, not in pattern strings.
Attachment #725706 - Attachment is obsolete: true
Attachment #727085 - Flags: review?(jwalden+bmo)
Fixes a small bug and removes no-longer-needed warning suppression.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #61)

> One minor substantive change I made to the patches, beyond style: I moved
> the second StringBuffer::resize() calls into the buffer-overflowed ifs.  It
> makes more sense that way, and on the off chance of a bogus error happening,
> it's probably not valid to treat the returned size as being well-defined,
> anyway.

I had moved it above so that it would also trim the buffer to the actual size of the content. But you're right, errors need to be handled first.
Attachment #727088 - Flags: review?(jwalden+bmo)
Updated per comment 48 and comment 57. Carrying r+terrence.
Attachment #725926 - Attachment is obsolete: true
Attachment #727089 - Flags: review+
Attachment #727089 - Flags: checkin?(jwalden+bmo)
Depends on: 852912
Blocks: 853301
Attachment #727084 - Flags: checkin?(jwalden+bmo)
Attachment #727089 - Flags: checkin?(jwalden+bmo)
(In reply to Norbert Lindenberg from comment #62)
> No. I added a comment explaining en-GB. Previously, I used "und" because the
> implementation didn't have any real locale data.

Okay.  I'm a little surprised ICU wouldn't have effectively that sort of code to handle "und" internally, but oh well.

> intl_availableCollations(locale) is small; 1 element for most locales; 6 for
> Chinese. While rigging up the function to provide that number I found that
> it's called twice per Collator instance; localeData() is called in both
> ResolveLocale and InitializeCollator. I changed ResolveLocale to return the
> locale data to InitializeCollator, but it made no measurable difference, so
> I reverted the change.

Okay.  I was guessing it was a lot bigger; at these sizes it's O(1) for a large enough constant.  :-)

(In reply to Norbert Lindenberg from comment #64)
> I had moved it above so that it would also trim the buffer to the actual
> size of the content. But you're right, errors need to be handled first.

Urgh, I fail at memory.  I saw and understood the reason for the oddness on first readthrough, then forgot it when doing last once-overs before pushing.  I probably should have done a followup patch to fix the error-case.  Sorry for causing you extra hassle here.  :-(
Comment on attachment 727085 [details] [diff] [review]
Implement ICU dependent functions of Intl.Collator, Intl.NumberFormat, Intl.DateTimeFormat (part 9)

Review of attachment 727085 [details] [diff] [review]:
-----------------------------------------------------------------

I like this.  I like this a lot.  :-D

(In reply to Norbert Lindenberg from comment #63)
> I apologize for not providing adequate documentation with this patch. The
> interaction with ICU is impossible to understand without, and the related
> ICU documentation is incomplete - I filed a bug about that a long time ago.
> You guessed a lot of the background correctly, but were still missing
> critical bits.

Eh, it happens.  As long as we reach a good endpoint, I'm not too worried how we get there.  :-)

> Let's stick with the 1.0 spec as a reference for the time being.

Yeah, certainly.

> The set of pattern symbols has grown organically since about 1996. If
> someone designed it from scratch today, it would surely look different.

Same story with every old project -- good to know the rough reason here.

> I have no idea what "local day of week" really means, and this pattern
> character isn't used at all in CLDR in any of its hundreds of locales. "E"
> is used everywhere.

Heh.  Works for me.

> "y" uses as many digits as necessary. In the Japanese imperial calendar,
> we're only in the year 25 of the current era, and that shouldn't turn into
> 0025.

Okay.

> resolveICUPattern gets the pattern, not the skeleton, and the pattern may be
> somewhat different from what the skeleton requests (it all gets filtered
> through what UDateTimePatternGenerator can find for the locale).

Ah, right!  I probably wrote that at an intermediate stream-of-consciousness state while reviewing this.  :-)

> Correct, there is an issue that UDateTimePatternGenerator with the
> underlying locale data is an adequate implementation of
> BestFitFormatMatcher, but doesn't conform to the BasicFormatMatcher spec.
> I've filed
> https://bugzilla.mozilla.org/show_bug.cgi?id=852837

Sounds good.

> No, "j" is only allowed in skeleton strings, not in pattern strings.

Also good.

::: js/src/builtin/Intl.js
@@ +1657,5 @@
> +// ICU supports specification of date and time formats in three ways:
> +//
> +// 1) A style is just one of the identifiers FULL, LONG, MEDIUM, or SHORT.
> +//    The date-time components included in each style and their representation
> +//    are defined by ICU using CLDR locale data.

Add a parenthetical expanding CLDR at least once here, please.

@@ +1665,5 @@
> +//    specifies a year with at least four digits, a full month name, and a
> +//    two-digit day. It does not specify in which order the components appear,
> +//    how they are separated, the localized strings for textual components
> +//    (such as weekday or month), whether the month is in embedded or
> +//    stand-alone form (which can differ in some Slavic languages), or the

Could you briefly sketch what embedded and standalone forms are?  The names have me thinking the former is "This patch was reviewed on March 21, 2013" while the latter might be "Date of review: March 21, 2013", but that's a bit of guesswork.  And that guess seems a little dubious, thinking of how spellings and such vary based on gender, tense, etc. in at least German and probably a bunch of other languages.  So at least a hint of what this means would be nice.
Attachment #727085 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 727088 [details] [diff] [review]
Implement ICU dependent functions of Intl.Collator, Intl.NumberFormat, Intl.DateTimeFormat (cleanup)

Review of attachment 727088 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry again for mangling the resize cases here.  :-(
Attachment #727088 - Flags: review?(jwalden+bmo) → review+
No longer depends on: 852912
Updated per comment 69. Carrying r+jwalden.
Attachment #727085 - Attachment is obsolete: true
Attachment #728329 - Flags: review+
Attachment #728329 - Flags: checkin?(jwalden+bmo)
Attachment #727088 - Flags: checkin?(jwalden+bmo)
Comment on attachment 728329 [details] [diff] [review]
Implement ICU dependent functions of Intl.Collator, Intl.NumberFormat, Intl.DateTimeFormat (part 9)

https://hg.mozilla.org/integration/mozilla-inbound/rev/100e46502574
Attachment #728329 - Flags: checkin?(jwalden+bmo)
Comment on attachment 727088 [details] [diff] [review]
Implement ICU dependent functions of Intl.Collator, Intl.NumberFormat, Intl.DateTimeFormat (cleanup)

https://hg.mozilla.org/integration/mozilla-inbound/rev/e513c5d3b416
Attachment #727088 - Flags: checkin?(jwalden+bmo)
No longer depends on: 811911
Whiteboard: [leave open]
Target Milestone: --- → mozilla22
Blocks: 864843
Blocks: 866301
Blocks: 866305
Mass-moving existing Intl-related bugs to the new Core :: JavaScript: Internationalization API component.

If you think this bug has been moved in error, feel free to move it back to Core :: JavaScript Engine.

[Mass change filter: core-js-intl-api-move]
Component: JavaScript Engine → JavaScript: Internationalization API
With "ac_add_options --with-system-icu" I get during libxul linking (gcc -flto build):

../../intl/unicharutil/util/internal/libintl_unicharutil_util_internal.a(Unified_cpp_intl_unicharutil_util_internal0.o):Unified_cpp_intl_unicharutil_util_internal0.cpp:functi
on ICUUtils::ParseNumber(nsAString_internal&, ICUUtils::LanguageTagIterForContent&): error: undefined reference to 'unum_close'
../../intl/unicharutil/util/internal/libintl_unicharutil_util_internal.a(Unified_cpp_intl_unicharutil_util_internal0.o):Unified_cpp_intl_unicharutil_util_internal0.cpp:functi
on ICUUtils::ParseNumber(nsAString_internal&, ICUUtils::LanguageTagIterForContent&): error: undefined reference to 'unum_open'
../../intl/unicharutil/util/internal/libintl_unicharutil_util_internal.a(Unified_cpp_intl_unicharutil_util_internal0.o):Unified_cpp_intl_unicharutil_util_internal0.cpp:functi
on ICUUtils::ParseNumber(nsAString_internal&, ICUUtils::LanguageTagIterForContent&): error: undefined reference to 'unum_parseDouble'
../../intl/unicharutil/util/internal/libintl_unicharutil_util_internal.a(Unified_cpp_intl_unicharutil_util_internal0.o):Unified_cpp_intl_unicharutil_util_internal0.cpp:functi
on ICUUtils::ParseNumber(nsAString_internal&, ICUUtils::LanguageTagIterForContent&): error: undefined reference to 'unum_close'
../../intl/unicharutil/util/internal/libintl_unicharutil_util_internal.a(Unified_cpp_intl_unicharutil_util_internal0.o):Unified_cpp_intl_unicharutil_util_internal0.cpp:functi
on ICUUtils::LocalizeNumber(double, ICUUtils::LanguageTagIterForContent&, nsAString_internal&): error: undefined reference to 'unum_close'
../../intl/unicharutil/util/internal/libintl_unicharutil_util_internal.a(Unified_cpp_intl_unicharutil_util_internal0.o):Unified_cpp_intl_unicharutil_util_internal0.cpp:functi
on ICUUtils::LocalizeNumber(double, ICUUtils::LanguageTagIterForContent&, nsAString_internal&): error: undefined reference to 'unum_open'
../../intl/unicharutil/util/internal/libintl_unicharutil_util_internal.a(Unified_cpp_intl_unicharutil_util_internal0.o):Unified_cpp_intl_unicharutil_util_internal0.cpp:functi
on ICUUtils::LocalizeNumber(double, ICUUtils::LanguageTagIterForContent&, nsAString_internal&): error: undefined reference to 'unum_setAttribute'
../../intl/unicharutil/util/internal/libintl_unicharutil_util_internal.a(Unified_cpp_intl_unicharutil_util_internal0.o):Unified_cpp_intl_unicharutil_util_internal0.cpp:functi
on ICUUtils::LocalizeNumber(double, ICUUtils::LanguageTagIterForContent&, nsAString_internal&): error: undefined reference to 'unum_formatDouble'
../../intl/unicharutil/util/internal/libintl_unicharutil_util_internal.a(Unified_cpp_intl_unicharutil_util_internal0.o):Unified_cpp_intl_unicharutil_util_internal0.cpp:functi
on ICUUtils::LocalizeNumber(double, ICUUtils::LanguageTagIterForContent&, nsAString_internal&): error: undefined reference to 'unum_close'
../../dist/lib/libjs_static.a(Unified_cpp_js_src0.o):Unified_cpp_js_src0.cpp:function dateTimeFormat_finalize(js::FreeOp*, JSObject*): error: undefined reference to 'udat_clo
se'
../../dist/lib/libjs_static.a(Unified_cpp_js_src0.o):Unified_cpp_js_src0.cpp:function collator_finalize(js::FreeOp*, JSObject*): error: undefined reference to 'ucol_close'
../../dist/lib/libjs_static.a(Unified_cpp_js_src0.o):Unified_cpp_js_src0.cpp:function js::intl_DateTimeFormat_availableLocales(JSContext*, unsigned int, JS::Value*): error: u
ndefined reference to 'udat_getAvailable'
../../dist/lib/libjs_static.a(Unified_cpp_js_src0.o):Unified_cpp_js_src0.cpp:function js::intl_DateTimeFormat_availableLocales(JSContext*, unsigned int, JS::Value*): error: u
ndefined reference to 'udat_countAvailable'
../../dist/lib/libjs_static.a(Unified_cpp_js_src0.o):Unified_cpp_js_src0.cpp:function js::intl_NumberFormat_availableLocales(JSContext*, unsigned int, JS::Value*): error: und
efined reference to 'unum_getAvailable'
../../dist/lib/libjs_static.a(Unified_cpp_js_src0.o):Unified_cpp_js_src0.cpp:function js::intl_NumberFormat_availableLocales(JSContext*, unsigned int, JS::Value*): error: und
efined reference to 'unum_countAvailable
... (many more)
Sorry for the noise. I've opened https://bugzilla.mozilla.org/show_bug.cgi?id=971669 for this issue.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: