A lot C programmers prefer to keep structures within the C source file ("module"...

pascal_cuoq · on April 14, 2020

I am sorry I do not have an answer to your question. It's a very valid one and I would be interested in any pointer to an answer.

What I can say while we are on the subject, is that I have seen C code (most often C code that started its life in the 1990s, to be fair) that instead of showing an abstract struct in the public interface, showed a different struct definition.

Please don't do this. Yes, when compiling nowadays, eventually every compilation unit ends up as object files passed to a linker that doesn't know about types, but this is undefined behavior. It makes it difficult to find undefined behavior in the rest of the code because there is a big undefined behavior right in the middle of it.

beefhash · on April 14, 2020

Wait, doesn't this mean that the BSD sockets API is inherently dependent on UB, casing different socket types to each other and sometimes only using the first few members, or am I misunderstanding you?

pascal_cuoq · on April 14, 2020

Yes and no.

The thing I am describing is when you link a compilation unit using:

  struct internal_state { int dummy; } state;

with another compilation unit that defined the same state differently:

  struct internal_state {
     int actual_meaningful_member_1;
     unsigned long actual_meaningful_member_2; } state;

As far as I know, BSD socked do not do this. Zlib was doing this (https://github.com/pascal-cuoq/zlib-fork/blob/a52f0241f72433... ), but I have had the privilege of discussing this with Mark Adler, and I think the no-longer-necessary hack was removed from Zlib.

BSD sockets probably have a different kind of UB, related to so-call “strict aliasing” rules, unless they have been carefully audited and revised since the carefree times in which they were written. I am going to have to let you read this article for details (example st1, page 5): https://trust-in-soft.com/wp-content/uploads/2017/01/vmcai.p...

loeg · on April 14, 2020

BSD sockets are weird in that the first struct's (sockaddr) size wasn't big enough, so APIs all take a nominal pointer to sockaddr but may require larger storage (sockaddr_storage) depending on the actual address.

  /*
   * Structure used by kernel to store most
   * addresses.
   */
  struct sockaddr {
          unsigned char   sa_len;         /* total length */
          sa_family_t     sa_family;      /* address family */
          char            sa_data[14];    /* actually longer; address value */
  };


  /*
   * RFC 2553: protocol-independent placeholder for socket addresses
   */
  #define _SS_MAXSIZE     128U
  #define _SS_ALIGNSIZE   (sizeof(__int64_t))
  #define _SS_PAD1SIZE    (_SS_ALIGNSIZE - sizeof(unsigned char) - \
                              sizeof(sa_family_t))
  #define _SS_PAD2SIZE    (_SS_MAXSIZE - sizeof(unsigned char) - \
                              sizeof(sa_family_t) - _SS_PAD1SIZE - _SS_ALIGNSIZE)
  
  struct sockaddr_storage {
          unsigned char   ss_len;         /* address length */
          sa_family_t     ss_family;      /* address family */
          char            __ss_pad1[_SS_PAD1SIZE];
          __int64_t       __ss_align;     /* force desired struct alignment */
          char            __ss_pad2[_SS_PAD2SIZE];
  };

wahern · on April 14, 2020

struct sockaddr_storage is insufficient as well. A Unix domain socket path can be longer than `sizeof ((struct sockaddr_un){ 0}).sun_path`. That's a major reason why all the socket APIs take a separate socklen_t argument. Most people just assume that a domain socket path is limited to a relatively short string, but it's not (except possibly Minix, IIRC).

asveikau · on April 14, 2020

> A Unix domain socket path can be longer than `sizeof ((struct sockaddr_un){ 0}).sun_path`

Hm, I didn't realize this, or if I knew this I had forgotten. It makes sense because sun_path is usually pretty small, I believe 108 chars is the most common choice, and typically file paths are allowed to be much longer.

Do you have a citation for this behavior? I can't seem to find it, though I'm not looking very hard.

I guess you are right that any syscall taking a struct sockaddr * also has a length passed to it... Some systems have sa_len inside struct sockaddr to indicate length, but IIRC linux does not. I've often thought that length parameter was sort of redundant, because (1) some platforms have sa_len, and (2) even without that, you should be able to derive length from family. But your Unix domain socket example breaks (2). Without being able to do that, I start to imagine that the kernel would need to probe for NUL chars terminating the C string anytime it inspects a struct sockaddr_un, rather than block-copying the expected size of the structure -- that would be needlessly complicated.

wahern · on April 14, 2020

So I just reran some tests on my existing VMs and it turns out I remembered wrong. Here's the actual break down:

* Solaris 11.4: .sun_path: 108; bind/connect path maximum: 1023. Length seems to be same as open. Interestingly, open path maximum seems to be 1023 (judged by trying ls -l /path/to/sock), although I always thought it was unbounded on Solaris.

* MacOS 10.14: .sun_path: 104, bind/connect path maximum: 253. Length can be bigger than .sun_path but less than open path limit.

* NetBSD 8.0: .sun_path: 104, bind/connect path maximum: 253. Same as MacOS.

* FreeBSD 12.0: .sun_path: 104, bind/connect path maximum: 104.

* OpenBSD 6.6: .sun_path: 104, bind/connect path maximum: 103 (104 - 1).

* Linux 5.4: .sun_path: 108, bind/connect path maximum: 108.

* AIX 7.1: .sun_path: 1023, bind/connect path maximum: 1023. Yes, .sun_path is statically sized to 1023! And like Solaris, open path maximum seems to be 1023 (as judged by trying ls -l /path/to/socket). Thanks to Polar Home, polarhome.com, for the free AIX shell account.

Note that all the above lengths are exclusive of NUL, and the passed socklen_t argument did not include a NUL terminator.

For posterity: on all these systems you can still create sockets with long paths, you just have to chdir or use bindat/connectat if available. My test code confirmed as much. And AFAICT getsockname/getpeername will only return the .sun_path path (if anything) used to bind or connect, but that's a more complex topic (see https://github.com/wahern/cqueues/blob/e3af1f63/PORTING.md#g...)

asveikau · on April 14, 2020

Linux also has the unusual extension of: if sun_path[0] is NUL, the path is not a filesystem path and the rest of the name buffer is an ID. I don't remember if that can have embedded NULs in that ID. I believe so.

haberman · on April 14, 2020

I'm curious what exactly makes this undefined behavior.

And in particular, what about something like this?

    struct Foo {
    #ifdef __cplusplus
      int bar() const { return bar_; }
     private:
    #endif
      int bar_;
    };

Or, taking this a step further:

    struct _Foo;
    typedef struct _Foo Foo;

    // In C "struct _Foo" is never defined.
    int Foo_bar(const Foo* foo) { return *(int*)foo; }
    void Foo_setbar(Foo* foo) { *(int*)foo; }
    Foo* Foo_new() { return malloc(sizeof(int)); }

    #ifdef __cplusplus
    struct _Foo {
      void set_bar() { bar_ = bar; }
      int bar() const { return bar_; }
     private:
      int bar_;
    };
    #endif

The above isn't ideal but it does provide encapsulation in a way that doesn't seem to violate strict aliasing (the memory location is consistently read/written as "int").

pascal_cuoq · on April 14, 2020

I think this is plenty ok. For one thing, If a struct as a member of type T, it's ok to access it through a pointer to T (and also the address of the struct is guaranteed to be identical to the address of the first member). For another, you are using dynamically allocated memory, so the only thing that matters is the type of the pointer when the access is finally made. It doesn't matter that it was a Foo* before, if what you dereference is an int*.

This is different from pretending that the address of a struct s { int a; double b; } is the address of a struct t { int a; long long c; } and accessing it through a pointer to that. If you do that, C compilers will (given the opportunity) assume that the write-through-a-pointer-to-struct-t does not modify any object of type “struct s”. This is what the example st1 in the article illustrates.

The latter is what I suspect plenty of socket implementations still do (because there are several types of sockets, represented by different struct types with a common prefix). It is possible to revise them carefully so that they do not break the rules, but I doubt this work has been done.

flatfinger · on April 17, 2020

The ability to use pointers to structures with a Common initial Sequence goes back at least to 1974--before unions were invented. When C89 was written, it would have been plausible that an implementation could uphold the Common Initial Sequence guarantees for pointers without upholding them for unions, but rather less plausible that implementations could do the reverse. Thus, the Standard explicitly specified that the guarantee is usable for unions, but saw no need to redundantly specify that it also worked for pointers.

If compilers would recognize that operation involving a pointer/lvalue that is freshly visibly based on another is an action that at least potentially involves the latter, that would be sufficient to make code that relies upon the CIS work. Unfortunately, some compilers are willfully blind to such things.

loeg · on April 14, 2020

Yeah, the BSD socket API is kind of terrible like that. You could consider it an unspecified union type, or use memcpy() exclusively to access it safely.

emilfihlman · on April 14, 2020

Yeah, it depends on well agreed convention but which is ub according to the standard.

rmind · on April 14, 2020

I assume you mean something like that:

    struct obj_impl {
        // real members
        ...
    };

    In public API header:

    struct obj {
        unsigned char _private[N]; // -- where N is the size of obj_impl
    };

I have seen such code too. It is also potentially error-prone. Certainly not advocating for it.

msebor · on April 14, 2020

The ELF visibility attributes solve the part of the problem at the binary level (by hiding private library APIs from the application). The rest should be doable by structuring the project sources and headers in a suitable way.

loeg · on April 14, 2020

ELF is very much not part of the C standard.

defectbydesign · on April 15, 2020

There are already "Name Spaces" in C and modules are actually object files or libraries.

You can spread components in as many object files or libraries as you wish.

IMHO it's not a C related problem but a code design one.

Write libraries (with headers) only if you need to share the code but if you're not sure about that just include it for your specific program.

There is no shame to include local files containing declarations and definitions.

I think it is a misconception from C programmers to write headers for local purpose.