/prog/ - C question — Tinychan BBS

Name: Anonymous 2017-04-14 17:09

is a byte with all zeros equivalent to the integer value (signed or unsigned) zero, in terms of portability?

Such that, is memset(&i, 0, sizeof(int)); equivalent to i = 0;?

Name: Anonymous 2017-04-14 19:19

Yes. memset itself usually implemented as a loop over 32-bit or 64-bit integers, depending on the machine register size.

Name: Anonymous 2017-04-14 20:28

>>2

usually implemented

Name: Anonymous 2017-04-14 21:27

>>3
Did you have something to add or you just enjoy quoting random phrases?

Name: Anonymous 2017-04-14 21:50

"Usually" is not good enough to be considered portable.
Is int v[..]; memset(v, 0, sizeof(v)); a portable equivalent to int v[..]={0};?

Name: Anonymous 2017-04-14 21:57

>>5
Yes

void *p;
memset(p, 0, sizeof(void *));

is not however equivalent to p = 0;

Name: Anonymous 2017-04-14 22:05

>>6
No because that is dangerous to do.

What guarantees that all bytes zero equals the integer value of zero?, e.g. 0000h = (int)0? How can you say that?

Name: Anonymous 2017-04-15 0:00

unsigned char uses ``a pure binary notation''¹, which is defined like you'd expect: Every bit represents a unique power of two between 1 and 2^CHAR_BIT-1, the value is the sum of the powers of two whose corresponding bit is set to 1. From this it follows that an unsigned char 0 consists of CHAR_BIT zero bits. All objects other than bitfields have an abstract object representation consisting of a contiguous sequence of one or more bytes². C distinguishes between object representations and values; I don't want to dive into the irrelevant details of ORs too much, but the thing to keep in mind is that not every OR must correspond to a value and a single value may have multiple ORs.

memset(dest, val, len), by definition, sets each of the bytes in the pointed-to part of an object's OR to (unsigned char)val. Therefore, your question is equivalent to ``Is all-zeroes a valid OR for 0 for all integer types?''. And the answer is no. Integer types are allowed to have padding bits which allow for so-called trap representations — representations that do not correspond to a value. You could use these to e.g. signal signed overflow or store parity bits.³ Use a scheme where a value of 0 requires set checksum bits and the memset trick doesn't work.

However, C implementations with padded integers are extremely rare. I know one architecture where padded integers make sense (a 48-bit Burroughs mainframe that used 40 bits for integer operations and only used the remaining 8 bits for floating point arithmetic), and I doubt it ever had a compliant C compiler in the first place. Most of the time, it's more effective to just use all bits for a larger integer range.

With that in mind, we could restrict ourselves to machines without padding bits and ask: Is all-zeroes a valid OR for 0 for all integer types if the integer types don't use padding bits? It turns out that the answer is yes in that case! Unsigned integer types may only have value and padding bits and their value bits must use a pure binary representation similar to the one unsigned char uses.⁴ Therefore they may not have trap representations and 0 is represented as all-zeroes. The only difference between unsigned and signed types is that signed types have a bit which is repurposed as a sign bit. Signed types may have trap representations even in the absence of padding⁵, but the value of a signed integer is defined as the value induced by the value bits as if you were dealing with an unsigned integer, fed into one of three operations⁶ if the sign bit is one. Therefore, an all-zeroes signed integer without padding bits represents the value 0.

So if you restrict yourself to implementations without integer padding bits, int i; memset(&i, 0, sizeof(i)); is in fact equivalent to i = 0;.

¹ Footnote 40, all references are according to ISO/IEC 9899:1999. I don't know if the numbering changed later and I don't care either because C11 is badly supported crap.
² §3.6 defines a byte as ``addressable unit of data storage large enough to hold any member of the basic character set of the execution environment'', which need not correspond to a ``real byte'' in hardware since CHAR_BIT must be at least 8 and an implementation on a 6-bit machine could provide an abstract ``C byte'' (the char) that consists of two ``real bytes'' and set CHAR_BIT to 12. unsigned char would then work like an unsigned 12-bit number.
³ Footnote 44.
⁴ §6.2.6.2.1
⁵ Negative zero in a sign-magnitude system is explicitly mentioned in §6.2.6.2.2 as an example of this.
⁶ Corresponding to sign-magnitude, two's complement and one's complement, §6.2.6.2.2.

Name: Anonymous 2017-04-15 0:40

>>8
Perfect answer, thank you

Name: Anonymous 2017-04-15 9:14

>>5

"Usually" is not good enough to be considered portable.

Theory and practice sometimes clash. And when that happens, theory loses. Every single time. --Linus Torvalds

Name: Anonymous 2017-04-16 15:21

>>1 why not look at the source code
https://github.com/lattera/glibc/blob/master/sysdeps/i386/memset.c#L30
memset (void *dstpp, int c, size_t len)
{
int d0;
unsigned long int dstp = (unsigned long int) dstpp;

/* This explicit register allocation
improves code very much indeed. */
register op_t x asm("ax");

x = (unsigned char) c;

/* Clear the direction flag, so filling will move forward. */
asm volatile("cld");

/* This threshold value is optimal. */
if (len >= 12)
{
/* Fill X with four copies of the char we want to fill with. */
x |= (x << 8);
x |= (x << 16);

/* Adjust LEN for the bytes handled in the first loop. */
len -= (-dstp) % OPSIZ;

/* There are at least some bytes to set.
No need to test for LEN == 0 in this alignment loop. */

/* Fill bytes until DSTP is aligned on a longword boundary. */
asm volatile("rep\n"
"stosb" /* %0, %2, %3 */ :
"=D" (dstp), "=c" (d0) :
"0" (dstp), "1" ((-dstp) % OPSIZ), "a" (x) :
"memory");

/* Fill longwords. */
asm volatile("rep\n"
"stosl" /* %0, %2, %3 */ :
"=D" (dstp), "=c" (d0) :
"0" (dstp), "1" (len / OPSIZ), "a" (x) :
"memory");
len %= OPSIZ;
}

/* Write the last few bytes. */
asm volatile("rep\n"
"stosb" /* %0, %2, %3 */ :
"=D" (dstp), "=c" (d0) :
"0" (dstp), "1" (len), "a" (x) :
"memory");

return dstpp;
}
libc_hidden_builtin_def (memset)

Name: Anonymous 2017-04-16 23:01

>>11
Because there are several implementations,
and then several libc.

But my question is answered anyway.

Name: Anonymous 2017-04-17 21:06

>>10
So... You're saying he should test it in every implementation if it is portable?

Name: Anonymous 2017-04-18 12:25

>>13
No, i'm saying that memset is much more complex than setting ints to 0, despite the reduced case of it being the same.
Using memset(with its different implementation) as some magical portability layer actually create more difference than just using i=0. There would be performance differences, rare bugs(you can't rely on C code larger than 10 lines completely) and corner cases where memset does something unexpected or does it much slower.

Name: Anonymous 2017-04-18 13:38

>>14
If it was the same thing, there wouldn't be a need for the function in the first place.

Name: Anonymous 2017-04-18 15:17

>>14
I think >>1-san didn't want to use memset to replace the cleaner and portable i = 0, but to quickly initialize structures and arrays.

Name: Anonymous 2017-04-18 15:23

>>16
For that {0} exists and is far safe and portable.

Name: Anonymous 2017-04-18 15:45

>>17
That only works for auto and static variables.

Name: Anonymous 2017-04-18 17:32

>>13
Yes. Language standard and implementation are different beasts. Moreover, C's standard explicitly leaves many decisions to the implementation.

Name: >>8-aho 2017-04-23 21:07

>>8,9

For any integer type the object representation where all the bits are zero shall be a representation of the value zero in that type.

§6.2.6.2.5
Please excuse me, I'll go kill myself now.

Name: Anonymous 2017-04-23 23:02

If C is so poorly designed, why hasn't someone come up with a better replacement?

Name: Anonymous 2017-04-24 1:19

>>22
Ada

Name: Anonymous 2017-04-25 4:40

>>22
C is the worse replacement for other languages. Universities were ``forced to'' replace better languages with C in the late 80s and early 90s.

The C takeover came a lot more recently than you might think. They make you think everyone used C in the 70s and that's why we're stuck with it, but that's not true at all. C was a toy project in the 70s.

https://www.bell-labs.com/usr/dmr/www/primevalC.html

Does that look like C was the important and popular language you may have been taught it was?

Name: Anonymous 2017-04-25 7:30

>>24

C is the worse replacement for other languages. Universities were ``forced to'' replace better languages with C in the late 80s and early 90s.

C was replaced with the likes of Python and Java in Universities to create hoards of code monkeys for companies.
It has NOTHING to do with the importance of the language.

Name: Anonymous 2017-04-25 18:35

>>25
You're acting like C was a huge part of the university's curriculum. Some universities taught C for less than five years before switching to Java. They switched from Pascal to C in the 90s. They taught C because it would produce code monkey Windows programmers.

Professors were forced to switch to C for the exact same reason they later had to teach Java or Python. Do you think these professors who were all about strong typing and software reliability wanted to switch from Ada or Pascal to C?

Name: Anonymous 2017-04-25 21:03

>>24
How is pascal a better alternative to C?

Name: Anonymous 2017-04-25 22:45

>>27
They were already teaching Pascal (or other languages) and moved to C in the late 80s and early 90s, so a better question is "How is C a better alternative to Pascal?"

If I was going to replace Pascal with something, it would be Ada, not C. I know why a lot of people use C and it is a good language for some embedded systems where safety isn't important, but it's not something I would expect universities to teach or promote as a general purpose language. In a class about kernels and operating systems, C is acceptable in my opinion (but not good), but not as the main or intro language.

That's what's so strange about this. They brought in include files, null-terminated strings, a bad preprocessor, arrays decaying to pointers, no bounds checking, switch with fallthrough, bad declaration syntax, in a new (for teaching at their university) language, and not even as a systems language, but the main general purpose language.

Name: Anonymous 2017-04-26 0:11

>>24
What were some of these ``better languages" that were replaced by C?

>>28
Pascal is a teaching language, that wasn't standardized or given a proper set of features (like arrays of arbitrary size) until it was too late. Ada is basically a forced meme by the DOD, and sees very little use for application development. Microsoft, Borland, GNU, Watcom, and Digital Mars all make C compilers, but only ONE of them makes an Ada compiler, and that was an afterthought if anything. And C was both a systems and applications language in the 70s or 80s, it was in fact TOO HIGH LEVEL to be used on 8-bit microcomputers. It only fell behind as an applications language once OOP came onto the scene.

Name: Anonymous 2017-04-26 1:42

>>29

And C was both a systems and applications language in the 70s or 80s

Yes, but not as early or as important as you think it is. It's not like Java, which came out in 1995 and was a huge thing for web applets in the 90s. C didn't become popular until the Sun workstation came out and became more popular when Microsoft picked C for Windows.
https://en.wikipedia.org/wiki/Sun-1

Where are all of these systems and applications written in ``C without structs'' or ``C without compound declarators''?

The earlier compiler does not know about structures at all: the string "struct" does not appear anywhere. The second tape has a compiler that does implement structures in a way that begins to approach their current meaning. Their declaration syntax seems to use () instead of {}, but . and -> for specifying members of a structure itself and members of a pointed-to structure are both there.

Neither compiler yet handled the general declaration syntax of today or even K&R I, with its compound declarators like the one in int **ipp; . The compilers have not yet evolved the notion of compounding of type constructors ("array of pointers to functions", for example). These would appear, though, by 5th or 6th edition Unix (say 1975), as described (in Postscript) in the C manual a couple of years after these versions.

C was an unfinished toy project in the 70s.
https://www.bell-labs.com/usr/dmr/www/chist.html

it was in fact TOO HIGH LEVEL to be used on 8-bit microcomputers.

It was too bloated, not too high level. BASIC is higher level, but it's smaller (and usually interpreted).

Name: Anonymous 2017-04-26 21:49

>>30

Where are all of these systems and applications written in ``C without structs'' or ``C without compound declarators''?

Basically the entirety of Research Unix, though it did have structs since 1973. You seem to be conflating ``Primeval C" with K&R C, even though they're two different dialects (and the former was only in use during the development of the first self-hosting C compiler).

BASIC is higher level

Not really, it doesn't support local variables, named functions, or structs. It's basically assembly language for a virtual machine. Calling it higher level than C is like saying JVM bytecode is higher level than C just because it incorporates the notion of "classes".

Name: Anonymous 2017-04-27 10:52

>>28
Pascal is verbose and doesn't even have pointers.

Name: Anonymous 2017-04-27 11:43

>>32
pointers are weird though, they're like variables that don't actually store anything

Name: Anonymous 2017-04-27 11:47

>>33
Pointers are unsigned integer variables which hold the address of something else.
int var=1;
int* pointer=&var;//contains addressof(&) var

Name: Anonymous 2017-04-27 14:42

>>34

C allows pointers to both local variables and heap space to be declared the same, with no way for code to tell which is which. That brought us "near pointers" and "far pointers" that you just wouldn't need in a language like Pascal. Why? Because Pascal was really popular when the 8086 was being designed, so the hardware targeted the four independent memory spaces that Pascal has (code, stack, heap, globals/runtime). That was very painful for C, requiring bizarre extensions to the standard to handle. Even something like the 6502 compilers for C said basically "you can't write recursive functions, because all local variables are actually static."

The Burroughs B-series, for example, had typed memory. You could store a float in one location, an integer in another location, and issue an "add" instruction, and the CPU would upcast the int to a float and add them together. There was only one add instruction, not one add-floats, add-integers, etc. This is something that kept C from ever being ported to that architecture.

The verifone credit card terminal software can't support function pointers. There's just no instruction in the CPU to branch to a location that's not hard-coded into the instruction.

The Sigma-9 had two optional compute units: the scientific unit and the business unit. The scientific unit is what we'd call an FPU these days. The business unit did BCD math, converted decimal numbers to printable strings (the COBOL edit display instruction which was COBOL's version of printf formatting, for example) and block moves. C has no support for decimal math, so you would get much faster COBOL programs on that hardware than C programs.

The Dolphin had microcode specifically to support Smalltalk, which was an actual OO language, so again you couldn't put C on it because you can't have pointers into the middle of an object.

All these machines died, not because they were bad machines, but because they were really terrible at supporting C. They were too specialized. But that doesn't make C inherently a good system language. It's only a good system language today, because people wrote so much C code that hardware manufacturers didn't feel the need to keep supporting the other languages in hardware.

Nowadays, you have GPUs. Up until CUDA, you couldn't write C code for GPUs, until eventually hardware manufacturers evolved GPUs to the point where they could support something very much like C. It's not that C is good for GPU programming. It's that hardware manufacturers catered to the C community and eventually threw enough hardware into the mix to make C-on-GPUs a reasonable approach. But the same thing has been happening between languages and hardware since before C was invented. Look at the restrictions on stuff like Fortran IV array indexing, and realize that's because that was the form of the expression that the programmer could look at and understand exactly which addressing mode the indexing operation would use.

ADDITION: Indeed, there's one aspect in which C falls far short compared to other system programming languages like Ada and Forth, and that's in support of dynamically-generated code. The whole C compilation and memory model is a harvard architecture, where the user has no direct control over the program counter's contents, can't generate or even load code on the fly, and in theory has code separate from data in every way. When this assumption is violated, you get one of the most common security flaws in modern systems.

https://www.reddit.com/r/programming/comments/rcnkk/what_does_it_take_for_a_language_to_be_faster/c44thes/

C question

1 Name: Anonymous 2017-04-14 17:09

2 Name: Anonymous 2017-04-14 19:19

3 Name: Anonymous 2017-04-14 20:28

4 Name: Anonymous 2017-04-14 21:27

5 Name: Anonymous 2017-04-14 21:50

6 Name: Anonymous 2017-04-14 21:57

7 Name: Anonymous 2017-04-14 22:05

8 Name: Anonymous 2017-04-15 0:00

9 Name: Anonymous 2017-04-15 0:40

10 Name: Anonymous 2017-04-15 9:14

11 Name: Anonymous 2017-04-16 15:21

12 Name: Anonymous 2017-04-16 23:01

13 Name: Anonymous 2017-04-17 21:06

14 Name: Anonymous 2017-04-18 12:25

15 Name: Anonymous 2017-04-18 13:38

16 Name: Anonymous 2017-04-18 15:17

17 Name: Anonymous 2017-04-18 15:23

18 Name: Anonymous 2017-04-18 15:45

19 Name: Anonymous 2017-04-18 17:32

21 Name: >>8-aho 2017-04-23 21:07

22 Name: Anonymous 2017-04-23 23:02

23 Name: Anonymous 2017-04-24 1:19

24 Name: Anonymous 2017-04-25 4:40

25 Name: Anonymous 2017-04-25 7:30

26 Name: Anonymous 2017-04-25 18:35

27 Name: Anonymous 2017-04-25 21:03

28 Name: Anonymous 2017-04-25 22:45

29 Name: Anonymous 2017-04-26 0:11

30 Name: Anonymous 2017-04-26 1:42

31 Name: Anonymous 2017-04-26 21:49

32 Name: Anonymous 2017-04-27 10:52

33 Name: Anonymous 2017-04-27 11:43

34 Name: Anonymous 2017-04-27 11:47

35 Name: Anonymous 2017-04-27 14:42