Re: A note on strcpy
Posted: Sat Nov 30, 2013 6:33 pm
1. I have not suggested ANYONE use undefined behavior.
2. I have not defended my code as correct, or anything else, other than it worked for N years, and still does except for Apple Mavericks.
3. I have complained about Apple changing this for no good reason. If they wanted to take an action, why not just call memmove() rather than aborting? That FIXES the problem. Why just print "Abort" with no explanation. Leading to a lot of debugging by a lot of people (just do a google search).
4. If you had programmed for a long time, the reason for the suggested usage is well-documented... If you do strcpy(s+n, s) a right-to-left copy (the only reasonable way to copy a null-terminated string) can cause problems. The only requirements to produce a potential problem are that source and destination overlap, and the length of source + source pointer >= original destination pointer. A straightforward case. If you avoid that, strcpy works just fine with overlaps.
5. strcpy() exposes many potential pitfalls. Destination shorter than source. Source non-null-terminated. Source or destination null. In short, there are plenty of ways to misuse any programming construct to cause crashes, bad results, memory leaks, buffer overwrites, or any of a host of other known problems. If you look at the list of bugs attributed to strcpy() 95% have nothing to do with overlap whatsoever. Most are source/destination length mismatches. The rest are improperly formatted strings with no terminating null which causes a copy of an undetermined number of bytes. It will eventually find a zero somewhere to stop copying, but it will likely copy way too much.
6. Changing software to break it is unacceptable. As I mentioned, MOST programs that use overlap have programmers that know what they are doing. For me, this code dates back to the early Fortran days, where there were no strings. Yet I needed to handle some sort of string to be able to parse moves to create an opening book. I wrote my own "copy()" subroutine in Fortran to do exactly that. Copy left-to-right, but without any terminating null, I just told it how many bytes to copy. And it worked. When I started Crafty, I translated utility stuff from Fortran to C by hand, so that I could get past the trivia as quickly as possible and get Crafty up and running. I simply copied what I did, but changed to follow typical C usage and switched to normal strings. And strcpy() did exactly what my old copy did, so I used it.
As I said, I do not defend the practice as good. I generally avoid such myself. But this is old, and it worked, and the change Apple made was simply pointless. They could have fixed it by using memmove() and some would have applauded. Some would not because to use memmove() you have to locate the terminating 0 which means scan down the string byte by byte BEFORE starting the copy loop or calling memmove(), and that slows the code down significantly. But Apple didn't do that, they just located the 0, then used the source, destination and length to see if the operands overlap. If so they issue "Abort" with no other explanation and terminate execution.
They had three choices:
(a) leave it alone (the best choice)
(b) fix it switching to memmove() if it detects overlap. A good (but lower performance) choice.
(c) breaking it outright (a terrible choice).
If you don't agree, that's fine. But don't start with the "I find it pretty appalling" nonsense. One should never break code intentionally. There's no way to make an "idiot-proof" strcpy() in the first place, so make it as functional and as fast as possible (which is what has been done by EVERYONE except Apple Mavericks) and move on to something else.
I posed a similar scenario with an aircraft. Did you read that? A crab angle too large is dangerous and the behavior will be undefined because it is not advised. But if you have to land in a crosswind stronger than recommended, or else stick the plane into the ground, which would you choose. A good pilot can exceed max recommended crab angle with additional speed, and a quick foot on the rudder pedal to straighten the plane up just before the wheels touch the ground. Apple's approach would be "stick it in the ground, this is undefined, we don't care whether the pilot is good or not."
No software engineering book I can think of suggests such behavior. This reminds me of the 1970's where the old Xerox UTS operating system had one error message for the command shell: "eh?" Doesn't matter whether the file doesn't exist, you had a bad wildcard, malformed directory path, bad command option, whatever. eh? They improved. Apple regressed.
For Dann.
Here's the output for the intel compiler with my sample program. First, the EXACT code:
#include <stdio.h>
void main() {
int i;
i = func(0);
printf("%d\n", i);
}
int func(int init) {
int x;
if (init)
x=1;
return (x++);
}
And here's the compiler output with normal optimization enabled:
crafty% icc -O tst.c
tst.c(2): warning #1079: return type of function "main" must be "int"
void main() {
^
I do not know what, exactly, you are doing. Data flow does not work here. But try this change which might open your eyes a bit:
where you call the function with a zero or a one value, replace that with this:
double value;
value = drand48()
i = dodo((int) value);
Does it complain or not? Value will, in this case, always be zero.
replace with
i = dodo((int) (value + 0.5)
Does it complain or not? Value will be 0 50% of time, 1 50% of time.
Compilers can't determine that and give an accurate error. Compiler doesn't even have a clue what drand48() returns, other than a floating point number.
My version of msvc does NOT find this error either. We have the latest commercial compilers they release, with automatic updates. So no idea what you are seeing, but it does not match with what I see. I get the same (no message) output for gcc 4.7.3, clang, intel C compiler (more than one version) and MSVC (I can't give version until I get to the office on monday, my windows laptop stays there as all I use it for is grading assignments.
2. I have not defended my code as correct, or anything else, other than it worked for N years, and still does except for Apple Mavericks.
3. I have complained about Apple changing this for no good reason. If they wanted to take an action, why not just call memmove() rather than aborting? That FIXES the problem. Why just print "Abort" with no explanation. Leading to a lot of debugging by a lot of people (just do a google search).
4. If you had programmed for a long time, the reason for the suggested usage is well-documented... If you do strcpy(s+n, s) a right-to-left copy (the only reasonable way to copy a null-terminated string) can cause problems. The only requirements to produce a potential problem are that source and destination overlap, and the length of source + source pointer >= original destination pointer. A straightforward case. If you avoid that, strcpy works just fine with overlaps.
5. strcpy() exposes many potential pitfalls. Destination shorter than source. Source non-null-terminated. Source or destination null. In short, there are plenty of ways to misuse any programming construct to cause crashes, bad results, memory leaks, buffer overwrites, or any of a host of other known problems. If you look at the list of bugs attributed to strcpy() 95% have nothing to do with overlap whatsoever. Most are source/destination length mismatches. The rest are improperly formatted strings with no terminating null which causes a copy of an undetermined number of bytes. It will eventually find a zero somewhere to stop copying, but it will likely copy way too much.
6. Changing software to break it is unacceptable. As I mentioned, MOST programs that use overlap have programmers that know what they are doing. For me, this code dates back to the early Fortran days, where there were no strings. Yet I needed to handle some sort of string to be able to parse moves to create an opening book. I wrote my own "copy()" subroutine in Fortran to do exactly that. Copy left-to-right, but without any terminating null, I just told it how many bytes to copy. And it worked. When I started Crafty, I translated utility stuff from Fortran to C by hand, so that I could get past the trivia as quickly as possible and get Crafty up and running. I simply copied what I did, but changed to follow typical C usage and switched to normal strings. And strcpy() did exactly what my old copy did, so I used it.
As I said, I do not defend the practice as good. I generally avoid such myself. But this is old, and it worked, and the change Apple made was simply pointless. They could have fixed it by using memmove() and some would have applauded. Some would not because to use memmove() you have to locate the terminating 0 which means scan down the string byte by byte BEFORE starting the copy loop or calling memmove(), and that slows the code down significantly. But Apple didn't do that, they just located the 0, then used the source, destination and length to see if the operands overlap. If so they issue "Abort" with no other explanation and terminate execution.
They had three choices:
(a) leave it alone (the best choice)
(b) fix it switching to memmove() if it detects overlap. A good (but lower performance) choice.
(c) breaking it outright (a terrible choice).
If you don't agree, that's fine. But don't start with the "I find it pretty appalling" nonsense. One should never break code intentionally. There's no way to make an "idiot-proof" strcpy() in the first place, so make it as functional and as fast as possible (which is what has been done by EVERYONE except Apple Mavericks) and move on to something else.
I posed a similar scenario with an aircraft. Did you read that? A crab angle too large is dangerous and the behavior will be undefined because it is not advised. But if you have to land in a crosswind stronger than recommended, or else stick the plane into the ground, which would you choose. A good pilot can exceed max recommended crab angle with additional speed, and a quick foot on the rudder pedal to straighten the plane up just before the wheels touch the ground. Apple's approach would be "stick it in the ground, this is undefined, we don't care whether the pilot is good or not."
No software engineering book I can think of suggests such behavior. This reminds me of the 1970's where the old Xerox UTS operating system had one error message for the command shell: "eh?" Doesn't matter whether the file doesn't exist, you had a bad wildcard, malformed directory path, bad command option, whatever. eh? They improved. Apple regressed.
For Dann.
Here's the output for the intel compiler with my sample program. First, the EXACT code:
#include <stdio.h>
void main() {
int i;
i = func(0);
printf("%d\n", i);
}
int func(int init) {
int x;
if (init)
x=1;
return (x++);
}
And here's the compiler output with normal optimization enabled:
crafty% icc -O tst.c
tst.c(2): warning #1079: return type of function "main" must be "int"
void main() {
^
I do not know what, exactly, you are doing. Data flow does not work here. But try this change which might open your eyes a bit:
where you call the function with a zero or a one value, replace that with this:
double value;
value = drand48()
i = dodo((int) value);
Does it complain or not? Value will, in this case, always be zero.
replace with
i = dodo((int) (value + 0.5)
Does it complain or not? Value will be 0 50% of time, 1 50% of time.
Compilers can't determine that and give an accurate error. Compiler doesn't even have a clue what drand48() returns, other than a floating point number.
My version of msvc does NOT find this error either. We have the latest commercial compilers they release, with automatic updates. So no idea what you are seeing, but it does not match with what I see. I get the same (no message) output for gcc 4.7.3, clang, intel C compiler (more than one version) and MSVC (I can't give version until I get to the office on monday, my windows laptop stays there as all I use it for is grading assignments.