These are excerpts from the documentation which I dim to be important for all users to remember and take into consideration:
!!! ALERT - z/OS and EBCDIC annoyance factors !!! There are few issues that were overlooked or mentioned only in passing in the Documentation. This issues are basically z/OS annoyance factors that must not be ignored. Here is an enumeration of the main known issues. 1. The original EBCDIC did not have the Circumflex (or Caret) ('^') character. When IBM first implemented the C compiler, they used their annoying Logical Not ('¬') character instead. This hold over from a bygone era is with us to stay. The problem is that this affects not only the compiler but the patterns that are the input to the PCRE library under z/OS as well. EXPLICITLY: For any pattern that requires circumflex under any other OS, you MUST use the Logical Not! Example: /^[^A]/ that should match any line that starts with any character but 'A' would be /¬[¬A]/ in the z/OS implementation.
2. However, the circumflex confusion does not end here. Officially in IBM-1047,X'B0' is the logical not and X'5F' is the circumflex , but in reality and presentation on the terminal, the X'5F' is the logical not. In IBM-037 the inverse mapping is in effect. I assume that the presentation is an holdover from IBM-037. Sorry about the immense confusion, but it is not my doing.
3. The Tab character (X'05') causes the C compiler to fail. I do not know which character to use instead, so I replaced all Tabs with Spaces in the source code. This should not affect patterns as we rarely deal with Tab character as is. I believe that it is recognized as white space as it should, but farther testing is needed.
4. The character y umlaut (officially known in Unicode as 'Latin small letter y with diaeresis' and looks like y with two dots above it, is mapped to X'FF' in ASCII. X'FF' in EBCDIC is EO (i.e. EOT End of Transmission). This may and would, cause issues when uploading ASCII files that contain that character to the mainframe.
5. When entering pattern via the PARM in the JCL, please remember that any Back Slash character must be doubled. This is a requirement of the C compiler as far as I can tell. Surprisingly, this annoyance factor is not unique to z/OS but may be seen in the Unix/Linux environment as well. Example: to run pcredemo to find all digits in the input string, one must code: //STEP1 EXEC PGM=PCREDEMO,PARM='-g \\d 123' and not //STEP1 EXEC PGM=PCREDEMO,PARM='-g \d 123'
A few more words about the build process of this port During the last few months I communicated with a few people who had found original and interesting ways to build the package on their respective machines. Some had build it on the z/OS Unix environment, perhaps using a straight download from the PCRE Source-Forge site, or some version of my port. Some used a version of gcc for compiler. One did not even go through the standard compile process, but compiled the assembler output of gcc on z/VM in order to get rid of what he'd called the LE bloat. I cannot vouch for all those methods. I do not know Unix well enough. I certainly do not know the gcc, let alone the mainframe port of that stuff. My understanding of 'make' and configuration files is, at best, superficial. However, I have successfully ported the package to the classic native z/OS environment which is so far and removed from the original environment, with decent build process that is predictable, repeatable and would be well understood by the intended audience. The reason for my success is NOT that I chose JCL and the IBM supplied C compiler over gcc, shell scripts and Unix. On the contrary, choosing those tools would simplify the port and build process tremendously. The reason for my success is that I followed instructions and did not try to figure out any shortcuts. I understand why z/VM needs a different build process. I understand why using gcc as an alternative compiler and Unix as an alternative environment would be beneficial, but I do recommend to follow my documented footsteps in all other aspects if you want it to work correctly in EBCDIC based environment. Please refrain from any shortcuts. If, for example, the documentation asks you to create the chartables.c by running the dftables.c, and than compile it and use it in the build, than there is a good reason for that and it should be done. When I did the port, I obviously needed to develop a classic native z/OS oriented, build process. There was no way to avoid it. But I did follow the rest of the implementation instructions to the letter without any shortcuts. And lo and behold, it worked. I expect the compile scripts to be different on z/VM as z/VM uses CMS scripts rather than JCL. It is possible to use gcc under z/OS Unix and keep the process close to the original, but in all those cases, it is still required to be mindful of the target EBCDIC environment. So please be mindful when using alternative methods. You may find that gcc may not necessarily be fully compatible with the code-page that you are using. You may find that other interesting and wonderful aspects of Unix/ASCII to z/OS/EBCDIC port may come to bite you in unexpected places and ways. But don't be discouraged, and above all, please let me know about your experience :)