pokutuna.com

pokutuna

Web Developer / Software Engineer
Hyogo, Japan

Contributions

  • langchain-ai/langchainjs

    Replacement Character(�) appears in multibyte text output from Google VertexAI

    Checked other resources I added a very descriptive title to this issue. I searched the LangChain.js documentation with the integrated search. I used the GitHub search to find a similar question and didn't find it. I am sure that this is a bug in LangChain.js rather than my code. The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package). Example Code Make the model output long texts containing multibyte characters as a stream. import { VertexAI } from "@langchain/google-vertexai"; // Set your project ID and pass the credentials according to the doc. // https://js.langchain.com/docs/integrations/llms/google_vertex_ai const project = "YOUR_PROJECT_ID"; const langchainModel = new VertexAI({ model: "gemini-1.5-pro-preview-0409", location: "us-central1", authOptions: { projectId: project }, }); // EN: List as many Japanese proverbs as possible. const prompt = "日本のことわざをできるだけたくさん挙げて"; for await (const chunk of await langchainModel.stream(prompt)) { process.stdout.write(chunk); } Error Message and Stack Trace (if applicable) (No errors or stack traces occur) Output Example: Includes Replacement Characters (�) ## ������������:知恵の宝庫 日本のことわざは、長い歴史の中で培われた知恵や教訓が詰まった、短い言葉の宝庫で������いくつかご紹介しますね。 **人生・教訓** * **井の中の蛙大海を知らず** (I no naka no kawazu taikai wo shirazu): 狭い世界しか知らない者のたとえ。 * **石の上にも三年** (Ishi no ue ni mo san nen): ������強く努力すれば成功する。 * **案ずるより産むが易し** (Anzuru yori umu ga yasushi): 心配するよりも行動した方が良い。 * **転�������������** (Korobanu saki no tsue): 前もって準備をすることの大切さ。 * **失敗は成功のもと** (Shippai wa seikou no moto): 失敗から学ぶことで成功�������る。 **人���関係** * **類は友を呼ぶ** (Rui wa tomo wo yobu): 似た者同士が仲良くなる。 * **情けは人の為ならず** (Nasake wa hito no tame narazu): 人に親切にすることは巡り巡��て自分に良いことが返ってくる。 * **人の振り見て我が振り直せ** (Hito no furi mite waga furi naose): 他人の行動を見て自分の行動を反省する。 * **出る杭は打たれる** (Deru kui wa utareru): 他人より目���つ��叩かれる。 * **三人寄れば文殊の知恵** (Sannin yoreba monju no chie): みんなで知恵を出し合えば良い考えが浮かぶ。 ... Description This issue occurs when requesting outputs from the model in languages that include multibyte characters, such as Japanese, Chinese, Russian, Greek, and various other languages, or in texts that include emojis 😎. This issue occurs due to the handling of streams containing multibyte characters and the behavior of buffer.toString() method in Node. langchainjs/libs/langchain-google-gauth/src/auth.ts Line 15 in a1ed4fe data.on("data", (data) => this.appendBuffer(data.toString())); When receiving a stream containing multibyte characters, the point at which a chunk (readable.on('data', ...) is executed) is may be in the middle of a character’s byte sequence. For instance, the emoji "👋" is represented in UTF-8 as 0xF0 0x9F 0x91 0x8B. The callback might be executed after only 0xF0 0x9F has been received. buffer.toString() attempts to decode byte sequences assuming UTF-8 encoding. If the bytes are invalid, it does not throw an error, instead silently outputs a REPLACEMENT CHARACTER (�). https://nodejs.org/api/buffer.html#buffers-and-character-encodings To resolve this, use TextDecoder with the stream option. https://developer.mozilla.org/en-US/docs/Web/API/TextDecoder/decode Related Issues The issue has been reported below, but it persists even in the latest version. #4113 The same issue occurred when using Google Cloud's client libraries instead of LangChain, but it has been fixed. googleapis/nodejs-vertexai#78 googleapis/nodejs-vertexai#86 I will send a Pull Request later, but I am not familiar with this codebase, and there are many google-related packages under libs/ which I have not grasped enough. Any advice would be appreciated. System Info macOS node v20.12.2 langchain versions $ npm list --depth=1 | grep langchain ├─┬ @langchain/community@0.0.54 │ ├── @langchain/core@0.1.61 │ ├── @langchain/openai@0.0.28 ├─┬ @langchain/google-vertexai@0.0.12 │ ├── @langchain/core@0.1.61 deduped │ └── @langchain/google-gauth@0.0.12 ├─┬ langchain@0.1.36 │ ├── @langchain/community@0.0.54 deduped │ ├── @langchain/core@0.1.61 deduped │ ├── @langchain/openai@0.0.28 deduped │ ├── @langchain/textsplitters@0.0.0 │ ├── langchainhub@0.0.8

    pokutuna opened on 2024-05-04
  • langchain-ai/langchainjs

    google[patch]: fix: handling multibyte characters in stream for google-vertexai-web

    Fixes #6501 I have fixed this issue similarly to #5286. The approach is the same, but we need to use components that work in the Browser environment instead of Node. I previously fixed the same issue for @langchain/google-vertexai in #5285. Although I don't use @langchain/google-vertexai-web myself, I've also fixed this package as it was requested in the issue.

    pokutuna opened on 2024-08-12
  • langchain-ai/langchainjs

    Replacement Character(�) appears in multibyte text output from Google VertexAI Web

    Checked other resources I added a very descriptive title to this issue. I searched the LangChain.js documentation with the integrated search. I used the GitHub search to find a similar question and didn't find it. I am sure that this is a bug in LangChain.js rather than my code. The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package). Example Code Make the model output long texts containing multibyte characters as a stream. import { VertexAI } from "@langchain/google-vertexai-web"; const langchainModel = new VertexAI({ model: "gemini-1.5-pro-001", location: "us-central1", }); // EN: List as many Japanese proverbs as possible. const prompt = "日本のことわざをできるだけたくさん挙げて"; const stream = await langchainModel.stream(prompt); const reader = stream.getReader(); let buf = ""; while (true) { const { done, value } = await reader.read(); if (done) break; buf += value; } console.log(buf); This code can be executed by creating a service account key from the Google Cloud Console and running it with the following command: $ GOOGLE_WEB_CREDENTIALS=$(cat ./key.json) npx tsx sample.ts Error Message and Stack Trace (if applicable) (No errors or stack traces occur) Output Example: Includes Replacement Characters (�) ## ���本の諺 (ことわざ) - できるだけたくさん! **一般的な知������������** * 石の上にも三年 (いしのうえにもさんねん) - Perseverance will pay off. * 七転び八起き (ななころびやおき) - Fall seven times, stand up eight. * 継続は力なり (けいぞくはちからなり) - Persistence is power. * 急がば回れ (い��がばまわれ) - Haste makes waste. * 井の中の蛙大海を知らず (いのなかのかわずたいかいをしらず) - A frog in a well knows nothing of the great ocean. * 良���は���に苦し (りょうやくはくちにくい) - Good medicine tastes bitter. * 猿も木から落ちる (さるもきからおちる) - Even monkeys fall from trees. * 転石苔を生ぜず (てんせきこけをしょうぜず) - A rolling stone gathers no moss. * 覆水盆に返らず (ふくすいぼんにかえらず) - Spilled water will not return to the tray. * 後生の祭り (ごしょうの�����り) - Too late for regrets. * 習うより慣れろ (ならうよりなれろ) - Experience is the best teacher. * 鉄は熱いうちに打て (てつはあついうちにうて) - Strike while the iron is hot. ... Description This is the same issue as #5285. While #5285 is about @langchain/google-vertexai, this issue also occurs in @langchain/google-vertexai-web. The problem occurs when a stream chunk is cut in the middle of a multibyte character. For detailed reasons, please refer to #5285. I will submit a Pull Request with the fix shortly. System Info macOS node v20.12.2 langchain versions $ npm list --depth=1 | grep langchain ├─┬ @langchain/google-vertexai-web@0.0.25 │ ├── @langchain/core@0.2.23 │ └── @langchain/google-webauth@0.0.25 ├─┬ @langchain/google-vertexai@0.0.25 │ ├── @langchain/core@0.2.23 deduped │ └── @langchain/google-gauth@0.0.25 ├─┬ langchain@0.2.15 │ ├── UNMET OPTIONAL DEPENDENCY @langchain/anthropic@* │ ├── UNMET OPTIONAL DEPENDENCY @langchain/aws@* │ ├── UNMET OPTIONAL DEPENDENCY @langchain/cohere@* │ ├── UNMET OPTIONAL DEPENDENCY @langchain/community@* │ ├── @langchain/core@0.2.23 deduped │ ├── UNMET OPTIONAL DEPENDENCY @langchain/google-genai@* │ ├── @langchain/google-vertexai@0.0.25 deduped │ ├── UNMET OPTIONAL DEPENDENCY @langchain/groq@* │ ├── UNMET OPTIONAL DEPENDENCY @langchain/mistralai@* │ ├── UNMET OPTIONAL DEPENDENCY @langchain/ollama@* │ ├── @langchain/openai@0.2.6 │ ├── @langchain/textsplitters@0.0.3

    pokutuna opened on 2024-08-12
  • kubeflow/pipelines

    [sdk] Bug when trying to iterate a list of dictionaries with ParallelFor

    Environment KFP SDK version: kfp==2.0.0b16 All dependencies version: kfp==2.0.0b16 kfp-pipeline-spec==0.2.2 kfp-server-api==2.0.0b1 Steps to reproduce When running the code snippet below the following error is raised: kfp.components.types.type_utils.InconsistentTypeException: Incompatible argument passed to the input 'val_a' of component 'add': Argument type 'STRING' is incompatible with the input type 'NUMBER_INTEGER' @dsl.component() def add(val_a: int, val_b: int) -> int: return val_a + val_b @dsl.pipeline() def model_training_pipeline() -> None: with dsl.ParallelFor( items=[{"a": 1, "b": 10}, {"a": 2, "b": 20}], parallelism=1 ) as item: task = add(val_a=item.a, val_b=item.b) compiler.Compiler().compile( pipeline_func=model_training_pipeline, package_path="/app/pipeline.yaml" ) Expected result According to the ParallelFor documentation, the code sample above should compile without errors. The add component should receive the values of the dictionaries as integer arguments. Materials and Reference The code snippet below is a modification of the code snippet above, changing the add component to accept string arguments. @dsl.component() def add(val_a: str, val_b: str) -> int: return int(val_a) + int(val_b) @dsl.pipeline() def model_training_pipeline() -> None: with dsl.ParallelFor( items=[{"a": 1, "b": 10}, {"a": 2, "b": 20}], parallelism=1 ) as item: task = add(val_a=item.a, val_b=item.b) compiler.Compiler().compile( pipeline_func=model_training_pipeline, package_path="/app/pipeline.yaml" ) The pipeline compiles without errors with this modification, however it fails to run in Google Vertex Pipelines. The add component doesn't even run and throws the following error in the UI: Failed to evaluate the expression with error: INVALID_ARGUMENT: Failed to parseJson from string.; Failed to evaluate the parameter_expression_selector. As the component's code is not even executed, it seems that the problem occurs when executing the DAG. Here is the content of the pipeline.yaml that was compiled. # PIPELINE DEFINITION # Name: model-training-pipeline components: comp-add: executorLabel: exec-add inputDefinitions: parameters: val_a: parameterType: STRING val_b: parameterType: STRING outputDefinitions: parameters: Output: parameterType: NUMBER_INTEGER comp-for-loop-2: dag: tasks: add: cachingOptions: enableCache: true componentRef: name: comp-add inputs: parameters: val_a: componentInputParameter: pipelinechannel--loop-item-param-1 parameterExpressionSelector: parseJson(string_value)["a"] val_b: componentInputParameter: pipelinechannel--loop-item-param-1 parameterExpressionSelector: parseJson(string_value)["b"] taskInfo: name: add inputDefinitions: parameters: pipelinechannel--loop-item-param-1: parameterType: STRUCT deploymentSpec: executors: exec-add: container: args: - --executor_input - '{{$}}' - --function_to_execute - add command: - sh - -c - "\nif ! [ -x \"$(command -v pip)\" ]; then\n python3 -m ensurepip ||\ \ python3 -m ensurepip --user || apt-get install python3-pip\nfi\n\nPIP_DISABLE_PIP_VERSION_CHECK=1\ \ python3 -m pip install --quiet --no-warn-script-location 'kfp==2.0.0-beta.16'\ \ && \"$0\" \"$@\"\n" - sh - -ec - 'program_path=$(mktemp -d) printf "%s" "$0" > "$program_path/ephemeral_component.py" python3 -m kfp.components.executor_main --component_module_path "$program_path/ephemeral_component.py" "$@" ' - "\nimport kfp\nfrom kfp import dsl\nfrom kfp.dsl import *\nfrom typing import\ \ *\n\ndef add(val_a: str, val_b: str) -> int:\n return int(val_a) +\ \ int(val_b)\n\n" image: python:3.7 pipelineInfo: name: model-training-pipeline root: dag: tasks: for-loop-2: componentRef: name: comp-for-loop-2 iteratorPolicy: parallelismLimit: 1 parameterIterator: itemInput: pipelinechannel--loop-item-param-1 items: raw: '[{"a": 1, "b": 10}, {"a": 2, "b": 20}]' taskInfo: name: for-loop-2 schemaVersion: 2.1.0 sdkVersion: kfp-2.0.0-beta.16 Impacted by this bug? Give it a 👍.

    lucasvbalves opened on 2023-05-09
  • remix-run/remix

    Piping `res.body` to TransformStream throws an error

    What version of Remix are you using? Latest Are all your remix dependencies & dev-dependencies using the same version? Yes Steps to Reproduce Fetch any URL PipeThrough res.body into TransformStream Expected Behavior Fetching and piping through the stream to TransformStream works without issues. Actual Behavior You will receive an error from Web polyfill used by Remix that "first parameter" (in this case, TransformStream) has a readable property that is not a ReadableStream. First parameter has member 'readable' that is not a ReadableStream. This is because TransformStream is not polyfilled along ReadableStream/WritableStream, as a result, native Node implementation is loaded. When there are both polyfill and native streams circling around, this can lead to issues, e.g: MattiasBuelens/web-streams-polyfill#93 (comment)

    grabbou opened on 2023-06-18
  • GoogleCloudPlatform/bigquery-utils

    Create bqutil UDFs in all other non-US datasets

    I ran into some issues when trying to use CTE's in combination with bqutil. This exectues as expected: SELECT `bqutil.fn.median`([1,1,1,2,3,4,5,100,1000]) as median However, after adding a CTE: WITH covid AS ( SELECT date, daily_confirmed_cases FROM `bigquery-public-data.covid19_ecdc_eu.covid_19_geographic_distribution_worldwide` ) SELECT `bqutil.fn.median`([1,1,1,2,3,4,5,100,1000]) as median BQ throws the error: "Function not found: bqutil.fn.median". I there a way to explicitly import the BQ utils or any other suggestions to address this issue?

    davidvanrooij opened on 2021-07-06
  • langchain-ai/langchainjs

    fix handling of multibyte characters in streams for google-gauth

    Fixes #5285 There were only integration tests in google-gauth, but I have added some unit tests. Is this okay? I also used npm link to verify locally that the issue has been resolved.

    pokutuna opened on 2024-05-04
  • dataform-co/dataform

    External contributors cannot pass the test commands in contributing.md

    This is a guide to run tests provided in contributing.md dataform/contributing.md Lines 35 to 43 in f58bbdd ### Test The following command runs all unit tests: ```bash bazel test --build_tests_only -- ... -tests/integration/... ``` This runs all tests excluding those that rely on encrypted secrets. If you need to run integration tests, please [get in touch](mailto:opensource@dataform.co) with the team. However, it fails with the following output: ERROR: /home/pokutuna/dataform/test_credentials/BUILD:5:14: Action test_credentials/bigquery.json failed: (Exit 1): gcloud failed: error executing command external/gcloud_sdk/bin/gcloud kms decrypt '--ciphertext-file=test_credentials/bigquery.json.enc' '--plaintext-file=bazel-out/k8-py2-fastbuild/bin/test_credentials/bigquery.json' ... (remaining 4 arguments skipped) ERROR: (gcloud.kms.decrypt) PERMISSION_DENIED: Request had insufficient authentication scopes. - '@type': type.googleapis.com/google.rpc.ErrorInfo domain: googleapis.com metadata: method: google.cloud.kms.v1.KeyManagementService.Decrypt service: cloudkms.googleapis.com reason: ACCESS_TOKEN_SCOPE_INSUFFICIENT This command runs all tests except those depending on secrets. But the tests under the tests/cli/ also rely on Cloud KMS. As a result, external contributors cannot run these tests. This line is depending on a secret. dataform/tests/cli/BUILD Line 17 in f58bbdd "//test_credentials:bigquery.json",

    pokutuna opened on 2023-05-05
  • hashicorp/terraform-provider-google

    Validation pattern is narrower than actually used/generated for `google_monitoring_custom_service` and SLO.

    Community Note Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request. If you are interested in working on this issue or have submitted a pull request, please leave a comment. If an issue is assigned to the modular-magician user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to hashibot, a community member has claimed the issue already. Terraform Version Terraform v1.5.7 on darwin_arm64 + provider registry.terraform.io/hashicorp/google v4.82.0 Affected Resource(s) google_monitoring_custom_service google_monitoring_slo Terraform Configuration Files terraform { required_version = "1.5.7" required_providers { google = { source = "hashicorp/google" version = "4.82.0" } } backend "local" { path = "terraform.tfstate" } } # my Google Cloud project provider "google" { project = "pokutuna-playground" } # to be imported import { to = google_monitoring_custom_service.example id = "projects/my-project/services/gs-ReZdgRiuY5DWEldJnSA" } import { to = google_monitoring_slo.example id = "projects/my-project/services/gs-ReZdgRiuY5DWEldJnSA/serviceLevelObjectives/c3nU6dECTzSjFSEmMCyRyA" } Debug Output The following gist includes the output of the operations I actually executed in my Google Cloud project. $ cat main.tf $ TF_LOG=DEBUG terraform plan -generate-config-out=imported.tf $ cat imported.tf $ TF_LOG=DEBUG terraform plan https://gist.github.com/pokutuna/0f84c03e0eb18ac26a91b031afa1a419 Panic Output N/A Expected Behavior The actual existing service_id and slo_id do not trigger validation errors. Actual Behavior When running plan with import, or apply after import, the following validation errors are printed. (Other errors are also included, but they are not mentioned in this issue.) │ Error: "service_id" ("gs-ReZdgRiuY5DWEldJnSA") doesn't match regexp "^[a-z0-9\\-]+$" │ │ with google_monitoring_custom_service.example, │ on imported.tf line 8, in resource "google_monitoring_custom_service" "example": │ 8: service_id = "gs-ReZdgRiuY5DWEldJnSA" │ Error: "slo_id" ("c3nU6dECTzSjFSEmMCyRyA") doesn't match regexp "^[a-z0-9\\-]+$" │ │ with google_monitoring_slo.example, │ on imported.tf line 25, in resource "google_monitoring_slo" "example": │ 25: slo_id = "c3nU6dECTzSjFSEmMCyRyA" The service_id and slo_id are automatically generated when created from the console. The IDs I used in the example were also automatically generated. In other words, it's validating with a pattern that's narrower than what Cloud Monitoring actually generates. Steps to Reproduce Define a custom service and SLO on the Cloud Monitoring console. Describe the defined resources in the import block. Execute the steps included in the log. $ cat main.tf $ TF_LOG=DEBUG terraform plan -generate-config-out=imported.tf $ cat imported.tf $ TF_LOG=DEBUG terraform plan Important Factoids There's nothing special about my account. I'm using the Application Default Credentials created with gcloud auth application-default login. I suspect that the pattern ^[a-z0-9\\-]+$ is from the following API documentation. Method: services.create Method: services.serviceLevelObjectives.create I believe the pattern in these documents are also incorrect (I've provided feedback on it). The pattern that's actually working on Cloud Monitoring can be obtained from the API error. $ curl -X POST -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" "https://monitoring.googleapis.com/v3/projects/$GOOGLE_PROJECT/services?serviceId=%F0%9F%A5%BA" { "error": { "code": 400, "message": "Resource names must match pattern `^[a-zA-Z0-9-_:.]+$`. Got value \"🥺\"", "status": "INVALID_ARGUMENT" } } Therefore, ^[a-zA-Z0-9-_:.]+$ is the pattern that represents actual possible IDs. We can actually call these API to create a custom service and slo with the ID prefix:lower_UPPER-01.23. $ export GOOGLE_PROJECT=pokutuna-playground $ export ACCEPTABLE_ID=prefix:lower_UPPER-01.23 $ curl -X POST -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" "https://monitoring.googleapis.com/v3/projects/$GOOGLE_PROJECT/services?serviceId=$ACCEPTABLE_ID" -d '{"custom":{}}' -H 'Content-Type: application/json' > { > "name": "projects/744005832574/services/prefix:lower_UPPER-01.23", > "custom": {}, > "telemetry": {} > } $ curl -X POST -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" -H 'Content-Type: application/json' "https://monitoring.googleapis.com/v3/projects/$GOOGLE_PROJECT/services/$ACCEPTABLE_ID/serviceLevelObjectives?serviceLevelObjectiveId=$ACCEPTABLE_ID" -d @- <<JSON { "serviceLevelIndicator": { "requestBased": { "distributionCut": { distributionFilter: "metric.type=\"appengine.googleapis.com/http/server/response_latencies\" resource.type=\"gae_app\"", "range": { "min": 0, "max": 1000 } } } }, "goal": 0.001, "calendarPeriod": "WEEK" } JSON > { > "name": "projects/744005832574/services/prefix:lower_UPPER-01.23/serviceLevelObjectives/prefix:lower_UPPER-01.23", > "serviceLevelIndicator": { > "requestBased": { > "distributionCut": { > "distributionFilter": "metric.type=\"appengine.googleapis.com/http/server/response_latencies\" resource.type=\"gae_app\"", > "range": { > "max": 1000 > } > } > } > }, > "goal": 0.001, > "calendarPeriod": "WEEK" > } References #11696 This PR addresses the issue of importing service in google_monitoring_slo, but it has been left for a year. It seems that the google_monitoring_service and google_monitoring_custom_service use the same API. However, google_monitoring_service does not have service_id validation. mmv1/products/monitoring/Service.yaml (google_moniroting_custom_service) mmv1/products/monitoring/GenericService.yaml (google_moniroting_service)

    pokutuna opened on 2023-09-13
  • googleapis/nodejs-datastore

    Unable to connect to emulator running on docker compose with client 7.5.1

    I have development environments and CI to run Datastore emulator and an application that connect to it on Docker Compose. Those connections are resolved by the service name on the overlay network within it, such as datastore:8081. Since client version 7.5.1, these cannot connect to the emulator. This was triggered by this PR: #1101 When the baseUrl_ does not include a part that means the local network, grpc.credentials.createInsecure is no longer used. This change is for a custom endpoint, and the endpoint is given by the DATASTORE_EMULATOR_HOST environment variable. As a result, authentication is not skipped when the emulator host is something like datastore:8081. To support custom endpoints requiring authentication, how about using a another environment variable instead of reusing the existing one? (like DATASTORE_CUSTOM_HOST) I think users who set the DATASTORE_EMULATOR_HOST are expecting development use and not expecting authentication to be required. Workaround Setting network_mode: "host" and expose the emulator port for joining the host network, we can include localhost in endpoint url. However, this occupies a port on the host and may need to be adjusted. Environment details OS: macOS(host) & Linux(container) Node.js version: v20.2.0 npm version: 9.6.6 @google-cloud/datastore version: 7.5.1 Steps to reproduce Using Docker Compose, set up the Datastore emulator and an application container that uses the client library. You will encounter the error: "Could not load the default credentials." during connection. This is the reproduction code using two different client versions, including docker-compose.yaml: https://gist.github.com/pokutuna/314248d183f6fbfe60154f63751d3655

    pokutuna opened on 2023-06-03